AbstractThe course introduces the students to probability theory and applied statistics. It will focus on understanding the theoretical foundations of statistics and on applying the theory using mathematical analysis and simulations in R.
The course intends to give the student tools to identify and solve statistical problems in practice, occurring in data-analysis.The subjects covered in the course include: probability spaces, random variables, conditional and joint probability, independence, expectation, variance, correlation and covariance, simulation of random variables, law of large numbers, central limit theorem, explorative data analysis, statistical models, bootstrapping, maximum likelihood estimation, confidence intervals, hypothesis testing.
Formal prerequisitesThe course is mandatory for second semester BSc in Data Science students and requires basics in programming and mathematics.
Intended learning outcomes
After the course, the student should be able to:
- Apply fundamental definitions and theorems from probability theory and statistics
- Perform basic computations on random variables and simulate random variables using R
- Perform basic statistical modelling and inference (estimation and hypothesis testing) using mathematical analysis and in R
- Analyse sampling distribution of estimators using both mathematical tools and simulation (bootstrapping) with R
- Present a statistical analysis in a clear way that allows the reader to understand the conclusions and the assumptions they are based on
- Do basic programming and data manipulation in R
- Identify statistical problems in a given data analysis
The lectures will introduce the theory and give examples of apply the theory. The weekly exercises will train the students on applying the theory and using R. The problems that the students solve in the weekly exercises will prepare the students for the written exam.
Course literatureDekking, F.M, Kraaikamp, C., Lopuhaä, H.P., Meester, L.E. (2010), A Modern Introduction to Probability and Statistics - Understanding Why and How, Springer.
Verzani, J. (2014), Using R for Introductory Statistics, Second Edition, CRC Press.
Student Activity BudgetEstimated distribution of learning activities for the typical student
- Preparation for lectures and exercises: 15%
- Lectures: 25%
- Exercises: 25%
- Assignments: 15%
- Exam with preparation: 10%
- Other: 10%
Ordinary examExam type:
C: Submission of written work, Internal (7-point scale)
C22: Submission of written work – Take home
4-hours take-home exam (disregard the 1 day duration below)
Aids allowed for the exam
• Written and printed books and notes
• E-books and notes on the computer are allowed.
• Specific software and/or programmes:Students should use a computer with the R programming language installed (with packages as specified by the teachers).
Random fraud control
After the exam, a sample of approx. 20% of students will be contacted and asked to explain some of their answers to the exam. Students must be available after the exam so that they can be contacted. The purpose is to ensure that the student has answered the exam themselves. It has no impact on the grade. More information will be available on learnIT before the exam.