The course aims to train the students in conducting a thorough and valid analysis of online data sources with the use of basic programming, statistics and business intelligence tools.
DescriptionThis course is based on the assumption that we live in a world where the amount of data grows rapidly and where the need to be able to understand and analyze it becomes more and more pronounced. To keep up with this development it is therefore necessary to learn tools and techniques for interacting with these data sources and understanding the cultural contexts of the data. This course will teach you those tools and techniques using the programming language Python, basic statistics, and business intelligence.
The course aims to provide the students with tools to pose questions and get meaningful answers from data by minimising the amount of time it takes to arrive at the answer and maximising the relevance of the answer.
The course therefore focuses on two perspectives:
- Data understanding: this will give the students a basic, theoretical
understanding of data. We'll learn about different types of data, how
you know which questions to pose, and which answers to expect. We'll
also learn how to verify if the answers are meaningful and relevant,
and how to measure their quality.
- Data analysis: this will give the students a basic understanding of the concrete tools you can use for analysing data. We'll learn how to use the programming language Python for data analysis, how to translate your data questions in practice, how to make the procedure reproducible, and how to verify the answers you get from your code.
In order to work with the data sources of today, we'll also take some time for learning how to fetch and preprocess large amounts of data from modern and large data sources, both structured sources such as those exposed by and API and more loosely structured which must be fetched using web scraping techniques.
- Knowledge about fundamental Python programming
- Knowledge about basic scientific theory
Intended learning outcomes
After the course, the student should be able to:
- Write a Python program that extracts and presents information from common data formats
- Explain techniques for processing data in Python, given the size and format of the data
- Account for basic statistical measures and regression models
- Explain the difference between statistical metrics such as precision, recall and accuracy
- Discuss how sample populations relate to real-world populations
- Reason about and describe a falsifiable question that can be addressed with a specific data source
- Provide data-driven answers to falsifiable questions using statistical measures and regression models
- Discuss the validity of an analytical conclusion based on the method and data
The course will mainly consist of lectures, group work, and project work with a focus on active students’ participation and practical application of data handling techniques.
- McKinney, Wes: Python for Data Analysis, O'Reilly Media, 2017
- Provost, Foster & Tom Fawcett: Data Science for Business, O'Reilly Media, 2013
- Ceder, Naomi: The Quick Python Book, Manning, 2018
Student Activity BudgetEstimated distribution of learning activities for the typical student
- Preparation for lectures and exercises: 20%
- Lectures: 25%
- Exercises: 25%
- Assignments: 20%
- Exam with preparation: 10%
Ordinary examExam type:
C: Submission of written work, Internal (7-point scale)
C22: Submission of written work – Take home
Submission of up to five pages of Python code analyzing data provided by ITU.
All aids allowed (open book exam).
Time and dateOrdinary Exam - hand out Wed, 1 Jun 2022, 08:00 - 21:00
Ordinary Exam - submission Thu, 2 Jun 2022, 08:00 - 14:00
Reexam - hand out Mon, 15 Aug 2022, 08:00 - 21:00
Reexam - submission Tue, 16 Aug 2022, 08:00 - 14:00