Official course description:

Full info last published 21/08-23
Course info
ECTS points:
Course code:
Participants max:
Offered to guest students:
Offered to exchange students:
Offered as a single subject:
MSc. Master
MSc in Data Science
Course manager
Associate Professor
Associate Professor, Head of study programme
Course semester
Efterår 2023
28 August 2023
26 January 2024
Exam type
ekstern censur
Grade Scale
Exam Language

This course introduces students to the foundations of handling heterogeneous data sources through the steps of data collection, annotation, processing, cleaning, integration, transformation, and visualization. Ethical issues and dataset bias are also discussed. 


This course teaches how to design, implement and combine a suite of techniques for producing high-quality datasets, starting from the collection of raw data in the wild. The student taking this course will learn how to: 

  • collect and integrate heterogeneous data from various sources (including open data, unstructured data, web scraping, proprietary APIs)
  • annotate it with appropriate metadata (e.g, using crowdsourcing)
  • clean it and transform it to satisfy given quality indicators (e.g., deduplication, anonymization, normalization), 
  • iterate over these steps to meet the requirements of specific data science and machine learning problems

The course will also teach how to identify and address some ethical issues arising from data collection and handling, including possible implications of data biases. In parallel, the student will also gain experience with discussing, presenting, and visualizing key aspects that can document and inform the data processing steps and effectively describe the final datasets produced.  

Formal prerequisites

There are no formal prerequisites for this course for students in the associated MSc program. 

It will be helpful to have some experience with Python, and basic statistics. 

Intended learning outcomes

After the course, the student should be able to:

  • Describe different data collection/annotation/visualization methods with regards to their strengths and weaknesses
  • Apply appropriate data collection/annotation/visualization methods in order to create novel datasets
  • Find suitable connections between dataset properties, analysis methods, and research questions
  • Extract insights from the data analysis and present the results with appropriate visualization and written reporting
  • Discuss the findings with respect to relevant work from the literature, and reflect on their real-world implications
Learning activities

The course will consist of lectures, which may require preparation (such as reading or exploring some data) beforehand, and a group project (or several smaller projects). 

Course literature

You do not need to purchase any books for the course.

We will assign reading material that is available online (such as open access papers). 

Student Activity Budget
Estimated distribution of learning activities for the typical student
  • Preparation for lectures and exercises: 25%
  • Lectures: 25%
  • Project work, supervision included: 40%
  • Exam with preparation: 10%
Ordinary exam
Exam type:
D: Submission of written work with following oral, External (7-point scale)
Exam variation:
D1G: Submission for groups with following oral exam based on the submission. Shared responsibility for the report.
Exam submission description:
During the exam you will present your project as a group. Then we will ask individual questions based on the report, to test your level of the learning objectives.

Submission: a report with the findings of the project, Github repository with analysis code
Group submission:
  • Group submission of the project report, group size 4 people.
Exam duration per student for the oral exam:
20 minutes
Group exam form:
Mixed exam 1 : Individual and joint student presentation followed by an individual and a group dialogue. The students make a joint presentation followed by a group dialogue. Subsequently the students are having individual examination with presentation and / or dialogue with the supervisor and external examiner while the rest of the group is outside the room.

Exam type:
D: Submission of written work with following oral, External (7-point scale)
Exam variation:
D11: Submission with following oral exam based on the submission.
Exam duration per student for the oral exam:
20 minutes

Time and date
Ordinary Exam - submission Wed, 13 Dec 2023, 08:00 - 14:00
Ordinary Exam Tue, 9 Jan 2024, 09:00 - 21:00
Ordinary Exam Wed, 10 Jan 2024, 09:00 - 21:00
Reexam - submission Wed, 28 Feb 2024, 08:00 - 14:00
Reexam Thu, 21 Mar 2024, 14:00 - 18:00