Official course description:

Full info last published 15/11-22
Course info
Language:
English
ECTS points:
7.5
Course code:
BSLASDA1KU
Participants max:
95
Offered to guest students:
yes
Offered to exchange students:
yes
Offered as a single subject:
yes
Price for EU/EEA citizens (Single Subject):
10625 DKK
Programme
Level:
Bachelor
Programme:
BSc in Data Science
Staff
Course manager
Assistant Professor
Teacher
Associate Professor
Course semester
Semester
Forår 2023
Start
30 January 2023
End
25 August 2023
Exam
Exam type
ordinær
Internal/External
ekstern censur
Grade Scale
7-trinsskala
Exam Language
GB
Abstract

Turning the unprecedented amounts of data being collected today into useful information is well beyond the computing power of a single general purpose CPU core. It is, therefore, crucial to know and understand the methods and tools that are able to parallelize various data analysis tasks in an efficient way on multicore CPUs and on a cluster of machines.

With this goal in mind, this course first gives an overview of the popular parallel data processing platforms. Then, it dives into parallelizing various machine learning tasks.

Description
Turning the unprecedented amounts of data being collected today into useful information is well beyond the computing power of a single general purpose CPU core. It is, therefore, crucial to know and understand the methods and tools that are able to parallelize various data analysis tasks in an efficient way on multicore CPUs and on a cluster of machines. 
With this goal in mind, this course first gives an overview of the popular parallel data processing platforms. Then, it dives into parallelizing various machine learning tasks.
Formal prerequisites
The course is mandatory for BSc in Data Science fourth semester.
The course assumes that the students have taken an introductory course on data management or database systems.
Intended learning outcomes

After the course, the student should be able to:

  • Select the right distributed data processing platform and the right subset of functionalities from such platforms for a given task
  • Apply machine learning and data mining in a parallel setting
  • Effectively combine different types of data analysis tasks (machine learning, traditional SQL, …) in a query
  • Reason about the performance of data processing systems in a parallel setting
Learning activities

There will be 2 hour lectures a week covering the weekly topic, and 2 hour exercises a week covering coding exercises and activities that would guide students in their assignments and exam.

Three assignments will be given. Detailed description of the assignments and deadlines will be announced on the course page of LearnIT. Note that feedback will only be given on the assignments, assuming that they are handed over before the associated deadline. If the assignments are submitted late, no feedback is given.

Course literature
Large-Scale Machine Learning with Python. B.Sjardin, L.Massaron, A.Boschetti. Packt

Student Activity Budget
Estimated distribution of learning activities for the typical student
  • Lectures: 20%
  • Exercises: 20%
  • Assignments: 40%
  • Exam with preparation: 20%
Ordinary exam
Exam type:
C: Submission of written work, External (7-point scale)
Exam variation:
C11: Submission of written work
Exam submission description:
The exam will have questions covering material from the assignments and exercises (75%) and other course contents (25%).


reexam
Exam type:
C: Submission of written work, External (7-point scale)
Exam variation:
C11: Submission of written work

Time and date