Official course description, subject to change:
Basic info last published 10/10-19

Large Scale Data Analysis

Course info
Language:
English
ECTS points:
7.5
Course code:
BSLASDA1KU
Participants min:
1
Participants max:
75
Offered to guest students:
yes
Offered as a single subject:
yes
Price (single subject):
10625 DKK (incl. vat)
Programme
Level:
Bachelor
Programme:
Bachelor of Science in Data Science
Staff
Course manager
Full Professor
Teacher
Research Assistant
Course semester
Semester
Forår 2020
Start
27 January 2020
End
31 August 2020
Abbreviation
20201
Exam
Exam type
ordinær
Internal/External
ingen censur
Grade Scale
bestået/ikke bestået
Exam Language
GB
Abstract

Turning the unprecedented amounts of data being collected today into useful information is well beyond the computing power of a single general purpose CPU core. It is, therefore, crucial to know and understand the methods and tools that are able to parallelize various data analysis tasks in an efficient way on multicore CPUs and on a cluster of machines.

With this goal in mind, this course first gives an overview of the popular parallel data processing platforms. Then, it dives into parallelizing various machine learning tasks.


Description
Turning the unprecedented amounts of data being collected today into useful information is well beyond the computing power of a single general purpose CPU core. It is, therefore, crucial to know and understand the methods and tools that are able to parallelize various data analysis tasks in an efficient way on multicore CPUs and on a cluster of machines. 
With this goal in mind, this course first gives an overview of the popular parallel data processing platforms. Then, it dives into parallelizing various machine learning tasks.
Intended learning outcomes

After the course, the student should be able to:

  • Select the right distributed data processing platform and the right subset of functionalities from such platforms for a given task
  • Apply machine learning and data mining in a parallel setting
  • Effectively combine different types of data analysis tasks (machine learning, traditional SQL, …) in a query
  • Reason about the performance of data processing systems in a parallel setting
Ordinary exam
Exam type:
Z. To be decided, - (7-trinsskala)
Exam variation: