Advanced Data Systems
To transform the sheer amount of complex data into timely discoveries
that influence the society, data-intensive systems (including database system and machine learning platforms) must utilize the full
processing power offered by modern servers.
In this course, you will learn how to design, implement, and evaluate new components of a production-grade open-source data-intensive system. You will learn the techniques for data management on modern hardware (multi-cores, microsecond-scale storage, and 100 GBE) and apply them with hands-on experience with the internals of an open-source system.
Computer Systems Performance class.
Intended learning outcomes
After the course, the student should be able to:
- Analyze the functional and performance requirement of a data-intensive system (database system or machine learning platform);
- Navigate the codebase of production-grade open-source software;
- Design and implement components in the context of a production-grade data system;
- Evaluate the performance characteristics of a software system.
Learning activitiesThe course is based on lectures, a seminar and assignments:
- The lectures focus on fundamental principles underlying the design and implementation of modern operating system, network system, file system and database system;
- The seminar (presentation and structured discussion of research articles) will focus on recent advances in data systems including computational storage, cross-layer design, in-network processing as well as experience report of machine learning and data systems solutions in modern data centers;
- The assignments will consist of two new software
components to be developed in the context of the OX NVMe controller
accessed from a user-space NVMe driver. The first component is a data
system task (e.g., partitioning, hashing, matrix multiplication) on the
host side, the second component is a computational storage component on
the storage side. Developments will take place on a Stingray platform
(100 GE, ARM V8, SSD) accessed from a x86 server (32 cores, 256 GB RAM).
There are 2 mandatory deliverables corresponding to the two components to be developed.
The student will receive the grade NA (not approved) at the ordinary exam, if the mandatory activities are not approved and the student will use an exam attempt.
The course literature is published in the course page in LearnIT.
Ordinary examExam type:
D: Submission of written work with following oral, internal (7-trinsskala)
D2G: Submission of written work for groups with following oral exam supplemented by the work submitted. The group has a shared responsibility for the content of the report.
The report will consist of the description of the design choices and implementation techniques used during the assignments as well as an experimental study of the developed system.
Group size: 2 persons.
Group form: Group exam
Duration of the oral exam: 20 minutes per student.