Official course description, subject to change:
In this course, you will learn how to analyze and improve the performance of a computer system with a specific focus on data-intensive systems.
To investigate a system’s performance and come up with solutions to improve it, there are three key things to know.
- How to devise a methodology to design experiments so that you can investigate the system’s performance,
- The toolset to run the necessary experiments and collect the results, and
- At least a basic understanding of the systems layers (hardware, operating systems, etc.) to be able to interpret your results and come up with solutions.
This is what we will cover in this course.
In this class, students will learn how to design and conduct performance analysis experiments and how to troubleshoot existing complex data-intensive systems.
We live in a world that requires near-instant response times and increasingly faster access to very large volumes of business-critical data. Data scientists expect high-performance from their data systems in order to reduce time to insight. Software and DevOps engineers are expected to continuously improve the performance of IT systems.
Sub-optimal performance means that you are paying a higher cost for something than you should. This cost could be money, time, energy, carbon footprint …, and usually these costs are all inter-linked.
To achieve sustainable management and growth of data-intensive systems (e.g., database management systems, big data processing systems, machine learning systems), we must utilize the available computer systems resources (e.g., hardware) well and avoid sub-optimal performance.
Performance analysis of
computer systems can
(1) uncover the effects of design or implementation bugs leading to such sub-optimal performance and
(2) help characterize the needs and behavior of different types of systems helping us target more effective performance optimizations.
This is what we will be learning in this course.
It would be good for the students to have taken introductory courses on database systems and operating systems, or something similar. Similarly, having some familiarity with command line interface and C/CPP would help.
Intended learning outcomes
After the course, the student should be able to:
- Design and conduct performance evaluation experiments
- Formulate hypothesis about the causes of poor performance across different layers of a data system’s stack (i.e., data management components, operating system, file system, network, hardware).
- Select appropriate set of tools for troubleshooting performance problems.
- Analyze the performance of a complex real-world data system.
Ordinary examExam type:
D: Submission of written work with following oral, External (7-point scale)
D2G: Submission for groups with following oral exam supplemented by the submission. Shared responsibility for the report.