Data Science in Production
Course info
Programme
Staff
Course semester
Exam
Abstract
This course will introduce classes of tasks that are at the core of most real-world production systems. It will teach advanced solutions to solve these tasks on complex and large-scale data with state-of-the-art tools.
Description
At the core of most IT production systems there are algorithmic solutions to problems of ranking and matching. Solving these two fundamental tasks enables a wide variety of services: getting the best list of images for a search query, getting a recommendation for the best next song to listen to, finding new friends in online social media, and much more. In this course, we will introduce advanced concepts of Information Retrieval, Recommenders Systems, Computational Advertising, and Dev-Ops tools to deploy these services at scale.
In particular, the course will cover the following subjects:
- Information retrieval systems
- Indexing large-scale data
- Ranking and weighting for relevance
- Search strategies
- Learning to Rank
- Grouping and detection of near duplicates
- Elasticsearch
- Recommender systems
- Content-based recommendations
- Collaborative filtering
- Dimensionality reduction
- Matrix factorization for personalization
- Multiarmed bandits
- Link recommendation
- Computational Advertising
- Advertisement auctions and bidding
- Advertisement matching
- A/B testing
- Dev-Ops concepts and tools
- Deployment
- Orchestration
- Dev-ops tools (e.g., Docker, Kubernetes)
- Metrics to evaluate the performance of ranking and matching systems
Formal prerequisites
A solid background in Python programming, Linear Algebra, and fundamentals of machine learning is required.
Intended learning outcomes
After the course, the student should be able to:
- Design and implement a recommender system that satisfies given requirement
- Design and implement simple information retrieval systems
- Design and implement methods to extract structured information from linked data
- Discuss possible architectural solutions to address complex problems of ranking and matching
- Recommend the most appropriate techniques and metrics to evaluate the performance of a given production task
- Design and implement software for basic deployment and orchestration of services
Ordinary exam
Exam type:C: Submission of written work, External (7-point scale)
Exam variation:
C11: Submission of written work