Data Science in Production (Spring 2022)
Official course description:
Course info
Programme
Staff
Course semester
Exam
Abstract
This course will introduce classes of tasks that are at the core of most real-world production systems. It will teach advanced solutions to solve these tasks on complex and large-scale data with state-of-the-art tools.
Description
At the core of most IT production systems there are algorithmic solutions to problems of ranking and matching. Solving these two fundamental tasks enables a wide variety of services: getting the best list of images for a search query, getting a recommendation for the best next song to listen to, finding new friends in online social media, and much more. In this course, we will introduce advanced concepts of Information Retrieval, Recommenders Systems, and Linked Data Mining, which use a variety of advanced techniques of ranking and matching to enable complex services. We will introduce basic concepts of computational advertising.
In particular, the course will cover the following subjects:
- Recommender systems
- Content-based recommendations
- Collaborative filtering
- Dimensionality reduction
- Matrix factorization for personalization
- Information retrieval systems
- Indexing large-scale data
- Grouping and detection of near duplicates
- Ranking and weighting for relevance
- Search strategies
- Computational Advertising
- Advertisement auctions and bidding
- Advertisement matching
- Linked-data mining
- Link analysis
- Predicting the evolution of graphs
- Graph representation learning
- Metrics to evaluate the performance of ranking and matching systems
Formal prerequisites
A solid background in Python programming is required. Basics of Statistics, Linear Algebra, and Fundamentals of machine learning are strongly recommended.
Intended learning outcomes
After the course, the student should be able to:
- Design and implement a recommender system that satisfies given requirement
- Design and implement simple information retrieval systems
- Design and implement methods to extract structured information from linked data
- Discuss possible architectural solutions to address complex problems of ranking and matching
- Recommend the most appropriate techniques and metrics to evaluate the performance of a given production task
Learning activities
The course will consist of lectures and hands-on practice with
coding, mostly in Python.
The students will be presented with tasks that are
typical of IT production systems and they will be asked to reflect on them and
to propose possible solutions. These activities will be similar to those that
the students will need to complete for their exam. The students will also have
the opportunity to code some of the solutions they come up with and to submit
them as optional assignments to receive feedback.
During the lecture, there will be moments for the students to engage with each other and with the teacher through quizzes and open discussions in groups. After the lecture, the students will be invited to perform some complimentary activities including reading and watching videos that expand on the concepts discussed during the lectures.
Course literature
Some of the material included in the following books will be part of the course. These books are intended as optional reading and support material. The course will be self-contained and reading these books is not necessary to pass the exam with full grades.
- Mining massive datasets (http://www.mmds.org/)
- Modern information retrieval (https://www.amazon.it/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910)
- Recommender systems (https://www.amazon.it/Recommender-Systems-Textbook-Charu-Aggarwal/dp/3319296574)
- Graph representation learning (https://www.cs.mcgill.ca/~wlh/grl_book/)
- More literature will be published in the course page in LearnIT.
Student Activity Budget
Estimated distribution of learning activities for the typical student- Preparation for lectures and exercises: 15%
- Lectures: 25%
- Exercises: 30%
- Exam with preparation: 30%
Ordinary exam
Exam type:C: Submission of written work, External (7-point scale)
Exam variation:
C11: Submission of written work
The exam will consist of a series of tasks in the domains of recommendation systems, information retrieval, graph mining, or computational advertising. The student will be asked to write a report on how these tasks can be solved and to write code to implement the proposed solutions. The final submission will contain a written report and Python code.
reexam
Exam type:C: Submission of written work, External (7-point scale)
Exam variation:
C11: Submission of written work