Projects in Data Science (Spring 2024)
Official course description:
Course info
Programme
Staff
Course semester
Exam
Abstract
This course aims to familiarize students with the pipeline for a Data Science project: from a domain-specific context and associated data we need to identify and formulate a domain-specific research question and translate it into a technical problem, which can then be addressed with techniques within Data Science. After performing the relevant data analysis, the results should be communicated in the context of the domain.
Description
The course consists of a Data Science project from start to finish, including the initial problem presentation, technical translation of the problem, some methodology decisions, implementation, evaluation, and translation of the results back into non-technical language. Through this course students will gain experience with online collaboration using platforms such as GitHub and Overleaf.
Formal prerequisites
This course combines knowledge from the first-semester courses Introduction to Data Science and Programming, Linear Algebra and Optimisation, and Foundations of Probability with knowledge that will be acquired during the second semester from the two concurrent courses.
The course is only open for students enrolled in BSc in Data Science.
Intended learning outcomes
After the course, the student should be able to:
- Identify and delimit a problem in Data Science within a given domain-specific context
- Discuss the relevant options for an appropriate scientific methodology to address the problem; this covers considerations on the data-analytical approach and on the implementational approach
- Carry out the full analysis according to the selected methodology
- Communicate their work to both experts and non-experts; this should cover the entire pipeline from problem formulation to analysis methods and their results
Learning activities
The project is associated with a lecture series on the topics of the project, exercise sessions, and project supervision. The project is an exam submission of the project report and code, followed by oral exam.
In this project you will learn to measure features in images of skin lesions, and predict the diagnosis (for example, melanoma) from these features in an automatic way. You will:
- Explore the dataset of skin lesions, filter out low quality images, etc.
- Implement methods to measure ”handcrafted” features
- Explore and transform these features
- Use the features with a machine learning classifier to predict the lesion
- diagnosis
- Perform experiments to evaluate different parts of your method
- Write a report and prepare a presentation about your findings
You will receive data and Python code to start with at the start of the project.
Mandatory activities
There will be a small mandatory assignment early in the course to ascertain that all students have the necessary computational infrastructure in place and have become sufficiently acquainted with the collaborative platforms used for the group project before embarking on the project itself. The feedback will consist of written comments on the submission on the online platform, describing which of the parts (if any) of the submission are missing.
If the students fail to hand in or the activity is not approved, they will be given the chance for a second attempt 1-2 weeks later.
The student will receive the grade NA (not approved) at the ordinary exam, if the mandatory activities are not approved and the student will use an exam attempt.
Course literature
There are no required books, relevant chapters and papers will be provided via LearnIT.
Student Activity Budget
Estimated distribution of learning activities for the typical student- Preparation for lectures and exercises: 5%
- Lectures: 15%
- Exercises: 15%
- Assignments: 10%
- Project work, supervision included: 45%
- Exam with preparation: 10%
Ordinary exam
Exam type:D: Submission of written work with following oral, Internal (7-point scale)
Exam variation:
D2G: Submission for groups with following oral exam supplemented by the submission. Shared responsibility for the report.
You must hand in:
- groupX report.pdf: A project report, written in LaTeX. A template will be provided.
- groupX code.zip: A zip file of your Github repository, with code that can reproduce your results and classify other images.
Group
- Group size: 4-5 students
- The exam starts with a maximum 10 minute presentation by the group
- We will then ask questions while the whole group present. The questions can be about your project, and general material covered during the lectures.
- Typically the questions will start with undirected questions (anybody in the group can answer), followed by directed questions (for example if a specific student did not volunteer any answer yet).
- The group leaves for a few minutes while the examiners deliberate, then we invite you back for your grade/short feedback
- More feedback will be available afterwards
15 minutes
Group exam : Joint student presentation followed by a group dialogue. All the students are present in the examination room throughout the examination.
reexam
Exam type:B: Oral exam, Internal (7-point scale)
Exam variation:
B1I: Oral exam with time for preparation. In-house.
30 minutes
Yes
30 minutes
Time and date
Ordinary Exam - submission Thu, 16 May 2024, 08:00 - 14:00Ordinary Exam Tue, 11 June 2024, 09:00 - 21:00
Ordinary Exam Wed, 12 June 2024, 09:00 - 21:00
Ordinary Exam Thu, 13 June 2024, 09:00 - 21:00
Ordinary Exam Fri, 14 June 2024, 09:00 - 21:00
Reexam Mon, 19 Aug 2024, 09:00 - 18:00
Reexam Tue, 20 Aug 2024, 09:00 - 18:00