Official course description:
AbstractThe course gives an introduction and overview of data engineering techniques and practices.
A data ‘revolution’ is underway, a fourth industrial revolution, one that is already reshaping how knowledge is produced, business conducted, and governance enacted. Data has traditionally been time-consuming and costly to generate, analyze, interpret, and generally provided a relatively static and coarse snapshot of phenomena.
This state of affairs is changing now. Rather than being scarce and limited in scope, data production is increasingly becoming a ‘deluge’ i.e. a vast flow of real-time, varied, resolute, and relational data relatively low in cost.
Data is increasingly becoming open as well. This data abundance (as opposed to data scarcity) is reshaping how we work with, circulate, trade, analyze, and exploit data. This development is founded on the latest wave of information sources and communication technologies, such as collective intelligence, artificial intelligence/machine learning, big data harvested from social media and the internet, or through the internet of things. Data is produced through mobile phones, distributed and cloud computing, open-source platforms, crowdsourcing platforms, and the plethora of digital devices encountered in homes, workplaces, public spaces, and inter-worked sensors and devices.
These technical infrastructures lead to evermore aspects of everyday life – work, consumption, travel, communication, and leisure – being captured as data. Moreover, they are re-configuring the production, circulation, and interpretation of data.
The students will gain an understanding of the technical aspects of data management and the opportunities and risks they create for organizations.
During the course, the students will relate and work with the (changing) nature of database use and design, including:
• Data purpose, representation and modelling
• Data collection, retrieval, and storage
• Data Engineering and Data Processes
• Architecture of Unbundled Data Systems
Python programming is a prerequisite.
This course is part of the second semester in the bachelor's degree in Global Business Informatics.
Intended learning outcomes
After the course, the student should be able to:
- Explain the difference between the relational and non-relational data models
- Describe the architecture and components of a database system
- Design an ER model and a relational model in a concrete scenario
- Define SQL queries in a concrete scenario
- Define Python programme for batch data processing
- Design a data management process
- Explain the difference between different data sourcing methods
Lectures and exercises. The exercises will essentially be based on statistical analysis of collected crowdsourcing data and programming tasks, in Python and SQL.
The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences.
Author: Rob Kitchin, R., 1st Edition.
Database Systems: Design, Implementation, & Management.
Authors: Carlos Coronel, Steven
Morris, 13th Edition.
Plus uploaded readings to course page in LearnIT.
Student Activity BudgetEstimated distribution of learning activities for the typical student
- Preparation for lectures and exercises: 40%
- Lectures: 20%
- Exercises: 20%
- Exam with preparation: 20%
Ordinary examExam type:
C: Submission of written work, Internal (7-point scale)
C1G: Submission of written work for groups
Size of written product: Max. 20 pages
The project size should be:
1 student: max 10 standard pages
2 students: max 15 standard pages
3 students: max 20 standard pages
The exam is a case-based project.
The students will apply the learned data engineering techniques learned in the course to a case challenge.
The product should result in a written assignment for an external public or private sector business partner with key emphasis on building a database and solve a problem through selected engineering techniques.
- The assignment must be written in groups of 1-3 members, but is graded individually. It must therefore be clearly identified in the paper which student is responsible for which parts of the paper or if the students have contributed equally to all parts of the paper.
The project can be written individually.