IT-Universitetet i København
 
  Tilbage Kursusoversigt
Kursusbeskrivelse
Kursusnavn (dansk):DDM, Distributed Data Mining 
Kursusnavn (engelsk):DDM, Distributed Data Mining 
Semester:Efterår 2005 
Udbydes under:cand. it, softwareudvikling (swu) 
Omfang i ECTS:7,50 
Kursussprog:Engelsk 
Kursushjemmeside:https://learnit.itu.dk 
Min. antal deltagere:
Forventet antal deltagere:
Maks. antal deltagere:20 
Formelle forudsætninger: While not a formal requirement, the students are strongly recommended to take the DMI02 class before this class. The class will also emphasize knowledge of object oriented programming techniques, such as it is taught on HA.Dat. It will require distributed systems knowledge on the level in the distributed systems module at HA.Dat. The students need to have flair, interest, and strong desire to work with statistics/mathematics on the same level as the Data Mining class (DMI), and must have working knowledge of the mathematics taught in gymnasium.  
Læringsmål: The aim is to offer a course that combine data mining and distributed systems. Hence the goal is to let students work with small so-called ambient and ubiquitous systems and work with data mining techniques on these systems. Specifically, the students will work with a particular type of data mining algorithms which is applicable in distributed systems. Furthermore, the class will emphasize transaction management as taught in the Distributed Systems module on Ha.Dat.
Ultimately, the aim is to prepare the students for a successful career in the business intelligence, data mining, and mobile/web services industry, while utilizing and connecting many of the computer science classes/informatics classes taught at INF.

 
Fagligt indhold: The class is a balance between advanced data mining algorithms and small distributed systems. The students are given small networked devices and expected to create a distributed data mining system on those devices. One example could be a distributed web-mining robot. The system is open-source and the students will contribute to an open-source project while doing the terms projects. Example project: Creation of a distributed web-mining and web-server systems, that collects and data mines information from the Internet, while being accessible from either mobile or stationary computers.  
Læringsaktiviteter:

The class will be taught in a laboratory/lecture style, with at least 50% of the class time being used on group work. The class meets on a bi-weekly basis throughout the semester. Groups of two students will be formed in the class.
The number of students is limited to 12 (recommended group size 3 students) , as the class requires use of specific Java hardware, which requires time for installation, setup, and configuration. The quality of the class will suffer if the number of the number of three-person groups is above five due to the experimental nature of the class. The students will produce actual software systems, and this will require substantial per-group supervision.
E-learning: It will be natural to use Sitescape (CBS future e-learning platform) for discussion forums. In particular, the students will be asked to make their project ?electronic? such that other groups can see and evaluate the work. More traditionally, the e-learning system will be used for distribution of material to the students.

Work Required by Students
To spend time in the lab and working systematically with setting up their own open source distributed data mining system. Also, the students will be expected to create the mini-project content during the semester and present their code and programs during the semester.

Expected use of student work hours on the various activities in the course
? Homework (5 lectures of 12 hours): 60 hours
? Attending lectures (5 lectures of 6 hours): 30 hours
? Preparing presentations (2 presentations): 10 hours
? Writing mini projects (5 weeks of 20 hours):100 hours
? Preparation for individual exams (1 exam of 25 hours): 25 hours
? Total: 225 hours

Note!! CBS course - further CBS' homepage

This is a course of exchange, that only MSc-students can sign up for. Interested Master, Diploma or single subject students have to sign up at DØK's student administration office, CBS.

Please notice, that the course takes place at CBS. The exact time of day etc. might be different from the below mentioned. 

Eksamensform og -beskrivelse:X. experimental examination form (7-scale; external exam), 13-skala, Ekstern censur

 

Litteratur udover forskningsartikler: Books:
? ?An Introduction to Support Vector Machines? by Nello Christianini and John Shawe-Taylor, 189 pages
Online:
? ?Using Support Vector Machines for Distributed Machine Learning? by Rasmus Pedersen, 107 pages
? ?A Tutorial on Support Vector Machines for Pattern Recognition? by Christopher J.C. Burges, 46 pages
? ?Java Optimized Processor? by Martin Schoberl, 150 pages
? ?FPGA programming step by step? by Ed Klingman, 6 pages
? ?Wireless Development Tutorial Part I? by Jonathan Knudsen and Dana Nourie, 8 pages
? ?A First Look at Eclipse Plug-In Programming? by Koray Guclu, 10 pages
? ?Linear Algebra Tutorial? by Arizona State University, 50 pages
? ?Distributed Data Mining with the Weka Egg? by Rasmus Pedersen, 10 pages

Optional literature:
? ?Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations?
? ?Advanced Java Networking?, Second Edition by Dick Steflik and Prashant Sridharan
? ?Distributed Systems: Concepts and Design?, Edition 3 by George Coulouris, Jean Dollimore and Tim Kindberg  
 
Afholdelse (tid og sted)
Kurset afholdes på følgende tid og sted:
UgedagTidspunktForelæsning/ØvelserStedLokale
Onsdag 09.00-12.00 Forelæsning CBS The course runs in odd weeks
Onsdag 13.00-15.30 Øvelser CBS The course runs in odd weeks