IT-Universitetet i København
 
  Tilbage Kursusoversigt
Kursusbeskrivelse
Kursusnavn (dansk):Projektklynge: Talekodning og -genkendelse 
Kursusnavn (engelsk):Project Cluster: Speech Coding and Recognition 
Semester:Efterår 2005 
Udbydes under:cand.it., medieteknologi og spil (mtg) 
Omfang i ECTS:0,00 
Kursussprog:Engelsk 
Kursushjemmeside:https://learnit.itu.dk 
Min. antal deltagere:
Forventet antal deltagere:10 
Maks. antal deltagere:50 
Formelle forudsætninger:Highest level of math from highschool or equivalent. Course: Signal Processing or equivalent.
 
Læringsmål:The human sense of hearing and its ability to talk are very important means of communication, which are gaining importance for IT-systems. Having completed the project cluster the student is able to explain and apply models for production, perception and recognition of speech, necessary for the understanding, construction and performance evaluation of IT-systems, which use speech as one of the input/output media. 
Fagligt indhold:Background:

Natural and synthetic speech is becoming increasingly important in IT-systems, where it, among others, is applied in automatic information delivery systems; in reservation and information retrieval systems; in animated movies and cartoon movies; and in teleconferencing systems. Furthermore it is expected to be important in future systems for immersive telepresence connecting geographically distant sites, and networked systems for E-commerce, maintenance, and monitoring.



Contents:

Models for Speech Production: The human vocal tract. Linear prediction used for parameter estimation. Parameters for the male/female, and child voice.



Models for Speech Perception:

The human ear. Frequency analysis and pitch perception. Intensity discrimination. Time/frequency masking. Sound localization and auditory perception. The interaction between visual and auditory information.



Speech Coding, Recognition:

Speech coding using the CELP (Code Excited Linear Prediction) algorithms.

Principles of MP3 audio coding.

Speech recognition using the HMM (Hidden Markov Model) algorithms.

Noise reduction of speech.



Performance Evaluation:

Estimation of the subjective quality of a speech based system.

Future applications in Quality of Service (QoS) measures.



Demonstrations of human ear psychoacoustic properties important for coding of audio and speech.



Hands-on exercises on:

Spectral Analysis of Speech.

Speech Coding and Synthesis.

Speech Recognition.

 
Læringsaktiviteter:

Please notice! You sign up for this project cluster as if it was a normal course. But please remark:

  • enrolment in a project cluster is not considered a binding course registration (only relevant to MSc students)
  • You can register for a project cluster as a fourth activity (besides of three courses).
  • You/your group have to register your project in the Project Base before the deadline for 12 week projects, 16 week projects, 4 week projects or theses/final projects. Please find the deadlines here http://www1.itu.dk/sw923.asp
  • The project and its title you register in
    the project base, will appear on the diploma,
    when you have passed the project. (Please be
    aware that the project cluster by itself is not
    registered on the diploma).
  • Single-subject students at Open University and guest students from other universities interested in the project cluster, please contact the Student Administration Office, phone +45 72 18 52 05.
First meeting in the project cluster will be announced via the mailinglist.
 
Eksamensform og -beskrivelse:X. experimental examination form (7-scale; external exam), 13-skala, Ekstern censur

 

Litteratur udover forskningsartikler:T.F. Quatieri

Discrete-Time Speech Signal Processing

Prentice Hall, 2001.



Ted Painter, Andreas Spanias

Perceptual Coding of Digital Audio

Proceedings of IEEE, Vol. 88, No. 4, April 2000.



Sadaoki Furui

Speech Recognition Technology in the Ubiquitous/Wearable Computing Environment.

Proc. 2000 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. IV, pp. 3735 - 3738.



Ram R. Rao, Tsuhan Chen, Russell M. Mersereau

Audio-to-Visual Conversion for Multimedia Communication

IEEE Transactions on Industrial Electronics, Vol. 45, No. 1, February 1998.
 
 
Afholdelse (tid og sted)
Kurset afholdes på følgende tid og sted:
UgedagTidspunktForelæsning/ØvelserStedLokale
Torsdag 13.30-16.00 Forelæsning ITU