IT-Universitetet i København
 
  Tilbage Kursusoversigt
Kursusbeskrivelse
Kursusnavn (dansk):Talekodning og -genkendelse 
Kursusnavn (engelsk):Speech Coding and Recognition 
Semester:Efterår 2003 
Udbydes under:cand.it., multimedieteknologi (mmt) 
Omfang i ECTS:7,50 
Kursussprog:Engelsk 
Kursushjemmeside:https://learnit.itu.dk 
Min. antal deltagere:
Forventet antal deltagere:13 
Maks. antal deltagere:50 
Formelle forudsætninger:Highest level of math from highschool or equivalent. Course: Signal Processing or equivalent. 
Læringsmål:The human sense of hearing and its ability to talk are very important means of communication, which are gaining importance for IT-systems. The course objectives are to introduce fundamental models for production, perception and recognition of speech, necessary for the understanding, construction and performance evaluation of IT-systems, which use speech as one of the input/output media. 
Fagligt indhold:Course Background:

Natural and synthetic speech is becoming increasingly important in IT-systems, where it, among others, is applied in automatic information delivery systems; in reservation and information retrieval systems; in animated movies and cartoon movies; and in teleconferencing systems. Furthermore it is expected to be important in future systems for immersive telepresence connecting geographically distant sites, and networked systems for E-commerce, maintenance, and monitoring.



Course Contents:

Models for Speech Production: The human vocal tract. Linear prediction used for parameter estimation. Parameters for the male/female, and child voice.



Models for Speech Perception:

The human ear. Frequency analysis and pitch perception. Intensity discrimination. Time/frequency masking. Sound localization and auditory perception. The interaction between visual and auditory information.



Speech Coding, Recognition:

Speech coding using the CELP (Code Excited Linear Prediction) algorithms.

Principles of MP3 audio coding.

Speech recognition using the HMM (Hidden Markov Model) algorithms.

Noise reduction of speech.



Performance Evaluation:

Estimation of the subjective quality of a speech based system.

Future applications in Quality of Service (QoS) measures.



Demonstrations of human ear psychoacoustic properties important for coding of audio and speech.



Hands-on exercises on:

Spectral Analysis of Speech.

Speech Coding and Synthesis.

Speech Recognition.
 
Læringsaktiviteter:

The course is carried out through lectures before noon and exercises in the afternoon. The exercises are carried out in Matlab. 

Eksamensform og -beskrivelse:X. experimental examination form (7-scale; external exam), 13-skala, Ekstern censur

Oral examination with 0.5 hour preparation, with access to the course material and notes. It is allowed to bring one page A4 prepared during the preparation time, into the examination room.
Grading: 13 points grading. External Censor.
The duration of the examination is approx. 20 minutes.



The speech coding and speech recognition exercises must be approved (pass/fail) prior to the oral examination. The exercises are handed in no later than the last day in the 12 weeks course period.  

Litteratur udover forskningsartikler:T.F. Quatieri

Discrete-Time Speech Signal Processing

Prentice Hall, 2001.



Ted Painter, Andreas Spanias

Perceptual Coding of Digital Audio

Proceedings of IEEE, Vol. 88, No. 4, April 2000.



Sadaoki Furui

Speech Recognition Technology in the Ubiquitous/Wearable Computing Environment.

Proc. 2000 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. IV, pp. 3735 - 3738.



Ram R. Rao, Tsuhan Chen, Russell M. Mersereau

Audio-to-Visual Conversion for Multimedia Communication

IEEE Transactions on Industrial Electronics, Vol. 45, No. 1, February 1998.