Digital Speech Processing

Објавено: June 29, 2023
1. Course Title Digital Speech Processing
2. Code 4ФЕИТ05010
3. Study program 9-VMS, 10-DPSM, 19-MV, 21-PNMI, 22-BE
4. Organizer of the study program (unit, institute, department) Faculty of Electrical Engineering and Information Technologies
5. Degree (first, second, third cycle) Second cycle
6. Academic year/semester I/1   7.    Number of ECTS credits 6.00
8. Lecturer Dr Branislav Gerazov
9. Course Prerequisites
10. Course Goals (acquired competencies):

The goal of the course program is to allow students to acquire a wide knowledge of the techniques for the analysis, synthesis and recognition of speech signals. It is designed to bring close the various approaches and applications of digital speech processing through studying the state-of-the-art.

11. Course Syllabus:

1. Fundamentals of digital audio, principles of digitisation, oversampling, jitter 2. Working with audio signals in the digital domain; quantisation, dither, noise shaping. 3. Fourier transform, Z-transform, amplitude and phase spectrum. 4. Sliding window method, short time Fourier transform (STFT), spectrograms. 5. Fundamentals of digital filters, filtering, FIR, IIR, FIR filter design. 6. LP, HP, BP, BS, and Notch filters, filterbanks, equalisation. 7. Basics of speech production, source filter model, LP analysis, VOCODER; compression of speech LP10, CELP. 8. Machine learning for speech signals, basics of ASR, feature extraction, MFCCs. 9. DTW, HMM, GММ. 10. Deep learning in ASR, NN, DNN, RNN, LSTM, CNN, transformers, end2end systems. 11. Speaker recognition, GММ, ЕМ, UBM, LLRM. 12. Speech synthesis, concatenative, articulatory and formant synthesis, parametric synthesis with HMM and NN. 13. Deep learning for speech synthesis.

12. Learning methods:

Combined learning: lectures with slides and visualisations and independent work on projects.

13. Total number of course hours 180
14. Distribution of course hours 3 + 3
15. Forms of teaching 15.1 Lectures-theoretical teaching 45 hours
15.2 Exercises (laboratory, practice classes), seminars, teamwork 45 hours
16. Other course activities 16.1 Projects, seminar papers 30 hours
16.2 Individual tasks 30 hours
16.3 Homework and self-learning 30 hours
17. Grading
17.1 Exams 0 points
17.2 Seminar work/project (presentation: written and oral) 50 points
17.3. Activity and participation 20 points
17.4. Final exam 30 points
18. Grading criteria (points) up to 50 points 5 (five) (F)
from 51 to 60 points 6 (six) (E)
from 61 to 70 points 7 (seven) (D)
from 71 to 80 points 8 (eight) (C)
from 81 to 90 points 9 (nine) (B)
from 91 to 100 points 10 (ten) (A)
19. Conditions for acquiring teacher’s signature and for taking final exam Attendance to lectures.
20. Forms of assessment Project assignment and final exam.
21. Language Macedonian and English
22. Method of monitoring of teaching quality Surveys, interviews and self-evaluation.
23. Literature
23.1.       Required Literature
No. Author Title Publisher Year
1. Lawrence R. Rabiner, Ronald W. Schafer Theory and Applications of Digital Speech Processing Pearson 2010
2. Dan Jurafsky and James H. Martin Speech and Language Processing Pearson Education 2014
23.2.       Additional Literature
No. Author Title Publisher Year
1.  Ian Goodfellow, Yoshua Bengio and Aaron Courville  Deep Learning  MIT Press  2016
2.  Lawrence Rabiner, Biing-Hwang Juang  Fundamentals of Speech Recognition  Prentice Hall  1993
3.  Uday Kamath, John Liu, James Whitaker  Deep Learning for NLP and Speech Recognition  Springer  2019