1. | Course Title | Digital Speech Processing | |||||||||||
2. | Code | 4ФЕИТ05010 | |||||||||||
3. | Study program | 9-VMS, 10-DPSM, 19-MV, 21-PNMI, 22-BE | |||||||||||
4. | Organizer of the study program (unit, institute, department) | Faculty of Electrical Engineering and Information Technologies | |||||||||||
5. | Degree (first, second, third cycle) | Second cycle | |||||||||||
6. | Academic year/semester | I/1 | 7. | Number of ECTS credits | 6.00 | ||||||||
8. | Lecturer | Dr Branislav Gerazov | |||||||||||
9. | Course Prerequisites | ||||||||||||
10. | Course Goals (acquired competencies):
The goal of the course program is to allow students to acquire a wide knowledge of the techniques for the analysis, synthesis and recognition of speech signals. It is designed to bring close the various approaches and applications of digital speech processing through studying the state-of-the-art. |
||||||||||||
11. | Course Syllabus:
1. Fundamentals of digital audio, principles of digitisation, oversampling, jitter 2. Working with audio signals in the digital domain; quantisation, dither, noise shaping. 3. Fourier transform, Z-transform, amplitude and phase spectrum. 4. Sliding window method, short time Fourier transform (STFT), spectrograms. 5. Fundamentals of digital filters, filtering, FIR, IIR, FIR filter design. 6. LP, HP, BP, BS, and Notch filters, filterbanks, equalisation. 7. Basics of speech production, source filter model, LP analysis, VOCODER; compression of speech LP10, CELP. 8. Machine learning for speech signals, basics of ASR, feature extraction, MFCCs. 9. DTW, HMM, GММ. 10. Deep learning in ASR, NN, DNN, RNN, LSTM, CNN, transformers, end2end systems. 11. Speaker recognition, GММ, ЕМ, UBM, LLRM. 12. Speech synthesis, concatenative, articulatory and formant synthesis, parametric synthesis with HMM and NN. 13. Deep learning for speech synthesis. |
||||||||||||
12. | Learning methods:
Combined learning: lectures with slides and visualisations and independent work on projects. |
||||||||||||
13. | Total number of course hours | 180 | |||||||||||
14. | Distribution of course hours | 3 + 3 | |||||||||||
15. | Forms of teaching | 15.1 | Lectures-theoretical teaching | 45 hours | |||||||||
15.2 | Exercises (laboratory, practice classes), seminars, teamwork | 45 hours | |||||||||||
16. | Other course activities | 16.1 | Projects, seminar papers | 30 hours | |||||||||
16.2 | Individual tasks | 30 hours | |||||||||||
16.3 | Homework and self-learning | 30 hours | |||||||||||
17. | Grading | ||||||||||||
17.1 | Exams | 0 points | |||||||||||
17.2 | Seminar work/project (presentation: written and oral) | 50 points | |||||||||||
17.3. | Activity and participation | 20 points | |||||||||||
17.4. | Final exam | 30 points | |||||||||||
18. | Grading criteria (points) | up to 50 points | 5 (five) (F) | ||||||||||
from 51 to 60 points | 6 (six) (E) | ||||||||||||
from 61 to 70 points | 7 (seven) (D) | ||||||||||||
from 71 to 80 points | 8 (eight) (C) | ||||||||||||
from 81 to 90 points | 9 (nine) (B) | ||||||||||||
from 91 to 100 points | 10 (ten) (A) | ||||||||||||
19. | Conditions for acquiring teacher’s signature and for taking final exam | Attendance to lectures. | |||||||||||
20. | Forms of assessment | Project assignment and final exam. | |||||||||||
21. | Language | Macedonian and English | |||||||||||
22. | Method of monitoring of teaching quality | Surveys, interviews and self-evaluation. | |||||||||||
23. | Literature | ||||||||||||
23.1. | Required Literature | ||||||||||||
No. | Author | Title | Publisher | Year | |||||||||
1. | Lawrence R. Rabiner, Ronald W. Schafer | Theory and Applications of Digital Speech Processing | Pearson | 2010 | |||||||||
2. | Dan Jurafsky and James H. Martin | Speech and Language Processing | Pearson Education | 2014 | |||||||||
23.2. | Additional Literature | ||||||||||||
No. | Author | Title | Publisher | Year | |||||||||
1. | Ian Goodfellow, Yoshua Bengio and Aaron Courville | Deep Learning | MIT Press | 2016 | |||||||||
2. | Lawrence Rabiner, Biing-Hwang Juang | Fundamentals of Speech Recognition | Prentice Hall | 1993 | |||||||||
3. | Uday Kamath, John Liu, James Whitaker | Deep Learning for NLP and Speech Recognition | Springer | 2019 |