| Literature DB >> 31341467 |
Vlado Delić1, Zoran Perić2, Milan Sečujski1, Nikša Jakovljević1, Jelena Nikolić2, Dragiša Mišković1, Nikola Simić2, Siniša Suzić1, Tijana Delić1.
Abstract
Speech technologies have been developed for decades as a typical signal processing area, while the last decade has brought a huge progress based on new machine learning paradigms. Owing not only to their intrinsic complexity but also to their relation with cognitive sciences, speech technologies are now viewed as a prime example of interdisciplinary knowledge area. This review article on speech signal analysis and processing, corresponding machine learning algorithms, and applied computational intelligence aims to give an insight into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems. Additionally, the article discusses the concepts and recent advances in speech signal compression, coding, and transmission, including cognitive speech coding. To conclude, the main intention of this article is to highlight recent achievements and challenges based on new machine learning paradigms that, over the last decade, had an immense impact in the field of speech signal processing.Entities:
Year: 2019 PMID: 31341467 PMCID: PMC6614991 DOI: 10.1155/2019/4368036
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Interdisciplinary nature of speech technologies, i.e., spoken language processing (adopted from [2]).
Figure 2Unified framework that encompasses speech signal processing fields in the scope of the article.
Figure 3Block diagram of speech production and speech perception and corresponding processes performed by machines carrying out text-to-speech synthesis (TTS) and automatic speech recognition (ASR).
Figure 4Components of a human-machine speech dialogue system.
Figure 5Speech signal quality according to MOS versus bit rate for various speech signal coding techniques.
Figure 6Forward adaptive PCM: (a) encoder; (b) decoder.
Figure 7One of the realizations of backward adaptive PCM with one codeword memory: (a) encoder; (b) decoder.
Figure 8Dual mode quantization scheme: (a) encoder; (b) decoder.
Figure 9DPCM: (a) encoder; (b) decoder.