Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition.

Literature DB >> 33800348

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition.

Juraj Kacur¹, Boris Puterka², Jarmila Pavlovicova², Milos Oravec³.

Abstract

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions-lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0-8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.

Entities: CellLine Chemical Disease Gene Species

Keywords: LPC; cepstral features; classification; frequency scales; phases; psychoacoustic filter banks; spectrograms; speech emotions; windows

Mesh：

Year: 2021 PMID： 33800348 PMCID： PMC7962835 DOI： 10.3390/s21051888

Source DB: PubMed Journal: Sensors (Basel) ISSN： 1424-8220 Impact factor: 3.576

6 in total

Review 1. Deep learning in neural networks: an overview.

Authors: Jürgen Schmidhuber
Journal: Neural Netw Date: 2014-10-13

2. Evaluating deep learning architectures for Speech Emotion Recognition.

Authors: Haytham M Fayek; Margaret Lech; Lawrence Cavedon
Journal: Neural Netw Date: 2017-03-21

Review 3. The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology.

Authors: Jonathan Posner; James A Russell; Bradley S Peterson
Journal: Dev Psychopathol Date: 2005

4. Mental status assessment of disaster relief personnel by vocal affect display based on voice emotion recognition.

Authors: Shunji Mitsuyoshi; Mitsuteru Nakamura; Yasuhiro Omiya; Shuji Shinohara; Naoki Hagiwara; Shinichi Tokuno
Journal: Disaster Mil Med Date: 2017-04-08

5. Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network.

Authors: Wei Jiang; Zheng Wang; Jesse S Jin; Xianfeng Han; Chunguang Li
Journal: Sensors (Basel) Date: 2019-06-18 Impact factor: 3.576

6. Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.

Authors: Tursunov Anvarjon; Soonil Kwon
Journal: Sensors (Basel) Date: 2020-09-12 Impact factor: 3.576

6 in total

4 in total

1. Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.

Authors: Abeer Ali Alnuaim; Mohammed Zakariah; Aseel Alhadlaq; Chitra Shashidhar; Wesam Atef Hatamleh; Hussam Tarazi; Prashant Kumar Shukla; Rajnish Ratna
Journal: Comput Intell Neurosci Date: 2022-03-31

2. The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning.

Authors: Giovanni Costantini; Emilia Parada-Cabaleiro; Daniele Casali; Valerio Cesarini
Journal: Sensors (Basel) Date: 2022-03-23 Impact factor: 3.576

3. Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications.

Authors: Juraj Kacur; Boris Puterka; Jarmila Pavlovicova; Milos Oravec
Journal: Sensors (Basel) Date: 2022-08-22 Impact factor: 3.847

4. Global and local feature fusion via long and short-term memory mechanism for dance emotion recognition in robot.

Authors: Yin Lyu; Yang Sun
Journal: Front Neurorobot Date: 2022-08-24 Impact factor: 3.493

4 in total