Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.

Literature DB >> 33737889

Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.

Hua Zhang^1,2, Ruoyun Gou¹, Jili Shang¹, Fangyao Shen¹, Yifan Wu^1,3, Guojun Dai¹.

Abstract

Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.

Entities: Chemical Disease Gene Species

Keywords: attention mechanism; deep convolutional neural network; deep neural network; long short-term memory; speech emotion recognition

Year: 2021 PMID： 33737889 PMCID： PMC7962985 DOI： 10.3389/fphys.2021.643202

Source DB: PubMed Journal: Front Physiol ISSN： 1664-042X Impact factor: 4.566

2 in total

1. Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier.

Authors: Abeer Ali Alnuaim; Mohammed Zakariah; Prashant Kumar Shukla; Aseel Alhadlaq; Wesam Atef Hatamleh; Hussam Tarazi; R Sureshbabu; Rajnish Ratna
Journal: J Healthc Eng Date: 2022-03-28 Impact factor: 2.682

2. Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.

Authors: Abeer Ali Alnuaim; Mohammed Zakariah; Aseel Alhadlaq; Chitra Shashidhar; Wesam Atef Hatamleh; Hussam Tarazi; Prashant Kumar Shukla; Rajnish Ratna
Journal: Comput Intell Neurosci Date: 2022-03-31

2 in total