| Literature DB >> 35214316 |
Loan Trinh Van1, Thuy Dao Thi Le2, Thanh Le Xuan1, Eric Castelli3,4.
Abstract
The expression of emotions in human communication plays a very important role in the information that needs to be conveyed to the partner. The forms of expression of human emotions are very rich. It could be body language, facial expressions, eye contact, laughter, and tone of voice. The languages of the world's peoples are different, but even without understanding a language in communication, people can almost understand part of the message that the other partner wants to convey with emotional expressions as mentioned. Among the forms of human emotional expression, the expression of emotions through voice is perhaps the most studied. This article presents our research on speech emotion recognition using deep neural networks such as CNN, CRNN, and GRU. We used the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus for the study with four emotions: anger, happiness, sadness, and neutrality. The feature parameters used for recognition include the Mel spectral coefficients and other parameters related to the spectrum and the intensity of the speech signal. The data augmentation was used by changing the voice and adding white noise. The results show that the GRU model gave the highest average recognition accuracy of 97.47%. This result is superior to existing studies on speech emotion recognition with the IEMOCAP corpus.Entities:
Keywords: CNN; CRNN; GRU; IEMOCAP; data augmentation; emotion; recognition; speech
Mesh:
Year: 2022 PMID: 35214316 PMCID: PMC8877219 DOI: 10.3390/s22041414
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Distribution of sample number for 9 emotions.
Figure 2The waveform of a disqualified wav file.
25 parameters belonging to set S2.
| Parameters | Param # |
|---|---|
| Spectral Flatness | 1 |
| Spectral Bandwidth | 1 |
| Spectral Centroid | 1 |
| Spectral Contrast | 7 |
| Chroma | 12 |
| Pitch | 1 |
| Spectral RollOff | 1 |
| FRMS | 1 |
Figure 3Illustration of the formant translation in two cases: (a) consequence of changing female voice closer to a male voice and (b) consequence of changing male voice closer to female voice.
Figure 4Illustration of waveform and average signal/noise ratio before and after white noise addition. (a) Before adding noise. (b) After adding noise.
Figure 5Gated Recurrent Unit.
Configuration of the CNN model with five convolutional layers for 128 parameters.
| Model: “sequential” | ||
|---|---|---|
| Layer (Type) | Output Shape | Param # |
| BatchNormalization-1 | (None, 372, 128, 1) | 1488 |
| Conv2D-1 | (None, 372, 128, 64) | 640 |
| BatchNormalization-2 | (None, 372, 128, 64) | 256 |
| ELU-1 | (None, 372, 128, 64) | 0 |
| MaxPooling2D-1 | (None, 186, 64, 64) | 0 |
| Dropout-1 | (None, 186, 64, 64) | 0 |
| Conv2D-2 | (None, 186, 64, 128) | 73856 |
| BatchNormalization-3 | (None, 186, 64, 128) | 512 |
| ELU-2 | (None, 186, 64, 128) | 0 |
| MaxPooling2D-2 | (None, 93, 32, 128) | 0 |
| Dropout-2 | (None, 93, 32, 128) | 0 |
| Conv2D-3 | (None, 93, 32, 128) | 147584 |
| BatchNormalization-4 | (None, 93, 32, 128) | 512 |
| ELU-3 | (None, 93, 32, 128) | 0 |
| MaxPooling2D-3 | (None, 46, 16, 128) | 0 |
| Dropout-3 | (None, 46, 16, 128) | 0 |
| Conv2D-4 | (None, 46, 16, 128) | 147584 |
| BatchNormalization-5 | (None, 46, 16, 128) | 512 |
| ELU-4 | (None, 46, 16, 128) | 0 |
| MaxPooling2D-4 | (None, 15, 5, 128) | 0 |
| Dropout-4 | (None, 15, 5, 128) | 0 |
| Conv2D-5 | (None, 15, 5, 64) | 73792 |
| BatchNormalization-6 | (None, 15, 5, 64) | 256 |
| ELU-5 | (None, 15, 5, 64) | 0 |
| MaxPooling2D-5 | (None, 5, 1, 64) | 0 |
| Dropout-5 | (None, 5, 1, 64) | 0 |
| Flatten | (None, 320) | 0 |
| Dense-1 | (None, 128) | 41088 |
| ELU-6 | (None, 128) | 0 |
| Dropout-6 | (None, 128) | 0 |
| Dense-2 | (None, 4) | 516 |
| Total params: 488596 | ||
Configuration of the GRU model with 128 parameters.
| Model: “sequential” | ||
|---|---|---|
| Layer (Type) | Output Shape | Param # |
| BatchNormalization | (None, 372, 128) | 1488 |
| GRU-1 | (None, 372, 256) | 296448 |
| Dropout-1 | (None, 372, 256) | 0 |
| GRU-2 | (None, 512) | 1182720 |
| Dropout-2 | (None, 512) | 0 |
| Dense-1 | (None, 128) | 65664 |
| Activation | (None, 128) | 0 |
| Dropout-3 | (None, 128) | 0 |
| Dense-2 | (None, 4) | 516 |
| Total params: 1546836 | ||
Configuration of the CRNN model for 128 parameters.
| Layer (Type) | Output Shape | Param # | Connected to |
|---|---|---|---|
| Input Layer | (None, 128, 372, 1) | 0 | |
| Conv2D-1 | (None, 128, 372, 64) | 640 | Input Layer |
| Conv2D-2 | (None, 128, 372, 128) | 73856 | Conv2D-1 |
| Conv2D-3 | (None, 128, 372, 256) | 295168 | Conv2D-2 |
| BatchNormalization-3 (BN3) | (None, 128, 372, 256) | 1024 | Conv2D-3 |
| MaxPooling2D-3 (MP3) | (None, 64, 186, 256) | 0 | BN3 |
| Conv2D-4 | (None, 64, 186, 256) | 590080 | MP3 |
| Conv2D-5 | (None, 64, 186, 512) | 1180160 | Conv2D-4 |
| BatchNormalization-5(BN5) | (None, 64, 186, 512) | 2048 | Conv2D-5 |
| MaxPooling2D-5 (MP5) | (None, 32, 93, 512) | 0 | BN5 |
| Conv2D-6 | (None, 32, 93, 512) | 2359808 | MP5 |
| Conv2D-7 | (None, 32, 93, 512) | 2359808 | Conv2D-6 |
| BatchNormalization-7(BN7) | (None, 32, 93, 512) | 2048 | Conv2D-7 |
| Reshape | (None, 32, 47616) | 0 | BN7 |
| Dense (Fc9) | (None, 32, 128) | 6094976 | Reshape |
| LSTM1 | (None, 32, 128) | 131584 | Fc9 |
| LSTM2 | (None, 32, 128) | 131584 | Fc9 |
| Add | (None, 32, 128) | 0 | LSTM1 |
| LSTM2 | |||
| LSTM3 | (None, 32, 128) | 131584 | Add |
| LSTM4 | (None, 32, 128) | 131584 | Add |
| Concatenate | (None, 32, 256) | 0 | LSTM3 |
| LSTM4 | |||
| Dropout1 | (None, 32, 256) | 0 | Concatenate |
| Flatten | (None, 8192) | 0 | Dropout1 |
| Dense1 | (None, 512) | 4194816 | Flatten |
| Dropout2 | (None, 512) | 0 | Dense1 |
| Dense2 | (None, 4) | 2052 | Dropout2 |
| Total params: 17682820 | |||
Average recognition accuracy for CNN, CRNN, and GRU models.
| Folds | CNN | CRNN | GRU | |||
|---|---|---|---|---|---|---|
| S1 | S2 | S1 | S2 | S1 | S2 | |
|
| 96.78 | 96.83 | 96.94 | 94.72 | 95.04 | 97.57 |
|
| 96.57 | 97.04 | 96.46 | 97.20 | 94.30 | 96.83 |
|
| 96.73 | 97.63 | 96.09 | 96.31 | 96.20 | 97.84 |
|
| 96.25 | 96.83 | 98.10 | 95.46 | 95.25 | 96.94 |
|
| 96.57 | 97.47 | 95.46 | 98.31 | 95.73 | 97.57 |
|
| 95.62 | 96.99 | 97.68 | 95.83 | 95.36 | 97.52 |
|
| 95.73 | 96.52 | 98.05 | 96.41 | 96.41 | 97.94 |
|
| 96.57 | 96.09 | 98.94 | 98.42 | 95.46 | 97.73 |
|
| 96.15 | 97.20 | 96.89 | 97.94 | 96.04 | 97.31 |
|
| 96.33 |
|
| 96.73 | 95.53 |
|
Precision, recall, f1-score, and AUC for CNN model with 153 parameters.
| Folds | Precision | Recall | f1-score | AUC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ||
|
| 96.09 | 97.39 | 98.54 | 95.95 | 97.21 | 92.56 | 99.75 | 97.41 | 96.65 | 94.91 | 99.14 | 96.67 | 0.997 |
|
| 96.53 | 97.91 | 99.26 | 95.54 | 96.98 | 92.80 | 99.75 | 98.02 | 96.75 | 95.29 | 99.51 | 96.76 | 0.997 |
|
| 97.25 | 97.46 | 98.78 | 97.25 | 98.84 | 95.29 | 99.75 | 96.95 | 98.04 | 96.36 | 99.26 | 97.10 | 0.998 |
|
| 96.98 | 97.12 | 99.27 | 95.10 | 97.21 | 92.06 | 100.0 | 97.56 | 97.10 | 94.52 | 99.63 | 96.31 | 0.997 |
|
| 97.24 | 98.69 | 99.02 | 95.98 | 98.14 | 93.30 | 99.75 | 98.17 | 97.69 | 95.92 | 99.39 | 97.06 | 0.997 |
|
| 95.50 | 98.40 | 98.31 | 96.39 | 98.60 | 91.32 | 100.0 | 97.56 | 97.03 | 94.72 | 99.15 | 96.97 | 0.997 |
|
| 96.74 | 95.44 | 98.54 | 95.76 | 96.51 | 93.55 | 99.51 | 96.49 | 96.62 | 94.49 | 99.02 | 96.13 | 0.997 |
|
| 96.29 | 96.78 | 99.02 | 93.84 | 96.51 | 89.58 | 99.75 | 97.56 | 96.40 | 93.04 | 99.39 | 95.67 | 0.996 |
|
| 97.24 | 97.15 | 98.31 | 96.52 | 98.37 | 93.05 | 100.0 | 97.26 | 97.80 | 95.06 | 99.15 | 96.89 | 0.995 |
|
|
| 97.37 |
| 95.81 | 97.60 |
|
| 97.44 | 97.12 |
|
| 96.62 | 0.997 |
|
| 96.87 | 96.99 | |||||||||||
Precision, recall, f1-score, and AUC for CRNN model with 153 parameters.
| Folds | Precision | Recall | f1-score | AUC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ||
|
| 96.19 | 93.83 | 95.51 | 93.82 | 93.95 | 90.57 | 99.51 | 94.82 | 95.06 | 92.17 | 97.47 | 94.31 | 0.995 |
|
| 96.51 | 96.98 | 99.02 | 96.67 | 96.51 | 95.78 | 99.26 | 97.26 | 96.51 | 96.38 | 99.14 | 96.96 | 0.999 |
|
| 97.38 | 97.14 | 97.34 | 94.53 | 95.12 | 92.80 | 99.26 | 97.41 | 96.24 | 94.92 | 98.29 | 95.95 | 0.996 |
|
| 95.33 | 97.05 | 98.05 | 93.12 | 94.88 | 89.83 | 99.26 | 96.95 | 95.10 | 93.30 | 98.65 | 95.00 | 0.996 |
|
| 99.53 | 97.75 | 98.30 | 97.88 | 98.14 | 97.02 | 99.75 | 98.32 | 98.83 | 97.38 | 99.02 | 98.10 | 0.999 |
|
| 96.44 | 96.58 | 98.30 | 93.55 | 94.42 | 91.07 | 99.75 | 97.26 | 95.42 | 93.74 | 99.02 | 95.37 | 0.994 |
|
| 96.69 | 95.90 | 98.78 | 95.10 | 95.12 | 92.80 | 99.51 | 97.56 | 95.90 | 94.33 | 99.14 | 96.31 | 0.997 |
|
| 98.83 | 97.77 | 99.27 | 98.02 | 97.91 | 97.77 | 100.0 | 98.17 | 98.36 | 97.77 | 99.63 | 98.10 | 0.999 |
|
| 97.48 | 97.95 | 99.02 | 97.58 | 98.84 | 94.79 | 99.51 | 98.32 | 98.15 | 96.34 | 99.26 | 97.95 | 0.998 |
|
| 97.15 |
|
|
| 96.1 |
|
| 97.34 | 96.62 |
|
| 96.45 | 0.997 |
|
| 96.64 | 96.77 | |||||||||||
Precision, recall, f1-score, and AUC for GRU model with 153 parameters.
| Folds | Precision | Recall | f1-score | AUC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ||
|
| 97.42 | 97.69 | 100.0 | 96.16 | 96.51 | 94.54 | 99.01 | 99.24 | 96.96 | 96.09 | 99.50 | 97.67 | 0.995 |
|
| 98.10 | 94.58 | 98.29 | 96.51 | 95.81 | 95.29 | 99.26 | 96.95 | 96.94 | 94.93 | 98.77 | 96.73 | 0.999 |
|
| 98.11 | 96.51 | 98.78 | 97.88 | 96.51 | 96.03 | 100.0 | 98.48 | 97.30 | 96.27 | 99.39 | 98.18 | 0.996 |
|
| 96.90 | 94.53 | 99.02 | 97.15 | 94.65 | 94.29 | 99.26 | 98.63 | 95.76 | 94.41 | 99.14 | 97.88 | 0.996 |
|
| 99.27 | 94.66 | 99.51 | 97.14 | 95.35 | 96.77 | 99.51 | 98.32 | 97.27 | 95.71 | 99.51 | 97.73 | 0.999 |
|
| 96.96 | 96.24 | 99.75 | 97.29 | 96.51 | 95.29 | 99.51 | 98.32 | 96.74 | 95.76 | 99.63 | 97.80 | 0.994 |
|
| 98.35 | 95.89 | 99.50 | 98.02 | 96.74 | 98.51 | 98.77 | 97.87 | 97.54 | 97.18 | 99.13 | 97.94 | 0.997 |
|
| 98.35 | 95.58 | 99.51 | 97.57 | 96.74 | 96.53 | 99.75 | 97.87 | 97.54 | 96.05 | 99.63 | 97.72 | 0.999 |
|
| 98.81 | 96.46 | 99.26 | 95.69 | 96.51 | 94.54 | 99.51 | 98.17 | 97.65 | 95.49 | 99.38 | 96.91 | 0.998 |
|
| 98.03 |
|
| 97.05 | 96.15 |
|
| 98.21 | 97.08 |
|
| 97.62 | 0.997 |
|
| 97.38 | 97.45 | |||||||||||
Precision, recall, f1-score, and AUC for CNN model with 128 parameters.
| Folds | Precision | Recall | f1-score | AUC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ||
|
| 96.15 | 96.16 | 97.83 | 96.91 | 98.6 | 93.3 | 100.0 | 95.73 | 97.36 | 94.71 | 98.90 | 96.32 | 0.998 |
|
| 95.66 | 98.40 | 97.13 | 95.78 | 97.44 | 91.56 | 100.0 | 96.95 | 96.54 | 94.86 | 98.54 | 96.36 | 0.997 |
|
| 96.36 | 95.65 | 98.78 | 96.33 | 98.37 | 92.80 | 100.0 | 96.04 | 97.35 | 94.21 | 99.39 | 96.18 | 0.997 |
|
| 96.99 | 95.40 | 97.36 | 95.58 | 97.44 | 92.56 | 99.75 | 95.58 | 97.22 | 93.95 | 98.54 | 95.58 | 0.997 |
|
| 96.53 | 95.91 | 98.30 | 95.91 | 96.98 | 93.05 | 99.75 | 96.49 | 96.75 | 94.46 | 99.02 | 96.20 | 0.997 |
|
| 93.74 | 96.77 | 97.36 | 95.15 | 97.44 | 89.08 | 100.0 | 95.73 | 95.55 | 92.76 | 98.66 | 95.44 | 0.996 |
|
| 97.38 | 96.51 | 97.83 | 93.01 | 95.12 | 89.33 | 100.0 | 97.41 | 96.24 | 92.78 | 98.90 | 95.16 | 0.995 |
|
| 97.18 | 96.39 | 98.78 | 94.94 | 96.28 | 92.8 | 99.51 | 97.26 | 96.73 | 94.56 | 99.14 | 96.08 | 0.998 |
|
| 96.96 | 97.60 | 98.06 | 93.69 | 96.28 | 90.82 | 99.51 | 97.26 | 96.62 | 94.09 | 98.78 | 95.44 | 0.995 |
|
|
| 96.53 |
| 95.26 | 97.11 |
|
| 96.49 | 96.71 |
|
| 95.86 | 0.997 |
|
| 96.29 | 96.37 | |||||||||||
Precision, recall, f1-score, and AUC for CRNN model with 128 parameters.
| Folds | Precision | Recall | f1-score | AUC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ||
|
| 97.18 | 97.69 | 97.82 | 95.81 | 96.28 | 94.29 | 99.26 | 97.56 | 96.73 | 95.96 | 98.53 | 96.68 | 0.998 |
|
| 97.16 | 97.12 | 97.82 | 94.84 | 95.58 | 92.06 | 99.26 | 98.02 | 96.37 | 94.52 | 98.53 | 96.40 | 0.997 |
|
| 96.24 | 96.92 | 96.63 | 95.19 | 95.35 | 93.55 | 98.77 | 96.49 | 95.79 | 95.2 | 97.69 | 95.84 | 0.998 |
|
| 97.45 | 97.97 | 99.02 | 98.03 | 97.67 | 96.03 | 99.51 | 98.78 | 97.56 | 96.99 | 99.26 | 98.41 | 0.999 |
|
| 94.87 | 93.52 | 97.81 | 95.57 | 94.65 | 93.05 | 99.01 | 95.27 | 94.76 | 93.28 | 98.41 | 95.42 | 0.995 |
|
| 98.12 | 98.71 | 98.06 | 96.56 | 97.21 | 95.04 | 99.51 | 98.48 | 97.66 | 96.84 | 98.78 | 97.51 | 0.999 |
|
| 98.58 | 97.99 | 99.02 | 97.14 | 96.98 | 97.02 | 99.75 | 98.32 | 97.77 | 97.51 | 99.39 | 97.73 | 0.999 |
|
| 98.84 | 100.0 | 99.02 | 98.34 | 99.07 | 97.27 | 100.0 | 99.24 | 98.95 | 98.62 | 99.51 | 98.79 | 1.000 |
|
| 97.67 | 98.15 | 98.77 | 94.57 | 97.67 | 92.31 | 98.52 | 98.17 | 97.67 | 95.14 | 98.64 | 96.34 | 0.996 |
|
|
| 97.56 |
| 96.23 | 96.72 |
|
| 97.81 | 97.03 |
|
| 97.01 | 0.998 |
|
| 97.08 | 97.2 | |||||||||||
Precision, recall, f1-score, and AUC for GRU model with 128 parameters.
| Folds | Precision | Recall | f1-score | AUC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ||
|
| 94.78 | 93.47 | 98.53 | 93.98 | 97.21 | 88.83 | 98.77 | 95.12 | 95.98 | 91.09 | 98.65 | 94.55 | 0.994 |
|
| 95.09 | 91.84 | 98.78 | 92.47 | 94.65 | 86.60 | 99.75 | 95.43 | 94.87 | 89.14 | 99.26 | 93.92 | 0.993 |
|
| 97.20 | 94.18 | 98.54 | 95.31 | 96.98 | 92.31 | 99.51 | 96.04 | 97.09 | 93.23 | 99.02 | 95.67 | 0.996 |
|
| 95.40 | 94.16 | 98.28 | 93.93 | 96.51 | 88.09 | 98.77 | 96.65 | 95.95 | 91.03 | 98.53 | 95.27 | 0.991 |
|
| 94.97 | 94.29 | 99.26 | 94.89 | 96.51 | 90.07 | 99.51 | 96.34 | 95.73 | 92.13 | 99.38 | 95.61 | 0.994 |
|
| 95.18 | 94.01 | 98.53 | 94.31 | 96.51 | 89.58 | 98.77 | 96.04 | 95.84 | 91.74 | 98.65 | 95.17 | 0.995 |
|
| 96.08 | 95.31 | 99.26 | 95.53 | 96.98 | 90.82 | 99.26 | 97.71 | 96.53 | 93.01 | 99.26 | 96.61 | 0.996 |
|
| 94.12 | 92.73 | 99.75 | 95.40 | 96.74 | 91.81 | 98.77 | 94.82 | 95.41 | 92.27 | 99.26 | 95.11 | 0.994 |
|
| 96.54 | 94.78 | 98.54 | 94.90 | 97.44 | 90.07 | 99.75 | 96.49 | 96.99 | 92.37 | 99.14 | 95.69 | 0.992 |
|
| 95.48 |
|
| 94.52 | 96.61 |
|
| 96.07 | 96.04 |
|
| 95.29 | 0.994 |
|
| 95.42 | 95.53 | |||||||||||
Recapitulation of the average values of accuracy, precision, recall, f1-score, and AUC for 3 models with 128 (S1 set) and 153 (S2 set) feature parameters.
| Accuracy | Precision | Recall | f1-score | AUC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Highest | Lowest | Highest | Lowest | Highest | Lowest | Highest | Lowest | Highest | Lowest | |
| S1 | 97.18 | 95.53 | 97.34 | 95.67 | 97.08 | 95.42 | 97.2 | 95.53 | 0.998 | 0.994 |
| S2 | 97.47 | 96.96 CNN | 97.54 | 96.92 | 97.38 | 96.64 | 97.45 | 96.77 | 0.9973 | |
Figure 6Correlation between 25 parameters and accuracy recognition. specc: spectral centroid, specr: spectral rolloff, specf: spectral flatness, specb: spectral bandwidth, spec0-6: spectral contrast, chro0-11: chroma, frms: root-mean-square (RMS) value calculated for each frame.
Figure 7(a,d,g) Examples of loss and accuracy variations according to the epoch for training and validation for a fold; (b,e,h) ROC (class 0: anger, class 1: exc, class 2: sad, class 3: neu.); and (c,f,i) confusion matrix.
Accuracy, AUC and precision, recall, f1-score for each emotion for GRU model using 153 parameters without data augmentation.
| Folds | Accuracy | Precision | Recall | f1-score | AUC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | ANG | EXC | SAD | NEU | |||
|
| 73.68 | 70.87 | 63.51 | 78.29 | 76.33 | 67.59 | 46.53 | 99.02 | 78.66 | 69.19 | 53.71 | 87.45 | 77.48 | 0.919 |
|
| 76.84 | 80.21 | 70.15 | 92.31 | 69.71 | 71.3 | 46.53 | 94.12 | 88.41 | 75.49 | 55.95 | 93.2 | 77.96 | 0.926 |
|
| 76.21 | 78 | 62.2 | 92.23 | 72.63 | 72.22 | 50.5 | 93.14 | 84.15 | 75 | 55.74 | 92.68 | 77.97 | 0.916 |
|
| 76.42 | 75.93 | 61.36 | 91.43 | 75.29 | 75.93 | 53.47 | 94.12 | 79.88 | 75.93 | 57.14 | 92.75 | 77.51 | 0.918 |
|
| 75.79 | 78.43 | 72.92 | 92.31 | 67.42 | 74.07 | 34.65 | 94.12 | 90.85 | 76.19 | 46.98 | 93.2 | 77.4 | 0.921 |
|
| 76.21 | 83.33 | 60.71 | 91.59 | 71.13 | 69.44 | 50.5 | 96.08 | 84.15 | 75.76 | 55.14 | 93.78 | 77.09 | 0.936 |
|
| 77.89 | 78.5 | 62.07 | 96.04 | 75 | 77.78 | 53.47 | 95.1 | 82.32 | 78.14 | 57.45 | 95.57 | 78.49 | 0.927 |
|
| 75.79 | 80.65 | 59.3 | 91.51 | 72.11 | 69.44 | 50.5 | 95.1 | 83.54 | 74.63 | 54.55 | 93.27 | 77.4 | 0.923 |
|
| 73.68 | 69.17 | 58.46 | 87.27 | 73.89 | 76.85 | 37.62 | 94.12 | 81.1 | 72.81 | 45.78 | 90.57 | 77.33 | 0.913 |
|
|
|
| 63.41 |
| 72.61 | 72.74 |
|
| 83.67 | 74.79 |
|
| 77.63 | 0.922 |
| 75.9 | 74.62 | 74.63 | ||||||||||||
Summary of research results on speech emotion recognition with IEMOCAP and for four emotions.
| Ref. | Year | Model | Parameters | Average Accuracy (%) |
|---|---|---|---|---|
| [ | 2021 | Acoustic Segment Model (ASM), DNN | Latent Semantic Analysis (LSA) with HMM for ASM | 73.90 |
| [ | 2021 | PATHOSnet (Parallel, Audio-Textual, Hybrid Organisation for emotionS network) | Linguistic features + spectrogram | 80.40 |
| [ | 2021 | SSA-CRNN-r (Self Speaker Attentive Convolutional Neural Network-regularization) | 3-D Log-Mel spectrograms (with delta and delta-deltas) | 95.90 |
| [ | 2021 | FaceNet | Spectrogram | 68.96 |
| [ | 2020 | CRNN deep learning model based on Focal Loss | Spectrogram | 69.33 |
| [ | 2020 | Deep stride convolutional neural network (DSCNN) | Spectrogram | 81.75 |
| [ | 2020 | Combination of DCNN with a SincNet layer, RNN | Combined acoustic and textual data | 80.51 |
| [ | 2020 | Hybrid architecture: DenseBlock + LSTM | Spectrogram | 64.10 |
| [ | 2020 | DCNN | Spectrogram | 83.80 |
| [ | 2020 | Attention-based Convolutional Neural Networks (ACNN) | MFCC | 76.18 |
| [ | 2020 | LSTM | Log-spectra of short-time Fourier transforms (STFTs) | 58.80 |
| [ | 2020 | CNN | Spectrograms and MFCC (Mel-Frequency Spectral Coefficients) | 74.30 |
| [ | 2020 | Meta Multi-task Learning (MMTL), Meta Learner (CNN+LSTM) + Transfer Learner (Fully Connected Layer) | Spectrogram | 76.64 |
| [ | 2020 | LSTM | MelSpectrogram | 73.00 |
| [ | 2020 | 1D convolutions + Bi-LSTM | Both audio and text information | 72.82 |
| [ | 2020 | CNN, LSTM | 3D log-Mel spectrograms | 80.80 |
| [ | 2020 | DCNN | MFCC, chromagram, Mel-scale spectrogram, tonnetz, spectral contrast | 64.30 |
| [ | 2020 | Bidirectional LSTM | Spectrogram | 71.70 |
| [ | 2019 | Interaction-Aware Attention Network (IAAN) | MFCC, pitch | 66.30 |
| [ | 2019 | DFF-ATMF (Deep Feature Fusion-Audio and Text Modality Fusion), LSTM | MFCC, spectral centroid, chroma stft, spectral contrast (Audio modality + text modality) | 81.37 |
| [ | 2019 | CNN | Text & MFCC | 76.10 |
| [ | 2019 | Emoception Network drawinginspiration from Inception Network | MFSC (Mel-Frequency Spectral Coefficients) | 75.90 |
| [ | 2019 | Multi-head Self-attention+ Global Context-aware Attention Long Short-Term Memory recurrent neutral network (GCA-LSTM) | MFCC, F0, energy | 79.20 |
| [ | 2018 | LSTM, CNN | MFCC, zero-crossing rate, short-term energy, short-term entropy of energy, spectral centroid and spread, spectral entropy, spectral flux, spectral rolloff | 62.72 |
| [ | 2018 | Multi-channel CNN | Phoneme & Spectrogram | 73.90 |
| [ | 2018 | Fully convolutional network (FCN) + Attention layer | Spectrogram | 70.40 |
| [ | 2017 | CNN, Combined CNN, and LSTM | Spectrogram | 68.00 |
| [ | 2014 | SVM | Low-level acoustic features and derivation, cepstral-based features, GMM supervectors | 71.90 |
| Our Research | 2021 | GRU, CRNN, CNN | 153 parameters |
|