| Literature DB >> 33959083 |
Xiang Chen1, Rubing Huang2, Xin Li1, Lei Xiao1, Ming Zhou1, Linghao Zhang1.
Abstract
Emotional design is an important development trend of interaction design. Emotional design in products plays a key role in enhancing user experience and inducing user emotional resonance. In recent years, based on the user's emotional experience, the design concept of strengthening product emotional design has become a new direction for most designers to improve their design thinking. In the emotional interaction design, the machine needs to capture the user's key information in real time, recognize the user's emotional state, and use a variety of clues to finally determine the appropriate user model. Based on this background, this research uses a deep learning mechanism for more accurate and effective emotion recognition, thereby optimizing the design of the interactive system and improving the user experience. First of all, this research discusses how to use user characteristics such as speech, facial expression, video, heartbeat, etc., to make machines more accurately recognize human emotions. Through the analysis of various characteristics, the speech is selected as the experimental material. Second, a speech-based emotion recognition method is proposed. The mel-Frequency cepstral coefficient (MFCC) of the speech signal is used as the input of the improved long and short-term memory network (ILSTM). To ensure the integrity of the information and the accuracy of the output at the next moment, ILSTM makes peephole connections in the forget gate and input gate of LSTM, and adds the unit state as input data to the threshold layer. The emotional features obtained by ILSTM are input into the attention layer, and the self-attention mechanism is used to calculate the weight of each frame of speech signal. The speech features with higher weights are used to distinguish different emotions and complete the emotion recognition of the speech signal. Experiments on the EMO-DB and CASIA datasets verify the effectiveness of the model for emotion recognition. Finally, the feasibility of emotional interaction system design is discussed.Entities:
Keywords: LSTM; emotion recognition; interaction design; self-attention mechanism; speech
Year: 2021 PMID: 33959083 PMCID: PMC8093774 DOI: 10.3389/fpsyg.2021.674853
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1The relationship between user experience and interaction design.
Figure 2Interaction design framework based on emotion recognition.
Figure 3Speech recognition framework based on the method in this paper.
Figure 4The process of MFCC extraction.
Figure 5ILSTM structure diagram.
Figure 6Self-attention mechanism calculation process.
Data set introduction.
| EMO-DB | 7 | Neutral, angry, scared, happy, sad, disgusted, bored | 535 | 48 kHz |
| CASIA | 6 | Happy, sad, scared, angry, surprised, neutral | 9,600 | 16 kHz |
Experimental parameter settings.
| Number of Mel filters | 128 |
| Number of hidden layer units of LSTM/ILSTM | 128 |
| Learning rate | 0.01 |
| Speech signal | 16 bit |
Recognition rate of traditional LSTM in Emo-DB.
| Angry | 2 | 3 | 3 | 15 | 1 | 1 | |
| Bored | 0.5 | 0 | 0 | 2 | 5 | 4.5 | |
| Disgust | 15 | 11 | 1.5 | 0.5 | 5 | 0.0 | |
| Fear | 21 | 14 | 5 | 19.5 | 3.5 | 5 | |
| Happy | 54.5 | 0 | 3 | 1 | 0 | 0.5 | |
| Natural | 1 | 57 | 0 | 0.5 | 0.5 | 2.0 | |
| Sad | 1.5 | 5 | 0 | 0 | 0 | 0 |
The bold values represent the recognition rate of the adopted method.
Recognition rate of traditional LSTM in CASIA.
| Angry | 11 | 2 | 1 | 7.5 | 1.5 | |
| Happy | 8 | 5 | 3.5 | 13 | 1 | |
| Fear | 2 | 6 | 2 | 4 | 33 | |
| Calm | 1.5 | 7.5 | 10 | 2 | 2 | |
| Surprised | 8 | 7 | 2 | 3 | 1 | |
| Sad | 3 | 1.5 | 33 | 3.5 | 2 |
The bold values represent the recognition rate of the adopted method.
Recognition rate of ILSTM in Emo-DB.
| Angry | 3 | 3 | 2.5 | 17 | 0 | 0.5 | |
| Bored | 0.5 | 0 | 0 | 1 | 4 | 3.5 | |
| Disgust | 13 | 11 | 1 | 0 | 4.5 | 0.0 | |
| Fear | 20 | 16 | 5 | 19 | 4 | 5.5 | |
| Happy | 50 | 0 | 3 | 1 | 0 | 1 | |
| Natural | 1 | 49 | 0 | 0.5 | 0.5 | 5.0 | |
| Sad | 1 | 7 | 1 | 0 | 0 | 0 |
The bold values represent the recognition rate of the adopted method.
Recognition rate of this method in Emo-DB.
| Angry | 2 | 3 | 2 | 16 | 0 | 0 | |
| Bored | 0 | 0 | 0 | 1 | 4 | 2 | |
| Disgust | 13 | 10 | 1 | 0 | 5 | 0.0 | |
| Fear | 20 | 16 | 4 | 19 | 5 | 5 | |
| Happy | 49 | 1 | 3 | 1 | 0 | 0 | |
| Natural | 1 | 48 | 0 | 0 | 0 | 5.0 | |
| Sad | 0.5 | 5.5 | 1 | 0 | 0 | 0 |
The bold values represent the recognition rate of the adopted method.
Recognition rate of ILSTM in CASIA.
| Angry | 10.5 | 1 | 1 | 8 | 2 | |
| Happy | 5 | 2 | 3 | 14 | 1 | |
| Fear | 3 | 5 | 1 | 2.5 | 31.5 | |
| Calm | 2 | 8 | 9.5 | 1.5 | 4 | |
| Surprised | 7 | 6 | 3 | 2 | 0.5 | |
| Sad | 2 | 3 | 34 | 2.5 | 0 |
The bold values represent the recognition rate of the adopted method.
Recognition rate of this method in CASIA.
| Angry | 9 | 3 | 0 | 9 | 1 | |
| Happy | 4 | 5 | 2 | 11 | 0 | |
| Fear | 1.5 | 2.5 | 1 | 5 | 31 | |
| Calm | 1 | 4 | 10 | 6.5 | 1.5 | |
| Surprised | 6 | 5 | 4 | 1 | 1 | |
| Sad | 2 | 2 | 30.5 | 1.5 | 3 |
The bold values represent the recognition rate of the adopted method.