| Literature DB >> 35602743 |
Xu Wu1, Qian Zhang2.
Abstract
The rapid development of computer technology and artificial intelligence is affecting people's daily lives, where language is the most common way of communication in people's daily life. To apply the emotion information contained in voice signals to artificial intelligence products after analysis, this article proposes a design based on voice emotion recognition for aging intelligent home products with RBF. The authors first aimed at a smart home design, and based on the problem of weak adaptability and learning ability of the aging population, a speech emotion recognition method based on a hybrid model of Hidden Markov/Radial Basis Function Neural Network (HMM/RBF) is proposed. This method combines the strong dynamic timing modeling capabilities of the HMM model and the strong classification decision-making ability of the RBF model, and by combining the two models, the speech emotion recognition rate is greatly improved. Furthermore, by introducing the concept of the dynamic optimal learning rate, the convergence speed of the network is reduced to 40.25s and the operation efficiency is optimized. Matlab's simulation tests show that the recognition speed of the HMM/RBF hybrid model is 9.82-12.28% higher than that of the HMM model and the RBF model alone, confirming the accuracy and superiority of the algorithm and model.Entities:
Keywords: HMM; RBF; aging users; artificial intelligence; dynamic optimal learning rate; speech emotion recognition
Year: 2022 PMID: 35602743 PMCID: PMC9114816 DOI: 10.3389/fpsyg.2022.882709
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
FIGURE 1Schematic diagram of the artificial intelligence interaction process.
Definitions of basic emotions.
| Scholar | Basic emotion |
| Plutchik | Acceptance, anger, anticipation, disgust, joy, fear, sadness, surprise |
| Arnold | Anger, aversion, courage, dejection, desire, despair, hate, hope, love, sadness |
| Ekman | Anger, disgust, fear, joy, sadness, surprise |
| Frijda | Desire, happiness, interest, surprise, wonder, sorrow |
| Gray | Rage, terror, anxiety, joy |
| Izard | Anger, contempt, disgust, distress, fear, guilt, interest, joy, shame, surprise |
| James | Fear, grief, love, rage |
FIGURE 2The process of speech signal preprocessing.
FIGURE 3Windowing of speech signals.
FIGURE 4The pitch frequency of anger emotion.
FIGURE 5Pitch frequency of happy emotion.
Formant frequency and bandwidth values.
| Formant frequency Fi | 610.97 | 2555.72 | 4645.03 | 5958.66 |
| Bandwidth Bi | 595.35 | 698.61 | 583.90 | 594.35 |
FIGURE 6Power spectrum curve of channel transfer function.
FIGURE 7Mel frequency cepstral coefficient extraction process.
FIGURE 8Schematic diagram of the composition of the HMM model.
FIGURE 9Mixed model neural network training process.
HMM-RBF hybrid model algorithm.
| len = length(x); % Calculate vector length |
| max_x = max(x); % Calculate the maximum value of the vector |
| min_x = min(x); % Calculate vector minimum |
| for |
| y(i) = (x(i)-min_x)/(max_x-min_x); % normalization processing, [0, 1] numerical interval |
| End |
FIGURE 10Comparison of recognition rates of three algorithms.
FIGURE 11Average recognition rate of three algorithms under different signal-to-noise ratios.
Convergence speed comparison.
| Learning target | Learning rate | Operation hours | Number of iterations | Recognition rate | |
| HMM/RBF model with fixed learning rate (learning rate 0.01) | 0.01 | 0.01 | 78.56s | 1985 | 90.98% |
| HMM/RBF model with fixed learning rate (learning rate 0.02) | 0.01 | 0.02 | 66.45s | 1123 | 90.98% |
| HMM/RBF model with optimal learning rate | 0.01 | Dynamic optimization | 40.25s | 457 | 90.98% |