| Literature DB >> 30400224 |
Jaehun Bang1, Taeho Hur2, Dohyeong Kim3, Thien Huynh-The4, Jongwon Lee5, Yongkoo Han6, Oresti Banos7, Jee-In Kim8, Sungyoung Lee9.
Abstract
Personalized emotion recognition provides an individual training model for each target user in order to mitigate the accuracy problem when using general training models collected from multiple users. Existing personalized speech emotion recognition research has a cold-start problem that requires a large amount of emotionally-balanced data samples from the target user when creating the personalized training model. Such research is difficult to apply in real environments due to the difficulty of collecting numerous target user speech data with emotionally-balanced label samples. Therefore, we propose the Robust Personalized Emotion Recognition Framework with the Adaptive Data Boosting Algorithm to solve the cold-start problem. The proposed framework incrementally provides a customized training model for the target user by reinforcing the dataset by combining the acquired target user speech with speech from other users, followed by applying SMOTE (Synthetic Minority Over-sampling Technique)-based data augmentation. The proposed method proved to be adaptive across a small number of target user datasets and emotionally-imbalanced data environments through iterative experiments using the IEMOCAP (Interactive Emotional Dyadic Motion Capture) database.Entities:
Keywords: data augmentation; data selection; machine learning; personalization; speech emotion recognition
Mesh:
Year: 2018 PMID: 30400224 PMCID: PMC6264012 DOI: 10.3390/s18113744
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Proposed robust personalized emotion recognition framework.
Figure 2Waves of (a) before and (b) after preprocessing module in a sentence.
Feature vector scheme description.
| Categories | Statistical Values | Number of Features (100) | Description |
|---|---|---|---|
| 13 MFCC |
Mean StdDev Min Max | 52 (13 × 4) | MFCC is a coefficient, which represents audio, based on the perception of human auditory systems. MFCC has a simple calculation, anti-noise, good ability of distinction, and many other advantages. It is a commonly used feature of speech [ |
| 10 LPC | 40 (10 × 4) | LPC is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in a compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding [ | |
| Pitch | 4 | Pitch and energy are two of the most important features for determining emotion in speech. Individual’s emotional state is strongly related to pitch and energy while pitch and energy of a speech signal expressing happiness or anger is, usually, higher than those associated with sadness [ | |
| Energy | 4 |
Figure 3Insufficient data reinforcement workflow.
Figure 4Absent emotion data reinforcement workflow.
Figure 5Heuristic-based data selection workflow.
Organization of existing emotional speech database.
| Emotional Database | Total Samples | Emotions | Speakers | Avg. Samples per Person | Avg. Samples of Each Emotion per Person |
|---|---|---|---|---|---|
| Emo-DB | 535 | 7 | 10 | 53.5 | 7.6 |
| eNTERFACE | 1166 | 6 | 42 | 27 | 4.5 |
| SAVEE | 480 | 8 | 4 | 120 | 15 |
| IEMOCAP | 10,038 | 10 | 10 | 1003.8 | 100.3 |
Original IEMOCAP dataset structure.
| Emotion | Number of Samples | Rate |
|---|---|---|
| Anger | 1229 | 12.24% |
| Sadness | 1182 | 11.78% |
| Happiness | 495 | 4.93% |
| Neutral | 575 | 5.73% |
| Excited | 2505 | 24.96% |
| Surprise | 24 | 0.24% |
| Fear | 135 | 1.34% |
| Disgust | 4 | 0.03% |
| Frustration | 3830 | 38.16% |
| Other | 59 | 0.59% |
|
|
|
|
Refined IEMOCAP dataset organization.
| Emotion | Number of Samples | Rate |
|---|---|---|
| Anger | 1766 | 25.51% |
| Sadness | 1336 | 19.29% |
| Happiness | 1478 | 21.34% |
| Neutral | 2345 | 33.86% |
|
|
|
|
Figure 6Refined IEMOCAP dataset represented by each user.
Figure 7The concept of the experimental methodologies.
Experimental results for each classifier (unit %).
| Classifier | Experiment | Target User Data Samples for Training | ||||||
|---|---|---|---|---|---|---|---|---|
| 50 | 100 | 150 | 200 | 250 | 300 | |||
|
|
|
| 48.603 | |||||
|
| 0.512 | |||||||
|
| 0.478 | |||||||
|
|
| 37.245 | 42.257 | 44.752 | 47.583 | 48.823 | 50.542 | |
|
| 0.293 | 0.382 | 0.452 | 0.474 | 0.454 | 0.500 | ||
|
| 0.275 | 0.371 | 0.335 | 0.412 | 0.414 | 0.474 | ||
|
|
| 28.119 | 35.533 | 42.018 | 43.986 | 46.379 | 49.569 | |
|
| 0.313 | 0.453 | 0.454 | 0.498 | 0.510 | 0.518 | ||
|
| 0.197 | 0.297 | 0.367 | 0.390 | 0.419 | 0.415 | ||
|
|
| 35.421 | 47.069 | 49.989 | 51.736 | 53.449 | 55.108 | |
|
| 0.461 | 0.490 | 0.523 | 0.529 | 0.546 | 0.559 | ||
|
| 0.300 | 0.438 | 0.474 | 0.505 | 0.523 | 0.540 | ||
|
|
|
| 37.291 | |||||
|
| 0.3952 | |||||||
|
| 0.3574 | |||||||
|
|
| 35.178 | 37.916 | 39.784 | 40.529 | 40.707 | 40.027 | |
|
| 0.350 | 0.390 | 0.397 | 0.406 | 0.409 | 0.400 | ||
|
| 0.328 | 0.376 | 0.387 | 0.398 | 0.400 | 0.389 | ||
|
|
| 32.586 | 37.621 | 39.374 | 39.657 | 40.523 | 41.074 | |
|
| 0.390 | 0.432 | 0.425 | 0.420 | 0.436 | 0.425 | ||
|
| 0.298 | 0.386 | 0.382 | 0.390 | 0.412 | 0.407 | ||
|
|
| 36.131 | 42.268 | 47.931 | 53.294 | 56.542 | 60.589 | |
|
| 0.399 | 0.447 | 0.500 | 0.540 | 0.573 | 0.615 | ||
|
| 0.350 | 0.421 | 0.481 | 0.533 | 0.565 | 0.607 | ||
|
|
|
| 42.048 | |||||
|
| 0.462 | |||||||
|
| 0.441 | |||||||
|
|
| 40.891 | 43.342 | 44.324 | 44.834 | 44.420 | 46.329 | |
|
| 0.414 | 0.421 | 0.444 | 0.452 | 0.444 | 0.453 | ||
|
| 0.412 | 0.421 | 0.443 | 0.450 | 0.444 | 0.457 | ||
|
|
| 36.692 | 46.514 | 52.378 | 57.362 | 60.902 | 64.550 | |
|
| 0.535 | 0.570 | 0.590 | 0.620 | 0.650 | 0.669 | ||
|
| 0.435 | 0.513 | 0.556 | 0.599 | 0.632 | 0.650 | ||
|
|
| 50.925 | 55.448 | 59.302 | 62.293 | 64.722 | 67.633 | |
|
| 0.503 | 0.554 | 0.621 | 0.658 | 0.661 | 0.683 | ||
|
| 0.506 | 0.554 | 0.612 | 0.640 | 0.650 | 0.680 | ||
Average imbalance ratio for each experiment.
| Experiment | Target User Data Samples for Training | |||||
|---|---|---|---|---|---|---|
| 50 | 100 | 150 | 200 | 250 | 300 | |
| Exp. 1 | 1.755 | |||||
| Exp. 2 | 5.646 | 6.074 | 4.087 | 4.021 | 3.188 | 2.707 |
| Exp. 3 | 2.914 | 1.990 | 1.666 | 1.730 | 1.560 | 1.973 |
| Exp. 4 | 1.987 | 1.702 | 1.560 | 1.578 | 1.529 | 1.519 |
Figure 8Detailed experimental results of the random forest classifier.
Figure 9Specific experiment results—(a) 10 target user data; (b) 100 target user data; (c) 200 target user data; (d) 300 target user data.