| Literature DB >> 34209251 |
Vlad Pandelea1, Edoardo Ragusa2, Tommaso Apicella2, Paolo Gastaldo2, Erik Cambria1.
Abstract
Emotion recognition, among other natural language processing tasks, has greatly benefited from the use of large transformer models. Deploying these models on resource-constrained devices, however, is a major challenge due to their computational cost. In this paper, we show that the combination of large transformers, as high-quality feature extractors, and simple hardware-friendly classifiers based on linear separators can achieve competitive performance while allowing real-time inference and fast training. Various solutions including batch and Online Sequential Learning are analyzed. Additionally, our experiments show that latency and performance can be further improved via dimensionality reduction and pre-training, respectively. The resulting system is implemented on two types of edge device, namely an edge accelerator and two smartphones.Entities:
Keywords: deep learning; embedded systems; emotion recognition
Mesh:
Year: 2021 PMID: 34209251 PMCID: PMC8271649 DOI: 10.3390/s21134496
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Pipeline of the system.
Emotion distribution in MELD. Data splits as reported in [58].
| Train | Validation | Test | |
|---|---|---|---|
| Neutral | 4710 | 470 | 1256 |
| Joy | 1743 | 163 | 402 |
| Surprise | 1205 | 150 | 281 |
| Anger | 1109 | 153 | 345 |
| Sadness | 683 | 111 | 208 |
| Disgust | 271 | 22 | 68 |
| Fear | 268 | 40 | 50 |
Sample utterances from the dataset with the real label in the second column, and the predicted label in the third column.
| Utterance | Real Label | Prediction |
|---|---|---|
| Ohh, that’s a good one. | Joy | Joy |
| Someone on the subway licked my neck! Licked my neck!! | Disgust | Anger |
| Bob. Bob! Bob!!! What the hell are you doing?! | Surprise | Anger |
| Oh my good God. | Disgust | Joy |
Emotion distribution in IEMOCAP.
| Train | Validation | Test | |
|---|---|---|---|
| Frustrated | 1210 | 258 | 381 |
| Neutral | 1080 | 244 | 384 |
| Angry | 749 | 184 | 170 |
| Sad | 764 | 75 | 245 |
| Excited | 520 | 222 | 299 |
| Happy | 376 | 128 | 144 |
Weighted F1 Score on test set. The results are averaged over three runs.
| RoBERTa | MobileBERT | BERT-Medium | BERT-Tiny | ||||
|---|---|---|---|---|---|---|---|
| BP | 0.615 | BP | 0.609 | BP | 0.591 | BP | 0.574 |
| Hidden | 0.587 | Hidden | 0.568 | Hidden | 0.578 | Hidden | 0.519 |
| Linear | 0.586 | Linear | 0.531 | Linear | 0.566 | Linear | 0.507 |
| ELM | 0.565 | ELM | 0.537 | ELM | 0.532 | ELM | 0.508 |
| OSELM | 0.569 | OSELM | 0.54 | OSELM | 0.524 | OSELM | 0.486 |
Comparison, in terms of weighted F1 Score, with the the variants that are pre-trained on IEMOCAP.
| RoBERTa | MobileBERT | BERT-Medium | BERT-Tiny | |||||
|---|---|---|---|---|---|---|---|---|
| Regular | Pretrained | Regular | Pretrained | Regular | Pretrained | Regular | Pretrained | |
| Hidden | 0.587 | 0.598 | 0.568 | 0.589 | 0.578 | 0.54 | 0.519 | 0.544 |
| Linear | 0.586 | 0.598 | 0.531 | 0.584 | 0.566 | 0.524 | 0.507 | 0.523 |
| ELM | 0.565 | 0.586 | 0.537 | 0.56 | 0.532 | 0.51 | 0.508 | 0.531 |
| OSELM | 0.569 | 0.591 | 0.54 | 0.561 | 0.524 | 0.505 | 0.486 | 0.505 |
Comparison, in terms of weighted F1 Score, with the variants to which PCA reduction is applied.
| Linear | Hidden | ELM | OSELM | |
|---|---|---|---|---|
| BERT-Tiny 128 | 0.507 | 0.519 | 0.508 | 0.486 |
| BERT-Tiny PCA 64 | 0.496 | 0.514 | 0.509 | 0.472 |
| BERT-Tiny PCA 32 | 0.482 | 0.511 | 0.499 | 0.453 |
| BERT-Medium 512 | 0.566 | 0.578 | 0.532 | 0.524 |
| BERT-Medium PCA 128 | 0.538 | 0.567 | 0.53 | 0.506 |
| BERT-Medium PCA 64 | 0.534 | 0.553 | 0.526 | 0.495 |
Performance when a single user data is used. Results in terms of weighted F1 Score for all the users, varying the amount of data used, are reported.
| All | 500 | 200 | |
|---|---|---|---|
| Phoebe | 0.475 | 0.479 | 0.431 |
| Joey | 0.522 | 0.525 | 0.498 |
| Ross | 0.497 | 0.477 | 0.447 |
| Rachel | 0.495 | 0.472 | 0.418 |
| Monica | 0.474 | 0.46 | 0.437 |
| Chandler | 0.474 | 0.458 | 0.477 |
Feature extraction on Jetson Nano.
| TINY | MED | |||
|---|---|---|---|---|
| FP16 | FP32 | FP16 | FP32 | |
| Max-N | 17.2 (1.1) | 17.1 (0.7) | 24.28 (0.7) | 24.02 (0.7) |
| 5W | 33.96 (1.5) | 33.59 (1.5) | 41.62 (1.8) | 42.13 (0.3) |
Feature extraction on smartphone.
| TINY | MED | |||||
|---|---|---|---|---|---|---|
| CHIPSET | FP32 | FP16 | INT8 | FP32 | FP16 | INT8 |
| Snapdragon 765G | 2.6 (0.5) | 2.6 (0.5) | 1.9 (0.3) | 88.0 (4.5) | 85.0 (2.8) | 32.8 (0.5) |
| HiSilicon Kirin 655 | 12.3 (0.7) | 12.1 (0.5) | 9.6 (0.8) | 460.6 (0.9) | 456.8 (1.7) | 242.5 (0.7) |
Figure 2Rounding error distribution on MELD test set using TFLite interpreter.
Training of the linear classifier on Jetson Nano.
| Working Mode | # Features |
|
|
|
|---|---|---|---|---|
| 5W | 32 | 44.4 (2.8) | 103.8 (6.7) | 328.2 (12.7) |
| 64 | 53.4 (3.1) | 149.9 (10.2) | 475.5 (21.1) | |
| 128 | 47.2 (2.3) | 227.8 (12.2) | 799.3 (43.9) | |
| 512 | 78.4 (3.9) | 264.5 (3.0) | 1909.6 (165.4) | |
| Max-N | 32 | 28.3 (2.1) | 69.8 (6.2) | 199.8 (12.5) |
| 64 | 33.3 (2.9) | 96.7 (9.9) | 304.4 (26.5) | |
| 128 | 31.5 (0.9) | 145.7 (14.9) | 525.4 (40.4) | |
| 512 | 52.5 (2.2) | 179.6 (14.9) | 1284.3 (117.4) |
Training of the linear classifier on smartphones.
| CHIPSET | # Features |
|
|
|
|---|---|---|---|---|
| Snapdragon 765G | 32 | 291.4 (6.12) | 749.5 (5.85) | 1847.4 (20.16) |
| 64 | 255.1 (8.51) | 760.9 (33.46) | 1964.8 (163.69) | |
| 128 | 331.5 (7.96) | 766.5 (73.2) | 2119.8 (90.45) | |
| 512 | 447.2 (8.56) | 1376.2 (66.27) | 2596.8 (138.59) | |
| HiSilicon Kirin 655 | 32 | 378.27 (27.40) | 943.8 (28.82) | 2461.4 (40.94) |
| 64 | 413.8 (28.98) | 1033.8 (28.07) | 2723.3 (34.65) | |
| 128 | 488.1 (31.38) | 1224.9 (31.12) | 3230.9 (28.82) | |
| 512 | 1103.1 (28.90) | 2778.9 (31.16) | 7250.3 (36.28) |