| Literature DB >> 32933178 |
Valentina Franzoni1, Giulio Biondi2, Damiano Perri2, Osvaldo Gervasi1.
Abstract
This work concludes the first study on mouth-based emotion recognition while adopting a transfer learning approach. Transfer learning results are paramount for mouth-based emotion emotion recognition, because few datasets are available, and most of them include emotional expressions simulated by actors, instead of adopting real-world categorisation. Using transfer learning, we can use fewer training data than training a whole network from scratch, and thus more efficiently fine-tune the network with emotional data and improve the convolutional neural network's performance accuracy in the desired domain. The proposed approach aims at improving emotion recognition dynamically, taking into account not only new scenarios but also modified situations to the initial training phase, because the image of the mouth can be available even when the whole face is visible only in an unfavourable perspective. Typical applications include automated supervision of bedridden critical patients in a healthcare management environment, and portable applications supporting disabled users having difficulties in seeing or recognising facial emotions. This achievement takes advantage of previous preliminary works on mouth-based emotion recognition using deep-learning, and has the further benefit of having been tested and compared to a set of other networks using an extensive dataset for face-based emotion recognition, well known in the literature. The accuracy of mouth-based emotion recognition was also compared to the corresponding full-face emotion recognition; we found that the loss in accuracy is mostly compensated by consistent performance in the visual emotion recognition domain. We can, therefore, state that our method proves the importance of mouth detection in the complex process of emotion recognition.Entities:
Keywords: convolutional neural networks; emotion recognition; transfer learning
Year: 2020 PMID: 32933178 PMCID: PMC7571064 DOI: 10.3390/s20185222
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Mouth detection, cropping and resizing (source image from AffectNet database).
Figure 2General scheme of the adapted transfer learning techniques with the considered CNNs.
Figure 3General scheme of our CNN.
Figure 4Sample images from the filter dataset.
Number of images per type of emotion considered in our study.
| Neutral | Happy | Surprise | Anger | |
|---|---|---|---|---|
|
| 1239 | 562 | 463 | 478 |
Final results related to the considered CNNs.
| Network | Accuracy |
|---|---|
| Vgg-16 | 71.8% |
| InceptionResNetV2 | 79.5% |
| Inception V3 | 77.0% |
| Xception | 75.5% |
Figure 5Loss function evolution for each network as a function of the epochs.
Figure 6Accuracy function evolution for each network as a function of the epochs.
Figure 7Confusion matrix of the training set for the network InceptionResNetV2.
Number of misclassified images of neutral faces in the three wrong categories over a total of 221 images.
| Happy | Surprise | Anger | |
|---|---|---|---|
|
| 11 | 7 | 4 |
Figure 8Confusion matrices of the trained networks, with normalised values.
Figure 9Confusion matrix of the validation set obtained while running InceptionResNetV2 to analyse the whole face images instead of the mouth portions of the images.