| Literature DB >> 34025503 |
Theresa Küntzler1, T Tim A Höfling2, Georg W Alpers2.
Abstract
Emotional facial expressions can inform researchers about an individual's emotional state. Recent technological advances open up new avenues to automatic Facial Expression Recognition (FER). Based on machine learning, such technology can tremendously increase the amount of processed data. FER is now easily accessible and has been validated for the classification of standardized prototypical facial expressions. However, applicability to more naturalistic facial expressions still remains uncertain. Hence, we test and compare performance of three different FER systems (Azure Face API, Microsoft; Face++, Megvii Technology; FaceReader, Noldus Information Technology) with human emotion recognition (A) for standardized posed facial expressions (from prototypical inventories) and (B) for non-standardized acted facial expressions (extracted from emotional movie scenes). For the standardized images, all three systems classify basic emotions accurately (FaceReader is most accurate) and they are mostly on par with human raters. For the non-standardized stimuli, performance drops remarkably for all three systems, but Azure still performs similarly to humans. In addition, all systems and humans alike tend to misclassify some of the non-standardized emotional facial expressions as neutral. In sum, emotion recognition by automated facial expression recognition can be an attractive alternative to human emotion recognition for standardized and non-standardized emotional facial expressions. However, we also found limitations in accuracy for specific facial expressions; clearly there is need for thorough empirical evaluation to guide future developments in computer vision of emotional facial expressions.Entities:
Keywords: automatic facial coding; facial expression recognition; human emotion recognition; naturalistic expressions; recognition of emotional facial expressions; software evaluation; specific emotions; standardized inventories
Year: 2021 PMID: 34025503 PMCID: PMC8131548 DOI: 10.3389/fpsyg.2021.627561
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Category distributions of test data and drop outs of Azure, Face++, and FaceReader.
| Absolute frequency of images | 178 | 178 | 178 | 178 | 178 | 178 | 178 | 1246 |
| Relative frequency of images | 14.3 | 14.3 | 14.3 | 14.3 | 14.3 | 14.2 | 14.2 | 100 |
| Face++ | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Azure | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| FaceReader | 0.6 | 0.0 | 0.0 | 1.1 | 1.1 | 2.2 | 1.2 | 0.88 |
| Absolute frequency of images | 236 | 270 | 245 | 143 | 254 | 151 | 88 | 1387 |
| Relative frequency of images | 17.0 | 19.5 | 17.7 | 10.3 | 18.3 | 10.9 | 6.3 | 100 |
| Face++ | 0.0 | 0.7 | 0.8 | 0.7 | 0.4 | 0.0 | 1.1 | 0.5 |
| Azure | 16.9 | 11.1 | 25.3 | 26.6 | 25.2 | 17.9 | 23.9 | 20.3 |
| FaceReader | 73.3 | 70.0 | 75.1 | 79.0 | 76.8 | 74.8 | 69.3 | 74.2 |
Percentages are rounded to the first decimal. The base of the percentage is the respective total of each category. Reading example: Azure did not find a face in 16.9% of the 236 neutral images of the non-standardized data and a total of 20.3% of the 1387 images dropped out because of no face detection.
Figure 1Averaged human ratings separately for basic emotion categories for standardized (black bars) and non-standardized facial expressions (gray bars). (A) Depicts mean genuineness ratings ranging from 1 (very in-genuine) to 7 (very genuine). (B–H) Depict mean emotion ratings (percent) for (B) neutral, (C) joy, (D) anger, (E) disgust, (F) sadness, (G) fear, and (H) surprise expressions. Error bars are 95% confidence intervals.
Sensitivity, precision, and accuracy of Azure, Face++, and FaceReader separately for emotion categories.
| Neutral | 1.00 | 0.63 | 0.94 | 0.38 | 0.94 | 0.70 | 0.40 | 0.34 | 0.99 | 0.92 | 0.68 | 0.2 |
| Joy | 1.00 | 0.98 | 0.85 | 0.88 | 0.99 | 0.96 | 0.48 | 0.76 | 1.00 | 0.99 | 0.42 | 0.92 |
| Anger | 0.51 | 0.91 | 0.38 | 0.87 | 0.49 | 0.84 | 0.15 | 0.36 | 0.96 | 0.99 | 0.14 | 0.42 |
| Disgust | 0.85 | 0.98 | 0.10 | 0.50 | 0.89 | 0.77 | 0.16 | 0.17 | 0.97 | 0.99 | 0.15 | 0.17 |
| Sadness | 0.88 | 0.75 | 0.48 | 0.77 | 0.81 | 0.75 | 0.19 | 0.40 | 0.98 | 0.97 | 0.16 | 0.32 |
| Fear | 0.46 | 0.99 | 0.03 | 0.33 | 0.40 | 0.95 | 0.18 | 0.18 | 0.88 | 0.97 | 0.00 | 0.00 |
| Surprise | 0.98 | 0.73 | 0.56 | 0.43 | 0.97 | 0.71 | 0.66 | 0.20 | 0.98 | 0.93 | 0.34 | 0.33 |
| Average | 0.81 | 0.85 | 0.48 | 0.59 | 0.79 | 0.81 | 0.32 | 0.35 | 0.97 | 0.97 | 0.27 | 0.34 |
| Accuracy | 0.81 | 0.57 | 0.79 | 0.32 | 0.97 | 0.31 | ||||||
Stand., standardized data; Non-Stand., non-standardized data; Sens., sensitivity; Prec., precision.
Figure 2Confusion matrices indicating classification performance on standardized (left panels) and non-standaridzed data (right panels): (A) standardized data by Azure, (B) non-standardized data by Azure, (C) standardized data by Face++, (D) non-standardized data by Face++, (E) standardized data by FaceReader and (F) non-standardized data by FaceReader. Numbers indicate percentages to the base of the true category. Reading example: From the standardized data Azure classifies 4.5% of the truly fearful expressions as neutral. The 45.5% of the fearful images are classified correctly.
Figure 3Classification performance depicted as Receiver Operating Characteristic (ROC) curves and corresponding Area under the Curve (AUC) for overall emotion recognition performance for the three FER systems (Azure, Face++, and FaceReader) and human raters. Separately for (A) standardized facial expressions and (B) non-standardized facial expressions separately. The white diagonal line indicates classification performance by chance.