| Literature DB >> 35394438 |
Peter Washington1, Haik Kalantarian1, John Kent1, Arman Husic1, Aaron Kline1, Emilie Leblanc1, Cathy Hou1, Onur Cezmi Mutlu1, Kaitlyn Dunlap1, Yordan Penev1, Maya Varma1, Nate Tyler Stockham1, Brianna Chrisman1, Kelley Paskov1, Min Woo Sun1, Jae-Yoon Jung1, Catalin Voss1, Nick Haber1, Dennis Paul Wall1.
Abstract
BACKGROUND: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces.Entities:
Keywords: affective computing; artificial intelligence; autism spectrum disorder; computer vision; convolutional neural network; digital therapy; emotion recognition; machine learning; mobile health; pediatrics
Year: 2022 PMID: 35394438 PMCID: PMC9034430 DOI: 10.2196/26760
Source DB: PubMed Journal: JMIR Pediatr Parent ISSN: 2561-6722
Figure 1Pipeline of the model training process. Structured videos enriched with child emotion evocation are collected from a mobile autism therapeutic deployed in the wild. The frames are ranked for their contribution to the target classifier by a maximum entropy active learning algorithm and receive human labels on a rating platform named HollywoodSquares. The frames are corresponding labels that are transferred onto a ResNet-152 neural network pretrained on the ImageNet data set.
Figure 2HollywoodSquares rating interface. Annotators use keyboard shortcuts and the mouse to speedily annotate a sequence of frames acquired during GuessWhat gameplay.
Emotions represented in the HollywoodSquares data set, including how many children and videos are represented for each emotion category.
| Emotion | Frequency | Number of children | Number of videos |
| Anger | 643 | 28 | 62 |
| Disgust | 1723 | 46 | 95 |
| Fear | 1875 | 41 | 89 |
| Happy | 13,332 | 73 | 228 |
| Neutral | 16,055 | 87 | 289 |
| Sad | 947 | 31 | 93 |
| Surprise | 5393 | 52 | 135 |
Representation of race and ethnicity of children whose who played the “Emoji” charades category and uploaded a video to the cloud.
| Race/ethnicity | Frequency |
| Arab | 6 |
| Black or African | 16 |
| East Asian | 16 |
| Hispanic | 36 |
| Native American | 7 |
| Pacific Islander | 5 |
| South Asian | 14 |
| Southeast Asian | 7 |
| White or Caucasian | 100 |
| Not specified | 60 |
Figure 3Example of frames collected from GuessWhat gameplay, including examples of cropped (A) and original (B) frames. We have displayed these images after obtaining consent from the participants for public sharing.
Figure 4Confusion matrix for the entirety of the Child Affective Facial Expression data set.
Comparison of several popular neural network architectures trained on the same data seta.
| Model | Balanced accuracy (%) | F1-score (%) | Number of network parameters |
| ResNet152V2; He et al [ | 64.12 | 64.2 | 60,380,648 |
| ResNet50V2; He et al [ | 63.67 | 63.12 | 25,613,800 |
| InceptionV3; Szegedy et al [ | 59 | 59.66 | 23,851,784 |
| MobileNetV2; Sandler et al [ | 57.63 | 58.19 | 3,538,984 |
| DenseNet121; Huang et al [ | 58.2 | 59.19 | 8,062,504 |
| DenesNet201; Huang et al [ | 57.02 | 58.95 | 20,242,984 |
| Xception; Chollet and François [ | 58.16 | 60.58 | 22,910,480 |
aDefault hyperparameters were used for all networks.
Figure 5Confusion matrix for Child Affective Facial Expression Subset A.
Figure 6Confusion matrix for Child Affective Facial Expression Subset B.
Figure 7Classifier performance versus original CAFE annotator performance for 10 difficulty bins. The classifier tends to perform well when humans agree on the class and poorly otherwise. The numbers in parentheses represent the number of images in each bin. This highlights the issue of ambiguous labels in affective computing and demonstrates that our model performance scales proportionally to human performance. CAFE: Child Affective Facial Expression.