| Literature DB >> 32234701 |
Haik Kalantarian1,2, Khaled Jedoui3, Kaitlyn Dunlap1,2, Jessey Schwartz1,2, Peter Washington1,2, Arman Husic1,2, Qandeel Tariq1,2, Michael Ning1,2, Aaron Kline1,2, Dennis Paul Wall1,2,4.
Abstract
BACKGROUND: Autism spectrum disorder (ASD) is a developmental disorder characterized by deficits in social communication and interaction, and restricted and repetitive behaviors and interests. The incidence of ASD has increased in recent years; it is now estimated that approximately 1 in 40 children in the United States are affected. Due in part to increasing prevalence, access to treatment has become constrained. Hope lies in mobile solutions that provide therapy through artificial intelligence (AI) approaches, including facial and emotion detection AI models developed by mainstream cloud providers, available directly to consumers. However, these solutions may not be sufficiently trained for use in pediatric populations.Entities:
Keywords: affect; artificial intelligence; autism; digital data; digital health; emotion; mHealth; machine learning; mobile app; mobile phone
Year: 2020 PMID: 32234701 PMCID: PMC7160704 DOI: 10.2196/13174
Source DB: PubMed Journal: JMIR Ment Health ISSN: 2368-7959
Figure 1A mobile charades game played between caregiver and child is used to crowdsource emotive video, subsampled and categorized by both manual raters and automatic classifiers. Frames from these videos form the basis of our dataset to evaluate several emotion classifiers.
Figure 2Prompts from the emoji category are caricatures, but many are still associated with the classic Ekman universal emotions.
Figure 3Prompts from the faces category are derived from real photos of children over a solid background.
Figure 4The structure of a single video is characterized by its boundary points, which identify the times at which various prompts were shown to the child.
The distribution of frames per category (N=2602).
| Emotion | Frames, n |
| Neutral | 1393 |
| Emotive | 1209 |
| Happy | 864 |
| Sad | 60 |
| Surprised | 165 |
| Disgusted | 69 |
| Angry | 51 |
Percentage of frames correctly identified by classifier: Azure (Azure Cognitive Services), AWS (Amazon Web Services), SH (Sighthound), and Google (Google Cloud Vision). These results only include frames in which there was a face, and the two manual raters agreed on the class. Google Vision API does not support the neutral label.
| Classifier | Frame type | ||
|
| Emotive (n=1209), n (%) | Neutral (n=1393), n (%) | All (n=2602), n (%) |
| Azure | 798 (66.00) | 744 (53.40) | 1542 (59.26) |
| AWSa | 829 (68.56) | 679 (48.74) | 1508 (57.95) |
| 785 (64.92) | N/Ab | N/A | |
| Sighthound | 664 (54.92) | 902 (64.75) | 1566 (60.18) |
aAWS: Amazon AWS Rekognition.
bN/A: not applicable.
Percentage of frames correctly identified by emotion type by each classifier: Azure (Azure Cognitive Services), AWS (Amazon Web Services), SH (Sighthound), and Google (Google Cloud Vision). These results only include frames in which there was a face, and the two manual raters agreed on the class. Note: Google Vision API does not support the neutral or disgust labels.
| Classifier | Frame type | |||||
|
| Neutral (n=1394), n (%) | Happy (n=864), n (%) | Sad (n=60), n (%) | Surprised (n=165), n (%) | Disgusted (n=69), n (%) | Angry (n=51), n (%) |
| AWS | 679 (48.74) | 709 (82.0) | 19 (31) | 94 (56.9) | 4 (5) | 3 (5) |
| Sighthound | 902 (64.75) | 545 (63.0) | 13 (21) | 90 (54.5) | 10 (14) | 6 (11) |
| Azure | 744 (53.41) | 695 (80.4) | 20 (33) | 80 (48.4) | 0 (0) | 3 (5) |
| N/Aa | 676 (78.2) | 10 (16) | 93 (56.3) | N/A | 6 (11) | |
aN/A: not applicable.
Figure 5The Cohen’s Kappa Score is a measure of agreement between two raters, and was calculated for all four evaluated classifiers: Azure (Azure Cognitive Services), AWS (Amazon Web Services), SH (Sighthound), and Google (Google Cloud Vision). Results indicate weak agreement between all pairs of classifiers.
Figure 6The distribution of frames between the two human raters for each emotion: HP (Happy), SD (Sad), AG (Angry), DG (Disgust), NT (Neutral), and SC (Scared).
Speed of the evaluated classifiers.
| Classifier | Time (seconds) |
| Azure | 28.6 |
| AWS | 90.6 |
| 55.9 | |
| Sighthound | 41.1 |
Figure 7A comparison of the performance of each classifier on a set of frames highlights scenarios that may lead to discrepancies in the classifier outputs for various emotions: HP (Happy), CF (Confused), DG (Disgust), N/A (Not Applicable), AG (Angry), SC (Scared). Ground truth manual labels are shown on top, with labels derived from each classifier on the bottom.