Literature DB >> 36035501

Machine learning models using mobile game play accurately classify children with autism.

Nicholas Deveau¹, Peter Washington², Emilie Leblanc³, Arman Husic³, Kaitlyn Dunlap³, Yordan Penev³, Aaron Kline³, Onur Cezmi Mutlu⁴, Dennis P Wall^1,3.

Abstract

Digitally-delivered healthcare is well suited to address current inequities in the delivery of care due to barriers of access to healthcare facilities. As the COVID-19 pandemic phases out, we have a unique opportunity to capitalize on the current familiarity with telemedicine approaches and continue to advocate for mainstream adoption of remote care delivery. In this paper, we specifically focus on the ability of GuessWhat? a smartphone-based charades-style gamified therapeutic intervention for autism spectrum disorder (ASD) to generate a signal that distinguishes children with ASD from neurotypical (NT) children. We demonstrate the feasibility of using "in-the-wild", naturalistic gameplay data to distinguish between ASD and NT by children by training a random forest classifier to discern the two classes (AU-ROC = 0.745, recall = 0.769). This performance demonstrates the potential for GuessWhat? to facilitate screening for ASD in historically difficult-to-reach communities. To further examine this potential, future work should expand the size of the training sample and interrogate differences in predictive ability by demographic.

Entities: Chemical

Year: 2022 PMID： 36035501 PMCID： PMC9398788 DOI： 10.1016/j.ibmed.2022.100057

Source DB: PubMed Journal: Intell Based Med ISSN： 2666-5212

Introduction

Remote treatment and progress tracking has transformed the way in which clinicians deliver care to their patients [1]. This trend was catalyzed by COVID-19, and as the pandemic phases out, remote health is positioned to remain a primary component of many forms of care [2,3]. One of the best established forms of remote treatment, telemedicine, has been shown to increase access to care across geographic regions [4]. Telemedicine's success during the COVID-19 pandemic provided a glimpse of a future in which access to care is not determined by one's ability to physically visit a care provider. This specific moment in history presents the biomedical community with an opportunity to drastically improve access to care for individuals who have historically been underserved by the medical system. If we are to realize a future of more equitable healthcare delivery, it is critical that we focus on developing new forms of remote care at a moment in time when both patients and clinicians are familiar with remote care workflows. Autism Spectrum Disorder (ASD) serves as a clear example of a condition that is an ideal substrate for remote care. Although the prevalence of ASD is similar in both rural (0.9 pct.) and urban (1.0 pct.) areas, individuals in rural communities face limited access to identification and intervention services [5]. Data mining studies have suggested that up to 80 pct. of counties in the U.S. lack sufficient diagnostic resources [6]. Moreover, early diagnosis and intervention of ASD can significantly improve the quality of life for individuals with ASD and their families [7]. As such, the invention of novel technologies that allow for remote screening for ASD could address the disparity in early diagnosis of the disorder. The development of technology for remote care also allows us to experiment with novel ways of conceptualizing the patient-care interaction. It is not a stretch to say that simply providing the current physician interaction through a smartphone may not be the best way of providing remote care. In fact, delays in the adoption of telehealth have been attributed to “unengaged” and “resisting” users [8]. Moreover, current methods for screening for ASD typically include subjective caregiver-report questionnaires. Feature selection on electronic health records have identified salient behavioral features for predicting ASD [[9], [10], [11], [12]], and these features can be reliably acquired through crowdsourcing by non-expert raters [[13], [14], [15], [16], [17],17]. These non-expert feature tags have been used to train machine learning models which can detect ASD with high accuracy, precision, and recall [10,11,[18], [19], [20], [21], [22], [23]]. While digitization of these questionnaires may be one way of addressing gaps in screening for ASD, such questoinaries require literacy and perform worse with non-white and lower education caregivers [24]. Consequently, naively deploying digitized versions of diagnostic questionnaires risks exacerbating current disparities in the early identification of ASD. A clear need exists for objective methods of screening for ASD that perform equally well across demographic groups. Our lab developed GuessWhat? a mobile charades-style gamified therapeutic intervention that acquires structured video data from children with ASD for use in behavioral diagnostics research [[25], [26], [27], [28], [29]]. We designed the gamified therapeutic to be an engaging and fun way for parents and children to interact while having the option to support behavioral research and remote therapy by sharing objective gameplay and video data. Computer vision classifiers have been developed with the resulting data streams [29,30], and other computer vision efforts for detecting behavioral features related to early time point diagnostics and longitudinal outcome tracking are possible [13,31,32]. In addition to active data collection and monitoring of structured gameplay sessions, passive data collection and device usage measures can potentially be used for diagnostic purposes. Detecting behavioral and mental health conditions through passive device usage has been termed “digital phenotyping” in the literature [29,30,[33], [34], [35]]. Here, we explore the feasibility of using device usage data during gameplay to predict the presence of ASD in a semi-passive manner. Although this falls outside of the traditional definition of “digital phenotyping”, we argue that this passive and semi-passive prediction of behavioral health from device usage also falls into the broad category of “digital phenotyping”. The main goal of this study is to identify the ability for GuessWhat? To generate a signal that distinguishes children with ASD from neurotypical (NT) children. Using only objective behavioral data captured by the game, we successfully demonstrated the ability to train a classifier that distinguishes the two groups, a critical step toward formalizing the game as an objective and easily-deployed remote screening tool for ASD.

Methods

Data collection

We collected behavioral data through at-home gameplay of the GuessWhat? game, a game developed to acquire structured video from children with ASD for behavioral disease research [[25], [26], [27], [28], [29]]. During gameplay, a parent shows a child a prompt–an image–and the child is asked to act out the image, much as they would in a game of charades. As the child acts out the prompt, the parent guesses what the image is based off of the child's acting, and if the parent guesses correctly, the parent labels the prompt by tilting the phone forward (top of phone away from forehead) if successful and backward (bottom of phone away from forehead) if unsuccessful. These steps are detailed in Fig. 1 .

Fig. 1

Mobile Intervention User Experience. a) GuessWhat is a charades-style mobile game available for any a smartphone device. In a typical game session, b) the parent holds the smartphone to their forehead and tries to guess the emotion mimicked by the child in response to the prompt shown on the phone's screen. Upon guessing, the parent tilts the phone to proceed to the next prompt through the end of the 90-second session. c) After each 90s game, parent and child can review together. In-app d) game modes, e) unlocking deck and character choices based on coins earned, and f) activity-based achievement badges reinforce positive progression and ensure optimal child engagement through time. During gameplay, the app collects metadata regarding whether the parent successfully guessed the prompt acted out by the child. We define a trial to be the delivery of a single prompt to a child during a unique session of gameplay (need figure/graphic showing this). The unprocessed data stored by the game and used in this study includes event-level data that tracks the following aspects of gameplay: Start time of a prompt End time of a prompt Whether the prompt was correct Whether the prompt was skipped

Participant recruitment

The Stanford Institutional Review Board approved the study prior to any research activities taking place. The recruitment methods for this study are identical to Ref. [36] This study was conducted remotely. Participants were recruited through GuessWhat's existing userbase, The Hartwell Foundation's KidsFirst autism research database, ResearchMatch.org, and Facebook advertisements. All participating were required to meet the following criteria: 1) they were able to read and speak English, 2) they had a compatible iOS or Android device with internet access, 3) the parent was 18+ years old, 4) the child was between 3 and 12 years of age and diagnosed with ASD. To safeguard against the potential for self-reporting bias, we required the caregiver to confirm that their child's autism diagnosis came from a formal medical assessment. We asked the caregiver to choose a diagnostic label from a menu of choices including Autism Spectrum Disorder, Autistic Disorder, Pervasive Developmental Disorder-Not Otherwise Specified, Asperger Syndrome, ADHD/ADD, Anxiety, Speech and Language Delay. In addition, we required participants to report on the specific type(s) of therapy being administered to their child. This information requires a specialized understanding of the autism diagnosis and subsequent treatment prescriptions.

Data sample, feature engineering and preprocessing

We collected gameplay from children with autism spectrum disorder (ASD, n = 28) and neurotypical children (NT, n = 21). 19 of the ASD children were male and 9 were female. 10 of the NT children were male, 7 were female, and 4 did not provide sex information. Data were acquired between April 2017 and February 2021. Children were classified as ASD or non-ASD based on parent-provided information collected when the parent signed up for GuessWhat? Children had a mean age of 7.10 +- 5.82 years. In order to minimize missing-data imputation, we focused our analysis only on the most commonly presented prompts, which were images from the CAFE dataset, a collection of images of young children displaying angry, fearful, sad, happy, surprised, disgusted and neutral faces [37]. Future work will expand this image dataset to include prompts derived from videos served on the social media platform TikTok.

Feature engineering

Previous work has shown that differences in emotion recognition tasks can be used to distinguish children with autism from NT children [38,39]. This delta stems from the ability to correctly identify an emotion and the reaction time required to do so (e.g., how long it takes a child to recognize a facial emotion). Consequently, we developed features that allowed us to measure these two constructs. The first set of features (N = 17, detailed in appendix A1) measured the accuracy with which a child successfully acted out a specific prompt. Each one of these features corresponded to one of 17 faces shown during gameplay. A prompt was considered correct if the parent labeled it as such during gameplay. It should be noted that, although parents received instructions for how to correctly label a prompt, we had no way to confirm that they did in fact label it correctly. Thus, for m trials of each face type, we calculated percent correct p to be: For a single session of gameplay, we calculated a percent correct feature for each of the prompts shown from the CAFE dataset. The second set of features (n = 17) measured the amount of time it took for a child to act out the prompt and for the parent to label it. . We call this prompt duration, d, and we calculated it as follows:where m is the number of times a prompt corresponding to a specific emotion was shown to a child. In other words, it was the average amount of time it took a child to identify a specific emotion. We calculated d for each of the 17 types of faces shown, regardless of whether a child correctly identified the face. Appendix A2 illustrates the input data schema.

Preprocessing

To avoid incorporating information from the distribution of the training data into the test set (i.e., “data leakage”), we carried out the following preprocessing steps separately for each test-train split of the data. The steps included, in order: outlier removal, imputation, standardization, upsampling. For all features, we considered values greater than 3 standard deviations away from the mean value to be an outlier. We removed these values and then used k-nearest-neighbor imputation (k = 3) to fill missing observations. We then standardized our data by subtracting feature-wise means from each observation and dividing by feature-wise standard deviation. Finally, due to compounding effects of moderately imbalanced classes within a small size dataset, we used SMOTE [40] to upsample the minority class and ensure balanced classes (equal ASD and NT) in each test-train split.

Modeling

We tested the performance of 4 classifiers on our set of 34 features. We trained and tested our models in Python and using the packages scikit-learn [41]. We chose to test models from 3 families of classifiers: linear models, support vector machines (SVM) and tree-based methods. Three main criteria drove the choice of these families of models. First, we had no a priori belief about the linearity (or lack thereof) of the relationships between our features, so it was important to model our data using a set of methods that would perform well under various conditions of linearity. Second, our sample size was not particularly large, so it was important to test model types that offered considerable flexibility to prevent overfitting through regularization. Third, we wanted to choose an interpretable model to gain insight into the specific aspects of gameplay that predict ASD. Table 1 describes the types of models we used in our analysis and their relevant attributes.

Table 1

Summary of tested classifiers. Hypermarameter names correspond to those used by scikit-learn v. 1.0.1

	Hyperparameter	Values Tested
XGBoost	learning_rate	0.05, 0.10, 0.15, 0.20, 0.25, 0.30
	max_depth	1, 2, 3
	min_child_weight	1, 3, 5, 7
	gamma	0.1, 0.2, 0.3, 0.4
	colsample_bytree	0.3, 0.4, 0.5, 0.7
Random Forest	max_depth	1, 2, 3
	min_samples_leaf	2, 3, 4, 5
Logistic Regeession	penalty	L1, L2
	C	0.1 to 10, 20 values log-spaced
Linear SVM	C	−7 to 4, 50 values log-spaced

Summary of tested classifiers. Hypermarameter names correspond to those used by scikit-learn v. 1.0.1 We used a repeated, nested grid search to simultaneously identify the best performing set of hyperparameters for our models as well as to understand the statistical accuracy of the performance metrics we obtained. The first iteration of the outer loop of our cross-validation procedure randomly splits the dataset into 4 equal-sized partitions. Then, using only 3 of these 4 partitions, the inner loop tunes the model hyperparameters using a grid search and 4-fold cross validation in order to select the best performing model with respect to AU-ROC. It should be noted that, because of a low sample size, we limited our search space to include hyperparameters that would lead to more regularization and less overfitting. After the best performing model was selected by the inner loop, its performance (measured by AU-ROC and recall) was tested on the unseen data included in the 4th partition from the outer loop. After four iterations of the outer loop, we obtained 4 classifiers, each with a corresponding set of performance metrics. Finally, we repeated the outer loop 7 times to obtain a total of 28 sets of performance metrics from which we bootstrapped distributions of each metric of interest. Fig. 6 provides a visualization of the repeated, nested cross validation procedure.

Fig. 6

Repeated Nested Cross Validation Procedure Used to separately tune model hyperparameters and evaluate out-of-sample performance.

Feature selection

A simpler model (e.g. a model with fewer features) is often favorable as it is easier to interpret and often can improve performance by eliminating noisy features. We produced heatmaps in order to inspect the importance of each feature across the 28 iterations of cross validation used to train and evaluate each model (e.g., Fig. 2 ). Finally, we retained the models using only duration-based features, which consistently displayed higher feature importance in the random forest classifier. We noted a clear separation between duration and accuracy-based features: features using the time taken by the child to act out the prompt were significantly more important on average (Fig. 4 ; t = 5.15, p = 1e-5) than features relying on the parent's ability to correctly guess the prompt the child is acting out. This gap in feature importance based on the type of feature (duration or accuracy-based) could be due to the variability in parents' adherence to the GuessWhat? Instructions.

Fig. 2

Fig. 4

Difference in average feature importance by feature type.

Heatmaps of relative feature importance for each classier type (all features). The y axis corresponds to the 28 iterations of cross-validation. Plots are presented in decreasing order of ROC-AUC performance (as read left to right). Duration-based features tended to be most important in models that produced a higher AU-ROC. Heatmaps of relative feature importance for each classier type (duration-only features). The y axis corresponds to the 28 iterations of cross-validation Plots are presented in decreasing order of AU-ROC performance (as read left to right). Difference in average feature importance by feature type.

Final training

After manual feature selection, we followed the same repeated, nested cross-validation procedure as before and re-trained the classifiers using the entire training set on the reduced feature space.

Model evaluation

We evaluated models by comparing the mean values of AU-ROC and recall across the 28 iterations of cross validation. Later in this paper we will discuss the practical considerations of each metric and provide an argument for those off of which we should base final model selection.

Results

Model performance using full feature set

Using the full set of 34 features (all accuracy and duration measures), we obtained four heatmaps of relative feature importance across each of the 28 iterations of repeated, nested cross validation (Fig. 2). One heatmap exists for each type of classifier, and each row of the heatmap displays feature importances for the classifier that maximized AU-ROC during the hyperparameter tuning grid search. 28 rows of the heatmap correspond to the 28 models produced through the 28 iterations of repeated, nested cross validation. When trained on the full set of 34 features, an XGBoost classifier produced the best model with respect to both AU-ROC (AU-ROC = 0.74) and recall (recall = 0.76). Random Forest performed second best with respect to both AU-ROC (AU-ROC = 0.73) and recall (recall = 0.76). Logistic regression and linear SVM performed noticeably worse than tree based methods with respect to recall, but SVM showed better precision than the other methods. In our discussion we will argue for selecting the model that results in the highest average of AU-ROC and Recall. A summary of all performance metrics for all models are found in Table 2 .

Table 2

Cross-validated performance metrics for each classifier type obtained through hyperparameter grid search. Minority class (NT) was sampled using SMOTE to obtain equal class size as majority class (ASD).

Features	Mean AU-ROC		Mean Recall		Mean Accuracy		Mean Precision
Features	All	Duration	All	Duration	All	Duration	All	Duration
Model
XGBoost	0.74*	0.70	0.76*	0.72	0.71	0.67	0.64	0.61
Random Forest	0.73	0.75*	0.76	0.77*	0.72*	0.72*	0.65	0.67*
Logistic Regression	0.67	0.69	0.67	0.69	0.65	0.68	0.60	0.67
Linear SVM	0.70	0.71	0.70	0.74	0.69	0.69	0.70*	0.65

Manual inspection of features

Manual inspection of feature importances revealed that duration-based features were, in general, most important in the two best performing tree-based classifiers. This pattern was most pronounced in the random forest (Fig. 2). Lower performing linear methods tended to spread feature importance out among both duration-based and accuracy-based features (Fig. 2). More specifically, duration-based features corresponding to faces expressing disgust (in the tree-based methods) most consistently displayed high relative importance across the folds of cross-validation. This is most clearly seen in Fig. 2 Fig. 3, where duration based features are seen on the left side of the figures.

Fig. 3

Heatmaps of relative feature importance for each classier type (duration-only features). The y axis corresponds to the 28 iterations of cross-validation Plots are presented in decreasing order of AU-ROC performance (as read left to right).

Model performance using reduced feature set

Three of the four classifiers saw comparable or improved performance with respect to AU-ROC and recall when trained only on the reduced subset of 17 duration-based features. XGBoost was the only classifier that performed worse when trained only on the reduced feature subset. Performance metrics for each model trained using both the full set of features and the duration-based subset are found in Table 2.

Feature importances using the reduced feature set

Features corresponding to the emotion “disgust” were consistently most important within the highest performing random forest classifier (random forest). Features corresponding to surprise and sadness were consistently highly important across all classifier types except for XGboost (Fig. 5 ).

Fig. 5

Feature importances aggregated by emotion across all 4 families of classifiers.

Feature importances aggregated by emotion across all 4 families of classifiers. Repeated Nested Cross Validation Procedure Used to separately tune model hyperparameters and evaluate out-of-sample performance. When we aggregated feature importance by emotion (e.g., took the average of all features corresponding to a face showing disgust), the difference in importance between the most important feature, disgust, and all other emotions was most drastic in the random forest classifier (Fig. 5).

Classifier selection

As mentioned, we selected the simplest model resulting in the highest average of AU-ROC and Recall. According to this metric, the best performing model among both the full and reduced feature set was a random forest classifier. Because of the repeated, nested cross validation, we cannot report a single “best” set of hyperparameters.

Discussion

Novelty of method

To the extent of our knowledge, this was the first study to use naturalistic, smartphone-collected game play data to distinguish ASD from NT children in a non-clinical setting. Moreover, the objective nature of the data adds to a growing body of work demonstrating that digital phenotyping can successfully distinguish ASD from NT children [32,42]. Considerable work has focused on researching and developing objective methods of screening for ASD. Many of these methods rely either on genetic information or image and video data collected for use with computer vision algorithms. In this study, we expand on these modalities, demonstrating that using data collected through the use of a digital therapeutic, we are able to distinguish ASD from NT. These results can potentially generalize to other ubiquitous modalities such as wearable computers, which have proven to be clinically useful for addressing certain symptoms related to autism [[43], [44], [45], [46], [47], [48], [49], [50], [51], [52]]. The significance of this is twofold. First, it is reasonable to expect that supplanting the aforementioned efforts to develop digital screening tools with the modality presented in this paper will produce more expressive models that can be used to screen for ASD in a privacy-preserved manner. Privacy concerns are at the forefront of behavioral phenotyping efforts (Washington et al., 2020). Second, GuessWhat? benefits from capturing information during naturalistic interactions between parents and their children. This has a clear benefit: the parent can intervene if the child begins to lose interest or pay less attention to the game. That said, involving the parent in gameplay introduces potential confounding effects of the parent's method of interacting with the game (e.g., the duration-based measures in this study could capture the speed with which a parent marks a prompt as correct as opposed to the speed with which it takes the child to answer the prompt).

Feature discussion

One of the most striking results of the study was the extent to which the highest performing models nearly exclusively found duration-based features important compared to accuracy based features. This may have been due to the low variance of the accuracy features compared to that of the duration based features. For both ASD and NT, accuracy metrics were skewed heavily towards 1.0 (i.e., always correct), suggesting that these features may not discriminate well between the two conditions. This trend could be driven by latent factors such as parents incorrectly labeling or a game design that was too easy for the majority of children regardless of diagnosis. Positing that the accuracy features may have been introducing noise into our dataset, we opted to train models using only duration-based features, which improved performance in 3 of the 4 types of classifiers. Consequently, our final classifier was a random forest trained on just the duration subset of features (Fig. 3). When we looked at the mean feature importance of features aggregated by emotion, features corresponding to disgust were the most important in the random forest classifier. It has been shown that younger children are worse at discriminating facial expressions of disgust and surprise when images of these expressions are presented alongside specific other emotions [53], suggesting that cognitive development is required for accurate processing of these emotions. This provides a compelling explanation of why we found features corresponding to disgust and surprise to be best at discriminating between ASD and NT children in our study. That said, we must take care to not deduce an oversimplified understanding of these features. Specifically, although emotion recognition is necessary for a child to act out a prompt, it is but one component of a complex interaction between child performance and parent interpretation that could drive the signal found in this study. In this paper, we consider this complex didactic process to be a proxy for emotion recognition, but it likely captures components of theory of mind, metacognition and many other phenomena, as well.

Strengths and limitations

This study demonstrated that naturalistic gameplay data involving childrens’ ability to identify and process facial emotions can be used to distinguish ASD and NT children. Moreover, the features that were most important to distinguishing the children were features corresponding to disgust and surprise, a finding consistent with previous literature [53]. Capturing this objective signal “in the wild” is a promising step forward in successfully developing novel methods of screening for ASD that complement existing instruments, resulting in more accurate and accessible methods of screening for the disorder. The ability to capture this signal “in-the-wild” without the use of specialized equipment is extremely well situated for translation due to three key factors. First, an emerging model of remote care that emphasizes telehealth visits was catalyzed by COVID-19, and both clinicians and patients became accustomed to receiving care through digital tools. Second, there is increasing awareness of the disparities in diagnosing ASD both in rural areas and among low socioeconomic status groups [5,54]. Requiring only a smartphone with internet access, GuessWhat? could be used as part of a broader strategy of addressing these disparities in care for ASD among different groups in the United States. Third, while there is some disagreement about the “patient as a consumer” model [55,56], when we narrow the discussion specifically to the ways in which patients interact with digital health tools and products, it would be naive to assume that patients are not sensitive to the experience of using a digital tool, especially when many of the most successful health and wellness apps provide exemplar experiences. As such, the low barrier to use and straightforward experience provided by GuessWhat? positions the platform well for high engagement from the relevant patient populations. A limitation of this study is that the sample size was too small to evaluate the predictive power of our models across various demographic dimensions, including gender, ethnicity, nationality, age and other diagnosis (e.g. ADHD, dyslexia). As we expand our recruitment efforts, we plan to follow up on this work with models that are validated to assess fairly across demographic subdivisions. Additionally, to mitigate the possible bias introduced by parents being lenient with their own children, we should reproduce this study in a clinical setting in which the person displaying the prompts to the child is neither a parent nor informed of the child's diagnosis. Finally, future work should attempt to elucidate the impact of the many dimensions of the child-parent dyadic relationship on the signal found in this study. Future work should specifically interrogate 1) the ability for a child to identify a prompt's emotion 2) the child's ability to introspect and express the emotion in a way that the parent would recognize 3) the ability of the parent to identify the relevant emotion and 4) the parent's ability press the button quickly. Furthermore, the broader autistic phenotype (BAP) is a term that refers to the presence of certain autism-related traits in undiagnosed family members of children with autism [57]. These typically manifest as more mild impairments in social and communication abilities. Parents exhibiting the BAP could potentially drive the results found in this study.

Funding

This work was supported in part by funds to DPW from the (1R01EB025025-01, 1R21HD091500-01, 1R01LM013083, 1R01LM013364), the (Award 2014232), , , Coulter Foundation, Lucile Packard Foundation, the , and program grants from Stanford's Human Centered Artificial Intelligence Program, Stanford's Precision Health and Integrated Diagnostics Center (PHIND), Stanford's Beckman Center, Stanford's Bio-X Center, Predictives and Diagnostics Accelerator (SPADA) Spectrum, Stanford's Spark Program in Translational Research, Stanford mediaX, and Stanford's Wu Tsai Neurosciences Institute's Neuroscience: Translate Program. We also acknowledge generous support from David Orr, Imma Calvo, Bobby Dekesyer and Peter Sullivan. P.W. would like to acknowledge support from Mr. Schroeder and the Stanford Interdisciplinary Graduate Fellowship (SIGF) as the Schroeder Family Goldman Sachs Graduate Fellow.

Declaration of competing interest

D.P.W. is the founder of Cognoa.com. This company is developing digital health solutions for pediatric healthcare. AK works as a part-time consultant to Cognoa.com.

ANGRY_2	NEUTRAL_2
ANGRY_3	SAD
DISGUST	SAD_2
DISGUST_2	SCARED
DISGUST_3	SCARED_2
DISGUST_4	SURPRISED
HAPPY_2	SURPRISED_2
HAPPY_3	SURPRISED_3
NEUTRAL

ANGRY_2_duration

ANGRY_2_accuracy

DISGUST_2_duration

DISGUST_2_accuracy

...

SURPRISED_3_duration

SURPRISED_3_accuracy

41 in total

1. A Mobile Game for Automatic Emotion-Labeling of Images.

Authors: Haik Kalantarian; Khaled Jedoui; Peter Washington; Dennis P Wall
Journal: IEEE Trans Games Date: 2018-10-22

2. Feasibility Testing of a Wearable Behavioral Aid for Social Learning in Children with Autism.

Authors: Jena Daniels; Nick Haber; Catalin Voss; Jessey Schwartz; Serena Tamura; Azar Fazel; Aaron Kline; Peter Washington; Jennifer Phillips; Terry Winograd; Carl Feinstein; Dennis P Wall
Journal: Appl Clin Inform Date: 2018-02-21 Impact factor: 2.342

3. Sociodemographic Barriers to Early Detection of Autism: Screening and Evaluation Using the M-CHAT, M-CHAT-R, and Follow-Up.

Authors: Meena K Khowaja; Ann P Hazzard; Diana L Robins
Journal: J Autism Dev Disord Date: 2015-06

4. The broader autism phenotype and its implications on the etiology and treatment of autism spectrum disorders.

Authors: Jennifer Gerdts; Raphael Bernier
Journal: Autism Res Treat Date: 2011-08-17

5. Use of machine learning for behavioral distinction of autism and ADHD.

Authors: M Duda; R Ma; N Haber; D P Wall
Journal: Transl Psychiatry Date: 2016-02-09 Impact factor: 6.222

Review 6. Clinical impact of early diagnosis of autism on the prognosis and parent-child relationships.

Authors: Jennifer Harrison Elder; Consuelo Maun Kreider; Susan N Brasher; Margaret Ansell
Journal: Psychol Res Behav Manag Date: 2017-08-24

Review 7. Rural Trends in Diagnosis and Services for Autism Spectrum Disorder.

Authors: Ligia Antezana; Angela Scarpa; Andrew Valdespino; Jordan Albright; John A Richey
Journal: Front Psychol Date: 2017-04-20

8. Exploratory study examining the at-home feasibility of a wearable tool for social-affective learning in children with autism.

Authors: Jena Daniels; Jessey N Schwartz; Catalin Voss; Nick Haber; Azar Fazel; Aaron Kline; Peter Washington; Carl Feinstein; Terry Winograd; Dennis P Wall
Journal: NPJ Digit Med Date: 2018-08-02

9. Identification and Quantification of Gaps in Access to Autism Resources in the United States: An Infodemiological Study.

Authors: Michael Ning; Jena Daniels; Jessey Schwartz; Kaitlyn Dunlap; Peter Washington; Haik Kalantarian; Michael Du; Dennis P Wall
Journal: J Med Internet Res Date: 2019-07-10 Impact factor: 5.428

10. The Performance of Emotion Classifiers for Children With Parent-Reported Autism: Quantitative Feasibility Study.

Authors: Haik Kalantarian; Khaled Jedoui; Kaitlyn Dunlap; Jessey Schwartz; Peter Washington; Arman Husic; Qandeel Tariq; Michael Ning; Aaron Kline; Dennis Paul Wall
Journal: JMIR Ment Health Date: 2020-04-01