| Literature DB >> 35436234 |
Nathan A Chi1, Peter Washington2, Aaron Kline1, Arman Husic1, Cathy Hou3, Chloe He4, Kaitlyn Dunlap1, Dennis P Wall1,4,5.
Abstract
BACKGROUND: Autism spectrum disorder (ASD) is a neurodevelopmental disorder that results in altered behavior, social development, and communication patterns. In recent years, autism prevalence has tripled, with 1 in 44 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process that requires the work of trained physicians, significant attention has been given to developing systems that automatically detect autism. We work toward this goal by analyzing audio data, as prosody abnormalities are a signal of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns.Entities:
Keywords: artificial intelligence; audio; autism; child; diagnosis; digital data; mHealth; machine learning; mobile app; speech
Year: 2022 PMID: 35436234 PMCID: PMC9052034 DOI: 10.2196/35406
Source DB: PubMed Journal: JMIR Pediatr Parent ISSN: 2561-6722
Figure 1Overview of audio-based AI detection pipeline. First, the educational video game Guess What? crowdsources the recording of videos of NT children and children with ASD from consenting participants. Audio of children's speech is manually spliced from the videos and 3 models are trained on this audio data. The first is a random forest classifier, which uses an ensemble of independently trained decision trees. The second is a CNN. The third is a fine-tuned wav2vec 2.0 model. Model 1 takes commonly used speech recognition features as input, model 2 learns from spectrograms of the audio, and model 3 takes the raw audio data itself as input. AI: artificial intelligence; ASD: autism spectrum disorder; CNN: convolutional neural network; NT: neurotypical.
Distribution of 850 audio clips across 5 folds. Each of the 3 models was trained on the same distribution of clips with 5-fold cross-validation.
| Group | Fold 0 | Fold 1 | Fold 2 | Fold 3 | Fold 4 |
| Neurotypical | 87 | 87 | 81 | 83 | 87 |
| Autism spectrum disorder | 87 | 87 | 81 | 83 | 87 |
Figure 2Mel-frequency spectrogram for a neurotypical child speech segment, spliced from a Guess What? gameplay video. This spectrogram was one of 850 used to train the convolutional neural network model with 8 million parameters, which yielded the highest accuracy of the 3 best-performing models.
Figure 3(A) and (B) represent the same 8M CNN model architecture. This architecture performed best out of all of our tested architectures, including a fine-tuned Inception v3 model. (B) was in part created with the Python package Visualkeras. 8M CNN: convolutional neural network with 8 million parameters.
Performances on Guess What? data set. Results are reported with standard deviation over 5 different runs for each model.
| Model | Accuracy, mean (SD) | Precision, mean (SD) | Recall, mean (SD) | AUROCa, mean (SD) | |
| Random forest | 0.697 (0.013) | 0.687 (0.010) | 0.744 (0.247) | 0.694 (0.013) | 0.740 (0.09) |
| Convolutional neural network | 0.793 (0.013) | 0.804 (0.014) | 0.793 (0.014) | 0.790 (0.014) | 0.822 (0.010) |
| Wav2vec 2.0 | 0.769 (0.005) | 0.782 (0.021) | 0.746 (0.031) | 0.768 (0.006) | 0.815 (0.077) |
aAUROC: area under the receiver operating characteristic curve.
Figure 4(A) ROC curve for random forest model. (B) Confusion matrix for random forest model. (C) ROC curve for 8M CNN. (D) Confusion matrix for CNN. (E) ROC curve for wav2vec 2.0 model. (F) Confusion matrix for wav2vec 2.0 model. All models were tested and trained on the Guess What? audio data set, composed of child speech segments taken from educational gameplay videos. 8M CNN: convolutional neural network with 8 million parameters; ASD: autism spectrum disorder; AUC: area under the curve; NT: neurotypical; ROC: receiver operating characteristic.