| Literature DB >> 25710120 |
J A Kosmicki1, V Sochat2, M Duda3, D P Wall3.
Abstract
Although the prevalence of autism spectrum disorder (ASD) has risen sharply in the last few years reaching 1 in 68, the average age of diagnosis in the United States remains close to 4--well past the developmental window when early intervention has the largest gains. This emphasizes the importance of developing accurate methods to detect risk faster than the current standards of care. In the present study, we used machine learning to evaluate one of the best and most widely used instruments for clinical assessment of ASD, the Autism Diagnostic Observation Schedule (ADOS) to test whether only a subset of behaviors can differentiate between children on and off the autism spectrum. ADOS relies on behavioral observation in a clinical setting and consists of four modules, with module 2 reserved for individuals with some vocabulary and module 3 for higher levels of cognitive functioning. We ran eight machine learning algorithms using stepwise backward feature selection on score sheets from modules 2 and 3 from 4540 individuals. We found that 9 of the 28 behaviors captured by items from module 2, and 12 of the 28 behaviors captured by module 3 are sufficient to detect ASD risk with 98.27% and 97.66% accuracy, respectively. A greater than 55% reduction in the number of behaviorals with negligible loss of accuracy across both modules suggests a role for computational and statistical methods to streamline ASD risk detection and screening. These results may help enable development of mobile and parent-directed methods for preliminary risk evaluation and/or clinical triage that reach a larger percentage of the population and help to lower the average age of detection and diagnosis.Entities:
Mesh:
Year: 2015 PMID: 25710120 PMCID: PMC4445756 DOI: 10.1038/tp.2015.7
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Training and testing data description
| AC | 111 | 16 | 10 | 164 | 33 | 60 |
| AGRE | 314 | 28 | 23 | 454 | 56 | 93 |
| NDAR | 315 | 47 | 282 | 109 | 21 | 27 |
| SSC | 575 | 27 | 0 | 1333 | 233 | 0 |
| SVIP | 14 | 4 | 33 | 21 | 10 | 127 |
| Total | 1329 | 122 | 348 | 2081 | 353 | 307 |
Abbreviations: AC, Autism Consortium; ADOS, Autism Diagnostic Observation Schedule; AGRE, Autism Genetic Resource Exchange; NDAR, National Database of Autism Research; SSC, Simons Simplex Collection; SVIP, Simons Variance in Individuals Project.
Total number of individuals given a diagnosis of autism, autism spectrum or non-spectrum from the ADOS-2.
AGRE was used for training the module 3 classifiers.
NDAR data set was used for training the module 2 classifiers.
Sample description
| Autism | Age | 1451 | 52 | 68 | 98 | 12–490 | 2434 | 88 | 111 | 141 | 38–559 |
| Full IQ | 710 | 61 | 77 | 90 | 25–130 | 1782 | 83 | 96 | 108 | 26–167 | |
| VIQ | 702 | 57 | 74 | 87 | 19–129 | 1784 | 82 | 95 | 108 | 19–167 | |
| NVIQ | 705 | 68 | 83 | 95 | 26–139 | 1787 | 85 | 97 | 108 | 26–161 | |
| Non-spectrum | Age | 348 | 34 | 37 | 48 | 13–183 | 307 | 80 | 108 | 130 | 35–207 |
| Full IQ | 42 | 76 | 88 | 105 | 54–132 | 181 | 87 | 100 | 110 | 63–160 | |
| VIQ | 42 | 75 | 90 | 105 | 54–123 | 181 | 89 | 100 | 109 | 49–135 | |
| NVIQ | 43 | 78 | 88 | 103 | 54–137 | 182 | 89 | 98 | 108 | 54–169 | |
Abbreviations: DX, ADOS-2 diagnosis; IQ1, first quartile; IQ2, second quartile (median); IQ3, third quartile; VIQ, verbal IQ; NVIQ, nonverbal IQ.
All ages are in months.
Machine learning algorithms used in training
| ADTree | ADTree is based on boosting and combines multiple types of decision trees. | 0.967 | 0.982 | 10/28 | 0.988 | 0.871 | 9/28 |
| Functional tree | Functional trees use linear/logistic regression at decision nodes and linear models at leaf nodes. | 0.981 | 0.986 | 12/28 | 0.994 | 0.978 | 14/28 |
| LibSVM* | SVMs search for the highest dimensional plane that separates the classes by the largest margin. | 0.997 | 0.979 | 14/28 | 1 | 0.989 | 12/28 |
| LMT | Logistic model trees use decision trees with logistic regression models at leaf nodes. | 0.989 | 0.986 | 9/28 | 0.998 | 0.967 | 15/28 |
| Logistic regression* | Predicts a categorical outcome based on a series of predictor features. | 0.989 | 0.986 | 9/28 | 0.996 | 0.978 | 19/28 |
| Naive Bayes | Naive Bayes is a probabilistic classifier based on Bayes' theorem. | 0.981 | 0.975 | 14/28 | 0.961 | 0.957 | 14/28 |
| NBTree | Naive Bayes trees are decision trees that use naive Bayes classifiers at leaf nodes. | 0.970 | 0.979 | 8/28 | 0.980 | 0.925 | 14/28 |
| Random forest | Random forest trains multiple decision trees returning the most common class. | 0.981 | 0.965 | 20/28 | 0.990 | 0.981 | 11/28 |
Abbreviations: ADTree, alternating decision tree; LMT, logistic model trees; NBTree, Naive Bayes Tree; SVM, support vector machine.
*Logistic regression and LibSVM were the top-performing algorithms for module 2 and module 3 with respect to sensitivity, specificity and number of features.
Description of the eight machine learning algorithms used in training to determine the best algorithm and optimal number of features. Sensitivity, specificity and number of features used over the total number of features in the best-performing iteration of each algorithm for modules 2 and 3 are listed.
Figure 1Module 2 logistic regression and logistic model tree (LMT) training results. Sensitivity and specificity of the module 2 logistic regression and LMT classifiers based on the number of features used during training on the National Database of Autism Research are provided in Table 1. The nine-feature logistic regression classifier (blue dot) was used in testing.
Figure 2Module 3 SVM training results. Sensitivity and specificity of the module 3 SVM classifier based on the number of features used during training on Autism Genetic Resource Exchange are provided in Table 1. The 12-feature SVM classifier was used in testing. SVM, support vector machine.
Figure 3Module 3 SVM test results. The 12-feature SVM decision values from testing data for the two classes: autism (red) and non-spectrum (blue). Forty-four misclassified individuals with autism (red triangles), and six individuals without autism (blue circles) contributed to 97.71% sensitivity and 97.20% specificity. ADOS, Autism Diagnostic Observation Schedule; SVM, support vector machine.
Module 2 activities
| Construction task | Yes |
| Response to name | No |
| Make-believe play | No |
| Joint interactive play | Yes |
| Conversation | No |
| Response to joint attention | No |
| Demonstration task | Yes |
| Description of a picture | Yes |
| Telling a story from a book | Yes |
| Free play | Yes |
| Birthday party | Yes |
| Snack | Yes |
| Anticipation of a routine with objects | Yes |
| Bubble play | Yes |
Abbreviation: ADOS, Autism Diagnostic Observation Schedule.
List of the 14 observational activities administered in module 2 of the ADOS-2. Of the 14, only 10 are needed to measure the behaviors used by the Logistic Regression classifier (Supplemental Discussion).
Module 3 activities
| Construction task | No |
| Make-believe play | No |
| Joint interactive play | Yes |
| Demonstration task | No |
| Description of a picture | Yes |
| Telling a story from a book | Yes |
| Cartoons | No |
| Conversation and reporting | Yes |
| Emotions | No |
| Social difficulties and annoyance | Yes |
| Break | Yes |
| Friends and marriage | Yes |
| Loneliness | Yes |
| Creating a story | No |
Abbreviation: ADOS, Autism Diagnostic Observation Schedule.
List of the 14 observational activities administered in module 3 of the ADOS-2. Of the 14, only 8 are needed to measure the behaviors needed by the support vector machine classifier (Supplemental Discussion).