| Literature DB >> 28002438 |
Matthew J Maenner1,2, Marshalyn Yeargin-Allsopp1, Kim Van Naarden Braun1, Deborah L Christensen1, Laura A Schieve1.
Abstract
The Autism and Developmental Disabilities Monitoring (ADDM) Network conducts population-based surveillance of autism spectrum disorder (ASD) among 8-year old children in multiple US sites. To classify ASD, trained clinicians review developmental evaluations collected from multiple health and education sources to determine whether the child meets the ASD surveillance case criteria. The number of evaluations collected has dramatically increased since the year 2000, challenging the resources and timeliness of the surveillance system. We developed and evaluated a machine learning approach to classify case status in ADDM using words and phrases contained in children's developmental evaluations. We trained a random forest classifier using data from the 2008 Georgia ADDM site which included 1,162 children with 5,396 evaluations (601 children met ADDM ASD criteria using standard ADDM methods). The classifier used the words and phrases from the evaluations to predict ASD case status. We evaluated its performance on the 2010 Georgia ADDM surveillance data (1,450 children with 9,811 evaluations; 754 children met ADDM ASD criteria). We also estimated ASD prevalence using predictions from the classification algorithm. Overall, the machine learning approach predicted ASD case statuses that were 86.5% concordant with the clinician-determined case statuses (84.0% sensitivity, 89.4% predictive value positive). The area under the resulting receiver-operating characteristic curve was 0.932. Algorithm-derived ASD "prevalence" was 1.46% compared to the published (clinician-determined) estimate of 1.55%. Using only the text contained in developmental evaluations, a machine learning algorithm was able to discriminate between children that do and do not meet ASD surveillance criteria at one surveillance site.Entities:
Mesh:
Year: 2016 PMID: 28002438 PMCID: PMC5176307 DOI: 10.1371/journal.pone.0168224
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Histograms of prediction scores (x-axis) compared to clinician-assigned surveillance case definition (blue: autism spectrum disorder (ASD), red: non-ASD).
Horizontal bar represents classification score threshold. Upper panel: classifications for 2008 (training) data. Bottom panel: 2010 (test) data, discordant classifications are highlighted in a lighter shade of blue or red.
Comparison between clinician-assigned surveillance autism spectrum disorder (ASD) case status and predictions from random forest algorithm.
| Study Year | ||
|---|---|---|
| 2008 | 2010 | |
| Number of children abstracted | 1162 | 1450 |
| Number of ASD cases | 601 | 754 |
| Simple Agreement (%) | 86.3 | 86.5 |
| Sensitivity (%) | 84.5 | 84.0 |
| Specificity (%) | 88.2 | 89.2 |
| Positive Predictive Value (%) | 88.5 | 89.4 |
| Negative Predictive Value (%) | 84.2 | 83.7 |
| Kappa | 0.73 | 0.73 |
| Area Under Receiver-Operating Characteristic Curve | 0.932 | 0.932 |
Footnote: Autism spectrum disorder is abbreviated as ASD
*Note: the estimates for the 2008 data are calculated from the “out-of-bag” sample, reducing the potential for overfitting.
Autism spectrum disorder (ASD) prevalence per 1,000 children (with 95% confidence interval) for 2010 Georgia Autism and Developmental Disabilities Monitoring Network site: comparison between published and algorithm-derived estimates.
| Group | Published[ | Algorithm-based | Algorithm: Published Prevalence Ratio | ||
|---|---|---|---|---|---|
| Overall | 15.5 | (14.5–16.7) | 14.6 | (13.6–15.7) | 0.94 |
| Boys | 25.4 | (23.5–27.5) | 24.1 | (22.3–26.1) | 0.95 |
| Girls | 5.5 | (4.6–6.5) | 4.9 | (4.1–5.9) | 0.89 |
| Non-Hispanic White | 18.2 | (16.2–20.4) | 17.4 | (15.5–19.5) | 0.95 |
| Non-Hispanic Black | 14.0 | (12.5–15.7) | 13.0 | (11.5–14.6) | 0.93 |
| Hispanic | 10.7 | (8.7–13.1) | 10.1 | (8.2–12.5) | 0.94 |
Characteristics of children, by algorithm-clinician concordance on autism spectrum disorder (ASD) case status.
| "True Positives" | "False Positives" | "False Negatives" | "True Negatives" | |
|---|---|---|---|---|
| Clinician/Surveillance classification: | ASD | Non-ASD | ASD | Non-ASD |
| Algorithm prediction | ASD | ASD | Non-ASD | Non-ASD |
| Number of children (out of 1450) | 633 | 75 | 121 | 621 |
| Non-Hispanic White (%) | 38.9 (35.1–42.7) | 40.0 (30.0–51.3) | 35.5 (27.6–44.4) | 40.1 (36.3–44.0) |
| Male (%) | 83.7 (80.1–86.4) | 81.3 (71.1–88.5) | 76.0 (67.7–82.8) | 71.8 (68.2–75.2) |
| Known IQ < = 70 (%) | 35.4 (31.8–39.2) | 28.0 (19.1–39.0) | 14.9 (9.6–22.3) | 22.7 (19.6–26.2) |
| Previous diagnosis of autistic disorder (%) | 52.1 (48.2–56.0) | 14.7 (8.4–24.4) | 12.4 (7.7–19.4) | 1.4 (0.7–2.7) |
| Previous ASD diagnosis other than autistic disorder (%) | 44.2 (40.4–48.1) | 12.0 (6.4–21.3) | 26.4 (19.4–34.9) | 2.7 (1.7–4.3) |
| Previous ASD special education classification (%) | 65.4 (61.6–69.0) | 12.0 (6.4–21.3) | 15.7 (10.3–23.2) | 0.8 (0.3–1.9) |
| Any autistic disorder/ASD diagnosis or ASD special education classification (%) | 92.3 (89.9–94.1) | 36.0 (26.1–47.3) | 44.6 (36.1–53.5) | 4.7 (3.3–6.6) |
| Number of evaluations (median and IQR) | 7 (4–10) | 7 (3–11) | 3 (2–5) | 4 (2–6) |
| Age in months at first evaluation (median and IQR) | 40 (28–56) | 41 (26–59) | 53 (34–73) | 53 (35–71) |
| Evaluations from school sources only (%) | 22.4 (19.4–25.8) | 30.7 (21.4–41.8) | 36.4 (28.3–45.2) | 43.0 (39.2–46.9) |
| Evaluations from health sources only (%) | 11.7 (9.4–14.4) | 13.3 (7.4–22.8) | 24.8 (18.0–33.2) | 21.6 (18.5–25.0) |
| Evaluation from both school and health sources (%) | 65.9 (62.1–69.5) | 56.0 (44.7–66.7) | 38.8 (30.6–47.7) | 35.3 (31.6–39.1) |
| ADDM reviewers requested a “secondary” review (%) | 11.4 (9.1–14.1) | 64.0 (52.7–73.9) | 44.6 (36.1–53.5) | 31.7 (28.2–35.5) |
Footnote: Autism spectrum disorder is abbreviated as ASD
*Note: Categories are not mutually exclusive; children often receive multiple diagnoses
**Note: Surveillance system clinician reviewers requested a second review from another clinician if they felt uncertain about a child’s ASD classification