| Literature DB >> 27472449 |
Todd Lingren1, Pei Chen2, Joseph Bochenek3, Finale Doshi-Velez4, Patty Manning-Courtney5,6, Julie Bickel7,8, Leah Wildenger Welchons7,8,9, Judy Reinhold5,6, Nicole Bing5,6, Yizhao Ni1, William Barbaresi10, Frank Mentch11, Melissa Basford12, Joshua Denny3, Lyam Vazquez11, Cassandra Perry13, Bahram Namjou14,15, Haijun Qiu11, John Connolly11, Debra Abrams11, Ingrid A Holm10,13,16,17, Beth A Cobb14, Nataline Lingren18, Imre Solti1,15, Hakon Hakonarson11,19, Isaac S Kohane16,20, John Harley5,14,21, Guergana Savova16,20.
Abstract
OBJECTIVE: Cohort selection is challenging for large-scale electronic health record (EHR) analyses, as International Classification of Diseases 9th edition (ICD-9) diagnostic codes are notoriously unreliable disease predictors. Our objective was to develop, evaluate, and validate an automated algorithm for determining an Autism Spectrum Disorder (ASD) patient cohort from EHR. We demonstrate its utility via the largest investigation to date of the co-occurrence patterns of medical comorbidities in ASD.Entities:
Mesh:
Year: 2016 PMID: 27472449 PMCID: PMC4966969 DOI: 10.1371/journal.pone.0159621
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1ASD Algorithm Project Overview.
ASD–Autism Spectrum Disorder; ICD-9 –International Classification of Diseases 9th edition; DSM IV–Diagnostic and Statistical Manual of Mental Diseases 4th edition; ML—machine learning
Fig 2ASD Rule-based Algorithm.
ASD–Autism Spectrum Disorder; EHR–Electronic Health Records; ICD-9 –International Classification of Diseases 9th edition; DSM IV–Diagnostic and Statistical Manual of Mental Diseases 4th edition; PDD-NOS–pervasive developmental disorder not otherwise specified; sections 3a., 3b., 3c. refer to DSM IV ASD classification for Autism, Asperger’s and PDD-NOS, respectively
Distribution of patients
| Training | Development | Test | Total | |
|---|---|---|---|---|
| BCH | 95 | 27 | 28 | 150 |
| CCHMC | 87 | 34 | 31 | 152 |
| Combined | 182 | 61 | 59 | 302 |
BCH–Boston Children’s Hospital; CCHMC–Cincinnati Children’s Hospital and Medical Center.
Fig 3SVM-based Machine Learning Prediction System.
SVM–Support Vector Machines.
Best Machine Learning Results on Test Set.
| 1st stage: Combined | BCH | 0.726 | 0.852 | 0.784 | 0.533 |
| 1st stage: CCHMC | CCHMC | 0.66 | 0.813 | 0.728 | 0.545 |
| 1st stage: Combined | Combined | 0.864 | 0.836 | 0.785 | 0.583 |
| 2nd stage: Combined | BCH | 0.780 | 0.783 | 0.780 | 0.762 |
| 2nd stage: Combined | CCHMC | 0.799 | 0.769 | 0.780 | 0.733 |
| 2nd stage: Combined | Combined | 0.786 | 0.769 | 0.761 | 0.770 |
BCH–Boston Children’s Hospital; CCHMC–Cincinnati Children’s Hospital and Medical Center; PPV–positive predictive value; ROC–Receivers Operator Characteristic.
Rule Based Results
| Evaluation Set | Precision/PPV | Recall/Sensitivity | F1-Measure | Area under ROC Curve |
|---|---|---|---|---|
| BCH | 0.885 | 0.891 (14) | 0.888 | 0.642 |
| CCHMC | 0.840 | 0.622 (48) | 0.715 | 0.599 |
| Combined | 0.866 | 0.758 (62) | 0.808 | 0.579 |
| CHOP | 0.849 | 0.737 (10) | 0.788 | 0.659 |
| (independent validation on 50 patients) |
BCH–Boston Children’s Hospital; CCHMC–Cincinnati Children’s Hospital and Medical Center; CHOP–Children’s Hospital of Philadelphia; PPV–positive predictive value; ROC–Receivers Operator Characteristic.
Fig 4SVM Grid Search on development set, using features and cost parameters.
Baseline Results (ICD-9 codes) on Test Set.
| Test Set | Precision/PPV |
|---|---|
| BCH | 0.273 |
| CCHMC | 0.645 |
| Combined | 0.460 |
BCH–Boston Children’s Hospital; CCHMC–Cincinnati Children’s Hospital and Medical Center; PPV–positive predictive value.
Fig 5Comparison of relative prevalence of primary co-morbidity categories for clusters of NLP rule-based patients for BCH, and CCHMC and VUMC.
BCH–Boston Children’s Hospital; CCHMC–Cincinnati Children’s Hospital and Medical Center; VUMC–Vanderbilt University Medical Center.
Fig 6Dimensionality reduction using the t-SNE algorithm on PheWAS codes.
Colors label clusters from the k-means algorithm. The clusters are labeled according to the comorbidity category with the highest relative prevalence for that cluster—duplicate labels appear when there is more than one cluster dominated by the same category. (t-distributed Stochastic Neighbor Embedding—t-SNE, Phenotype Wide Association Study–PheWAS, BCH–Boston Children’s Hospital; CCHMC–Cincinnati’s Children’s Hospital and Medical Center; VUMC–Vanderbilt University Medical Center, Deve.–Developmental Disorders, Seiz.–Seizure Disorders, Psych.—Psychological Disorders).