| Literature DB >> 31879701 |
Vijay S Nori1, Christopher A Hane1, William H Crown1, Rhoda Au2, William J Burke3, Darshak M Sanghavi1, Paul Bleicher1.
Abstract
INTRODUCTION: The study objective was to build a machine learning model to predict incident mild cognitive impairment, Alzheimer's Disease, and related dementias from structured data using administrative and electronic health record sources.Entities:
Keywords: Alzheimer's disease; Gradient boosting machine; Machine learning; Onset of dementia; Prediction
Year: 2019 PMID: 31879701 PMCID: PMC6920083 DOI: 10.1016/j.trci.2019.10.006
Source DB: PubMed Journal: Alzheimers Dement (N Y) ISSN: 2352-8737
Fig. 1Attrition of the two-year cohort into the training, validation and test data.
Data source sample sizes and summary statistics
| Cohort | Subset | N | Age mean (SD) | Encounters mean (SD) | Case prevalence, % | Female, % | Cardiovascular disease prevalence, % | Mood disorder prevalence, % |
|---|---|---|---|---|---|---|---|---|
| Claims | ClaimsOnly | 5,640,637 | 60.0 (10.7) | 10.7 (21.6) | 2.1 | 52.8 | 46.2 | 14.6 |
| SEHR | ClaimsOnly | 4,810,730 | 59.8 (10.6) | 10.6 (21.0) | 2.1 | 52.3 | 45.1 | 13.9 |
| SEHR | Mixed | 609,578 | 61.7 (11.3) | 11.3 (29.4) | 3.5 | 56.2 | 55.1 | 19.3 |
| Open-World | ClaimsOnly | 8,348,496 | 60.4 (10.7) | 10.7 (24.1) | 2.7 | 54.7 | 43.9 | 14.4 |
| Open-World | EHROnIy | 7,276,426 | 62.6 (11.4) | 11.4 (17.4) | 3.7 | 59.0 | 34.2 | 14.1 |
| Open-World | Mixed | 1,602,898 | 60.6 (10.7) | 10.7 (27.4) | 4.1 | 57.6 | 47.7 | 19.1 |
Label Learning Model results by age group
| Age group | Sensitivity | AUC | Lift | True positives | False positives | True negatives | False negatives | Case, % | Case count | Total count |
|---|---|---|---|---|---|---|---|---|---|---|
| 45,55 | 0.29 | 0.89 | 94.0 | 403 | 977 | 444,939 | 977 | 0.31 | 1380 | 447,296 |
| 55,60 | 0.34 | 0.90 | 63.8 | 325 | 622 | 175,999 | 622 | 0.53 | 947 | 177,568 |
| 60,64 | 0.39 | 0.90 | 48.8 | 374 | 583 | 117,945 | 583 | 0.80 | 957 | 119,485 |
| 64,70 | 0.38 | 0.88 | 24.0 | 844 | 1396 | 139,148 | 1396 | 1.57 | 2240 | 142,784 |
| 70,75 | 0.43 | 0.85 | 10.7 | 1499 | 1998 | 81,507 | 1998 | 4.02 | 3497 | 87,002 |
| 75,80 | 0.49 | 0.83 | 5.2 | 2722 | 2818 | 49,961 | 2818 | 9.50 | 5540 | 58,319 |
| 80,99 | 0.53 | 0.81 | 2.9 | 5205 | 4634 | 39,639 | 4634 | 18.18 | 9839 | 54,112 |
| Summary | 0.47 | 0.87 | 20.9 | 11,372 | 13,028 | 1,049,138 | 13,028 | 2.25 | 24,400 | 1,086,566 |
Abbreviation: AUC, area-under-the-curve.
Comparison of onset model quality for original versus learned labels
| Original labels | Learned labels | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Prediction threshold | Sensitivity of ADRD | Specificity of ADRD | Positive predictive value of ADRD | Proportion of cohort over threshold | Prediction threshold | Sensitivity of ADRD | Specificity of ADRD | Positive predictive value of ADRD | Proportion of cohort over threshold |
| Choosing by threshold greater than | |||||||||
| 0.75 | 0.060 | 1.000 | 0.857 | 0.002 | 0.75 | 0.075 | 1.000 | 1.000 | 0.002 |
| 0.50 | 0.180 | 0.998 | 0.681 | 0.006 | 0.50 | 0.283 | 1.000 | 1.000 | 0.006 |
| 0.20 | 0.388 | 0.987 | 0.405 | 0.021 | 0.20 | 0.619 | 0.991 | 0.604 | 0.021 |
| Choosing by sensitivity | |||||||||
| 0.102 | 0.50 | 0.971 | 0.282 | 0.040 | 0.328 | 0.50 | 0.999 | 0.926 | 0.011 |
| 0.007 | 0.90 | 0.545 | 0.043 | 0.465 | 0.040 | 0.90 | 0.921 | 0.196 | 0.096 |
| 0.004 | 0.95 | 0.325 | 0.031 | 0.681 | 0.031 | 0.95 | 0.893 | 0.160 | 0.124 |
| Choosing by specificity | |||||||||
| 0.064 | 0.572 | 0.95 | 0.209 | 0.061 | 0.061 | 0.830 | 0.95 | 0.262 | 0.066 |
| 0.223 | 0.353 | 0.99 | 0.448 | 0.018 | 0.189 | 0.634 | 0.99 | 0.576 | 0.023 |
| 0.610 | 0.128 | 0.999 | 0.747 | 0.004 | 0.326 | 0.503 | 0.999 | 0.916 | 0.012 |
Abbreviation: ADRD, Alzheimer's disease or related dementias.
Sensitivity (area-under-the-curve) scores over different time windows
| Time window | Outcome label | SEHR | OW-C | OW-E | OW-M | Claims |
|---|---|---|---|---|---|---|
| Label Learning | Original | 0.47 (0.87) | 0.49 (0.87) | 0.41 (0.83) | 0.50 (0.86) | 0.46 (0.87) |
| 3 year | Original | 0.26 (0.70) | 0.29 (0.70) | 0.26 (0.67) | 0.29 (0.68) | 0.23 (0.69) |
| 3 year | Learned | 0.24 (0.71) | 0.28 (0.73) | 0.27 (0.72) | 0.30 (0.72) | 0.24 (0.71) |
| 4 year | Original | 0.27 (0.67) | 0.29 (0.68) | 0.26 (0.66) | 0.29 (0.66) | 0.25 (0.69) |
| 4 year | Learned | 0.21 (0.68) | 0.27 (0.72) | 0.26 (0.72) | 0.29 (0.71) | 0.20 (0.71) |
| 5 year | Original | 0.25 (0.64) | 0.27 (0.63) | 0.24 (0.61) | 0.26 (0.62) | 0.25 (0.67) |
| 5 year | Learned | 0.22 (0.66) | 0.24 (0.71) | 0.21 (0.71) | 0.25 (0.70) | 0.23 (0.68) |
| 6 year | Original | 0.26 (0.68) | 0.27 (0.65) | 0.23 (0.64) | 0.27 (0.64) | 0.25 (0.69) |
| 6 year | Learned | 0.22 (0.67) | 0.24 (0.69) | 0.23 (0.69) | 0.25 (0.69) | 0.23 (0.69) |
| 7 year | Original | 0.25 (0.65) | 0.25 (0.67) | 0.21 (0.64) | 0.26 (0.66) | 0.26 (0.68) |
| 7 year | Learned | 0.22 (0.67) | 0.21 (0.69) | 0.20 (0.67) | 0.22 (0.68) | 0.18 (0.68) |
| 8 year | Original | 0.25 (0.63) | 0.25 (0.63) | 0.22 (0.60) | 0.28 (0.62) | 0.24 (0.59) |
| 8 year | Learned | 0.15 (0.72) | 0.21 (0.65) | 0.21 (0.63) | 0.25 (0.66) | 0.18 (0.70) |
Abbreviations: SEHR, structured electronic health record data; OW-C, Open World claims only data; OW-E, Open World EHR data; OW-M Open World mixed data.
Top 10 Features that explain the model prediction in Label Learning∗
| Type of variable | Code | Time window (days) | Code description | Percent gain | Cumulative gain |
|---|---|---|---|---|---|
| icd9 | 78097 | 730 | Altered Mental Status | 7.2 | 7.2 |
| cpt4 | 70551 | 730 | Magnetic Resonance (e.g., Proton) Imaging, Brain (including Brain Stem); Without Contrast Material | 6.4 | 13.6 |
| etg | 319900 | 60 | Neurological Diseases Signs & Symptoms | 4.4 | 18.0 |
| cpt4 | 70450 | 730 | Computed Tomography, Head Or Brain; Without Contrast Material | 4.2 | 22.3 |
| cpt4 | 70551 | 60 | Magnetic Resonance (e.g., Proton) Imaging, Brain (including Brain Stem); Without Contrast Material | 4.0 | 26.3 |
| cpt4 | 96118 | 730 | Neuropsychological Testing (e.g., Halstead-reitan Neuropsychological Battery, Wechsler Memory Scales And Wisconsin Card Sorting Test), Per Hour Of The Psychologist's Or Physician's Time, Both Face-to-face Time Administering Tests To The Patient And Time Interpreting These Test Results And Preparing The Report | 3.4 | 29.7 |
| etg | 319900 | 730 | Neurological Diseases Signs & Symptoms | 2.4 | 32.1 |
| cpt4 | 96116 | 730 | Neurobehavioral Status Exam (clinical Assessment Of Thinking, Reasoning And Judgment, e.g., Acquired Knowledge, Attention, Language, Memory, Planning And Problem Solving, And Visual Spatial Abilities), Per Hour Of The Psychologist's Or Physician's Time, Both Face-to-face Time With The Patient And Time Interpreting Test Results And Preparing The Report | 2.4 | 34.5 |
| etg | 239300 | 730 | Psychotic & Schizophrenic Disorders | 2.3 | 36.8 |
| cpt4 | 96118 | 60 | Neuropsychological Testing (e.g., Halstead-Reitan Neuropsychological Battery, Wechsler Memory Scales And Wisconsin Card Sorting Test), Per Hour Of The Psychologist's Or Physician's Time, Both Face-to-face Time Administering Tests To The Patient And Time Interpreting These Test Results And Preparing The Report | 2.3 | 39.1 |
Additional features reported in Supplementary Material D.