| Literature DB >> 31276468 |
Vijay S Nori1, Christopher A Hane1, David C Martin1, Alexander D Kravetz2, Darshak M Sanghavi1.
Abstract
Alzheimer's disease and related dementias (ADRD) are highly prevalent conditions, and prior efforts to develop predictive models have relied on demographic and clinical risk factors using traditional logistical regression methods. We hypothesized that machine-learning algorithms using administrative claims data may represent a novel approach to predicting ADRD. Using a national de-identified dataset of more than 125 million patients including over 10,000 clinical, pharmaceutical, and demographic variables, we developed a cohort to train a machine learning model to predict ADRD 4-5 years in advance. The Lasso algorithm selected a 50-variable model with an area under the curve (AUC) of 0.693. Top diagnosis codes in the model were memory loss (780.93), Parkinson's disease (332.0), mild cognitive impairment (331.83) and bipolar disorder (296.80), and top pharmacy codes were psychoactive drugs. Machine learning algorithms can rapidly develop predictive models for ADRD with massive datasets, without requiring hypothesis-driven feature engineering.Entities:
Mesh:
Year: 2019 PMID: 31276468 PMCID: PMC6611655 DOI: 10.1371/journal.pone.0203246
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Cases identified using different rules.
Case 1 shows an individual who has an ADRD diagnosis in an inpatient setting and no previous relevant claims. So, the confirmation and index date are on the day of that claim. Case 2 has the same inpatient diagnosis claim and a previous claim in outpatient setting. So, the previous claim is used for as the index date, although it is over 730days prior to the claim in the inpatient setting. Case 3 has two claims in outpatient settings; the second claim is used as the confirmation and the first is used as the index date. Case 4 has a pharmacy claims for Memantine Hydrochloride and a diagnosis claim in an outpatient setting within 730 days. This case has a previous diagnosis claim in an outpatient setting which is used as the index date. Case 5 has multiple claims for Donepezil, Galantamin, Rivastigmin or Tacrin and the earliest of those is used for the confirmation and index dates.
Fig 2Study design with training and test cohorts.
Demographic and clinical profiles of cohorts.
| Training | Validation | |||
|---|---|---|---|---|
| Cases | Controls | Cases | Controls | |
| Age, years, mean (SD) | 77.24 (6.95) | 77.24 (6.95) | 77.19 (6.99) | 58.71 (11.35) |
| Gender, male, N (%) | 13,794 (38.46) | 68,970 (38.46) | 3,432 (37.80) | 267,801 (46.07) |
| Diagnosis Codes, N(%) | ||||
| HYPERTENSION (401.9) | 21,580 (60.17) | 98,672 (55.02) | 5,419 (59.69) | 187,144 (32.19) |
| HYPERLIPIDEMIA (272.4) | 17,229 (48.04) | 82,314 (45.90) | 4,357 (47.99) | 191,833 (33.00) |
| HYPERCHOLESTEROLEMIA (272.0) | 10,768 (30.02) | 58,244 (32.48) | 2,814 (30.99) | 116,730 (20.08) |
| PAIN IN SOFT TISSUES OF LIMB (729.5) | 10,112 (28.19) | 42,702 (23.81) | 2,566 (28.26) | 91,039 (15.66) |
| OTHER MALAISE AND FATIGUE (780.79) | 9,540 (26.60) | 34,465 (19.22) | 2,403 (26.47) | 87,362 (15.03) |
| Procedure Codes N (%) | ||||
| RADIOLOGIC EXAMINATION, CHEST (71020) | 13,915 (38.80) | 61,303 (34.18) | 3,518 (38.75) | 14,3310 (24.65) |
| DIAGNOSTIC RADIOGRAPHIC PROC (76499) | 12,759 (35.57) | 64,037 (35.71) | 3,235 (35.63) | 205,253 (35.31) |
| SCREENING MAMMOGRAPHY (77052) | 8,884 (24.77) | 51,472 (28.70) | 2,287 (25.19) | 171,086 (29.43) |
| BONE DENSITY STUDY (77080) | 6,815 (19.00) | 38,011 (21.20) | 1,746 (19.23) | 85,077 (14.64) |
| CT, HEAD OR BRAIN (70450) | 6,266 (17.47) | 18,047 (10.06) | 1,609 (17.72) | 28,423 (4.89) |
| Drug Codes N (%) | ||||
| HYDROCODONE BIT/ACETAMINOPHEN | 4,198 (11.70) | 12,686 (7.07) | 1,046 (11.52) | 72,548 (12.48) |
| SIMVASTATIN | 3,548 (9.89) | 12,584 (7.02) | 968 (10.66) | 37,149 (6.39) |
| AZITHROMYCIN | 3,221 (8.98) | 12,561 (7.00) | 823 (9.06) | 69,553 (11.96) |
| LISINOPRIL | 3,211 (8.95) | 10,926 (6.09) | 824 (9.08) | 31,572 (5.43) |
| LEVOTHYROXINE SODIUM | 2,663 (7.42) | 9,623 (5.37) | 682 (7.51) | 29,602 (5.09) |
Fig 3Data sparsity in the cohort.
Over 55% of the cases and 80% of the controls have fewer than 3 codes.
Model sensitivity and specificity computed using a single threshold for the entire cohort based on the prevalence of the cases and for each age range based on the prevalence of the cases in those age ranges.
Age and gender matching in training yields different prevalence and measures.
| Cohort | Age | Prevalence | Threshold | Sensitivity | Specificity | Lift |
|---|---|---|---|---|---|---|
| Training | all | 16.67% | 0.20 | 31.9% | 86.4% | 1.9 |
| 15–64 | 16.67% | 0.18 | 41.8% | 88.4% | 2.5 | |
| 65–74 | 16.67% | 0.19 | 39.1% | 87.8% | 2.3 | |
| 75–99 | 16.67% | 0.21 | 29.6% | 85.9% | 1.8 | |
| computed | 16.67% | 32.1% | 86.4% | 1.9 | ||
| Test | all | 1.54% | 0.37 | 9.9% | 98.6% | 6.4 |
| 15–64 | 0.14% | 0.61 | 3.3% | 99.9% | 23.3 | |
| 65–74 | 1.77% | 0.40 | 9.8% | 98.4% | 5.6 | |
| 75–99 | 8.46% | 0.26 | 19.2% | 92.5% | 2.3 | |
| computed | 1.54% | 16.4% | 98.7% | 10.7 |
Fig 4Distribution of scores for cases and controls for different age ranges.
Figure shows substantial overlap in scores between cases and controls. The vertical lines show a proposed cut point for classification.
Clinical diagnosis, procedure and pharmacy variables included in the model, coefficients and variance inflation factors (VIF).
The intercept has no VIF because it is a constant and does not vary across the observations due to age/gender matching. The specific ICD-9 codes for diagnosis and CPT-4 codes for procedures used to identify these variables are shown in parenthesis.
| Type | Variable Description | Coefficient | Pr(>|Z|) | VIF |
|---|---|---|---|---|
| INTERCEPT (MALE, 65–69) | -1.96 | <0.0001 | ||
| ICD-9-CM | MEMORY LOSS (780.93) | 1.33 | <0.0001 | 1.02 |
| PARALYSIS AGITANS (332.0) | 1.17 | <0.0001 | 1.01 | |
| MILD COGNITIVE IMPAIRMENT, SO STATED (331.83) | 1.14 | <0.0001 | 1.01 | |
| BIPOLAR DISORDER, UNSPECIFIED (296.80) | 1.00 | <0.0001 | 1.01 | |
| UNSPECIFIED PSYCHOSIS (298.9) | 0.44 | <0.0001 | 1.06 | |
| LOSS OF WEIGHT (783.21) | 0.42 | <0.0001 | 1.02 | |
| DEPRESSIVE DISORDER, NOT ELSEWHERE CLASSIFIED (311) | 0.37 | <0.0001 | 1.10 | |
| ALTERED MENTAL STATUS (780.97) | 0.33 | <0.0001 | 1.16 | |
| PERSONAL HISTORY OF FALL (V15.88) | 0.32 | <0.0001 | 1.05 | |
| OTHER CONVULSIONS (780.39) | 0.32 | <0.0001 | 1.04 | |
| UNSPECIFIED FALL (E888.9) | 0.27 | <0.0001 | 1.06 | |
| OTHER CHRONIC PAIN (338.29) | 0.28 | <0.0001 | 1.03 | |
| ACUTE, BUT ILL-DEFINED, CEREBROVASCULAR DISEASE (436) | 0.24 | <0.0001 | 1.33 | |
| URGE INCONTINENCE (788.31) | 0.24 | <0.0001 | 1.07 | |
| OTHER ALTERATION OF CONSCIOUSNESS (780.09) | 0.23 | <0.0001 | 1.09 | |
| UNSPECIFIED CONSTIPATION (564.00) | 0.21 | <0.0001 | 1.05 | |
| UNSPECIFIED URINARY INCONTINENCE (788.30) | 0.21 | <0.0001 | 1.08 | |
| ENCOUNTER FOR LONG-TERM (CURRENT) USE OF OTHER MEDICATIONS (V58.69) | 0.17 | <0.0001 | 1.03 | |
| LACK OF COORDINATION (781.3) | 0.15 | <0.0001 | 1.09 | |
| OTHER MALAISE AND FATIGUE (780.79) | 0.11 | <0.0001 | 1.15 | |
| DIABETES MELLITUS (250.02) | 0.12 | <0.0001 | 1.30 | |
| ABNORMALITY OF GAIT (781.2) | 0.10 | <0.0001 | 1.20 | |
| DIZZINESS AND GIDDINESS (780.4) | 0.11 | <0.0001 | 1.13 | |
| UNSPECIFIED CEREBRAL ARTERY OCCLUSION WITH CEREBRAL INFARCTION (434.91) | 0.11 | 0.0083 | 1.35 | |
| DIABETES MELLITUS (250.00) | 0.07 | <0.0001 | 1.42 | |
| EDEMA (782.3) | 0.07 | <0.0001 | 1.10 | |
| MUSCLE WEAKNESS (GENERALIZED) (728.87) | 0.06 | 0.0277 | 1.14 | |
| URINARY TRACT INFECTION, SITE NOT SPECIFIED (599.0) | 0.03 | 0.0526 | 1.12 | |
| CPT | SCREENING MAMMOGRAPHY; COMPUTER-AIDED DETECTION (77052) | -0.25 | <0.0001 | 1.03 |
| COMPUTED TOMOGRAPHY, HEAD OR BRAIN (70450) | 0.16 | <0.0001 | 1.42 | |
| RADIOLOGIC EXAMINATION, CHEST; SINGLE VIEW, FRONTAL (71010) | 0.05 | 0.0011 | 1.28 | |
| Medication (HICL Description) | VENLAFAXINE HCL | 0.52 | <0.0001 | 1.02 |
| DULOXETINE HCL | 0.45 | <0.0001 | 1.05 | |
| TOLTERODINE TARTRATE | 0.32 | <0.0001 | 1.07 | |
| SERTRALINE HCL | 0.29 | <0.0001 | 1.05 | |
| CITALOPRAM HYDROBROMIDE | 0.25 | <0.0001 | 1.06 | |
| POTASSIUM CHLORIDE | 0.24 | <0.0001 | 1.35 | |
| OXYBUTYNIN CHLORIDE | 0.22 | 0.0001 | 1.08 | |
| HYDROCODONE BIT/ACETAMINOPHEN | 0.19 | <0.0001 | 1.27 | |
| PROPOXYPHENE/ACETAMINOPHEN | 0.15 | <0.0001 | 1.09 | |
| SULFAMETHOXAZOLE/TRIMETHOPRIM | 0.15 | <0.0001 | 1.12 | |
| METFORMIN HCL | 0.13 | 0.0001 | 1.40 | |
| BLOOD SUGAR DIAGNOSTIC | 0.12 | 0.0011 | 1.38 | |
| LISINOPRIL | 0.11 | <0.0001 | 1.17 | |
| CEPHALEXIN MONOHYDRATE | 0.10 | 0.0003 | 1.12 | |
| SIMVASTATIN | 0.10 | <0.0001 | 1.17 | |
| CLOPIDOGREL BISULFATE | 0.09 | 0.0054 | 1.11 | |
| TRAMADOL HCL | 0.09 | 0.0097 | 1.13 | |
| GABAPENTIN | 0.07 | 0.0914 | 1.12 | |
| FUROSEMIDE | 0.00 | 0.9396 | 1.46 |