| Literature DB >> 34437600 |
Mark N Warden1, Susan Searles Nielsen1, Alejandra Camacho-Soto1, Roman Garnett2, Brad A Racette1,3.
Abstract
Identifying people with Parkinson disease during the prodromal period, including via algorithms in administrative claims data, is an important research and clinical priority. We sought to improve upon an existing penalized logistic regression model, based on diagnosis and procedure codes, by adding prescription medication data or using machine learning. Using Medicare Part D beneficiaries age 66-90 from a population-based case-control study of incident Parkinson disease, we fit a penalized logistic regression both with and without Part D data. We also built a predictive algorithm using a random forest classifier for comparison. In a combined approach, we introduced the probability of Parkinson disease from the random forest, as a predictor in the penalized regression model. We calculated the receiver operator characteristic area under the curve (AUC) for each model. All models performed well, with AUCs ranging from 0.824 (simplest model) to 0.835 (combined approach). We conclude that medication data and random forests improve Parkinson disease prediction, but are not essential.Entities:
Mesh:
Year: 2021 PMID: 34437600 PMCID: PMC8389479 DOI: 10.1371/journal.pone.0256592
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Characteristics of Parkinson disease cases and controls with Medicare Part D coverage, U.S. Medicare 2009, %.
| Cases | Controls | |
|---|---|---|
| N = 35,941 | N = 52,324 | |
| Age, years | ||
| 66–69 | 8.1 | 16.7 |
| 70–74 | 19.5 | 28.3 |
| 75–79 | 24.2 | 22.3 |
| 80–84 | 27.3 | 19.2 |
| 85–90 | 21.0 | 13.4 |
| Female | 64.7 | 54.0 |
| Race/ethnicity | ||
| White | 86.3 | 83.7 |
| Black | 6.0 | 7.8 |
| Pacific Islander/other | 1.2 | 1.6 |
| Asian | 2.9 | 3.4 |
| Hispanic | 3.1 | 2.9 |
| Native American | 0.3 | 0.4 |
| Unknown | 0.1 | 0.1 |
| Smoking index ≥ median | 41.1 | 51.5 |
| Age, years, mean (SD) | 78.8 (6.1) | 78.1 (6.2) |
| Number of unique ICD9 codes, mean (SD) | 99.7 (52.4) | 76.3 (46.0) |
a Predicted probability of ever smoking divided by the person’s total number of unique diagnosis codes.
Abbreviations: ICD9 = International Classification of Diseases, Ninth Revision, Clinical Modification; SD = standard deviation.
Fig 1Comparison of distinct and shared predictors between models for predicting Parkinson disease, U.S. Medicare 2009.
Performance of models for predicting Parkinson disease in the test dataset.
| Cut point that maximizes percent accurately classified | Cut point at Youden’s index | Overall performance | Relative performance | ||||
|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Sensitivity | Specificity | AUC(95% CI) | |||
| (95% CI) | (95% CI) | (95% CI) | (95% CI) | ||||
| Penalized regression without Part D | 65.5 (63.9–67.1) | 83.4 (82.4–84.4) | 78.0 (76.7–79.3) | 73.2 (71.9–74.4) | 0.824 (0.815–0.832) | Reference model | -- |
| Penalized regression with Part D | 67.2 (65.6–68.7) | 82.6 (81.6–83.7) | 78.6 (77.2–79.9) | 73.3 (72.1–74.6) | 0.827 (0.818–0.836) | p = 0.61 | Reference model |
| Random forest (with Part D) | 66.3 (64.7–67.8) | 82.8 (81.8–83.9) | 76.8 (75.4–78.1) | 75.0 (73.9–76.2) | 0.826 (0.818–0.835) | -- | p = 0.90 |
| Combined model (with Part D) | 72.9 (71.5–79.6) | 79.6 (78.4–80.7) | 76.3 (74.9–77.6) | 76.3 (75.0–77.4) | 0.835 (0.826–0.843) | -- | p = 0.23 |
a Percent sensitivity or specificity, at selected cut points: The cut point that maximizes the percent accurately classified (data dependent) and the cut point at Youden’s index [29] (not data dependent).
b The AUC is a measure of overall model performance, and the presented p-value assesses relative performance of the specified model as compared to the stated reference model using the method of DeLong et al. [33] to obtain the p-value. A p-value < 0.05 indicates that the two AUCs being compared are significantly different. The first comparison tests whether there is a difference in AUC when including Part D prescription medication data in the penalized regression model. The other comparisons test whether there is a difference in the AUCs across the different approaches in which Part D data were included.
c Random forest classifier’s case prediction probability included as a predictor in a new penalized regression model with Part D prescription medication data.
Abbreviations: AUC = area under the receiver operator characteristic curve; CI = confidence interval.