| Literature DB >> 27494614 |
Ivo D Dinov1,2,3, Ben Heavner4, Ming Tang1, Gustavo Glusman4, Kyle Chard5, Mike Darcy6, Ravi Madduri5, Judy Pa2, Cathie Spino3, Carl Kesselman6, Ian Foster5, Eric W Deutsch4, Nathan D Price4, John D Van Horn2, Joseph Ames2, Kristi Clark2, Leroy Hood4, Benjamin M Hampstead7,8, William Dauer3, Arthur W Toga2.
Abstract
BACKGROUND: A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. METHODS ANDEntities:
Mesh:
Year: 2016 PMID: 27494614 PMCID: PMC4975403 DOI: 10.1371/journal.pone.0157077
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of UPDRS ratings.
| UPDRS ratings | Missing | |
|---|---|---|
| Uses 13 questions to measures non-motor aspects of experiences of daily living. It has two parts—Part 1A (neuropsychiatric symptoms) and Part IB (non-motor symptoms) | 47.5% | |
| Captures motor aspects of experiences of daily living (dressing, hygiene, tremor, freezing) | 47.6% | |
| Neurological motor examination (speech, rigidity, movement of extremities, and posture) | 48.2% | |
| Motor complications (dyskinesia and motor fluctuations) | 80.7% | |
Outline of the core study designs and data characteristics.
| Inference Type | Methods | Comments on Data Restrictions | # of Cases |
|---|---|---|---|
| Change-model (HC vs. PD or PD+SWEDD) GLM, MMRM or classification | Imaging data limited to bilateral surface-area, curvature, shape-index, curvedness and volume for the insular cortex and the cingulate gyrus | 423 | |
| GEE | UPDRS data limited to the following top-level variables: Part_I_Summary, Part_II_Patient_Questionnaire_Summary, Part_III_Summary, X_Assessment_Non.Motor_Epworth_Sleepiness_Scale_Summary, X_Assessment_Non.Motor_Geriatric_Depression_Scale_GDS_Short_Summary | 406 | |
| Various classifiers | (Raw) unbalanced and rebalanced (SMOTE) groups, all data elements with and without UPDRS (to contrast the power of no-UPDRS predictions) | 423 |
Legend: HC = healthy controls; PD+SWEDD = (pooled cohort) Parkinson’s Disease and scans without evidence of dopaminergic deficit; GLM = generalized linear model; MMRM = Mixed-effect Model Repeat Measurement; GEE = generalized estimating equation; UPDRS = unified Parkinson's disease rating scale.
Gender Distributions.
| Cohort | Total | |||
|---|---|---|---|---|
| Gender | HC | PD | SWEDD | |
| 84 | 170 | 23 | 277 | |
| 39 | 93 | 14 | 146 | |
| 123 | 263 | 37 | 423 | |
Gender Differences by Cohort.
| Statistics | df | Value | Prob |
|---|---|---|---|
| 2 | 0.7 | 0.705 | |
| Likelihood Ratio | 2 | 1.1575 | 0.5606 |
| Mantel-Haenszel | 1 | 1.0453 | 0.3066 |
Between-cohort differences in some demographic, genetic and clinical variables.
| Variable | Categories/Classes | HC | PD | SWEDD | Exact Fisher's Test (p-value) |
|---|---|---|---|---|---|
| 1 | 84 | 170 | 23 | 0.711 | |
| 2 | 39 | 93 | 14 | ||
| 0 | 123 | 259 | 37 | 0.523 | |
| 1 | 0 | 4 | 0 | ||
| 0 | 67 | 127 | 15 | 0.189 | |
| 1 | 40 | 107 | 20 | ||
| 2 | 16 | 29 | 2 | ||
| 0 | 84 | 176 | 28 | 0.733 | |
| 1 | 37 | 78 | 8 | ||
| 2 | 2 | 9 | 1 | ||
| 0 | 77 | 160 | 26 | 0.904 | |
| 1 | 40 | 89 | 10 | ||
| 2 | 6 | 14 | 1 | ||
| 0 | 80 | 163 | 26 | 0.906 | |
| 1 | 39 | 88 | 10 | ||
| 2 | 4 | 12 | 1 | ||
| 0 | 82 | 167 | 26 | 0.779 | |
| 1 | 39 | 86 | 10 | ||
| 2 | 2 | 10 | 1 | ||
| Dementia (PDD) | 0 | 5 | 0 | ||
| Mild Cognitive Impairment (PD-MCI) | 2 | 46 | 12 | ||
| Normal Cognition (PD-NC) | 121 | 212 | 25 | ||
| No | 121 | 230 | 31 | ||
| Yes | 2 | 33 | 6 | ||
| No | 121 | 251 | 33 | ||
| Yes | 2 | 12 | 4 | ||
| 10% - 49% | 0 | 5 | 0 | ||
| 50% - 89% | 5 | 45 | 7 | ||
| 90% - 100% | 118 | 213 | 30 |
Most significant GEE model coefficients for covariates contributing to segregating mean cohort differences.
| Estimate | Std.err | Wald | Pr(>|W|) | |
|---|---|---|---|---|
| -7.2397 | 1.7924 | 16.32 | ||
| -38.8454 | 1.6404 | 560.78 | ||
| 1.1731 | 0.5006 | 5.49 | 0.01911 | |
| -1.0406 | 0.1688 | 38.01 | ||
| -0.6709 | 0.0622 | 116.36 |
GEE and GLMM predictive model summaries.
| GEE | GLMM | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Estimate | Std.err | Wald | Pr(>|W|) | AIC | BIC | logLik | Deviance | Df.resid | |
| (Intercept) | 1.117257 | 0.723184 | 2.39 | 0.12237 | 193.8 | 380.9 | -64.9 | 129.8 | 2530 |
| L superior parietal gyrus ComputeArea | 1.400402 | 1.497027 | 0.88 | 0.34955 | |||||
| L superior parietal gyrus Volume | -2.525902 | 1.359074 | 3.45 | 0.06309 | (Intercept) | -7.82E+01 | 0.99904 | ||
| R superior parietal gyrus ComputeArea | 1.164381 | 0.976521 | 1.42 | 0.23311 | L_superior_parietal_gyrus_ComputeArea | -1.94E+00 | 0.77698 | ||
| R superior parietal gyrus Volume | -0.451386 | 1.007911 | 0.2 | 0.65427 | L_superior_parietal_gyrus_Volume | 2.34E+00 | 0.74594 | ||
| L putamen ComputeArea | 0.496793 | 0.713507 | 0.48 | 0.48626 | R_superior_parietal_gyrus_ComputeArea | -1.36E-01 | 0.98267 | ||
| L putamen Volume | -0.61937 | 0.83991 | 0.54 | 0.46086 | R_superior_parietal_gyrus_Volume | 1.31E+00 | 0.84431 | ||
| R putamen Volume | 0.836431 | 0.946288 | 0.78 | 0.37675 | L_putamen_ComputeArea | -3.63E+00 | 0.52797 | ||
| R putamen ShapeIndex | 0.56228 | 0.354731 | 2.51 | 0.11295 | L_putamen_Volume | 2.84E+00 | 0.58167 | ||
| L caudate ComputeArea | -0.413876 | 1.746183 | 0.06 | 0.81264 | R_putamen_Volume | -2.95E-01 | 0.93113 | ||
| L caudate Volume | 1.580367 | 1.540474 | 1.05 | 0.30494 | R_putamen_ShapeIndex | -2.06E-01 | 0.87846 | ||
| R caudate ComputeArea | 0.155381 | 1.410379 | 0.01 | 0.91227 | L_caudate_ComputeArea | -6.77E+00 | 0.39352 | ||
| R caudate Volume | -1.804119 | 1.502705 | 1.44 | 0.22991 | L_caudate_Volume | 3.38E+00 | 0.65557 | ||
| R_caudate_ComputeArea | 5.30E+00 | 0.53451 | |||||||
| chr17 rs11868035 GT | -0.716443 | 0.407688 | 3.09 | 0.07886 | R_caudate_Volume | -2.08E+00 | 0.78918 | ||
| chr17 rs11012 GT | -0.431071 | 0.891313 | 0.23 | 0.62864 | chr12_rs34637584_GT | -4.51E+00 | 0.99272 | ||
| chr17_rs11868035_GT | -4.92E-01 | 0.76636 | |||||||
| chr17_rs11012_GT | 7.34E-01 | 0.82913 | |||||||
| chr17 rs199533 GT | -0.883482 | 0.889914 | 0.99 | 0.32082 | chr17_rs393152_GT | 2.92E+00 | 0.39461 | ||
| chr17_rs12185268_GT | 3.02E+00 | 0.60026 | |||||||
| Weight | 0.473417 | 0.307536 | 2.37 | 0.12371 | chr17_rs199533_GT | -2.90E+00 | 0.55589 | ||
| Sex | 1.09E+00 | 0.68982 | |||||||
| Weight | 1.88E+00 | 0.16203 | |||||||
| Age | 1.83E+00 | 0.16687 | |||||||
| UPDRS_part_I | 3.29E-01 | 0.84611 | |||||||
| FID IID | 0.000757 | 0.001309 | 0.33 | 0.56317 | |||||
| FID_IID | -1.60E-03 | 0.62892 | |||||||
| COGSTATE | 9.42E+00 | 0.23984 | |||||||
| COGDXCL | -0.489857 | 1.434844 | 0.12 | 0.7328 | COGDECLN | -4.48E+00 | 0.27759 | ||
| EDUCYRS | 0.07841 | 0.099566 | 0.62 | 0.43098 | FNCDTCOG | 8.71E+00 | 0.99989 | ||
| COGDXCL | -4.27E+00 | 0.39832 | |||||||
| EDUCYRS | 3.68E-02 | 0.97715 | |||||||
Ultimate generalized linear logistic regression model (using step-wise AIC selection) illustrates that some UPDRS summaries (Parts II and III) along with Age play roles in explaining the diagnosis of participants.
| Estimate | Std.Dev. | Z | P(>|Z|) | |
|---|---|---|---|---|
| 21.013 | 1796.808 | 0.01 | 0.9907 | |
| 0.993 | 0.627 | 1.58 | 0.1134 | |
| -0.835 | 0.485 | -1.72 | 0.0848 | |
| -8.62 | 1796.809 | 0 | 0.9962 | |
| -13.982 | 1796.807 | -0.01 | 0.9938 | |
| -6.49 | 2.278 | -2.85 | 0.0044 |
Best machine learning based classification results (according to average measures of 5-fold cross-validation).
| Classifier | Cohorts | Balance | FP | TP | TN | FN | Accuracy | Sensitivity | Specificity | PPV | NPV | LOR |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PD vs. HC | balanced | 0.2 | 72.2 | 98.2 | 1.6 | 0.98954704 | 0.97831978 | 0.99796748 | 0.99723757 | 0.98396794 | 10.0058805 | |
| PD+SWEDD vs. HC | balanced | 1.2 | 72 | 97.2 | 1.8 | 0.9825784 | 0.97560976 | 0.98780488 | 0.98360656 | 0.98181818 | 8.08332861 | |
| PD vs. HC | unbalanced | 0.4 | 22.4 | 52.2 | 2.2 | 0.96632124 | 0.91056911 | 0.99239544 | 0.98245614 | 0.95955882 | 7.19197683 | |
| PD vs. HC | balanced | 2 | 69.4 | 96.4 | 4.4 | 0.96283391 | 0.9403794 | 0.9796748 | 0.9719888 | 0.95634921 | 6.63364135 | |
| PD+SWEDD vs. HC | unbalanced | 0.8 | 22.4 | 59.2 | 2.2 | 0.96453901 | 0.91056911 | 0.98666667 | 0.96551724 | 0.96416938 | 6.62466869 | |
| PD+SWEDD vs. HC | balanced | 2.8 | 67.2 | 95.6 | 6.6 | 0.94541231 | 0.91056911 | 0.97154472 | 0.96 | 0.93542074 | 5.851157 |
Example of the impact of including/excluding UPDRS data on the accuracy of the AdaBoost classification.
| Dataset | sensitivity | specificity | accuracy |
|---|---|---|---|
| 0.871794872 | 0.25 | 0.8203125 | |
| 1.0 | 0.96875 | 0.990024938 |