| Literature DB >> 31754175 |
Rana Zia Ur Rehman1, Silvia Del Din1, Yu Guan2, Alison J Yarnall1,3, Jian Qing Shi4, Lynn Rochester5,6.
Abstract
Parkinson's disease (PD) is the second most common neurodegenerative disease; gait impairments are typical and are associated with increased fall risk and poor quality of life. Gait is potentially a useful biomarker to help discriminate PD at an early stage, however the optimal characteristics and combination are unclear. In this study, we used machine learning (ML) techniques to determine the optimal combination of gait characteristics to discriminate people with PD and healthy controls (HC). 303 participants (119 PD, 184 HC) walked continuously around a circuit for 2-minutes at a self-paced walk. Gait was quantified using an instrumented mat (GAITRite) from which 16 gait characteristics were derived and assessed. Gait characteristics were selected using different ML approaches to determine the optimal method (random forest with information gain and recursive features elimination (RFE) technique with support vector machine (SVM) and logistic regression). Five clinical gait characteristics were identified with RFE-SVM (mean step velocity, mean step length, step length variability, mean step width, and step width variability) that accurately classified PD. Model accuracy for classification of early PD ranged between 73-97% with 63-100% sensitivity and 79-94% specificity. In conclusion, we identified a subset of gait characteristics for accurate early classification of PD. These findings pave the way for a better understanding of the utility of ML techniques to support informed clinical decision-making.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31754175 PMCID: PMC6872822 DOI: 10.1038/s41598-019-53656-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Layout of gait assessment in lab.
Figure 2Framework for machine learning modelling for PD Classification.
Demographic and clinical characteristics; M: Male; F: Female; BMI: Body mass index; MMSE: Mini-mental state examination; ABC: Activities specific balance confidence scale; UPDRS: Unified Parkinson’s disease rating scale; PIGD: Postural instability and gait disorder phenotype; ID: Indeterminate phenotype; TD: Tremor dominant phenotype; t(df): t-value at degree of freedom; p showing the statistical difference between PD and HC. In bold significant p values (p < 0.05).
| Characteristics | HC (n = 184) | PD (n = 119) | t(df) | p |
|---|---|---|---|---|
| M/F (n) | 78/106 | 79/40 | — | |
| Age (year) | 69.974 ± 7.711 | 66.898 ± 10.488 | t(199.23) = 2.75 | |
| Height (m) | 1.675 ± 0.097 | 1.696 ± 0.083 | t(278.05) = −2.02 | |
| Weight (Kg) | 76.544 ± 14.691 | 78.678 ± 15.115 | t(301) = −1.22 | 0.223 |
| BMI (Kg/m2) | 27.169 ± 3.913 | 27.233 ± 4.396 | t(300) = −0.13 | 0.896 |
| MMSE (0–30) | 29.29 ± 1.019 | 28.66 ± 1.304 | t(301) = 4.69 | |
| ABCs (0–100%) | 91.816 ± 10.902 | 82.597 ± 18.985 | t(301) = 5.36 | |
| Levodopa equivalent daily dose (LEDD, mg/day) | — | 175.893 ± 143.724 | — | — |
| Freezing of gait score (FOG) | — | 0.681 ± 2.718 | — | — |
| Hoehn and Yahr - Median | — | 2 | — | — |
| Hoehn and Yahr (n) - HY I | — | 28 | — | — |
| HY II | — | 70 | — | — |
| HY III | — | 21 | — | — |
| Time from Clinical Diagnosis (months) | — | 6.23 ± 4.89 | — | — |
| MDS-UPDRS III – Item 3.10 | — | 0.571 ± 0.671 | — | — |
| MDS-UPDRS III – overall | — | 25.37 ± 10.399 | — | — |
| MDS-UPDRS III for HY I | — | 16.82 ± 5.604 | — | — |
| MDS-UPDRS III for HY II | — | 27.49 ± 10.61 | — | — |
| MDS-UPDRS III for HY III | — | 29.71 ± 8.307 | — | — |
| Motor Phenotype (n) - PIGD | — | 55 | — | — |
| ID | — | 11 | — | — |
| TD | — | 33 | — | — |
Significant difference between PD and HC; AUC: Area under the curve; p showing the statistical difference between PD and HC. In bold significant p values (p < 0.05).
| Gait Model Domain | Gait Characteristics | HC (n = 184) | PD (n = 119) | p | AUC |
|---|---|---|---|---|---|
| Pace | Step Velocity (m/s) | 1.264 ± 0.192 | 1.125 ± 0.213 | 0.695 | |
| Step Length (m) | 0.672 ± 0.083 | 0.623 ± 0.101 | 0.655 | ||
| Swing Time Variability (s) | 0.015 ± 0.005 | 0.018 ± 0.006 | 0.636 | ||
| Rhythm | Step Time (s) | 0.537 ± 0.047 | 0.560 ± 0.049 | 0.628 | |
| Swing Time (s) | 0.387 ± 0.030 | 0.392 ± 0.033 | 0.170 | 0.541 | |
| Stance Time (s) | 0.688 ± 0.072 | 0.728 ± 0.077 | 0.646 | ||
| Variability | Step Velocity Variability (m/s) | 0.053 ± 0.013 | 0.054 ± 0.017 | 0.576 | 0.510 |
| Step Length Variability (m) | 0.020 ± 0.006 | 0.023 ± 0.008 | 0.612 | ||
| Step Time Variability (s) | 0.016 ± 0.006 | 0.019 ± 0.006 | 0.624 | ||
| Stance Time Variability (s) | 0.019 ± 0.008 | 0.023 ± 0.009 | 0.611 | ||
| Asymmetry | Step Time Asymmetry (s) | 0.011 ± 0.010 | 0.023 ± 0.028 | 0.654 | |
| Swing Time Asymmetry (s) | 0.009 ± 0.009 | 0.017 ± 0.020 | 0.675 | ||
| Stance Time Asymmetry (s) | 0.008 ± 0.009 | 0.017 ± 0.019 | 0.675 | ||
| Postural Control | Step Width (m) | 0.089 ± 0.025 | 0.093 ± 0.031 | 0.348 | 0.527 |
| Step Width Variability (m) | 0.022 ± 0.005 | 0.019 ± 0.006 | 0.682 | ||
| Step Length Asymmetry (m) | 0.020 ± 0.017 | 0.026 ± 0.022 | 0.568 |
Figure 3Heat map showing the correlation among the 16 gait characteristics.
Spot checking results of the models; RBF: Radial basis function.
| Model | Accuracy % Mean ± SD |
|---|---|
| Logistic Regression (LR) | 79.9 ± 4.6 |
| Linear Discriminant Analysis (LDA) | 79.6 ± 7.9 |
| K-Nearest Neighbour (KNN) | 76.1 ± 5.9 |
| Classification and Regression Tree (CART) | 73.4 ± 4.1 |
| Naïve Bayes (NB) | 78.3 ± 7.4 |
| Support Vector Machine (SVM-RBF) | 83.9 ± 7.1 |
| Random Forest (RF) | 86.0 ± 8.4 |
| Bagged Decision Tree (BDT) | 80.1 ± 8.7 |
| Extra Tree Classifier (ETC) | 83.4 ± 7.1 |
| Adaboost Classifier (AC) | 82.1 ± 7.8 |
| Gradient Boosting Classifier (GBC) | 84.9 ± 7.6 |
| Voting Classifier (LDA, Naïve Bayes, SVM) | 78.7 ± 6.4 |
Checking model performance on training and testing data; SE: Sensitivity, SP: Specificity; RBF: Radial basis function.
| Refined Models | Training Accuracy % | Testing Accuracy % |
|---|---|---|
| Random Forest (RF) | 87.94 ± 6.88 | 87.14 (94, 79) |
| Gradient Boosting Classifier (GBC) | 85.34 ± 7.38 | 84.28 (89, 79) |
| Support Vector Machine (SVM-RBF) | 83.37 ± 7.35 | 81.42 (78, 85) |
| Linear Discriminant Analysis (LDA) | 75.89 ± 9.56 | 82.85 (63, 85) |
| Logistic Regression (LR) | 79.65 ± 11.22 | 82.85 (68, 79) |
Figure 4Selection of optimal number of gait characteristics with (a) support vector machine, (b) logistic regression, (c) random forest.
Figure 5Feature selection with (a) support vector machine, (b) logistic regression, (c) random forest.
Optimal classification accuracy on testing and training data; GC: Gait characteristics; SE: Sensitivity, SP: Specificity; RFE: Recursive features elimination technique; RF: Random forest; SVM-RBF: Support vector machine with radial basis function kernel; LR: Logistic regression.
| Models | Top 10 GC – Test Accuracy% | Common GC – Test Accuracy% | Top 5 GC – Test Accuracy% | SVM top 5 GC – Test Accuracy % | Training F1 Score in RFE on optimal number of GC % |
|---|---|---|---|---|---|
| RF | 94.28 (100, 89) | 91.42 (94,89) | 94.28 (100, 89) | 97.14 (100, 94) | 96.4 |
| SVM-RBF | 81.92 (71, 89) | 83.67 (72, 94) | 85.71 (79, 94) | 85.71 (79, 94) | 84.5 |
| LR | 82.85 (74, 94) | 82.54 (79, 90) | 84.28 (76, 92) | 84.99 (76, 94) | 87.5 |