| Literature DB >> 35982595 |
Ilias Tougui1, Abdelilah Jilbab1, Jamal El Mhamdi1.
Abstract
OBJECTIVES: This study presents PD Predict, a machine learning system for Parkinson disease classification using voice as a biomarker.Entities:
Keywords: Computer-Assisted; Diagnosis; Machine Learning; Medical Informatics Applications; Parkinson Disease; Voice Disorders
Year: 2022 PMID: 35982595 PMCID: PMC9388925 DOI: 10.4258/hir.2022.28.3.210
Source DB: PubMed Journal: Healthc Inform Res ISSN: 2093-3681
Final distribution of valid subjects in this study
| PD group | HC group | Total | |
|---|---|---|---|
| Number of recordings | 424 | 424 | 848 |
|
| |||
| Number of participants | 212 | 212 | 424 |
|
| |||
| Sex | |||
| Male | 161 | 161 | 322 |
| Female | 51 | 51 | 102 |
|
| |||
| Age (yr) | 58.97 ± 8.95 (40–79) | 58.97 ± 8.95 (40–79) | |
Values are presented as mean ± standard deviation (min-max).
PD: Parkinson disease, HC: healthy controls.
Extracted features and structure of the dataset
| Feature Id | Feature/Component | Feature statistics |
|---|---|---|
| 1–78 | MFCCs | Mean |
| 79–84 | F0 Contour | |
| 85–86 | F0 | |
| 87–92 | Intensity | |
| 93 | Log energy | |
| 94–99 | Sliding-window Log energy | |
| 100 | Loudness | |
| 101 | Pitch period entropy | |
| 102–106 | Jitters | |
| 107–111 | Shimmers | |
| 112 | Detrended fluctuation analysis | |
| 113–116 | Formants | |
| 117 | HNR | |
| 118–123 | RMS | |
| 124 | Class (PWP = 1, HC = 0) |
MFCC: mel-frequency cepstral coefficient, HNR: harmonic to noise ratio, RMS: root mean square, PWP: participants with Parkinson disease, HC: healthy controls.
Distribution of participants in the training and holdout sets
| Training set (80%) | Holdout set (20%) | Total | |||
|---|---|---|---|---|---|
|
|
| ||||
| PD group | HC group | PD group | HC group | ||
| Number of recordings | 340 | 340 | 84 | 84 | 848 |
|
| |||||
| Number of participants | 170 | 170 | 42 | 42 | 424 |
|
| |||||
| Sex | |||||
| Male | 126 | 131 | 35 | 30 | 322 |
| Female | 44 | 39 | 7 | 12 | 102 |
|
| |||||
| Age (yr) | 59.13 ± 9.30 (40–79) | 58.81 ± 8.97 (40–79) | 58.31 ± 7.42 (43–75) | 59.62 ± 8.90 (43–76) | |
Values are presented as mean ± standard deviation (min–max).
PD: Parkinson disease, HC: healthy controls.
Hyperparameters of the two pipelines
| Pipeline | Stage | Hyperparameter | Value |
|---|---|---|---|
| gbcpl | Imputation | strategy | mean |
|
| |||
| Standardization | with_mean | true | |
| with_std | true | ||
|
| |||
| Feature selector | estimator = lasso | ||
| alpha | 0.001 | ||
| tol | 0.1 | ||
| max_features | 60 | ||
|
| |||
| GBC classifier | n_estimators | 600 | |
| min_samples_split | 0.8 | ||
| min_samples_leaf | 0.5 | ||
| max_features | 52 | ||
| max_depth | 8.0 | ||
| learning_rate | 0.1 | ||
|
| |||
| gbcpen | Imputation | strategy | mean |
|
| |||
| Standardization | with_mean | true | |
| with_std | true | ||
|
| |||
| Feature selector | estimator = elasticnet | ||
| tol | 3.0004 | ||
| max_iter | 1000000 | ||
| l1_ratio | 0.02 | ||
| alpha | 0.52 | ||
| max_features | 60 | ||
|
| |||
| GBC classifier | n_estimators | 200 | |
| min_samples_split | 0.70001 | ||
| min_samples_leaf | 0.30004 | ||
| max_features | 26 | ||
| max_depth | 11.0 | ||
| learning_rate | 0.04 | ||
GBC: gradient boosting classifier.
Figure 1Schematic illustration of the nested cross-validation approach.
Figure 2Various screens in the clientside desktop application.
Figure 3Architecture of PD Predict: (A) client-side application and (B) server-side web application.
Figure 4(A) The gbcpl cross-validation performance using a different subset of features and (B) the final 60 chosen features.
Figure 5(A) Performance of gbcpen in cross-validation using different subsets of features and (B) the final 60 chosen features.
Summary of the performance of the pipelines
| ML pipeline | Accuracy (%) | Recall (%) | F1-score (%) |
|---|---|---|---|
| gbcpl | |||
| Nested cross-validation | 65.59 (0.0675) | 66.18 (0.1118) | 65.49 (0.076) |
| Training set | 71.76 ± 3.38 | 72.65 ± 3.35 | 72.01 ± 3.37 |
| Holdout set | 71.43 ± 6.83 | 72.62 ± 6.74 | 71.76 ± 6.81 |
|
| |||
| gbcpen | |||
| Nested cross-validation | 65.00 (0.0587) | 65.00 (0.0837) | 64.88 (0.0611) |
| Training set | 72.65 ± 3.35 | 70.29 ± 3.43 | 71.99 ± 3.38 |
| Holdout set | 67.86 ± 7.06 | 67.86 ± 7.06 | 67.86 ± 7.06 |
Nested cross-validation results are presented as mean (standard deviation); training and holdout set performances are reported with 95% confidence intervals.
ML: machine learning.