| Literature DB >> 35892507 |
Abstract
Parkinson's disease (PD) is the second most common neurodegenerative disorder after Alzheimer's disease. It has a slow progressing neurodegenerative disorder rate. PD patients have multiple motor and non-motor symptoms, including vocal impairment, which is one of the main symptoms. The identification of PD based on vocal disorders is at the forefront of research. In this paper, an experimental study is performed on an open source Kaggle PD speech dataset and novel comparative techniques were employed to identify PD. We proposed an unsupervised autoencoder feature selection technique, and passed the compressed features to supervised machine-learning (ML) algorithms. We also investigated the state-of-the-art deep learning 1D convolutional neural network (CNN-1D) for PD classification. In this study, the proposed algorithms are support vector machine, logistic regression, random forest, naïve Bayes, and CNN-1D. The classifier performance is evaluated in terms of accuracy score, precision, recall, and F1 score measure. The proposed 1D-CNN model shows the highest result of 0.927%, and logistic regression shows 0.922% on the benchmark dataset in terms of F1 measure. The major contribution of the proposed approach is that unsupervised neural network feature selection has not previously been investigated in Parkinson's detection. Clinicians can use these techniques to analyze the symptoms presented by patients and, based on the results of the above algorithms, can diagnose the disease at an early stage, which will allow for improved future treatment and care.Entities:
Keywords: ML; Parkinson’s disease; dimensionality reduction; linear discriminate analysis; logistic regression; neural network; principal component analysis; random forest; support vector machine
Year: 2022 PMID: 35892507 PMCID: PMC9330613 DOI: 10.3390/diagnostics12081796
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Review of the literature.
| Study | Techniques Used | Remarks |
|---|---|---|
| Tsanas and Athanasios, et al. [ | Relief and local learning-based feature selection (LLBFS), minimum redundancy maximum relevance (mRMR), and least absolute shrinkage and selection operator | Features such as HNR, shimmer, and expiation of vocal fold produces 98.6% precision rate, |
| Rouzbahani et al. [ | Fisher’s discriminate ratio, correlation rates, and | Highest accuracy rate of 94% observed in kNN. |
| Parisi et al. [ | Multi-layer perceptron (MLP) with custom cost function and Lagrangian support vector machine (LSVM) are used for classification. | The proposed algorithm achieves 100% of accuracy rate. |
| Abdullah et al. [ | DNN classifier with softmax layer and stacked-auto-encoder (SAE). | Several different datasets were used to test the proposed model. |
| Timothy et al. [ | DNN classifier. | The study reports 85% accuracy for DNN model. |
| Mathur et al. [ | kNN, Adaboost. | The study reports 91.28% classification accuracy for kNN and Adaboost |
| Yasar et al. [ | Artificial neural networks (ANN). | The study reports that the proposed model achieves 94.93% accuracy in identifying diseased individuals. |
| ShuLih et al. [ | Thirteen-layer CNN architecture was used on a dataset of EEG signals (20 normal subjects and 20 PD sufferers). | 88.25% accuracy reported for the proposed CNN architecture. |
| Laura et al. [ | Logistic regression model on smell identification and Sniffin’ Sticks test. | Model shows 82.8% accuracy for smell identification and 85.3% accuracy for Sniffin’ Sticks test. |
| Daryl et al. [ | SVM and random forest algorithm. | SVM shows AUC of 92.3% and an accuracy of 85.3%. A random forest achieves 76.3% AUC with 75.6% accuracy. |
Dataset description.
| Detail | Source Information |
|---|---|
| Dataset property | UCI ML Repository |
| Dataset name | PD |
| Dataset attributes | 754 |
| Dataset records | 756 |
| Target variable | (0: control, 1: PD). Binary class problem |
| Task | Binary classification |
Figure 1Flow chart of PD detection.
Figure 2Structural diagram of autoencoder model, adopted from [46].
Sparse autoencoder parameters.
| Parameters | Values |
|---|---|
| No. of epochs | 200 |
| Weight decay | 20−5 |
| Optimization technique | Lbfgs |
| Sparse penalty weight | 3 |
| Sparsity | 0.1 |
Figure 31D-CNN architecture.
Classifiers with dimensionality reduction and without dimensionality reduction.
| Classifiers | Classifiers with Autoencoder | ML without Dimensionality Reduction Technique |
|---|---|---|
| Logistic regression | 0.875 | 0.865 |
| Support vector machine | 0.863 | 0.842 |
| Random forest | 0.828 | 0.814 |
| Naïve Bayes | 0.836 | 0.711 |
Classifier with dimensionality reduction and feature selection technique.
| Classifiers | Classifiers with Autoencoder | Classifiers with RFE |
|---|---|---|
| Logistic regression | 0.875 | 0.840 |
| Support vector machine | 0.863 | 0.823 |
| Random forest | 0.828 | 0.822 |
| Naïve Bayes | 0.836 | 0.743 |
Overall result analysis of proposed classifier.
| Classifiers | Accuracy Score | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Logistic regression | 0.875 | 0.918 | 0.926 | 0.922 |
| Support vector machine | 0.863 | 0.867 | 0.971 | 0.916 |
| Random forest | 0.828 | 0.823 | 0.989 | 0.898 |
| Naïve Bayes | 0.836 | 0.893 | 0.901 | 0.897 |
| 1D-CNN | 0.885 | 0.907 | 0.948 | 0.927 |
Classifiers accuracy and F1 score analysis with various dimensionality reduction techniques.
| ML Model | Accuracy | F1 Measure | ||||
|---|---|---|---|---|---|---|
| PCA | LDA | Autoencoder | PCA | LDA | Autoencoder | |
| Logistic regression | 0.737 | 0.618 | 0.875 | 0.829 | 0.707 | 0.922 |
| Support vector machine | 0.842 | 0.625 | 0.863 | 0.910 | 0.716 | 0.916 |
| Random forest | 0.816 | 0.618 | 0.828 | 0.885 | 0.704 | 0.898 |
| Naïve Bayes | 0.770 | 0.625 | 0.836 | 0.856 | 0.714 | 0.897 |
Figure 4Proposed model with autoencoder ROC curve.
Figure 5Logistic regression.
Figure 6Support vector machine.
Figure 7Random forest.
Figure 8Naïve Bayes.
Figure 91D-CNN.