| Literature DB >> 33584015 |
Yongming Li1, Xinyue Zhang1, Pin Wang1, Xiaoheng Zhang1,2, Yuchuan Liu1.
Abstract
Speech diagnosis of Parkinson's disease (PD) as a non-invasive and simple diagnosis method is particularly worth exploring. However, the number of samples of speech-based PD is relatively small, and there exist discrepancies in the distribution between subjects. In order to solve the two problems, a novel unsupervised two-step sparse transfer learning is proposed in this paper to tackle with PD speech diagnosis. In the first step, convolution sparse coding with the coordinate selection of samples and features is designed to learn speech structure from the source domain to replenish sample information of the target domain. In the second step, joint local structure distribution alignment is designed to maintain the neighbor relationship between the respective samples of the training set and test set, and reduce the distribution difference between the two domains at the same time. Two representative public PD speech datasets and one real-world PD speech dataset were exploited to verify the proposed method on PD speech diagnosis. Experimental results demonstrate that each step of the proposed method has a positive effect on the PD speech classification results, and it also delivers superior performance over the existing relative methods.Entities:
Keywords: Convolution sparse coding; Domain adaptation; Parkinson’s disease; Speech diagnosis; Two-step sparse transfer learning
Year: 2021 PMID: 33584015 PMCID: PMC7871026 DOI: 10.1007/s00521-021-05741-0
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.606
Fig. 1Illustration of the proposed JLSDA method. a The data distribution of the original source domain (training set) and target domain (testing set); b the relationship between the samples and the domains only after the alignment of the source and target domain distributions; c the data distribution after aligning the distribution of the source domain and the target domain and keeping the neighborhood structure relationship
Fig. 2Confusion matrix for two-class diagnosis of PD
Fig. 3a Sonograms of source domain; b feature kernels extracted from source domain; c original target domain; d target domain after first step transfer
First transfer classification accuracy for Sakar dataset
| Method | ACC (%) | TPR (%) | TNR (%) |
|---|---|---|---|
| KNN | 52.5 (LOSO) | 55.0 | 50.0 |
| SVM (linear) | 50.0 (LOSO) | 50.0 | 50.0 |
| FT&KNN | 90.0 (LOSO) | 85.0 | 95.0 |
| FT&SVM (linear) | 92.5 (LOSO) | 95.0 | 90.0 |
Second transfer classification accuracy for Sakar dataset
| Method | ACC (%) | TPR (%) | TNR (%) |
|---|---|---|---|
| KNN | 52.5 (LOSO) | 55.0 | 50.0 |
| SVM (linear) | 50.0 (LOSO) | 50.0 | 50.0 |
| ST&KNN | 67.5 (LOSO) | 65.0 | 70.0 |
| ST&SVM (linear) | 62.5 (LOSO) | 80.0 | 45.0 |
Two-step sparse transfer classification accuracy for Sakar dataset
| Method | ACC (%) | TPR (%) | TNR (%) |
|---|---|---|---|
| KNN | 52.5 (LOSO) | 55.0 | 50.0 |
| SVM (linear) | 50.0 (LOSO) | 50.0 | 50.0 |
| TSTL&KNN | 94.5 (LOSO) | 94.5 | 94.5 |
| TSTL&SVM (linear) | 97.5 (LOSO) | 97.5 | 97.5 |
Fig. 4The t-SNE visualizations of TSTL on three PD speech datasets. a Non-TSTL on Sakar; b TSTL on Sakar; c non-TSTL on MaxLittle; d TSTL on MaxLittle; e Non-TSTL on DNSH; f TSTL on DNSH
The comparison of the UDA classification result of the proposed algorithm based on three datasets
| Method | Dataset | ACC (%) | TPR (%) | TNR (%) |
|---|---|---|---|---|
| TCA (LOSO) | Sakar | 55.00 | 65.00 | 45.00 |
| MaxLittle | 75.00 | 100.00 | 0.00 | |
| DNSH | 46.88 | 59.38 | 34.38 | |
| CORAL (LOSO) | Sakar | 52.50 | 50.00 | 55.00 |
| MaxLittle | 75.00 | 100.00 | 0.00 | |
| DNSH | 48.44 | 56.52 | 40.63 | |
| Sakar | 62.50 | 65.00 | 60.00 | |
| DAN (LOSO) | MaxLittle | 66.88 | 84.17 | 15.00 |
| DNSH | 45.94 | 45.66 | 46.25 | |
| Sakar | 54.25 | 54.50 | 54.00 | |
| DANN (LOSO) | MaxLittle | 72.81 | 93.75 | 10.00 |
| DNSH | 47.67 | 54.06 | 41.25 | |
| TSTL (LOSO) | Sakar | |||
| MaxLittle | ||||
| DNSH |
ACC accuracy; TPR true positive rate; TNR true negative rate
Fig. 5Relationship between the convolution kernel number and classification accuracy for Sakar dataset
Fig. 6Relative accuracy of neighbor sample number on Sakar dataset
The comparison of the classification result of the proposed algorithm based on Sakar dataset
| Study | Method | ACC (%) | TPR (%) | TNR (%) |
|---|---|---|---|---|
| Canturk and Karabiber [ | 4 Feature Selection Methods & 6 Classifiers | 57.50 | 54.28 | 80.00 |
| Eskidere et al. [ | Random Subspace Classifier Ensemble | 74.17 | – | – |
| Zhang et al. [ | MENN&RF | 81.50 | 92.50 | 70.50 |
| Benba et al. [ | HFCC + SVM | 87.50 | 90.00 | 85.00 |
| Li et al. [ | Hybrid feature learning&SVM | 82.50 | 85.00 | 80.00 |
| Vadovsk and Parali [ | C4.5&C5.0&RF &CART | 66.50 | – | – |
| Zhang[ | LSVM&MSVM &RSVM&CART &KNN&LDA&NB | 94.17 | 50.00 | 94.92 |
| Benba et al. [ | MFCC&SVM | 82.50 | 80.00 | 85.00 |
| Kraipeerapun and Amornsamanku [ | Stacking&CMTNN | 75.00 | – | – |
| Khan et al. [ | Evolutionary neural network ensembles | 90.00 | 93.00 | 97.00 |
| Ali et al. [ | LDA-NN-GA | 95.00 | 95.00 | 95.00 |
| – | DBN | 54.60 | 52.40 | 56.80 |
| – | CNN | 60.00 | 63.00 | 57.00 |
| – | DBN&SVM | 50.50 | 53.00 | 48.00 |
| – | Autoencoder&SVM | 67.50 | 65.00 | 70.00 |
| Proposed algorithm | TSTL&SVM |
ACC accuracy; TPR true positive rate; TNR true negative rate
The comparison of the classification result of the proposed algorithm based on MaxLittle dataset
| Study | Method | ACC (%) | TPR (%) | TNR (%) |
|---|---|---|---|---|
| Little et al. [ | Preselection filter + exhaustive search + SVM | 91.40 | – | – |
| Shahbaba and Neal [ | Dirichlet process mixtures | 87.70 | – | – |
| Psorakis et al. [ | mRVMs | 89.47 | – | – |
| Guo et al. [ | GA-EM | 93.10 | – | – |
| Sakar and Kursun [ | Mutual information + SVM | 92.75 | – | – |
| Das [ | ANN decision tree | 92.90 | – | – |
| Ozcift and Gulten [ | Correlation-based feature selection-rotation forest | 87.10 | – | – |
| Luukka [ | Fuzzy entropy measures + similarity | 85.03 | – | – |
| Li et al. [ | Fuzzy-based nonlinear transformation + SVM | 93.47 | – | – |
| Spadoto et al. [ | PSO + OPF harmony search + OPF gravitational search + OPF | 84.01 | – | – |
| Polat [ | FCMFW + KNN | 97.93 | – | – |
| Chen et al. [ | PCA-fuzzy KNN | 96.07 | – | – |
| Ali et al. [ | DBN | 94.00 | – | – |
| Åström and Koker [ | Parallel ANN | 91.20 | 90.50 | 93.00 |
| Daliri [ | SVM with Chi-square distance kernel | 91.20 | 91.71 | 89.92 |
| Zuo et al. [ | PSO-fuzzy KNN | 97.47 | 98.16 | 96.57 |
| Kadam and Jadhav [ | FESA-DNN | 93.84 | 95.23 | 90.00 |
| Ma et al. [ | SVM-RFE | 96.29 | 95.00 | 97.50 |
| Cai et al. [ | RF-BFO-SVM | 97.42 | 99.29 | 91.50 |
| Dash et al. [ | ECFA-SVM | 97.95 | 97.90 | – |
| Gürüler [ | KMCFW-CVANN | 99.52 | 100.00 | 99.47 |
| – | SVM (linear kernel) | 75.00 | 100.00 | 0.00 |
| – | SVM (RBF kernel) | 75.00 | 100.00 | 0.00 |
| Proposed algorithm | TSTL&SVM | 96.87 | 87.50 |
ACC accuracy; TPR true positive rate; TNR true negative rate
The comparison of the classification result of the proposed algorithm based on DNSH dataset
| Study | Method | ACC (%) | TPR (%) | TNR (%) |
|---|---|---|---|---|
| – | KNN | 52.5 (LOSO) | 55.0 | 50.0 |
| – | SVM (linear kernel) | 50.0 (LOSO) | 50.0 | 50.0 |
| Proposed algorithm | TSTL&SVM | 90.63 (LOSO) | 90.63 | 90.63 |
The time cost of the proposed algorithm on PD speech datasets
| Dataset | Sakar | MaxLittle | DNSH |
|---|---|---|---|
| Time cost (s) | 25.188 | 3.269 | 18.133 |
Fig. 7Time cost of different sample size on Sakar dataset