| Literature DB >> 30344616 |
Haihua Jiang1, Bin Hu1, Zhenyu Liu2, Gang Wang3, Lan Zhang4, Xiaoyu Li2, Huanyu Kang2.
Abstract
Early intervention for depression is very important to ease the disease burden, but current diagnostic methods are still limited. This study investigated automatic depressed speech classification in a sample of 170 native Chinese subjects (85 healthy controls and 85 depressed patients). The classification performances of prosodic, spectral, and glottal speech features were analyzed in recognition of depression. We proposed an ensemble logistic regression model for detecting depression (ELRDD) in speech. The logistic regression, which was superior in recognition of depression, was selected as the base classifier. This ensemble model extracted many speech features from different aspects and ensured diversity of the base classifier. ELRDD provided better classification results than the other compared classifiers. A technique for identifying depression based on ELRDD, ELRDD-E, was here suggested and tested. It offered encouraging outcomes, revealing a high accuracy level of 75.00% for females and 81.82% for males, as well as an advantageous sensitivity/specificity ratio of 79.25%/70.59% for females and 78.13%/85.29% for males.Entities:
Mesh:
Year: 2018 PMID: 30344616 PMCID: PMC6174772 DOI: 10.1155/2018/6508319
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Block diagram of the ensemble logistic regression model for detecting depression (ELRDD).
Summary of speech features.
| Main category | Subcategory | Number of features | Functions |
|---|---|---|---|
| MFCC | MFCC (0–14) | 630 | Corresponding delta coefficients appended |
| SPEC | Flux | 42 | 21 functions utilized |
| Centroid | 42 | maxPos, minPos | |
| Entropy | 42 | Mean, std dev | |
| Roll-off | 168 | Skewness, kurtosis | |
| Band energies | 84 | Quartile 1/2/3 | |
| PROS | PCM loudness | 42 | Quartile range (2–1)/(3–2)/(3–1) |
| Log mel-frequency band (0–7) | 336 | Linear regression error Q/A | |
| LSP frequency (0–7) | 336 | Linear regression coeff. 1/2 | |
| F0 envelope | 42 | Percentile 1/99 | |
| Voicing probability | 42 | Percentile range (99–1) | |
| F0final, ShimmerLocal | 76 | 19 functions by eliminating the minimum value and the | |
| JitterLocal, JitterDDP | 76 | Range functions from the 21 abovementioned functions | |
| Pitch onsets, duration | 2 | No functions | |
| GLOT | GLT | 27 | Mean, max, min |
| GLF | 5 | Mean, max, min | |
| Total | 1992 |
Subspaces composed of several different feature vectors.
| No. | Subspace | No. | Subspace | No. | Subspace |
|---|---|---|---|---|---|
| 1 | MFCC | 2 | PROS | 3 | SPEC |
| 4 | GLOT | 5 | MFCC + PROS | 6 | MFCC + SPEC |
| 7 | MFCC + GLOT | 8 | PROS + SPEC | 9 | PROS + GLOT |
| 10 | SPEC + GLOT | 11 | MFCC + PROS + SPEC | 12 | MFCC + PROS + GLOT |
| 13 | MFCC + SPEC + GLOT | 14 | PROS + SPEC + GLOT | 15 | MFCC + PROS + SPEC + GLOT |
Algorithm 1ELRDD.
Algorithm 2ELRDD-E.
Classification outcomes of each individual classifier for males.
| Features | SVM | GMM | LR | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sen. (%) | Spe. (%) | Acc. (%) | Sen. (%) | Spe. (%) | Acc. (%) | Sen. (%) | Spe. (%) | Acc. (%) | |
| MFCC | 56.14 | 64.91 | 60.66 | 62.72 | 58.22 | 60.40 | 62.50 | 60.75 | 61.60 |
| PROS | 61.96 | 70.39 | 66.30 | 61.75 | 74.14 | 68.13 | 63.15 | 71.10 | 67.24 |
| SPEC |
| 73.94 | 68.81 | 65.84 | 71.60 | 68.81 |
| 70.69 | 69.07 |
| GLOT | 36.32 | 60.95 | 49.01 | 47.95 | 54.26 | 51.20 | 44.07 | 54.56 | 49.48 |
| MFCC + PROS | 60.67 | 69.78 | 65.36 | 63.69 | 70.49 | 67.19 | 65.41 | 68.36 | 66.93 |
| MFCC + SPEC | 59.05 | 72.72 | 66.09 | 64.55 | 69.17 | 66.93 | 63.58 | 69.07 | 66.41 |
| MFCC + GLOT | 53.56 | 66.53 | 60.24 | 61.96 | 60.65 | 61.29 | 61.10 | 60.75 | 60.92 |
| PROS + SPEC | 63.25 | 73.83 | 68.70 | 62.72 | 74.14 | 68.60 | 67.13 |
|
|
| PROS + GLOT | 60.99 | 71.60 | 66.46 | 61.96 | 72.92 | 67.61 | 62.82 | 71.20 | 67.14 |
| SPEC + GLOT | 62.61 |
|
| 65.19 | 70.89 | 68.13 | 66.70 | 70.99 | 68.91 |
| MFCC + PROS + SPEC | 60.99 | 72.92 | 67.14 | 65.63 | 72.31 |
| 64.55 | 70.39 | 67.56 |
| MFCC + PROS + GLOT | 59.59 | 72.62 | 66.30 | 63.36 | 69.27 | 66.41 | 62.82 | 67.24 | 65.10 |
| MFCC + SPEC + GLOT | 60.24 | 72.82 | 66.72 | 65.84 | 69.98 | 67.97 | 64.66 | 68.66 | 66.72 |
| PROS + SPEC + GLOT | 60.02 | 73.83 | 67.14 | 62.82 |
| 68.70 | 64.12 | 71.91 | 68.13 |
| MFCC + PROS + SPEC + GLOT | 61.85 | 73.12 | 67.66 |
| 71.30 | 68.86 | 64.33 |
| 68.39 |
Maximum of sensitivity (sen.), specificity (spe.), and accuracy (acc.) is shown in bold.
Recognition performance of each classifier for males.
| Classifier | Number of base classifiers | Sensitivity (%) | Specificity (%) | Accuracy (%) |
|---|---|---|---|---|
| Adaboost decision tree | 300 | 58.94 | 67.14 | 63.17 |
| Bagging decision tree | 500 | 59.48 | 70.28 | 65.05 |
| Random forest | 400 | 59.05 | 70.99 | 65.20 |
| ELRDD | 15 | 67.35 | 73.94 | 70.64 |
Classification outcomes of each individual classifier for females.
| Features | SVM | GMM | LR | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sen. (%) | Spe. (%) | Acc. (%) | Sen. (%) | Spe. (%) | Acc. (%) | Sen. (%) | Spe. (%) | Acc. (%) | |
| MFCC | 63.24 | 57.27 | 60.31 | 56.47 | 66.06 | 61.17 | 62.79 | 61.80 | 62.30 |
| PROS | 67.21 | 60.65 | 63.99 | 51.72 | 73.29 | 62.30 | 64.35 | 66.73 | 65.52 |
| SPEC | 60.64 | 63.35 | 61.97 | 52.44 | 73.70 | 62.87 | 63.05 | 64.91 | 63.96 |
| GLOT | 56.60 | 42.53 | 49.70 | 51.33 | 50.44 | 50.90 | 52.70 | 46.11 | 49.47 |
| MFCC + PROS |
| 61.06 | 64.36 | 56.86 | 71.54 | 64.06 |
| 66.06 | 65.48 |
| MFCC + SPEC | 66.10 | 61.60 | 63.89 |
| 69.78 | 63.66 | 63.24 | 65.99 | 64.59 |
| MFCC + GLOT | 63.05 | 57.20 | 60.18 | 55.63 | 64.84 | 60.15 | 62.66 | 60.24 | 61.47 |
| PROS + SPEC | 64.09 |
| 64.29 | 51.01 |
| 62.20 | 63.63 |
| 65.58 |
| PROS + GLOT | 67.47 | 59.16 | 63.40 | 52.31 | 72.08 | 62.00 | 63.37 | 66.73 | 65.02 |
| SPEC + GLOT | 61.09 | 60.31 | 60.71 | 51.79 | 70.99 | 61.21 | 61.87 | 62.75 | 62.30 |
| MFCC + PROS + SPEC | 64.74 | 62.41 | 63.59 | 56.47 | 73.09 |
| 64.41 | 67.41 | 65.88 |
| MFCC + PROS + GLOT | 67.08 | 62.27 |
| 55.50 | 72.62 | 63.89 | 64.74 | 67.14 |
|
| MFCC + SPEC + GLOT | 63.37 | 62.68 | 63.03 | 57.71 | 69.37 | 63.43 | 62.39 | 63.42 | 62.90 |
| PROS + SPEC + GLOT | 64.15 | 63.42 | 63.79 | 51.53 | 73.43 | 62.27 | 63.11 | 67.07 | 65.05 |
| MFCC + PROS + SPEC + GLOT | 65.00 | 63.22 | 64.13 | 56.02 | 72.96 | 64.32 | 63.44 |
| 65.48 |
Maximum of sensitivity (sen.), specificity (spe.), and accuracy (acc.) are shown in bold.
Recognition performance of each classifier for females.
| Classifier | Number of base classifiers | Sensitivity (%) | Specificity (%) | Accuracy (%) |
|---|---|---|---|---|
| Adaboost decision tree | 200 | 59.34 | 69.91 | 64.52 |
| Bagging decision tree | 300 | 58.75 | 68.56 | 63.56 |
| Random forest | 300 | 59.66 | 68.56 | 64.03 |
| ELRDD | 15 | 65.71 | 67.68 | 66.68 |
Classification outcomes of ELRDD-E.
| Gender | Classifier | Sensitivity (%) | Specificity (%) | Accuracy (%) |
|---|---|---|---|---|
| Male | ELRDD-E | 78.13 | 85.29 | 81.82 |
| Adaboost decision tree | 65.63 | 82.35 | 74.24 | |
| Bagging decision tree | 65.63 | 79.41 | 72.73 | |
| Random forest | 62.50 | 79.41 | 71.21 | |
| STEDD | 75.00 | 85.29 | 80.30 | |
| Female | ELRDD-E | 79.25 | 70.59 | 75.00 |
| Adaboost decision tree | 64.15 | 76.47 | 70.19 | |
| Bagging decision tree | 62.26 | 74.51 | 68.27 | |
| Random forest | 66.04 | 76.47 | 71.15 | |
| STEDD | 77.36 | 74.51 | 75.96 |