| Literature DB >> 35990073 |
Bálint Hajduska-Dér1, Gábor Kiss2, Dávid Sztahó2, Klára Vicsi2, Lajos Simon1.
Abstract
Depression is a growing problem worldwide, impacting on an increasing number of patients, and also affecting health systems and the global economy. The most common diagnostical rating scales of depression are self-reported or clinician-administered, which differ in the symptoms that they are sampling. Speech is a promising biomarker in the diagnostical assessment of depression, due to non-invasiveness and cost and time efficiency. In our study, we try to achieve a more accurate, sensitive model for determining depression based on speech processing. Regression and classification models were also developed using a machine learning method. During the research, we had access to a large speech database that includes speech samples from depressed and healthy subjects. The database contains the Beck Depression Inventory (BDI) score of each subject and the Hamilton Rating Scale for Depression (HAMD) score of 20% of the subjects. This fact provided an opportunity to compare the usefulness of BDI and HAMD for training models of automatic recognition of depression based on speech signal processing. We found that the estimated values of the acoustic model trained on BDI scores are closer to HAMD assessment than to the BDI scores, and the partial application of HAMD scores instead of BDI scores in training improves the accuracy of automatic recognition of depression.Entities:
Keywords: Support Vector Regression; depression; diagnosis; machine learning; speech
Year: 2022 PMID: 35990073 PMCID: PMC9385975 DOI: 10.3389/fpsyt.2022.879896
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 5.435
Main descriptive statistics of subjects of the applied Depressed Speech Database (DEPISDA).
| Count | H2B (score) | BDI (score) | Age (year) | |||||
|
| ||||||||
| Mean | Std. Dev. | Mean | Std. Dev. | Mean | Std. Dev. | |||
| Set I | Both | 175 | 13.5 | 12.8 | same as for H2B | same as for H2B | 44.3 | 16.0 |
| Males | 62 | 13.3 | 11.8 | same as for H2B | same as for H2B | 43.2 | 17.9 | |
| Females | 113 | 13.7 | 13.4 | same as for H2B | same as for H2B | 44.9 | 14.9 | |
| Set II | Both | 43 | 20.7 | 7.9 | 21.4 | 7.9 | 34.3 | 11.7 |
| Males | 12 | 18.1 | 7.7 | 23.9 | 5.6 | 35.7 | 10.4 | |
| Females | 31 | 21.6 | 7.8 | 27.4 | 8.5 | 33.8 | 12.4 | |
| All | Both | 218 | 15.0 | 12.3 | 16.1 | 13.1 | 42.3 | 15.8 |
| Males | 74 | 14.1 | 11.4 | 15.0 | 11.7 | 42.0 | 17.1 | |
| Females | 144 | 15.4 | 12.8 | 16.7 | 13.7 | 42.4 | 15.1 | |
FIGURE 1Number of subjects of the applied Depressed Speech Database by H2B category in the case of All, Set I, and Set II.
The used descriptive features calculated from LLDs.
| Calculated from | Total feature number | |
| Fundamental frequency | Voiced parts | 6 |
| Intensity | Vowels, whole speech | 12 |
| Jitter | E, O, vowels | 18 |
| Shimmer | E, O, vowels | 18 |
| First and second formant frequencies and their bandwidths | E, O, vowels | 72 |
| 13 MFCC | Vowels, whole speech | 156 |
| Articulation rate | Whole speech | 1 |
| Pause ratio | Whole sample | 1 |
| Ratio of transients | Whole speech, whole sample | 2 |
| Total: | 286 |
Accuracy of depression prediction, when the BDI scores were used for training.
| Target | RMSE | MAE | Pearson | Spearman | |
| Set I | BDI | 10.1 | 7.7 | 0.63 | 0.59 |
| Set II | BDI | 11.6 | 8.7 | 0.19 | 0.24 |
| H2B | 9.3 | 7.1 | 0.21 | 0.27 | |
| All | BDI | 10.4 | 7.9 | 0.61 | 0.61 |
| H2B | 9.9 | 7.6 | 0.61 | 0.61 |
FIGURE 2Comparison of predicted and original depression severity scores, when the BDI score was used for training.
FIGURE 3Histogram of differences of absolute error in the case of BDI and H2B for Set II.
The accuracy of the depression prediction, when the H2B scores were used for training.
| Target | RMSE | MAE | Pearson Coef. | Spearman Coef. | |
| Set I | H2B | 8.2 | 6.3 | 0.77 | 0.72 |
| Set II | H2B | 8.2 | 6.1 | 0.42 | 0.46 |
| All | H2B | 8.2 | 6.3 | 0.75 | 0.73 |
FIGURE 4Comparison of predicted and original depression severity scores, when the H2B scores were used for training.
Accuracy of the acoustic model and BDI questionnaire.
| RMSE | MAE | Mean Error | Pearson Coef. | Spearman Coef. | |
| BDI Questionnaire | 8.8 | 6.9 | 5.7 | 0.63 | 0.55 |
| Acoustic model | 8.2 | 6.1 | -3.3 | 0.42 | 0.46 |
FIGURE 5Comparison of BDI scores (A) and predicted scores of the acoustic model (B) with the H2B scores.
FIGURE 6The ROC curve of the acoustic models when models were trained with BDI (left) and with H2B scores (right).
The accuracy of the classification of depressed and healthy subjects, when H2B or BDI scores were used for training.
| Training variable | Classification accuracy | Sensitivity | Specificity | |
| At maximum classification accuracy | BDI | 76% | 80% | 72% |
| H2B | 84% | 79% | 89% | |
| At 90% sensitivity | BDI | 75% | 90% | 62% |
| H2B | 81% | 90% | 73% |