| Literature DB >> 34341338 |
Luisa Weiner1,2,3, Andrea Guidi4,5, Nadège Doignon-Camus6, Anne Giersch6, Gilles Bertschy6,7,8, Nicola Vanello4,5.
Abstract
There is a lack of consensus on the diagnostic thresholds that could improve the detection accuracy of bipolar mixed episodes in clinical settings. Some studies have shown that voice features could be reliable biomarkers of manic and depressive episodes compared to euthymic states, but none thus far have investigated whether they could aid the distinction between mixed and non-mixed acute bipolar episodes. Here we investigated whether vocal features acquired via verbal fluency tasks could accurately classify mixed states in bipolar disorder using machine learning methods. Fifty-six patients with bipolar disorder were recruited during an acute episode (19 hypomanic, 8 mixed hypomanic, 17 with mixed depression, 12 with depression). Nine different trials belonging to four conditions of verbal fluency tasks-letter, semantic, free word generation, and associational fluency-were administered. Spectral and prosodic features in three conditions were selected for the classification algorithm. Using the leave-one-subject-out (LOSO) strategy to train the classifier, we calculated the accuracy rate, the F1 score, and the Matthews correlation coefficient (MCC). For depression versus mixed depression, the accuracy and F1 scores were high, i.e., respectively 0.83 and 0.86, and the MCC was of 0.64. For hypomania versus mixed hypomania, accuracy and F1 scores were also high, i.e., 0.86 and 0.75, respectively, and the MCC was of 0.57. Given the high rates of correctly classified subjects, vocal features quickly acquired via verbal fluency tasks seem to be reliable biomarkers that could be easily implemented in clinical settings to improve diagnostic accuracy.Entities:
Mesh:
Year: 2021 PMID: 34341338 PMCID: PMC8329226 DOI: 10.1038/s41398-021-01535-z
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Demographic characteristics of the patient samples.
| Hypomania | Mixed hypomania | Mixed Depression | Depression | |
|---|---|---|---|---|
| Agea | 37.58 (13.52) | 42 (10.3) | 42.12 (12.75) | 44.83 (14.43) |
| Sex (F/M) | 12/7 | 4/4 | 14/3 | 8/4 |
| YMRSa | 12.58 (3.63) | 9.25 (2.37) | 4.37 (1.5) | 0.83 (0.94) |
| QIDS-C16a | 2.37 (1.5) | 9.75 (3.01) | 12.37 (3.95) | 12.5 (3.58) |
| Lithium (% yes) | 42% | 12.5% | 56% | 50% |
| Anti-epileptics (% yes) | 42% | 50% | 56% | 50% |
| Antipsychotics (%yes) | 63% | 37.5% | 37.5% | 50% |
| Antidepressants (%yes) | 26.5% | 37.5% | 37.5% | 66.5% |
| Benzodiazepines (%yes) | 15.5% | 12.5% | 25% | 25% |
YMRS Young Mania Rating Scale, QIDS-C16 Quick Inventory of Depressive Symptomatology.
aMean and standard deviation
Features used for the analysis of speech signals.
| Feature name | Feature category | Definition | Meaning |
|---|---|---|---|
| MedianF0 | Prosodic | Median of F0 values estimated within each word | Central tendency of voiced sound fundamental frequency |
| MadF0 | Prosodic | Median Absolute Deviation of F0 values estimated within each word | Dispersion index of voiced sound fundamental frequency |
| prosodic | Relative size of F0 rising and falling phase amplitudes | ||
| Prosodic | Relative size of F0 rising and falling phase durations | ||
| Prosodic | Mean of Amplitude* and Duration* features | ||
| Prosodic | steepness of the F0 contour during rising phase | ||
| Prosodic | steepness of the F0 contour during falling phase | ||
| Prosodic | Sum of PosSlope and AbsNegSlope | ||
| Prosodic | F0 slope between the first and the final F0 values in each voiced segment | ||
| Mean_Pause | Prosodic | The mean across a VFT of pause lengths between two consecutive words | Position index of pause length distribution |
| Std_Pause | Prosodic | The standard deviation across a VFT of pause lengths between two consecutive words | Dispersion index of pause length distribution |
| Mean_Speech | Prosodic | The mean word length across a VFT | Position index of word length distribution |
| Std_Speech | Prosodic | The standard deviation of word length across a VFT | Dispersion index of word length distribution |
| LTAS_F_median | Voice quality | the median frequency of a power spectrum divides the total power in two halves | LTAS shape feature. Central tendency index of the LTAS spectrum distribution. Relative Contribution of high and low frequencies |
| LTAS_A_median | Voice quality | The amplitude of the LTAS spectrum corresponding to LTAS_F_median | |
| LTAS_Max_A | Voice quality | The maximum amplitude of LTAS spectrum | |
| LTAS_Max_A_F | Voice quality | The frequency values corresponding to LTAS_Max_A | LTAS shape feature. Its value is expected to be lower than LTAS_F_median |
| LTAS_slope | Voice quality | LTAS shape feature: it is related to the slope of the LTAS spectrum between the peak and the amplitude corresponding to median frequency. The lower (negative), the smaller the contribution of higher frequencies | |
| LTAS_Ratio_Max | Voice quality | LTAS shape feature: it is related to the slope of the LTAS spectrum between the origin and the LTAS peak. Given that the maximum peak is at low frequencies, it weights amplitude of lower frequencies | |
| LTAS_Ratio_Median | Voice quality | LTAS shape feature: it is related to the slope of the LTAS spectrum between the origin and the value corresponding to the median frequency. |
Fig. 1F0 dynamics of the word "courage" and Taylor’s (2000) tilt model.
Upper. The time course of audio signal related to the French word “courage” along with its phonetic transcription. Fricative voiceless sounds are characterized by more rapid changes with respect to voiced sound (central part of the word). For voiced sounds, the fundamental frequency (F0) can be estimated and its time contour is shown in red. Lower. Taylor’s (2000) tilt model, whereby the falling phase is present thus resulting in geometric parameters as Duration*, and Amplitude* equal to −1.
Fig. 2Long-term average spectrum (LTAS) estimation strategy and example.
Upper. Long-term average spectrum estimation strategy. Lower. An example of LTAS. The features, provided in Table 2, were identified to parsimoniously describe the LTAS shape.
Classification results in depression groups and descriptive clinical measures (mean and SD).
| Classification | YMRS | QIDS-C16 | BAI | ||
|---|---|---|---|---|---|
| Depression | Correct | 9 | 0.78 (0.97) | 12.67 (3.77) | 18.33 (6.12) |
| Depression | Incorrect | 3 | 1 (1) | 12 (3.60) | 16.67 (13.05) |
| Mixed depression | Correct | 15 | 4.43 (1.60) | 12.14 (4.11) | 29.14 (11.75) |
| Mixed depression | Incorrect | 2 | 4 (0) | 14 (2.83) | 16.50 (13.43) |
YMRS Young Mania Rating Scale, QIDS-C16 Quick Inventory for Depressive Symptomatology, BAI Beck Anxiety Inventory.
Classifier performance measures for depression groups (mixed symptoms as target).
| NPV | PPV | Spec | Sens | Acc | F1 | MCC |
|---|---|---|---|---|---|---|
| 0.82 | 0.83 | 0.75 | 0.88 | 0.83 | 0.86 | 0.64 |
NPV negative predictive value, PPV positive predictive value, Spec specificity, Sens sensitivity, Acc accuracy, F1 F1 score, MCC Matthew’s correlation coefficient.
Classification results in manic groups and descriptive clinical measures (mean and SD).
| Classification | YMRS | QIDS-C16 | BAI | ||
|---|---|---|---|---|---|
| Mania | Correct | 17 | 12.82 (3.56) | 2.47 (1.54) | 14.94 (9.84) |
| Mania | Incorrect | 2 | 10.50 (4.95) | 1.50 (0.71) | 2 (2.82) |
| Mixed mania | Correct | 6 | 9.50 (2.74) | 9.33 (3.44) | 22.20 (10.42) |
| Mixed mania | Incorrect | 2 | 8.50 (0.71) | 11 (0) | 18.5 (12.01) |
YMRS Young Mania Rating Scale, QIDS-C16 Quick Inventory for Depressive Symptoms-Clinician version, BAI Beck Anxiety Inventory.
Classifier performance measures for manic groups (mixed symptoms as target).
| NPV | PPV | Spec | Sens | Acc | F1 | MCC |
|---|---|---|---|---|---|---|
| 0.89 | 0.75 | 0.89 | 0.75 | 0.86 | 0.75 | 0.57 |
NPV negative predictive value, PPV positive predictive value, Spec Specificity, Sens sensitivity, Ac Accuracy, F1 F1 score, MCC Matthew’s correlation coefficient.