| Literature DB >> 31125389 |
Sabah Al-Hameed1, Mohammed Benaissa1, Heidi Christensen2,3, Bahman Mirheidari2, Daniel Blackburn4, Markus Reuber5.
Abstract
Neurodegenerative diseases causing dementia are known to affect a person's speech and language. Part of the expert assessment in memory clinics therefore routinely focuses on detecting such features. The current outpatient procedures examining patients' verbal and interactional abilities mainly focus on verbal recall, word fluency, and comprehension. By capturing neurodegeneration-associated characteristics in a person's voice, the incorporation of novel methods based on the automatic analysis of speech signals may give us more information about a person's ability to interact which could contribute to the diagnostic process. In this proof-of-principle study, we demonstrate that purely acoustic features, extracted from recordings of patients' answers to a neurologist's questions in a specialist memory clinic can support the initial distinction between patients presenting with cognitive concerns attributable to progressive neurodegenerative disorders (ND) or Functional Memory Disorder (FMD, i.e., subjective memory concerns unassociated with objective cognitive deficits or a risk of progression). The study involved 15 FMD and 15 ND patients where a total of 51 acoustic features were extracted from the recordings. Feature selection was used to identify the most discriminating features which were then used to train five different machine learning classifiers to differentiate between the FMD/ND classes, achieving a mean classification accuracy of 96.2%. The discriminative power of purely acoustic approaches could be integrated into diagnostic pathways for patients presenting with memory concerns and are computationally less demanding than methods focusing on linguistic elements of speech and language that require automatic speech recognition and understanding.Entities:
Mesh:
Year: 2019 PMID: 31125389 PMCID: PMC6534304 DOI: 10.1371/journal.pone.0217388
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Participants’ details and test scores.
| FMD(n = 15) | ND(n = 15) | Cut off | Max score | P-value | |
|---|---|---|---|---|---|
| 57.8± 2.0 | 63.7 ± 2.3 | N/A | N/A | p = 0.06 | |
| 60% | 53% | ns* | |||
| 93.0 ± 1.4 | 58 ± 5.21 | 88 | 100 | p < 0.0001 | |
| 28.9 ± 0.2 | 18.8 ± 2.0 | 26.3 | 30 | p < 0.0001 | |
| 5.6 ± 1.0 | 5.3 ± 2.0 | 5 | 27 | ns | |
| 4.7 ± 1.2 | 4.8 ± 1.5 | 5 | 21 | ns | |
| range (10.1-32.3) | range(7.3-29.0) |
ACE-R: Addenbrooke’s Cognitive Examination-Revised; MMSE: Mini-Mental State Examination; PHQ9: Patient Health Questionnaire-9; GAD-7: Generalised Anxiety Assessment 7. Unpaired T-test was used. ns* = not significant
Details of the clinical session times expressed in minutes.
| Clinical session (Conversation +verbal fluency test) | Conversation part only | Patient contribution to the conversation | ||
|---|---|---|---|---|
| FMD | 34.3 ± 9.9 | 17.9± 8.5 | 11.5± 6.3 | |
| ND | 39.2 ± 8.0 | 19.4± 7.0 | 6.2± 4.5 | |
| FMD | (22.3–52.4) | (10.1–32.3) | (5.3–26.5) | |
| ND | (24.7–57.0) | (7.3–29.0) | (1.1–15.5) | |
| FMD | Not applicable | 50.9 % | 63.0 % | |
| ND | Not applicable | 49.79 % | 32.4 % | |
STD: Standard deviation
Fig 1The acoustic-only system for the identification of patients with neurodegenerative cognitive complaints.
Acoustic features.
| Features | Type | Number of features |
|---|---|---|
| Speech and silent statistics | Speech and silent features | 15 |
| Fundamental frequency (F0) | Phonation and voice quality | 3 |
| Harmonic-to-noise ratio (HNR) | Phonation and voice quality | 3 |
| Noise-to-harmonic ratio (NHR) | Phonation and voice quality | 3 |
| Shimmer scales | Phonation and voice quality | 3 |
| Jitter scales | Phonation and voice quality | 3 |
| Number of voice breaks | Phonation and voice quality | 3 |
| Degree of voice breaks | Phonation and voice quality | 3 |
| Mel frequency cepstral coefficients (MFCC) | Spectral features | 5 |
| Filter bank energy coefficient (Fbank) | Spectral features | 5 |
| Spectral Subband Centroid (SSC) | Spectral features | 5 |
| Total | 51 |
Top (22) selected features using the wrapper, embedded and their statistical U-test.
| Rank | Features | U | P | Rank | Features | U | P |
|---|---|---|---|---|---|---|---|
| 1 | Mean time of all speech segments excluding filler words | 17.0 | 0.00007 | 12 | Mean response time | 42.0 | 0.004 |
| 2 | Ratio of max speech segment to the max pause time | 19.0 | 0.0001 | 13 | VAR degree of voice breaks | 44.0 | 0.004 |
| 3 | STD of total speech segments time excluding filler words | 19.0 | 0.0001 | 14 | STD of number of voice breaks | 44.5 | 0.004 |
| 4 | STD of the speech segments time | 20.0 | 0.0001 | 15 | Mean degree of voice breaks | 45.5 | 0.005 |
| 5 | VAR of total speech segments time excluding filler words | 20.0 | 0.0001 | 16 | Mean of Fbank coefficients | 49.0 | 0.008 |
| 6 | Ratio of max pause time to the total turn time | 24.0 | 0.0002 | 17 | Min of Fbank coefficients | 49.5 | 0.009 |
| 7 | Ratio of total pauses time to the total turn time | 24.5 | 0.0002 | 18 | VAR of SSC coefficients | 52.5 | 0.01 |
| 8 | Ratio of total speech segments time to the total turn time | 26.0 | 0.0003 | 19 | STD of SSC coefficients | 59.5 | 0.02 |
| 9 | VAR of number of voice breaks | 26.0 | 0.0003 | 20 | STD of MFCC coefficients | 60.0 | 0.03 |
| 10 | Ratio of total No. of pauses to the total turn time | 27.0 | 0.0003 | 21 | VAR of Fbank coefficients | 60.5 | 0.03 |
| 11 | Mean number of voice breaks | 30.0 | 0.0006 | 22 | Mean of MFCC coefficients | 63.0 | 0.04 |
U: Mann-Whitney u-tests.
Sample size n1 = n2 = 15.
Fig 2Nested k-fold cross validation with K = 5.
First scenario classification accuracies under different feature subsets and classifiers.
| Classifier | Accuracy using All features (51) | Accuracy with feature selection | No. of selected features | Accuracy using features with significance statistical P-value | No. of selected features |
|---|---|---|---|---|---|
| 9 | 11 | ||||
| 90.0 ± 0.27 % | 11 | 15 | |||
| 85.0 ± 0.40 % | 93.0 ± 0.16 % | 11 | 90.0 ± 0.24 % | 21 | |
| 20 | 22 | ||||
| 90.0 ± 0.16 % | 14 | 21 | |||
| 13 | 18 |
Classification accuracies for the second scenario (augmented dataset).
| Classifier | Accuracy using All features(51) ± STD | Accuracy with feature selection ± STD |
|---|---|---|
| 84.0 ± 0.23 % | 88.0 ± 0.28 % | |
| 81.0 ± 0.26 % | 87.0 ± 0.29 % | |
| 91.0 ± 0.14 % | ||