| Literature DB >> 33833713 |
Israel Martínez-Nicolás1,2, Thide E Llorente1,2, Francisco Martínez-Sánchez3, Juan José G Meilán1,2.
Abstract
Background: The field of voice and speech analysis has become increasingly popular over the last 10 years, and articles on its use in detecting neurodegenerative diseases have proliferated. Many studies have identified characteristic speech features that can be used to draw an accurate distinction between healthy aging among older people and those with mild cognitive impairment and Alzheimer's disease. Speech analysis has been singled out as a cost-effective and reliable method for detecting the presence of both conditions. In this research, a systematic review was conducted to determine these features and their diagnostic accuracy.Entities:
Keywords: Alzheimer's disease; language impairment; mild cognitive impairment; speech analysis; speech impairment
Year: 2021 PMID: 33833713 PMCID: PMC8021952 DOI: 10.3389/fpsyg.2021.620251
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1PRISMA flowchart of the process followed to select studies for the review.
Descriptive studies on voice and speech of Alzheimer's disease and mild cognitive impairment patients.
| Hoffmann et al. ( | Control (15) and AD (30) | Spontaneous speech: explain why they are at the clinic and recount events and daily activities | Longer speech time and phonation time; higher voice breaks/hesitations >30 ms; lower speech rate and articulation rate |
| Horley et al. ( | Control (20) and AD (20) | Sentence repetition with a given emotional tone and reading | Differences in F0 (only when expressing surprise or happiness), F0 SD, and speech rate (when reading, but not when repeating). |
| Martínez-Sánchez et al. ( | Control (17) and AD (25) | Oral reading | F0, F0 SD, phonation time, proportion of pauses, % voiceless segments, voice breaks |
| Nasrolahzadeh et al. ( | Control (30) and AD (30) | Telling personal stories and conversation | AD patients record fewer variations in speech signal |
| Nasrolahzadeh et al. ( | Control (30) and AD (30) | Telling personal stories and conversation | Spontaneous speech signals of AD patients are less chaotic and non-linear than healthy subjects using higher-order spectra analysis. |
| Beltrami et al. ( | Control (48) and cognitively impaired (48: 16 aMCI, 16 mdMCi, and 16 eDem) | Describing a complex picture, a typical working day, and recalling the last dream remembered | Acoustic and rhythmic features can differentiate between multidomain MCI, early dementia, and control. Some acoustic features discriminated between control and aMCI. |
| Meilán et al. ( | AD (21) | Reading | % of voiceless segments explains a significant portion of the variance in the overall scores obtained in the neuropsychological test of patients with AD |
| Meilán et al. ( | Control (102), MCI (38), and AD (42) | Reading | Semantic and phonetic verbal fluency tasks explain 30.1% of the variance of unvoiced percentage and 26.4 of the percentage of voice breaks. |
| De Looze et al. ( | Control (36), MCI (16), and AD (18) | Reading sentences of different length and syntax complexity | Changes in speech chunking and speech timing when reading cognitive demanding sentences may be markers of MCI and AD as a consequence of impairments in working memory and attention. |
| Qiao et al. ( | Control (24), MCI (20), and AD (20) | “Cookie theft” picture description task | The 7 parameters found correlate with the cognitive function. Stepwise regression showed that the maximum and average duration of silence segments, the percentage of the duration of silence, and the minimum duration of phrasal segments explain 47.8% of the MMSE score variation. |
| Meilán et al. ( | Non-degenerative MCI (73) and MCI preAD (13) | Reading task | Duration and phonation time, pause number, and several frequency and intensity features differentiate people with MCI that develop Alzheimer from those that do not. |
Predictive studies on the early diagnosis of Alzheimer's disease.
| López-de-Ipiña et al. ( | Control (20) and AD (20) | Telling stories and conversation | Combination of two feature sets: emotional speech analysis (acoustic, voice quality, and duration features) and emotional temperature (prosodic and paralinguistic features) | AD: 75.2–97.7% | Artificial neural networks |
| López-de-Ipiña et al. ( | Control (20) and AD (20) | Telling stories and conversation | % voiced, % voiceless segments | AD: 83.7–93.79% | Machine learning |
| Martínez-Sánchez et al. ( | Control (35) and AD (35) | Reading task | Cut-off points for speech rate is 3.08 syllables per second, and for articulation, it is 4.27 syllables per second. | AD: 80% | ROC curve |
| Meilán et al. ( | Control (36) and AD (30) | Reading task | Percentage of voice breaks, number of periods of voice, number of voice breaks, and shimmer (apq3) | AD: 84.4% | Linear discriminant analysis |
| Khodabakhsh et al. ( | Control (27) and AD (27) | Conversation | Combination of 13 features (voice activity, articulation, and rate of speech-related features) | AD: 75.5–94.3% | Machine learning |
| Khodabakhsh and Demiroglu ( | Control (51) and AD (28) | Unstructured conversation | Silence ratio | AD: 78.5–83.5% | Machine learning |
| López-de-Ipiña et al. ( | Control (20) and AD (20) | Telling stories and conversation | Combination of features | AD: 96.89% | Machine learning |
| López-de-Ipiña et al. ( | Control (20) and AD (20) | Telling stories and conversation | Automatic selection of spontaneous speech features and of maximum, minimum, variance, standard deviation, median, and mode average for full signal and voiced signal | AD: 87.30–92.43% | Machine learning |
| Martínez-Sánchez et al. ( | Control (82) and AD (45) | Reading | Standard deviation of the duration of syllabic intervals | AD: 87% | ROC curve |
| Nasrolahzadeh et al. ( | Control (30) and AD (30) | Telling personal stories and conversation | Higher-order spectral analysis | AD: 94.18–97.71% | Machine learning |
| König et al. ( | Control (15), MCI (23), and AD (26) | Counting backward task, sentence repeating task, image description task, and verbal fluency task. | Combination of meaningful vocal features extracted from several tasks | MCI: 79%; | Machine learning |
| López-de-Ipiña et al. ( | Control (187) and MCI (38) | Categorical verbal fluency task (animals) | Several types of features are used to model both linear and non-linear disfluencies and speech. A total of 920 features are obtained. The best results are achieved with the 25-feature set. | MCI: 92–95% | Deep learning |
| López-de-Ipiña et al. ( | Three different samples: VF task (187 control and 38 patients with MCI); picture description (12 control and 6 with AD); spontaneous language (50 control and 20 with AD) | Categorical fluency task (control vs. MCI)Picture description task and spontaneous speech (control vs. AD) | Most relevant features automatically extracted for every comparison. | MCI: 73%; | Deep learning |
| Kato et al. ( | Control (91), MCI (91), and AD (91) | Answering a questionnaire on birthplace (T1), the name of his/her elementary school (T2), time orientation (Q2), and repeating 3-digit numbers backward (Q6) | Speech prosody-based cognitive impairment rating (SPCIR; 128 prosody features) | AD: 74.7–89.5% (from Q2); | ROC curve |
| Themistocleous et al. ( | Control (30) and MCI (25) | Reading (vowels were segmented from sentences) | Vowel formants, F0, vowel duration | MCI: 75–83% | ROC curve |
| Toth et al. ( | Control (36) and MCI (48) | Immediate and delayed recall of a short film | Combination of features (duration, speech rate, articulation rate. and pause-related). | MCI: 75% | Machine learning |
| Fraser et al. ( | Control (97) and AD (167) | “Cookie theft” picture description task | Combination of 35 to 50 acoustic, semantic, and syntactic features. | AD: 78.72–81.92% | Machine learning |
| Fraser et al. ( | Control (29) and MCI (26) | Cookie theft picture description task and reading task | Speech features + eye movement features + language features | MCI: 41–83% | Machine learning |
| Gosztolya et al. ( | Control (25), MCI (25), and AD (25) | Immediate and delayed recall of a short film | Set 1: acoustic features (speech rate and the number and duration of silent and filled pauses) Set 2: acoustic features + linguistic features | Set 1: 74–82% | Machine learning |
| König et al. ( | SCI (56), MCI (44), VD (38), and AD (27) | Fluency, picture description, counting down, and free speech tasks | Different combination of features extracted for every comparison | SCI vs. AD = 92%, | Machine learning |
| Martínez-Sánchez et al. ( | Control (98) and AD (47) | Reading | Age, minimum amplitude, maximum amplitude difference, mean and standard deviation of the NHR; asymmetry; standard deviation in the first formant; formant 3 bandwidth; standard deviation of the Acoustic Voice Quality Index; tone variability; Normalized Pairwise Variability Index | AD: 92.4% | Discriminant analysis |
| Al-Hameed et al. ( | Control with memory complaints (15), neurodegenerative disorders (15: AD 10, 2 aMCI, 2 frontotemporal dementia) | Conversation | Different sets of features (best with 9 features) | Neurodegenerative disorders: 81–92% | Machine learning |
| Chien et al. ( | Control (30) and AD (30) | Answers to neuropsychological tests | Feature sequence (a representation of various elements in speech) | AD: AUC 0.838 | Machine learning |
| Nagumo et al. ( | Control (6343), MCI (1601), global cognitive impairment (367), MCI + GCI (468) | Vowel utterances, tongue twister, diadochokinetic rate, short sentences | Set of temporal and acoustic features. | MCI: AUC 0.61 | Machine learning |
Figure 2Quality assessment of the descriptive studies using the JBI appraisal checklist, and their rating is a high, low, or unclear risk of bias for each question.
Figure 3Proportion of descriptive studies with a low, high, or unclear risk of bias.
Figure 4Quality assessment of the predictive studies using the QUADAS-2 checklist and their rating as a high, low, or unclear risk of bias for each domain and their applicability concerns.
Figure 5Proportion of predictive studies with a low, high, or unclear risk of bias.
Review of studies on the distinctive speech markers in older people.
| Speech time and phonation time | Longer in MCI (Toth et al., | Longer in AD than in healthy control (Hoffmann et al., |
| Number and proportion of pauses | Increase in the length of silent pauses (voiceless), producing a lower speech rate (Toth et al., | Increased number and proportion of pauses in AD (Martínez-Sánchez et al., |
| Voice breaks/hesitations >30 ms | Higher in AD (Hoffmann et al., | |
| % voiceless segments | Higher in AD. It explains a significant portion of the variance in the overall scores obtained in the neuropsychological testing of patients with AD (Meilán et al., | |
| Prosodic rate: decrease in speech rate with hesitations, as well as in articulatory rate without hesitations (fewer phonemes per second). | Presence of stammers and articulatory disfluencies that interrupt speech; longer hesitations (López-de-Ipiña et al., | Lower speech rate and articulation rate (Hoffmann et al., |
| Affective prosody | Impairments in affective prosody expression in the AD group when expressing surprise or happiness, but not sadness (Horley et al., | |
| Spectrum features | Changes in spectrum features such as special centroid or mel-frequency cepstral coefficients (MFCCs), the spectral energy, flux, variance, skewness, kurtosis, and slope (Fraser et al., | |
| Fundamental frequency | Altered (Themistocleous et al., | Lower mean of the fundamental frequency and standard deviation causing a “flat” speech prosody (Horley et al., |
| Autocorrelation: fluctuation of values of autocorrelation of the fundamental frequency in a specific period | Wider variability (Meilán et al., | |
| Phonological planning: formants | Impairments in formant features (Themistocleous et al., | Distortion in the parameters F2 and F3 (Meilán et al., |
| Syllabic variability | Altered mean duration of syllables (Martínez-Sánchez et al., | |
| (NHR): noise/harmonics | Lower ratio in AD than in people with NPS (Meilán et al., | |
| Continuity of harmonic segments | Lower in AD (König et al., | |
| Shimmer | Decrease amplitude perturbation quotient of sound between 3 and 11 vocal pulses (Shimmer_dB apq3; Shimmer_dB apq11) (Meilán et al., | |