| Literature DB >> 32929494 |
Ulla Petti1, Simon Baker1, Anna Korhonen1.
Abstract
OBJECTIVE: In recent years numerous studies have achieved promising results in Alzheimer's Disease (AD) detection using automatic language processing. We systematically review these articles to understand the effectiveness of this approach, identify any issues and report the main findings that can guide further research.Entities:
Keywords: Alzheimer’s disease; dementia; language; natural language processing; speech
Mesh:
Year: 2020 PMID: 32929494 PMCID: PMC7671617 DOI: 10.1093/jamia/ocaa174
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Flow diagram of study selection.
Information extracted from the studies
| No | Study | Impaired group size (s) | Control group size | AD/MCI | Data collection method | Most informative language and speech features | Number of samples used to train the model | Classification algorithm | Classification performance |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Ammer and Ayed 2018 | AD n = 242 | n = 242 | AD | SS | Repetition, word errors, MLU morphemes, POS | – | SVM, NN, DT | Precision = 79% |
| 2 | Beltrami et al 2018 |
aMCI n = 16, mdMCI n = 16, early dementia n = 16 | n = 48 | MCI | SS | Acoustic, lexical, syntactic | – | – | – |
| 3 | Boye et al 2014 | AD n = 5 | n = 5 | AD | SS | Lexical and semantic deficit, reduced conversation | – | – | – |
| 4 | Chien et al 2018 |
1) AD n = 15 2) AD n = 30 |
1) n = 15 2) n = 30 | AD | SS | Speech length, non-silence tokens | 150 samples from 60 participants | RNN | AUC = 0.956 |
| 5 | Clark et al 2014 |
MCI n = 23, AD n = 10 | n = 25 | AD, MCI | SVF, PVF | Semantic similarity of words | – | – | – |
| 6 | Clark et al 2016 | MCI-con n = 24, MCI-non n = 83 | n = 51 | MCI | SVF, PVF | Coherence, lexical frequency, graph theoretical measures | 158 | Random forest, SVM, NB, MLP | Acc = 81–84% |
| 7 | Fang et al 2017 | MCI-con n = 1 | n = 2 | MCI | SS | Unique and specific words, grammatical complexity | – | – | – |
| 8 | Fraser et al 2016 | AD n = 167 | n = 97 | AD | SS | Semantic, acoustic, syntactic, information | 473 samples from, 264 participants | Logistic regression | Acc = 81% |
| 9 | Garrard et al 2017 | Probable AD n = 5 | n = 0 | AD | SS | – | – | – | – |
| 10 | Gosztolya et al 2016 | MCI = 48 | n = 36 | MCI | SS | Filled pauses | 84 | SVM | Acc = 88.1% |
| 11 | Gosztolya et al 2019 | MCI n = 25, early AD n = 25 | n = 25 | AD, MCI | SS | Semantic, morphological, acoustic attributes | 75 | SVM | Acc = 86% |
| 12 | Guinn et al 2014 | AD n = 28 | n = 28 | AD | SS | Ratio, POS, lexical, pauses, fillers | 56 | NB, DT | precision = 80% |
| 13 | Hernandez-Dominguez et al 2018 |
AD n = 169 MCI n = 19 | n = 74 | AD, MCI | SS | Information coverage, auxiliary verbs, hapax legomena | 236 training, 26 testing | SVM, Random Forest | Acc = 87–94% |
| 14 | Khodabakhsh et al. 2014a | AD n = 27 | n = 27 | AD | SS | Log voicing ratio, average absolute delta formant and pitch | 54 | SVM, DT | Acc = 88–94% |
| 15 | Khodabakhsh et al. 2014b | AD n = 20 | n = 20 | AD | SS | Fillers, unintelligibility, no of words, confusion, pause & no answer rate | 40 | SVM, DT | Acc = 90% |
| 16 | Khodabakhsh et al 2015 | AD n=28 | n=51 | AD | SS | Ratio, POS, speech rate features | 79 | SVM, NN NB, CTree | Acc = 84% |
| 17 | Konig et al 2015 | MCI n=23, AD n=26 | n=15 | AD, MCI | SS, SVF, OT | Speech continuity, ratio | 64 | SVM | EER = 13–21% |
| 18 | Konig et al 2018 | AD n=27, mixed dementia n=38, MCI n=44, SCI n=56 | n=0 | AD, MCI | SS, SVF, PVF, OT | Location of first word, words’ distribution in time | 165 | SVM | Acc = 86% |
| 19 | Lopez-de-Ipina et al 2013a | AD n = 20 | n = 20 | AD | SS | Impoverished vocabulary, limited replies | 40 | MLP | Acc = 75–94.6% |
| 20 | Lopez-de-Ipina et al 2013b |
Early AD n = 1 Intermediate AD n = 2 Advanced AD n = 2 | n = 5 | AD | SS | Fluency, acoustic | 10 | SVM, MLP, kNN, DT, NB | Acc = 93.79% |
| 21 | Lopez-de-Ipina et al 2015 | AD n = 20 | n = 20 | AD | SS | Duration, time, frequency | 40 | MLP, KNN | Acc = 95% |
| 22 | Lopez-de-Ipina et al 2018 |
1) AD n = 6, 2) AD n = 20, 3) MCI n = 38 |
1) n = 12, 2) n = 20 3) n = 62 | AD, MCI | SS, SVF | Voicing, pauses, F0, harmonicity | 18, 40, 100 | MLP, CNN | Acc = 73–95% |
| 23 | Luz 2018 |
Nr of recordings reported AD n = 214 recordings | n = 184 recordings | AD | SS | Vocalisation, speech rate, number of utterances across discourse event | 398 | NB | Acc = 68% |
| 24 | Martinez de Lizarduy et al 2017 |
1) AD n = 6, 2) AD n = 20, 3) MCI n = 38 |
1) n = 12, 2) n = 20 3) n = 62 | AD, MCI | SS | Voicing, pauses, F0, harmonicity | 18, 40, 100 | kNN, SVM, MLP, CNN | Acc = 80–95% |
| 25 | Martinez-Sanchez et al 2016 | Possible AD n = 45 | n = 82 | AD | OT | Syllable intervals and their variation | – | – | AUC = 0.87 |
| 26 | Mirzaei et al 2018 | Early AD n = 16, MCI n = 16 | n = 16 | AD, MCI | OT | HNR, voice length, silences | 48 | kNN, SVM, DT | – |
| 27 | Rentoumi et al 2017 | AD n = 30 | n = 30 | AD | OT | – | – | NB, SVM | Acc = 89% |
| 28 | Sadeghian et al 2017 | AD n = 26 | n = 46 | AD | SS | Long pauses, pause and speech duration | 65 training, 7 testing | MLP | Acc = 94.4% |
| 29 | Satt et al 2013 | MCI n = 43, AD n = 27 | n = 19 | AD, MCI | SS, OT | Verbal reaction time, voiced segments | 89 | SVM | EER = 15.5–18% |
| 30 | Toth et al 2015 | MCI n = 32 | n = 19 | MCI | SS | Pauses, tempo | 153 samples from 51 participants | SVM, Random Forest | Acc = 82.4% |
| 31 | Toth et al 2018 | MCI n = 48 | n = 36 | MCI | SS | Pausation, tempo and duration | 84 | NB, Random Forest, SVM | Acc = 75% |
| 32 | Warnita et al 2018 | AD n = 169 | n = 98 | AD | SS | Feature set from Interspeech 2010 | 488 samples from 267 participants | GCNN | Acc = 73.6% |
| 33 | Zimmerer et al 2016 | AD n = 48 | n = 38 | AD | SS | Semantic errors, bigram and trigram proportions | – | Logistic regression | – |
Abbreviations: Acc, accuracy; AD, Alzheimer’s disease; aMCI, amnestic mild cognitive impairment; AUC, area under curve; CNN, convolutional neural networks; CTree, classification tree; DT, decision tree; EER, equal error rate; GCNN, gated convolutional neural networks; HNR, harmonics-to-noise ratio; kNN, k-nearest neighbor; MCI, mild cognitive impairment; MCI-con, mild cognitive impairment later converted into AD; MCI-non, mild cognitive impairment not converted into AD; MD, mixed dementia; mdMCI, multiple domain mild cognitive impairment; MLP, multilayer perceptron; MLU, mean length of utterance; NB, Naive Bayes; OT, other tasks; POS, part-of-speech; SCI, subjective cognitive impairment; SS, spontaneous speech; SVF, semantic verbal fluency; SVM, support vector machine.
Participant information
| Participant groups (total number of datasets including the group) | Information variable (number of datasets including this information) | Mean (SD) | Min | Max |
|---|---|---|---|---|
|
|
|
| 2 | 242 |
|
|
| 57 | 76 | |
|
|
| 9 | 18 | |
|
|
|
| 1 | 83 |
|
|
|
| 57 | 78 |
|
|
|
| 11 | 16 |
|
|
|
| 1 | 242 |
|
|
|
| 66 | 80 |
|
|
|
| 8 | 15 |
|
|
|
| 16 | 38 |
|
|
|
| 66 | 79 |
|
|
|
| 9 | 9 |
Abbreviations: AD, Alzheimer’s disease; aMCI, amnestic mild cognitive impairment; MCI, mild cognitive impairment; MCI-con, mild cognitive impairment later converted into AD; MCI-non, mild cognitive impairment not converted into AD; MD, mixed dementia; mdMCI, multiple domain mild cognitive impairment; SD, standard deviation.
Figure 2.Division of language tests used to identify different health conditions.
Figure 3.Most informative language and speech features across SS, VF, and OT tasks (AD, Alzheimer’s Disease; MCI, mild cognitive impairment; OT, other tasks; POS, Part-of-Speech; SD, Standard Deviation; SS, spontaneous speech; VF, verbal fluency).
Details of ML methods used and the performance achieved. “Average of all reported outcomes” refers to the average of all measures reported across studies using the ML algorithm and performance measure. “Average of best reported outcomes” takes the average measure of the best performance reported in each study (1 measure per study) using the ML algorithm and performance measure
| Classification | AD vs healthy control | CD vs healthy control | |||||||
|---|---|---|---|---|---|---|---|---|---|
| performance measure | acc (n) | AUC (n) | precision (n) | EER (n) | acc (n) | AUC (n) | precision (n) | EER (n) | |
|
Neural Nets (NNs) (n = 17) | average of all reported outcomes | 86% (17) | – | 0.64 (4) | – | 65% (4) | – | – | – |
| average of best reported outcomes | 88% (6) | 0.96 (1) | 0.69 (1) | – | 69% (2) | – | – | – | |
|
Support Vector Machines (SVMs) (n = 16) | average of all reported outcomes | 81% (33) | – | 0.68 (4) | 14% (2) | 78% (13) | – | – | 19% (3) |
| average of best reported outcomes | 88% (9) | – | 0.79 (1) | 14% (2) | 77% (6) | – | – | 19% (2) | |
|
Decision Trees (DTs) (n = 11) | average of all reported outcomes | 82% (24) | – | 0.63 (5) | – | 78% (9) | – | – | – |
| average of best reported outcomes | 90% (4) | – | 0.96 (2) | – | 80% (3) | – | – | – | |
|
Naïve Bayes (NB) (n = 7) | average of all reported outcomes | 81% (8) | – | 0.81 (1) | – | 67% (1) | – | – | – |
| average of best reported outcomes | 81% (4) | – | 0.81 (1) | – | 67% (1) | – | – | – | |
Abbreviations: acc, accuracy; AD, Alzheimer’s disease; AUC, area under curve; CD, cognitive decline; EER, equal error rate.
Most effective technologies
| ID | Study | Most effective technologies | Classification performance |
|---|---|---|---|
| 1 | Ammer and Ayed 2018 | feature selection: kNN; classifier: SVM | precision = 79% |
| 2 | Beltrami et al 2018 | Acoustic features | – |
| 3 | Boye et al 2014 | – | – |
| 4 | Chien et al 2018 | bidirectional LSTM RNN | AUC = 0.956 |
| 5 | Clark et al 2014 | Semantic similarity features | – |
| 6 | Clark et al 2016 | Classifiers with novel scores including MRI data | Acc = 81–84% |
| 7 | Fang et al 2017 | length of sentence, unique words, non-specific, and specific words | – |
| 8 | Fraser et al 2015 | Using 35 features | Acc = 82% |
| 9 | Garrard et al 2017 | Certain scripts and motives | – |
| 10 | Gosztolya et al 2016 | automatically selected feature set, correlation-based feature selection technique | Acc = 88.1% |
| 11 | Gosztolya et al 2019 | AD: combination of linguistic and acoustic features; MCI: semantic and acoustic features | Acc = 86% |
| 12 | Guinn et al 2014 | go-ahead utterances and certain fluency measures | precision = 80% |
| 13 | Hernandez-Dominguez et al 2018 | AD detection: RFC with coverage and linguistic features; decline detection: RFC with a combination of features with P-value <.001 when correlating with cognitive impairment | Acc = 87–94% |
| 14 | Khodabakhsh et al 2014a | SVM, logarithm of voicing ratio, average absolute delta feature of the first formant, and average absolute delta pitch feature | Acc = 88–94% |
| 15 | Khodabakhsh et al 2014b | SVM, DT | Acc = 90% |
| 16 | Khodabakhsh et al 2015 | SVM classifier with the silence ratio feature | Acc = 84% |
| 17 | Konig et al 2015 | – | EER = 13–21% |
| 18 | Konig et al 2018 | Fluency tasks | Acc = 86% |
| 19 | Lopez-de-Ipina et al 2013a | Including fractal dimension sets | Acc = 75–94.6% |
| 20 | Lopez-de-Ipina et al 2013b | SVM and features from 3 datasets: spontaneous speech, emotional response and energy features | Acc = 93.79% |
| 21 | Lopez-de-Ipina et al 2015 | MLP for Katz’s and Castiglioni’s algorithm with a window-size of 320 points | Acc = 95% |
| 22 | Lopez-de-Ipina et al 2018 | SS task and AD patients: the recording environment within a relaxing atmosphere; the presence of subtle cognitive changes in the signal due to a more open language; and the use of AD patients instead of MCI subjects. | Acc = 73–95% |
| 23 | Luz 2018 | – | Acc = 68% |
| 24 | Martinez de Lizarduy et al 2017 | spontaneous speech task; CNN | Acc = 80–95% |
| 25 | Martinez-Sanchez et al 2016 | The standard deviation of the duration of ΔS | AUC = 87% |
| 26 | Mirzaei et al 2018 | kNN with 18 features | – |
| 27 | Rentoumi et al 2017 | – | Acc = 89% |
| 28 | Sadeghian et al 2017 | using all the potential features, including and choosing the 5 most informative ones: 1) MMSE, 2) race, 3) fraction of pauses greater than 10s, 4) fraction of speech length that was pause, 5) words indicating quantities | Acc = 94.4% |
| 29 | Satt et al 2013 | Using 20 features | EER = 15.5–18% |
| 30 | Toth et al 2015 | SVM with manually extracted features | Acc = 82.4% |
| 31 | Toth et al 2018 | RFC with automatic and significant feature set | Acc = 66.7–75% |
| 32 | Warnita et al 2018 | 10-layer CNN with Interspeech 2010 feature set | Acc = 73.6% |
| 33 | Zimmerer et al 2016 | connectivity, closed-class words, semantic error rate | – |
Abbreviations: Acc, accuracy; AD, Alzheimer’s disease; CNN, convolutional neural networks; DT, decision trees; ET, emotional temperature; kNN, k-nearest neighbor; LSTM RNN, long short-term memory recurrent neural network; MLP, multilayer perception; MMSE, mini-mental state examination; MRI, magnetic resonance imaging; RFC, random forest classifier; SVM, support vector machine.