| Literature DB >> 33986655 |
Aparna Balagopalan1,2,3, Benjamin Eyre1, Jessica Robin1, Frank Rudzicz2,3,4, Jekaterina Novikova1.
Abstract
Introduction: Research related to the automatic detection of Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional diagnostic methods. Since AD significantly affects the content and acoustics of spontaneous speech, natural language processing, and machine learning provide promising techniques for reliably detecting AD. There has been a recent proliferation of classification models for AD, but these vary in the datasets used, model types and training and testing paradigms. In this study, we compare and contrast the performance of two common approaches for automatic AD detection from speech on the same, well-matched dataset, to determine the advantages of using domain knowledge vs. pre-trained transfer models.Entities:
Keywords: Alzheimer's disease; BERT; MMSE regression; dementia detection; feature engineering; transfer learning
Year: 2021 PMID: 33986655 PMCID: PMC8110916 DOI: 10.3389/fnagi.2021.635945
Source DB: PubMed Journal: Front Aging Neurosci ISSN: 1663-4365 Impact factor: 5.750
Basic characteristics of the patients in each group in the ADReSS challenge dataset are more balanced in comparison to DementiaBank.
| ADReSS | Train | Male | 24 | 24 |
| Female | 30 | 30 | ||
| ADReSS | Test | Male | 11 | 11 |
| Female | 13 | 13 | ||
| DementiaBank (Becker et al., | – | Male | 125 | 83 |
| Female | 197 | 146 | ||
ADReSS test set from Luz et al. (2020): basic characteristics of the patients in each group (M, male; F, female).
| [50, 55) | 1 | 0 | 23.0 (n.a) | 1 | 0 | 28.0 (n.a) |
| [55, 60) | 2 | 2 | 18.7 (1.0) | 2 | 2 | 28.5 (1.2) |
| [60, 65) | 1 | 3 | 14.7 (3.7) | 1 | 3 | 28.7 (0.9) |
| [65, 70) | 3 | 4 | 23.2 (4.0) | 3 | 4 | 29.4 (0.7) |
| [70, 75) | 3 | 3 | 17.3 (6.9) | 3 | 3 | 28.0 (2.4) |
| [75, 80) | 1 | 1 | 21.5 (6.3) | 1 | 1 | 30.0 (0.0) |
| Total | 11 | 13 | 19.5 (5.3) | 11 | 13 | 28.8 (1.5) |
Summary of all lexico-syntactic features extracted.
| Syntactic complexity | 36 | L2 Analyzer features; utterance length, depth of syntactic parse tree |
| Production rules | 104 | Proportion of production type |
| Phrasal type ratios | 13 | Proportion, average length and rate of phrase types |
| Lexical norm-based | 12 | Average lexical norms across words for (e.g., imageability) |
| Lexical richness | 6 | Type-token ratios; brunet; Honor's statistic |
| Word category | 5 | Proportion of demonstratives, function words, |
| Light verbs and inflected verbs, and propositions | ||
| Noun ratio | 3 | Ratios nouns:(nouns+verbs); nouns:verbs; pronouns:(nouns+pronouns) |
| Length measures | 1 | Average word length |
| Universal POS proportions | 18 | Proportions of Spacy universal POS tags |
| POS tag proportions | 53 | Proportions of Penn Treebank POS tags |
| Local coherence | 15 | Similarity between word2vec representations of utterances |
| Utterance distances | 5 | Fraction of pairs of utterances below a similarity threshold (0.5, 0.3, 0); avg/min distance |
| Speech-graph features | 13 | Representing words as nodes in a graph and computing density, number of loops, etc. |
| Utterance cohesion | 1 | Number of switches in verb tense across utterances divided by total number of utterances |
| Rate | 2 | Ratios—number of words: duration of audio; number of syllables: duration of speech, |
| Invalid words | 1 | Proportion of words not in the English dictionary |
| Sentiment norm-based | 9 | Average sentiment norms across all words, noun, and verbs |
The number of features in each subtype is shown in the second column (titled “#Features”).
Summary of all semantic features extracted.
| Word frequency | 10 | Proportion of lemmatized words occurrences |
| Global coherence | 15 | Cosine distances between word2vec utterances and content units |
The number of features in each subtype is shown in the second column (titled “#Features”).
Ten-fold CV results averaged across three runs with different random seeds on the ADReSS train set.
| SVM | 10 | 0.796 | 0.81 | 0.78 | 0.82 | 0.79 |
| NN | 10 | 0.762 | 0.77 | 0.75 | 0.77 | 0.76 |
| RF | 50 | 0.738 | 0.73 | 0.76 | 0.72 | 0.74 |
| NB | 80 | 0.750 | 0.76 | 0.74 | 0.76 | 0.75 |
| BERT | – |
Accuracy for BERT is higher, but not significantly so from SVM (H = 0.4838, p > 0.05 Kruskal-Wallis H-test). Bold indicates the best result.
AD detection results on unseen, held out ADReSS test set averaged over three runs with different random seeds.
| Baseline (Luz et al., | – | 0.7500 | – | – | – | 0.7800 | – |
| SVM | 10 | 0.8125 | 0.8000 | 0.8333 | 0.7917 | 0.8124 | 0.8125 |
| NN | 10 | 0.7708 | 0.7671 | 0.7778 | 0.7639 | 0.7708 | 0.7708 |
| RF | 50 | 0.7569 | 0.8033 | 0.6806 | 0.8333 | 0.7555 | 0.7500 |
| NB | 80 | 0.7292 | 0.7895 | 0.6250 | 0.8333 | 0.7262 | 0.7292 |
| BERT | – |
Bold indicates the best result.
LOSO-CV MMSE regression results on the ADReSS train and test sets.
| Baseline (Luz et al., | – | – | 4.38 | 5.20 | |
| LR | 15 | – | 5.37 | 4.18 | 4.94 |
| LR | 20 | – | 4.94 | 3.72 | – |
| Ridge | 509 | 12 | 6.06 | 4.36 | – |
| Ridge | 35 | 12 | 4.87 | 3.79 | |
| Ridge | 25 | 10 | – | ||
Bold indicates the best result.
Feature differentiation analysis results for the most important features, based on ADReSS train set.
| Average cosine distance between utterances | Semantic | 0.91 | 0.94 | – | – |
| Fraction of pairs of utterances below a similarity threshold (0.5) | Semantic | 0.03 | 0.01 | – | – |
| Cosine distance between word2vec utterances and content units | Semantic | 0.46 | 0.38 | −0.54 | −1.01 |
| Distinct content units mentioned: total content units | Semantic | 0.27 | 0.45 | 0.63 | 1.78 |
| Distinct action content units mentioned: total content units | Semantic | 0.15 | 0.30 | 0.49 | 1.04 |
| Distinct object content units mentioned: total content units | Semantic | 0.28 | 0.47 | 0.59 | 1.72 |
| Cosine distance between GloVe utterances and content units | Semantic | – | – | −0.42 | −0.03 |
| Average word length (in letters) | Lexico-syntactic | 3.57 | 3.78 | 0.45 | 1.07 |
| Proportion of pronouns | Lexico-syntactic | 0.09 | 0.06 | – | – |
| Ratio (pronouns):(pronouns+nouns) | Lexico-syntactic | 0.35 | 0.23 | – | – |
| Proportion of personal pronouns | Lexico-syntactic | 0.09 | 0.06 | – | – |
| Proportion of adverbs | Lexico-syntactic | 0.06 | 0.04 | −0.41 | −0.41 |
| Proportion of adverbial phrases amongst all rules | Lexico-syntactic | 0.02 | 0.01 | −0.37 | −0.74 |
| Proportion of non-dictionary words | Lexico-syntactic | 0.11 | 0.08 | – | – |
| Proportion of gerund verbs | Lexico-syntactic | – | – | 0.37 | 1.08 |
| Proportion of words in adverb category | Lexico-syntactic | – | – | −0.4 | −0.49 |
μ.
Next to correlation indicates significance at p < 9e-5.
Figure 1A t-SNE plot showing class separation. Note we only use the 13 features significantly different between classes (see Table 10) in feature representation for this plot.
Figure 2An attention visualization plot showing attention contributions of embeddings corresponding to each word to the “pooled” representation. This example is a sub-sample (first two utterances) of a speech transcript from a healthy person.
LOSO-CV results averaged across three runs with different random seeds on the ADReSS train set.
| Baseline (Luz et al., | – | 0.768 | 0.77 | 0.76 | – | 0.77 |
| SVM | 509 | 0.741 | 0.75 | 0.72 | 0.76 | 0.74 |
| SVM | 10 | |||||
| NN | 10 | 0.836 | 0.86 | 0.81 | 0.86 | 0.83 |
| RF | 50 | 0.778 | 0.79 | 0.77 | 0.79 | 0.78 |
| NB | 80 | 0.787 | 0.80 | 0.76 | 0.82 | 0.78 |
Accuracy for SVM is significantly higher than NN (H = 4.50, p = 0.034 Kruskal-Wallis H-test). Bold indicates the best result.
ADReSS Training set from Luz et al. (2020): basic characteristics of the patients in each group (M, male; F, female).
| [50, 55) | 1 | 0 | 30.0 (n/a) | 1 | 0 | 29.0 (n/a) |
| [55, 60) | 5 | 4 | 16.3 (4.9) | 5 | 4 | 29.0 (1.3) |
| [60, 65) | 3 | 6 | 18.3 (6.1) | 3 | 6 | 29.3 (1.3) |
| [65, 70) | 6 | 10 | 16.9 (5.8) | 6 | 10 | 29.1 (0.9) |
| [70, 75) | 6 | 8 | 15.8 (4.5) | 6 | 8 | 29.1 (0.8) |
| [75, 80) | 3 | 2 | 17.2 (5.4) | 3 | 2 | 28.8 (0.4) |
| Total | 24 | 30 | 17.0 (5.5) | 24 | 30 | 29.1 (1.0) |
Summary of all acoustic/temporal features extracted.
| Pauses and fillers | 9 | Total and mean duration of pauses; long and short pause counts; |
| pause to word ratio; fillers (um, uh); duration of pauses to word durations | ||
| Fundamental frequency | 4 | Avg/min/max/median fundamental frequency of audio |
| Duration-related | 2 | Duration of audio and spoken segment of audio |
| Zero-crossing rate | 4 | Avg/variance/skewness/kurtosis of zero-crossing rate |
| MFCC | 168 | Avg/variance/skewness/kurtosis of 42 MFCC coefficients |
The number of features in each subtype is shown in the second column (titled “#Features”).