| Literature DB >> 34093165 |
Hali Lindsay1, Johannes Tröger1,2, Alexandra König3,4.
Abstract
Alzheimer's disease (AD) is a pervasive neurodegenerative disease that affects millions worldwide and is most prominently associated with broad cognitive decline, including language impairment. Picture description tasks are routinely used to monitor language impairment in AD. Due to the high amount of manual resources needed for an in-depth analysis of thereby-produced spontaneous speech, advanced natural language processing (NLP) combined with machine learning (ML) represents a promising opportunity. In this applied research field though, NLP and ML methodology do not necessarily ensure robust clinically actionable insights into cognitive language impairment in AD and additional precautions must be taken to ensure clinical-validity and generalizability of results. In this study, we add generalizability through multilingual feature statistics to computational approaches for the detection of language impairment in AD. We include 154 participants (78 healthy subjects, 76 patients with AD) from two different languages (106 English speaking and 47 French speaking). Each participant completed a picture description task, in addition to a battery of neuropsychological tests. Each response was recorded and manually transcribed. From this, task-specific, semantic, syntactic and paralinguistic features are extracted using NLP resources. Using inferential statistics, we determined language features, excluding task specific features, that are significant in both languages and therefore represent "generalizable" signs for cognitive language impairment in AD. In a second step, we evaluated all features as well as the generalizable ones for English, French and both languages in a binary discrimination ML scenario (AD vs. healthy) using a variety of classifiers. The generalizable language feature set outperforms the all language feature set in English, French and the multilingual scenarios. Semantic features are the most generalizable while paralinguistic features show no overlap between languages. The multilingual model shows an equal distribution of error in both English and French. By leveraging multilingual statistics combined with a theory-driven approach, we identify AD-related language impairment that generalizes beyond a single corpus or language to model language impairment as a clinically-relevant cognitive symptom. We find a primary impairment in semantics in addition to mild syntactic impairment, possibly confounded by additional impaired cognitive functions.Entities:
Keywords: Alzheimer’s disease; dementia; explainability; language impairment; multilingual machine learning; natural language processing; picture description; spontaneous speech
Year: 2021 PMID: 34093165 PMCID: PMC8170097 DOI: 10.3389/fnagi.2021.642033
Source DB: PubMed Journal: Front Aging Neurosci ISSN: 1663-4365 Impact factor: 5.750
Figure 1A schematic overview of feature kinds that are typically extracted from spontaneous speech picture descriptions. Some of them involve extensive pre-processing steps such as automatic speech recognition (ASR), part of speech tagging or sentence parsing and additional linguistic resources for calibration, others not.
Sample characteristics for English and French samples.
| Language | Diagnosis | N (M/F) | Age | Education | MMSE |
|---|---|---|---|---|---|
| English | HC | 52 (23/29) | 66.13 (6.52) | - | 29.10 (1.00) |
| AD | 54 (24/30) | 66.76 (6.61) | - | 11.06 (5.49) | |
| French | HC | 25 (6/19) | 75.40 (7.00) | 12.80 (2.08) | 28.56 (1.42) |
| AD | 22 (9/13) | 81.59 (4.52) | 10.91 (3.94) | 18.36 (4.29) |
Age in years (SD), Education in years (SD) and score on MMSE cognitive screening with a max score of 30 (SD). Abbreviations: HC, Healthy Controls; AD, Alzheimer’s disease; MMSE, Mini Mental State Examination.
Explanation of semantic features.
| Example: There is a boy. The boy is a brother. He is stealing a cookie. The sister is watching. | ||
|---|---|---|
| Feature name | Explanation | Example |
| The number of unique IU mentioned Higher means they mentioned more IU in the picture | 3, boy and cookie, sister | |
| The number of unique keywords mention Higher means they either used more IU and/or used more lexical variety to describe the IU. | ||
| Counts all mentions of the IU from the mapped keywords. Higher means they said more overall about the image. | ||
| The number of unique IU (num_unique_IU) mentioned divided by the word count | num_unique_IU = 3; Word count = 18 3/18 = 0.1667 | |
| Number of total IU (num_total_IU) divided by the word count. | num_total_IU = 5; Word count = 18 5/18 = 0.2778 | |
| num_total_keywords = 5; Word count = 18 5/(18–5) = 0.3846 | ||
| The number of unique keywords (num_unique_keywords) divided by the word count. | num_unique_keywords = 4; Word count = 18 4/18 = 0.22 | |
| The number of unique IU (num_unique_IU) mentioned divided by the total count of all IU words available in the image. | num_unique_IU = 3, all_IU_words = 16 3/16 = 0.1875 | |
| The number of unique keywords (num_unique_keywords) divided by the number of total IU (num_total_IU) mentioned. | num_unique_keywords = 4; num_total_IU = 5 4/5 = 0.8 | |
| Number of total IU (num_total_IU) divided by the duration in seconds of the participant’s response. | num_total_IU = 5; duration = 15 s 5/15 = 0.33 | |
| The number of unique IU (num_unique_IU) divided by the duration in seconds of the participant’s response. | num_unique_IU = 3; duration = 15 s 3/15 = 0.2 | |
Feature name contains the name of the feature in the text and the name of each feature use in images in parentheses. The explanation column explains how the feature is calculated. At the top of the table there is an example which is used in the example column to explain how each feature is calculated.
Statistics as per feature set and language.
| Semantic features | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| English | French | |||||||||||||
| Feature |
| |||||||||||||
| keyword_to_non_keyword_ratio | −0.43 | 0.16 | 0.11 | 19.3 | *** | *** | −0.42 | 0.15 | 0.11 | 8.1 | ** | 0.10 | ||
| max_word_frequency_IU | −0.27 | 0.0003 | 0.0003 | 7.5 | *** | 0.13 | −0.34 | 0.0003 | 0.0002 | 5.3 | * | 0.47 | ||
| mean_word_frequency_all | −0.37 | 0.0089 | 0.0072 | 14.7 | *** | *** | −0.38 | 0.0075 | 0.0066 | 6.8 | ** | 0.20 | ||
| −0.60 | 10.94 | 6.87 | 37.8 | *** | *** | −0.64 | 10.24 | 5.50 | 18.6 | *** | *** | |||
| −0.60 | 11.83 | 7.26 | 37.3 | *** | *** | −0.60 | 11.40 | 5.73 | 16.6 | *** | ** | |||
| −0.60 | 0.10 | 0.06 | 37.3 | *** | *** | −0.60 | 0.07 | 0.04 | 16.6 | *** | ** | |||
| total_IU_density | −0.42 | 0.14 | 0.10 | 18.7 | *** | *** | −0.36 | 0.14 | 0.11 | 5.9 | ** | 0.34 | ||
| −0.54 | 0.26 | 0.16 | 30.2 | *** | *** | −0.46 | 0.34 | 0.21 | 9.8 | ** | * | |||
| −0.43 | 15.42 | 10.46 | 19.3 | *** | *** | −0.57 | 13.44 | 6.41 | 15.1 | *** | ** | |||
| unique_IU efficiency | −0.56 | 0.18 | 0.10 | 32.4 | *** | *** | −0.40 | 0.27 | 0.17 | 7.2 | ** | 0.16 | ||
| unique_IU ratio | −0.45 | 0.10 | 0.07 | 21.2 | *** | *** | −0.34 | 0.11 | 0.09 | 5.5 | * | 0.43 | ||
| ADP_count | −0.28 | 7.83 | 5.39 | 8.0 | ** | 0.20 | −0.50 | 14.44 | 6.95 | 11.4 | *** | * | ||
| −0.34 | 0.06 | 0.05 | 12.3 | *** | * | −0.51 | 0.13 | 0.09 | 11.9 | *** | * | |||
| AUX_ratio | −0.35 | 0.10 | 0.09 | 12.7 | *** | * | 0.30 | 0.04 | 0.06 | 4.1 | * | 1.00 | ||
| DET_count | −0.26 | 17.35 | 13.52 | 7.3 | ** | 0.31 | −0.45 | 17.72 | 9.95 | 9.4 | ** | 0.09 | ||
| DET_ratio | −0.43 | 0.15 | 0.12 | 19.3 | *** | *** | −0.32 | 0.17 | 0.14 | 4.7 | * | 1.00 | ||
| −0.34 | 21.12 | 15.59 | 11.9 | *** | * | −0.49 | 20.76 | 11.09 | 11.0 | *** | * | |||
| NOUN_ratio | −0.48 | 0.18 | 0.14 | 23.8 | *** | *** | −0.38 | 0.19 | 0.15 | 6.7 | ** | 0.42 | ||
| PRON_ratio | 0.25 | 0.07 | 0.09 | 6.4 | * | 0.51 | 0.51 | 0.11 | 0.18 | 12.1 | *** | * | ||
| PUNCT_count | 0.21 | 15.15 | 20.69 | 4.7 | * | 1.00 | −0.38 | 1.06 | 0.32 | 6.5 | * | 0.47 | ||
| PUNCT_ratio | 0.36 | 0.13 | 0.18 | 13.8 | *** | ** | −0.34 | 0.01 | 0.00 | 5.4 | * | 0.91 | ||
| bandwidth_mean | 0.22 | 2,022.85 | 2,153.46 | 5.0 | * | 1.00 | 0.32 | 2,176.25 | 2,323.75 | 4.8 | * | 1.00 | ||
| energy_skewness | 0.23 | 0.17 | 0.32 | 5.4 | * | 1.00 | 0.47 | -0.26 | 0.49 | 10.4 | ** | 0.26 | ||
| mfcc1_mean | −0.20 | -1.87 | -4.43 | 4.4 | * | 1.00 | −0.36 | -5.14 | -7.87 | 6.0 | * | 1.00 | ||
| mfcc1_skewness | 0.23 | 0.19 | 0.48 | 5.8 | * | 1.00 | 0.44 | -0.31 | 0.12 | 8.8 | ** | 0.63 | ||
| mfcc10_kurtosis | 0.23 | 0.73 | 1.00 | 5.5 | * | 1.00 | 0.37 | 0.40 | 0.67 | 6.4 | * | 1.00 | ||
| mfcc4_kurtosis | 0.22 | 0.92 | 1.47 | 4.9 | * | 1.00 | 0.41 | 0.51 | 1.02 | 7.7 | ** | 1.00 | ||
| normalized_loudness_std | −0.34 | 0.20 | 0.18 | 11.8 | *** | 0.12 | −0.56 | 0.23 | 0.20 | 14.4 | *** | * | ||
| ratio_speaking | −0.27 | 0.46 | 0.37 | 7.6 | ** | 1.00 | −0.57 | 0.64 | 0.48 | 15.2 | *** | * | ||
| speech_rate | −0.28 | 1.92 | 1.41 | 8.5 | ** | 0.73 | −0.47 | 3.12 | 2.32 | 10.2 | *** | 0.28 | ||
Point-biserial correlation coefficient .
Figure 2Points are plotted by correlation values (point-biserial correlation coefficient rPB, correlating the feature with the group AD vs. HC) with French on the Y-axis and English on the X-axis for each feature subgroup. The significance value (as by Kruskal–Wallis non-corrected significance test p < 0.05) is visualized by point color for French and point size for English. Points closer to the dashed line perform equally well in both languages. This figure contains all features that are significant in EITHER French or English, not necessarily both.
Figure 3Points are plotted by correlation values (point-biserial correlation coefficient rPB, correlating the feature with the group AD vs. HC) with French on the Y-axis and English on the X-axis for each feature subgroup. The significance value (as by Kruskal–Wallis non-corrected significance test p < 0.05) is visualized by point color for French and point size for English. Points closer to the dashed line perform equally well in both languages. This figure contains all features that are significant in BOTH French AND English. Feature labels are added to each point.
Figure 4Area Under Curve (AUC) performance results of the machine learning (ML) experiments. English and French for the respective samples separately, multilingual is for the joint classification, multilingual significance testing for feature selection (Generalizable) or using all features (ALL) and using only semantic, syntactic, and paralinguistic features (Language Features). The gray dashed line indicates chance performance of the models. The English (blue), French (orange), and Both (green) models trained with semantic, syntactic, and paralinguistic features are shown with a dashed line. The English, French and Multilingual models trained with the significant, generalizable features in English and French are indicated by the solid lines in the same color, respectively.
Confusion matrices for the final robust classifier without task-specific features using multilingual significance feature selection.
| LR Results | |||
|---|---|---|---|
| Classification Prediction | AD (positive) | 58 (AD/ | 16 (AD/ |
| HC (negative) | 61 (HC/ | 18 (HC/ | |
|
| |||
| Classification Prediction | AD (Positive) | 43 (AD/ | 13 (AD/ |
| HC (Negative) | 39 (HC/ | 11 (HC/ | |
| True | False | ||
| Classification Prediction | AD (Positive) | 15 (AD/ | 3 (AD/ |
| HC (Negative) | 22 (HC/ | 7 (HC/ | |
| True | False | ||
| Classification Prediction | AD (Positive) | 59 (AD/ | 14 (AD/ |
| HC (Negative) | 63 (HC/ | 17 (HC/ | |
| True | False | ||
| Classification Prediction | AD (Positive) | 42 (AD/ | 10 (AD/ |
| HC (Negative) | 42 (HC/ | 12 (HC/ | |
| True | False | ||
| Classification Prediction | AD (Positive) | 17 (AD/ | 4 (AD/ |
| HC (Negative) | 21 (HC/ | 5 (HC/ | |
| MLP Results | |||
| True | False | ||
| Classification Prediction | AD (Positive) | 59 (AD/ | 16 (AD/ |
| HC (Negative) | 61 (HC/ | 17 (HC/ | |
| True | False | ||
| Classification Prediction | AD (Positive) | 41 (AD/ | 13 (AD/ |
| HC (Negative) | 39 (HC/ | 13 (HC/ | |
| True | False | ||
| Classification Prediction | AD (Positive) | 14 (AD/ | 3 (AD/ |
| HC (Negative) | 22 (HC/ | 8 (HC/ | |
The first matrix shows the overall classification result of the model trained on the multilingual data. To ensure this model is not favoring one language, results are further broken down by language in the following matrices. Error is indicated by the false column where a false positive (AD/HC) is the case where a healthy control is classified as having AD and the False negative (HC/AD) is classifying a person with AD as a healthy control. The error rate is reported as all falsely classified participants divided by all participants.