| Literature DB >> 33173922 |
Sunghye Cho, Naomi Nevler, Sharon Ash, Sanjana Shellikeri, David J Irwin, Lauren Massimo, Katya Rascovsky, Christopher Olm, Murray Grossman, Mark Liberman.
Abstract
We implemented an automated analysis of lexical aspects of semi-structured speech produced by healthy elderly controls (n=37) and three patient groups with frontotemporal degeneration (FTD): behavioral variant FTD (n=74), semantic variant primary progressive aphasia (svPPA, n=42), and nonfluent/agrammatic PPA (naPPA, n=22). Based on previous findings, we hypothesized that the three patient groups and controls would differ in the counts of part-of-speech (POS) categories and several lexical measures. With a natural language processing program, we automatically tagged POS categories of all words produced during a picture description task. We further counted the number of wh -words, and we rated nouns for abstractness, ambiguity, frequency, familiarity, and age of acquisition. We also computed the cross-entropy estimation, which is a measure of word predictability, and lexical diversity for each description. We validated a subset of the POS data that were automatically tagged with the Google Universal POS scheme using gold-standard POS data tagged by a linguist, and we found that the POS categories from our automated methods were more than 90% accurate. For svPPA patients, we found fewer unique nouns than in naPPA and more pronouns and wh -words than in the other groups. We also found high abstractness, ambiguity, frequency, and familiarity for nouns and the lowest cross-entropy estimation among all groups. These measures were associated with cortical thinning in the left temporal lobe. In naPPA patients, we found increased speech errors and partial words compared to controls, and these impairments were associated with cortical thinning in the left middle frontal gyrus. bvFTD patients' adjective production was decreased compared to controls and was correlated with their apathy scores. Their adjective production was associated with cortical thinning in the dorsolateral frontal and orbitofrontal gyri. Our results demonstrate distinct language profiles in subgroups of FTD patients and validate our automated method of analyzing FTD patients' speech.Entities:
Year: 2020 PMID: 33173922 PMCID: PMC7654918 DOI: 10.1101/2020.09.10.20192054
Source DB: PubMed Journal: medRxiv
Group means (SD) and omnibus test results of clinical and demographic characteristics.
| control (N=37) | bvFTD (N=74) | naPPA (N=22) | svPPA (N=42) | Group comparisons | |
|---|---|---|---|---|---|
| 24 (64.9%) | 26 (35.1%) | 11 (50%) | 23 (54.8%) | χ=9.9, | |
| Male (N, percent) | 13 (35.1%) | 48 (64.9%) | 11 (50%) | 19 (45.2%) | |
| 15.9 (2.5) | 15.8 (2.8) | 15.3 (3.1) | 15.1 (2.8) | F(3,171)=0.9, | |
| 68.5 (7.9) | 63.1 (8.7) | 70.4 (9.4) | 63.3 (7) | F(3,171)=7.3, | |
| - | 4.4 (3.5) | 3.2 (1.9) | 3.9 (2) | F(2,135)=1.5, | |
| - | [42] | [8] | [26] | F(2,73)=1.1, | |
| - | 2.2 (1.9) | 1.7 (1.7) | 2.8 (2.6) | ||
| [31] | [68] | [20] | [38] | F(3,153)=12.1, | |
| 29.2 (1) | 23.6 (5.5) | 22.7 (6) | 22.1 (6.3) | ||
| [23] | [68] | [16] | [40] | F(3,143)=99.8, | |
| 27.9 (2.5) | 23.8 (5.8) | 24.7 (4.6) | 7.5 (6.4) | ||
| [23] | [65] | [16] | [39] | F(3,139)=30.8, | |
| 16.8 (4.6) | 9.2 (5.2) | 8.2 (4.4) | 5.1 (3.8) | ||
| [18] | [35] | [7] | [19] | F(3,75)=11.4, | |
| 50.8 (1.9) | 42.9 (7.9) | 48.4 (2.9) | 39.6 (6.6) | ||
| [6] | [62] | [14] | [37] | F(3,115)=2.88, | |
| 3.3 (0.5) | 2.1 (1.1) | 2.7 (1.2) | 2.5 (1.2) |
ANOVA analyses were used to compare all measures between groups except sex ratio, where a chi-squared test was used. MRI: Magnetic resonance imaging, BNT: Boston Naming Test, PPT: Pyramids and Palm Trees Test, PBAC: The Philadelphia Brief Assessment of Cognition (0=most apathetic, 4=least apathetic). Numbers in square brackets are Ns when less than the total.
List of POS categories and mapping between the Google POS tag set and the Penn Treebank tag set. MD, VBD, VBP, and VBZ in the Penn Treebank tags were used to calculate the number of tense-inflected verbs.
| Google POS | Penn Treebank | Gloss |
|---|---|---|
| NOUN | NN | noun, singular or mass |
| NNS | noun, plural | |
| VERB | MD | verb, modal auxiliary |
| VB | verb, base form | |
| VBD | verb, past tense | |
| VBG | verb, gerund or present participle | |
| VBN | verb, past participle | |
| VBP | verb, non-3rd person singular present | |
| VBZ | verb, 3rd person singular present | |
| ADJ (adjective) | AFX | affix |
| JJ | adjective | |
| JJR | adjective, comparative | |
| JJS | adjective, superlative | |
| PRP$ | pronoun, possessive | |
| WDT | wh-determiner (e.g., | |
| WP$ | wh-pronoun, possessive (e.g., | |
| ADV (adverb) | EX | existental there |
| RB | adverb | |
| RBR | adverb, comparative | |
| RBS | adverb, superlative | |
| WRB | wh-adverb (e.g., | |
| PRON | PRP | pronoun |
| ADP | IN | preposition |
| X | XX | unknown |
| INTJ | UH | interjection, exclamation |
| DET | DT | determiner |
| CONJ | CC | conjunction |
Figure 2:Median, 1SD, 25th-75th percentile and outliers of abstractness scores, semantic ambiguity ratings, word frequency, word familiarity, and age of acquisition of nouns; and cross-entropy estimation and lexical diversity across all words.
Group means (SD) and omnibus test results from ANCOVA analyses of the lexical measures. AoA: Age of acquisition.
| Control | bvFTD | naPPA | svPPA | F | ||
|---|---|---|---|---|---|---|
| Abstractness (noun) | 1.52 (0.76) | 1.55 (0.83) | 1.4 (0.59) | 1.92 (1.14) | F(3,169)=11.68 | <0.001 |
| Ambiguity (noun) | 1.65 (0.25) | 1.64 (0.26) | 1.64 (0.23) | 1.74 (0.28) | F(3,169)=11.01 | <0.001 |
| Frequency (noun) | 3.39 (0.86) | 3.52 (0.91) | 3.44 (0.91) | 3.94 (0.95) | F(3,169)=12.99 | <0.001 |
| Familiarity (noun) | 2.38 (0.14) | 2.38 (0.16) | 2.39 (0.14) | 2.4 (0.16) | F(3,169)=3.81 | 0.011 |
| AoA (noun) | 4.51 (1.42) | 4.36 (1.33) | 4.21 (1.24) | 4.15 (1.13) | F(3,169)=4.27 | 0.005 |
| Cross-entropy | 9.72 (0.49) | 9.61 (0.66) | 9.9 (0.84) | 9.1 (0.79) | F(3,169)=7.7 | <0.001 |
| Lexical diversity | 0.85 (0.03) | 0.79 (0.09) | 0.79 (0.06) | 0.81 (0.09) | F(3,169)=6.21 | <0.001 |
Demographic and clinical characteristics of the subset of patients with MRI data.
| Controls (n=18) | bvFTD (n=42) | naPPA (n=8) | svPPA (n=26) | Group differences in this subset | Comparison with the full set | |
|---|---|---|---|---|---|---|
| Age (years) | 65.9 (6.8) | 63 (8.5) | 65.5 (8.1) | 61.2 (7.1) | F(3,90)=1.53, | |
| Sex | 9 F, 9 M | 15 F, 27 M | 2 F, 6 M | 17 F, 9 M | χ=7.26, | χ=012, |
| Education (years) | 16.1 (2.9) | 15.9 (2.2) | 17.4 (3) | 15.3 (2.6) | F(3,90)=1.37, | |
| Disease duration (years) | - | 4 (3.4) | 3 (2) | 3.6 (2) | F(2,73)=0.43, | |
| MMSE (0–30) | 28.9 (1.1) | 25.1 (4.3) | 25.1 (3.4) | 23.6 (6.1) | F(3,88)=5.26, | |
| BNT (0–30) | 27.7 (2.7) | 24.5 (4.1) | 24.8 (5.1) | 7.7 (6.3) | F(3,89)=90.81, | |
| PPT (0–52) | 51.3 (1.1) | 45.4 (6.9) | 48.5 (3.7) | 39.1 (7.1) | F(3,48)=8.75, | |
| Animals and tools (max 60 secs) | 16.8 (5) | 10 (4.9) | 9.8 (4.8) | 6 (3.9) | F(3,86)=18.52, |
The p-values for the group differences in this subset were from ANOVA analyses, except the sex ratio, where a chi-squared test was used. Student’s t-tests (all measures but the sex ratio) and a chi-squared test (sex ratio) were used for the comparisons of this subset with the full dataset. MMSE: Mini Mental State Exam; BNT: Boston Naming Test; PPT: Pyramids and Palm Trees Test; F: females; M: males.
POS counts per 100 words and lexical measures of the subset of patients with MRI data.
| Controls | bvFTD | naPPA | svPPA | |
|---|---|---|---|---|
| Nouns | 19.42 (4.67) | 21.67 (6.94) | 23.65 (7.33) | 17.43 (5.12) |
| Unique nouns | 14.4 (3.37) | 16.26 (6.27) | 19.44 (3.84) | 12.24 (4.6) |
| Pronouns | 7.64 (2.33) | 6.21 (3.58) | 5.55 (1.86) | 9.4 (3.89) |
| 0.63 (0.35) | 1.3 (3.04) | 1.16 (1.81) | 0.9 (1.32) | |
| Tense-inflected verbs | 12.02 (1.56) | 12.26 (3.8) | 11.28 (3.88) | 13.71 (3.2) |
| Verbs | 22.46 (3.1) | 22.67 (4.56) | 20.81 (4.67) | 24.11 (4.68) |
| Speech errors/Partial words | 0.81 (1.16) | 1.17 (1.86) | 3.36 (3.93) | 0.73 (1.12) |
| Adverbs | 5.61 (1.79) | 5.46 (3.31) | 3.51 (2.88) | 7.94 (4.69) |
| Adjectives | 6.01 (1.68) | 3.89 (2.38) | 3.35 (2.28) | 3.3 (2.6) |
| Prepositions | 10.81 (1.52) | 8.34 (4.07) | 5.48 (2.73) | 7.78 (3.96) |
| Total words | 194.22 (75.56) | 112.23 (67.5) | 85.75 (50) | 121.88 (66.49) |
| Determiners | 13.6 (2.16) | 15.6 (3.93) | 16.5 (4.24) | 12.76 (5.3) |
| Conjunctions | 4.38 (1.82) | 5.01 (2.78) | 4.36 (3.24) | 5.02 (3.31) |
| Interjections | 5.02 (2.43) | 5.7 (3.83) | 8.9 (4.78) | 6.45 (5.43) |
| Ratio of content to function words | 1.31 (0.26) | 1.36 (0.35) | 1.28 (0.24) | 1.32 (0.32) |
| Abstractness (noun) | 1.54 (0.24) | 1.48 (0.26) | 1.35 (0.21) | 1.86 (0.51) |
| Ambiguity (noun) | 1.69 (0.05) | 1.66 (0.06) | 1.63 (0.09) | 1.77 (0.13) |
| Frequency (noun) | 3.58 (0.17) | 3.61 (0.28) | 3.49 (0.4) | 4.01 (0.44) |
| Familiarity (noun) | 2.36 (0.03) | 2.35 (0.05) | 2.36 (0.03) | 2.41 (0.07) |
| AoA (noun) | 4.4 (0.38) | 4.21 (0.42) | 4.14 (0.5) | 4.1 (0.46) |
| Cross entropy | 9.75 (0.52) | 9.66 (0.74) | 10.21 (1) | 9.18 (0.58) |
| Lexical diversity | 0.85 (0.04) | 0.79 (0.09) | 0.8 (0.06) | 0.8 (0.1) |
Group means (SD) and omnibus test results from ANCOVA analyses of the POS categories per 100 words, total number of words, and the ratio of content words of all participants.
| Control | bvFTD | naPPA | svPPA | F | |||
|---|---|---|---|---|---|---|---|
| Significant group differences | Unique nouns | 14.7 (3.19) | 14.87 (5.93) | 16.73 (5.96) | 12.21 (5.19) | F(3,169)=3.46 | 0.018 |
| Nouns | 20.32 (4.4) | 20.16 (6.48) | 21.92 (8.7) | 17.49 (5.3) | F(3,169)=2.52 | 0.058 | |
| Pronouns | 7.33 (2.41) | 7.13 (3.77) | 6.46 (3.2) | 9.74 (3.9) | F(3,169)=7.66 | <0.001 | |
| 0.34 (0.53) | 0.6 (112) | 0.34 (0.99) | 1.61 (1.72) | F(3,169)=9.26 | <0.001 | ||
| Tense-inflected verbs | 12.47 (183) | 12.94 (3.68) | 11.26 (3.2) | 14.14 (2.98) | F(3,169)=3.92 | 0.01 | |
| Verbs | 22.56 (3.42) | 23.59 (4.86) | 20.22 (4.42) | 24.44 (4.06) | F(3,169)=3.86 | 0.011 | |
| Speech errors/partial words | 0.48 (0.89) | 1.42 (2.26) | 3.67 (3.4) | 0.89 (154) | F(3,169)=4.18 | 0.007 | |
| Adverbs | 5.59 (2.07) | 6.04 (4.36) | 4.37 (3.61) | 7.05 (3.36) | F(3,169)=2.82 | 0.041 | |
| Total words | 174.38 (66.38) | 109.99 (62.35) | 91 (55.8) | 127.57 (66.5) | F(3,169)=11.37 | <0.001 | |
| Adjectives | 5.54 (1.82) | 3.98 (3.16) | 3.17 (2.03) | 3.69 (2.04) | F(3,169)=5.87 | <0.001 | |
| Prepositions | 9.96 (1.94) | 7.63 (4.06) | 5.98 (3.19) | 7.24 (3.72) | F(3,169)=7.66 | <0.001 | |
| No group differences | Determiners | 14.16 (2.48) | 14.85 (4.33) | 14.34 (5.4) | 13.35 (4.98) | F(3,169)=0.97 | 0.41 |
| Conjunctions | 4.43 (191) | 5.12 (2.69) | 5.9 (4.68) | 4.85 (2.88) | F(3,169)=1.41 | 0.24 | |
| Fillers | 5.5 (2.56) | 5.89 (3.9) | 10.03 (10.3) | 6.27 (4.83) | F(3,169)=1.46 | 0.23 | |
| Ratio of content to function words | 1.32 (0.22) | 1.36 (0.33) | 1.3 (0.6) | 1.32 (0.36) | F(3,169)=0.7 | 0.55 |
Figure 1:Median, 1SD, 25th-75th percentile and outliers in POS categories per 100 words, total number of words and the ratio of content words by phenotype.
Figure 3:Cortical thinning in svPPA (A), naPPA (B) and bvFTD (C) patients, and areas with cortical thinning that were significantly related to linguistic measures (p<0.05, uncorrected) in svPPA (A1–3), naPPA (B1), and bvFTD (C2) patients. Please note that these images are for illustration, and the complete results are summarized in Table 4.
Results of regression analyses with cortical thinning in patients.
| svPPA | Estimate | Std. Error | t-value | |
|---|---|---|---|---|
| L inferior temporal | 0.059 | 0.021 | 2.85 | 0.01 |
| L middle temporal | 0.054 | 0.026 | 2.09 | 0.049 |
| L superior temporal | 0.045 | 0.021 | 2.18 | 0.041 |
| L insula | 0.035 | 0.016 | 2.19 | 0.04 |
| L inferior temporal | −0.098 | 0.038 | −2.54 | 0.019 |
| L parahippocampal | −0.059 | 0.028 | −2.16 | 0.043 |
| L entorhinal | −0.104 | 0.05 | −2.08 | 0.049 |
| L inferior temporal | −0.219 | 0.08 | −2.6 | 0.021 |
| L middle temporal | −0.244 | 0.11 | −2.14 | 0.044 |
| L superior temporal | −0.19 | 0.087 | −2.2 | 0.039 |
| L fusiform | −0.303 | 0.108 | −2.818 | 0.01 |
| L insula | −0.142 | 0.065 | −2.207 | 0.04 |
| L temporal pole | −0.582 | 0.228 | −2.55 | 0.019 |
| L inferior temporal | −0.531 | 0.218 | −2.42 | 0.025 |
| L middle temporal | −0.652 | 0.225 | −2.89 | 0.011 |
| L superior temporal | −0.51 | 0.189 | −2.69 | 0.019 |
| L fusiform | −0.597 | 0.243 | −2.49 | 0.027 |
| R superior temporal | −0.309 | 0.14 | −2.21 | 0.038 |
| L inferior temporal | −2.609 | 0.833 | −3.11 | 0.007 |
| L middle temporal | −2.617 | 0.896 | −2.96 | 0.011 |
| L bank superior temporal | −1.795 | 0.572 | −3.13 | 0.006 |
| L superior temporal | −1.946 | 0.693 | −2.8 | 0.013 |
| L supramarginal | −1.722 | 0.601 | −2.86 | 0.018 |
| L insula | −0.5 | 0.205 | −2.39 | 0.026 |
| L lateral orbitofrontal | −1.182 | 0.564 | −2.1 | 0.048 |
| L inferior temporal | −0.627 | 0.258 | −2.46 | 0.024 |
| L middle temporal | −0.685 | 0.264 | −2.58 | 0.019 |
| L bank superior temporal | −0.379 | 0.176 | −2.16 | 0.043 |
| L superior temporal | −0.49 | 0.208 | −2.34 | 0.031 |
| L fusiform | −0.593 | 0.267 | −2.22 | 0.037 |
| L inferior temporal | −0.755 | 0.29 | −2.61 | 0.016 |
| L middle temporal | −0.83 | 0.247 | −3.41 | 0.009 |
| L superior temporal | −0.53 | 0.182 | −2.98 | 0.018 |
| L rostral middle frontal | −0.821 | 0.216 | −3.8 | 0.001 |
| R rostral middle frontal | −0.608 | 0.222 | −2.72 | 0.014 |
| L precentral | −0.599 | 0.163 | −3.67 | 0.001 |
| L supramarginal | −0.517 | 0.19 | −2.72 | 0.013 |
| L lateral orbitofrontal | −0.365 | 0.163 | −2.24 | 0.001 |
| R superior frontal | −0.592 | 0.218 | −2.72 | 0.013 |
| R pars opercularis | −0.549 | 0.192 | −2.86 | 0.009 |
| L inferior temporal | 0.451 | 0.187 | 2.4 | 0.027 |
| L middle temporal | 0.419 | 0.199 | 2.1 | 0.048 |
| L bank superior temporal | 0.348 | 0.143 | 2.45 | 0.026 |
| L superior temporal | 0.392 | 0.156 | 2.51 | 0.02 |
| L fusiform | 0.713 | 0.224 | 3.18 | 0.004 |
| naPPA | Estimate | Std. Error | t-value | |
| L rostral middle frontal | −0.194 | 0.044 | −4.39 | 0.022 |
| bvFTD | Estimate | Std. Error | t-value | |
| L orbitofrontal | 0.07 | 0.028 | 2.56 | 0.015 |
| L rostral middle frontal | 0.05 | 0.024 | 2.28 | 0.031 |
| L superior frontal | 0.07 | 0.03 | 2.29 | 0.03 |
| L caudal middle frontal | 0.07 | 0.025 | 2.67 | 0.035 |
| L post central | 0.06 | 0.023 | 2.49 | 0.018 |
| R pre central | 0.07 | 0.033 | 2.22 | 0.032 |
| R post central | 0.07 | 0.025 | 2.74 | 0.009 |
L: left, R: right.