Literature DB >> 31398554

Neuroanatomical structures supporting lexical diversity, sophistication, and phonological word features during discourse.

Janina Wilmskoetter¹, Julius Fridriksson², Ezequiel Gleichgerrcht³, Brielle C Stark⁴, John Delgaizo³, Gregory Hickok⁵, Kenneth I Vaden⁶, Argye E Hillis⁷, Chris Rorden⁸, Leonardo Bonilha³.

Abstract

Deficits in lexical retrieval are commonly observed in individuals with post-stroke aphasia. Successful lexical retrieval is related to lexical diversity, lexical sophistication, and phonological word properties; however, the crucial brain regions supporting these different features are not fully understood. We performed MRI-based lesion symptom mapping in 58 individuals with a chronic left hemisphere stroke to assess how regional damage relates to spoken discourse-extracted measures of lexical diversity, lexical sophistication, and phonological word properties. For discourse transcription and word feature analysis, we used the Computerized Language Analysis (CLAN) program, Stanford Core Natural Language Processing, Irvine Phonotactic Online Dictionary, Lexical Complexity Analyzer, and Gramulator. Lesions involving the left posterior insula and supramarginal gyri and inferior fronto-occipital fasciculus were significant predictors of utterances with, on average, lower lexical diversity. Low lexical sophistication was associated with damage to the left pole of the superior temporal gyrus. Production of words with lower phonological complexity (fewer phonemes, higher phonological similarity) was associated with damage to the left supramarginal gyrus. Our findings indicate that discourse-extracted features of lexical retrieval depend on the integrity of specific brain regions involving insular and peri-Sylvian areas. The identified regions provide insight into potentially underlying mechanisms of lexically diverse, sophisticated and phonologically complex words produced during discourse.

Entities: Chemical Disease Gene Species

Keywords: Aphasia; Brain lesions; Magnetic resonance imaging; Speech production; Stroke

Mesh：

Year: 2019 PMID： 31398554 PMCID： PMC6699249 DOI： 10.1016/j.nicl.2019.101961

Source DB: PubMed Journal: Neuroimage Clin ISSN： 2213-1582 Impact factor: 4.881

Introduction

Deficits in word production are among the most common symptoms of aphasia. Word production is a complex process that requires a series of processing steps ranging from the linking of the concept to the word form, phonological mapping, motor planning and articulation (Dell et al., 1997; Levelt, 1992). Individuals with aphasia can exhibit deficits in any stage of word production (Dell et al., 1997) and consequently can show an overall decrease in production of words (Caramazza and Hillis, 1991; McCarthy and Warrington, 1985) and an increase in spoken word errors (Dell et al., 1997; Howard and Gatehouse, 2006). Specific aspects of word features possibly impact word production, because the likelihood of selecting and correctly producing a given word depends on its features (Gordon, 2002; Graves et al., 2007; Kittredge et al., 2008; Okada et al., 2003; Walker et al., 2018). As such, individuals with aphasia might be limited to only forms that are most easily accessed, e.g., words with low lexical diversity, low sophistication, shorter length, high phonological similarity (Gordon, 2002; Graves et al., 2007; Kittredge et al., 2008; Okada et al., 2003; Walker et al., 2018). Word-finding deficits in post-stroke aphasia vary widely across individuals and are linked to lesions in many different left hemisphere brain regions (Baldo et al., 2013; DeLeon et al., 2007; Howard and Gatehouse, 2006; Migliaccio et al., 2016; Nickels and Howard, 1994). Whether specific aspects of word features, such as lexical diversity, lexical sophistication, and lexical-phonological measures, also rely on specific brain regions remains unclear. Most evidence on the relationships between word features and brain regions is from functional neuroimaging studies (e.g. using functional magnetic resonance imaging (fMRI), electroencephalography (EEG), magnetencephalography (MEG)). These studies suggest that different (and in part overlapping) brain regions are involved in processing different word features. For example, a fMRI study of healthy adults performing overt picture naming found that word frequency modulates activity in the left posterior inferior temporal gyrus and temporoparietal cortex, while concept familiarity modulates activity in the occipital cortex and the fusiform gyrus, and word length modulates activity in primary auditory regions, the superior temporal gyrus and sulcus, and the cerebellum (Wilson et al., 2009) (see also (Graves et al., 2007; Okada et al., 2003) for similar studies). However, while functional neuroimaging provides information on brain regions that are active during specific tasks, functional neuroimaging cannot discern crucial brain regions. Identifying brain regions that are crucial for performing a task is the goal of lesion symptom mapping. For example, studies using lesion symptom mapping in individuals with post-stroke aphasia identified left hemisphere peri-Sylvian regions to be associated with phonological errors during word production and left hemisphere anterior temporal regions to be associated with semantic errors (Mirman et al., 2015). While prior research has mapped specific word features or stages in speech production to specific brain regions, relatively little is known about the neuroanatomical bases of lexical features in the context of connected speech (discourse), as opposed to confrontation naming (Graves et al., 2007; Wilson et al., 2009). This gap in knowledge motivated this study, where we aimed to define the crucial cortical regions that, when damaged as in stroke, corresponded to reduced discourse-derived lexical properties. Understanding how lesions to specific brain regions are related to specific word features can help refine the neuroanatomical network that is essential for lexical retrieval. We assessed lexical diversity, lexical sophistication, and lexical-phonological features for words comprised by utterances produced during picture description tasks. We selected these features, because they 1) are important measures related to lexical retrieval during discourse, and 2) can be assessed objectively and automatically with freely available software. We focused on objective and automated assessments to maximize reliability of our measures and to make a potential implementation into clinical practice easier. Lexical diversity refers to the proportion of unique words produced in relation to the number of all words that an individual produced throughout the description tasks. The production of lexical (content) words – nouns, verbs, adjectives, adverbs – is essential to conveying information during communication. Lexical diversity has been linked to communicative competence and vocabulary knowledge of speakers (Avent and Austermann, 2003; Crossley et al., 2012), and speakers' communication effectiveness (Lu, 2012). Lexical sophistication refers to the proportion of sophisticated words among all lexical words (Hyltenstam, 1988; Lu, 2012), where words are classified as sophisticated if they are not included in the 2000 most frequent words listed in the British National Corpus (Leech et al., 2001; Lu, 2012). The use of more sophisticated words has been related to greater and advanced lexical knowledge (Rabaglia and Salthouse, 2011). Finally, lexical-phonological word features are quantifiable measures for a word. For example, the word “speech” has four phonemes /s.p.i.tʃ/, four phonological neighbors (speak, speed, spiel, peach) that share all but one phoneme, and a word-average biphoneme probability of 0.0179 that describes the relative frequency that its phoneme pairs (sp, pi, itʃ) occur in English words. Research in healthy adults and participants with post-stroke aphasia has suggested that these measures impact the likelihood that words are produced correctly (e.g., words with higher biphoneme probability are easier to produce) (Freedman and Barlow, 2013; Gordon, 2002; Laganaro et al., 2013; Vitevitch et al., 2004). The goal of this study was to explore left hemisphere stroke lesion correlates of lexical diversity, lexical sophistication and phonological features of words produced during discourse. To our knowledge, this study is the first to assess these specific correlates in discourse. Thus, instead of a hypothesis-driven study, we conducted an exploratory study by including cortical regions belonging to the left hemisphere speech-language language network.

Methods

Participants

This was a retrospective study that analyzed spoken discourse in 58 people with chronic, left hemisphere stroke who had been recruited as part of a larger study at the University of South Carolina and Medical University of South Carolina. Participants underwent speech and language testing, as well as structural brain MRI. Individuals were excluded if they had a previous diagnosis of other neurological or psychiatric diseases, bilateral or brainstem strokes, or were not native English speakers. Further, we excluded participants who produced <50 words across all three picture description tasks, because the “Lexical Complexity Analyzer” (LCA) (Ai and Lu, 2010; Lu, 2012) that we used for the word feature analyses, required a text length of at least 50 words. All participants underwent language assessment with the Western Aphasia Battery - Revised (WAB) (Kertesz, 2007). At time of language and imaging assessments, participants were, on average, 59.8 years old and 4.13 years post-stroke. Further participant information is provided in Table 1. Forty individuals presented with aphasia (i.e. had an Aphasia Quotient from the WAB of <93.7) and 18 tested above the cut-off for aphasia. We included participants with a left hemisphere stroke who did not test as aphasic according to the WAB cut-off value because these participants may still have had language impairments not detectable when using the WAB cut-off (Fromm et al., 2017). Including participants with a wide range of language abilities was essential to identify the crucial brain anatomy – which brain areas lead to impairment if lesioned and which do not – for the production of specific word features.

Table 1

Demographic and medical characteristics of all included stroke participants (N = 58).

Demographic information
Age, mean (SD; range)		59.8 (9.6; 37.1–79.8)
Gender, N (%)	Female	19 (32.8)
Gender, N (%)	Male	39 (67.2)
Race, N (%)	Caucasian	45 (77.6)
	African-American	12 (20.7)
	Unknown	1 (1.7)
Ethnicity, N (%)	Not Hispanic or Latino	57 (99.8)
Ethnicity, N (%)	Hispanic or Latino	1 (0.2)

N = number, SD = standard deviation.

Demographic and medical characteristics of all included stroke participants (N = 58). N = number, SD = standard deviation. Fig. 1 shows the lesion overlay of all participants, demonstrating lesions most commonly affecting peri-sylvian brain areas.

Fig. 1

Lesion overlay of all included participants (N = 58). Left side = left hemisphere (L), right side = right hemisphere (R). Different colors represent different numbers (#) of participants with lesions in that area, where red indicates voxels where the largest number of participants shared a lesion. The study was approved by the institutional review boards of the University of South Carolina and Medical University of South Carolina, where participants were recruited and tested. All subjects or caregivers provided informed consent to participate in this study.

Brain image acquisition

Brain images were acquired within two days of the language assessment. For the study presented here, we used T1- and T2-weighted images. We used T2 images to identify lesions because chronic stroke lesions are better visualized in T2 than T1 images; and then used the T1 image to co-register the T2 image. All participants underwent MRI scanning including T1 and T2 sequences using a 3 T Siemens Trio equipped with a 12-channel head coil. For the T1-weighted imaging sequence, we used an MP-RAGE (TFE) sequence with a voxel size = 1 mm3, FOV = 256 × 256 mm, 192 sagittal slices, 9-degree flip angle, TR = 2250 ms, TI = 925 ms, and TE = 4.15 ms, GRAPPA = 2, 80 reference lines. For the T2-weighted imaging sequence, we used a 3D SPACE (Sampling Perfection with Application optimized Contrasts by using different flip angle Evolutions) protocol with a voxel size = 1 mm3, FOV = 256 × 256 mm, 160 sagittal slices, variable flip angle, TR = 3200 ms, TE = 352 ms, no slice acceleration, and the same slice center and angulation was used as with the T1 sequence.

Brain image processing

MR DICOM image files were converted to NifTI format using the software dcm2niix (Li et al., 2016). Using the software MRIcron (www.mricron.com), two experts manually drew all stroke lesions on T2-weighted images until consensus about lesion demarcation was achieved. To allow comparisons across participants, we normalized stroke lesions into standard space using SPM12 (version 7487) (Functional Imaging Laboratory, Wellcome Trust Centre for Neuroimaging Institute of Neurology, University College London; http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) and in-house developed open source Matlab scripts (Rorden et al., 2012). First, we removed uneven edges with a 3 mm full-width half maximum Gaussian kernel to smooth the lesion maps. Second, we binarized the smoothed lesion maps with a 0 threshold, i.e., all voxels >0 corresponded to the lesion. Then, we applied an enantiomorphic approach (Nachev et al., 2008) to normalize the T1 image onto standard space (chimeric T1-weighted image with a voxel size = 1 mm3 and with the area corresponding to the stroke lesion being replaced by the mirrored equivalent region in the intact hemisphere) using SPM12's unified segmentation-normalization. Enantiomorphic normalization leverages the individual's own anatomy from the non-affected hemisphere to provide a normalization estimate for the affected and non-affected structures in the lesioned hemisphere. This advanced normalization technique has been validated in individuals with a stroke (Rorden et al., 2012). We re-inspected the smoothed images for anatomical accuracy and quality after the normalization.

Word production analyses

We analyzed spoken discourse samples from three picture description tasks. Participants saw three different picture scenes, one after the other, and were instructed to describe what was happening in the pictures for two minutes each. These scenes were the picnic scene from the WAB (Kertesz, 2007), the cookie theft picture from the Boston Diagnostic Aphasia Examination (BDAE) (Goodglass and Kaplan, 1983), and the circus scene from the Apraxia Battery for Adults (ABA-2) (Dabul, 2000). All scene descriptions were audio- and video-recorded and transcribed by a trained linguist in CHAT format using the Computerized Language Analysis (CLAN) program (MacWhinney, 2000). The linguist was blinded to the participants' demographics, medical information and WAB scores. Using in-house python scripts, the discourse transcripts were then stripped of tagged errors (i.e., semantic and phonological errors, neologisms, retracings, repetitions, repetitive initiations, false starts, and fillers) to estimate the efficiency of lexical retrieval with our measures (Fergadiotis and Wright, 2011). We spelled out all short word forms to avoid misclassifications (e.g. “he's fishing” was corrected to “he is fishing”). We employed the natural language processing (NLP) software “Stanford CoreNLP” (Manning et al., 2014; Toutanova et al., 2003) for part-of-speech tagging of all cleaned transcripts. Stanford CoreNLP has a part-of-speech tagger embedded that uses the Penn Treebank tag set. We went through all transcripts and checked the part-of-speech tagging for accuracy. Clearly misclassified words (e.g. “bowls” in “bowls on the table” was incorrectly classified as a verb instead of a plural noun) were removed. <1% of all produced words across all participants were identified as misclassifications of part-of-speech. The operators of the language analysis were blinded to results of the MRI analyses and vice versa.

Word feature analysis

We quantified seven variables: three variables for lexical diversity, one variable for lexical sophistication, and three variables for phonological properties. A single value for each of the seven variables was derived for every participant by calculating the average across all transcribed words of all three picture descriptions.

Lexical diversity

We calculated three different indices for lexical diversity using the software “Gramulator” published by McCarthy and colleagues (McCarthy et al., 2012). These indices were the measure of textual lexical diversity (MTLD), HD-D (a hypergeometric distribution of the lexical diversity measure “D”), and Maas (a log correction of the type token ratio developed by Heinz-Dieter Maas). MTLD is a well-recognized measure of lexical diversity (Fergadiotis et al., 2015; Fergadiotis et al., 2013; McCarthy and Jarvis, 2010), and complementing the MTLD with HD-D and Maas is recommended to provide a comprehensive assessment of lexical diversity (McCarthy and Jarvis, 2010). MTLD takes all produced words into account by calculating the mean length of word sequences that maintain a predefined type-token-ratio value. MTLD seems to be most suitable for texts containing at least 100 words (McCarthy and Jarvis, 2010) and has been validated in various studies and recommended as an unbiased measure for lexical diversity (Fergadiotis et al., 2015; Fergadiotis et al., 2013; McCarthy and Jarvis, 2010). HD-D uses a hypergeometric distribution to calculate the probability of how many word tokens of the same word type will occur in a random sample of 42 words of the whole text. The sum of all probabilities for all word types represents then the lexical diversity of the text (McCarthy and Jarvis, 2010). Lastly, Maas measures lexical diversity as the log of the type-token-ratio, which minimizes text length effects on the type-token-ratio (Fergadiotis et al., 2015; Maas, 1972). In contrast to MTLD and HD-D where higher values reflect higher lexical diversity, higher Maas values reflect lower lexical diversity. It has been recommended to use all three measures to provide a fuller description of lexical diversity (McCarthy and Jarvis, 2010). To define a single representative measure for subsequent statistical analyses, we performed a principal component analysis (PCA) using varimax rotation with Kaiser Normalization for the three measures (MTLD, HD-D and Maas). We considered factor loadings of greater than | ± 0.7| as significant (Comrey and Lee, 1992). The PCA confirmed a one-factor solution, which accounted for 78.31% of variance in the participants' performance. The lexical diversity variables had similar loadings on the extracted component (MTLD = 0.925; HD-D = 0.950; Maas = −0.769). We used this component to represent lexical diversity.

Lexical sophistication

We calculated lexical sophistication using the online program “lexical complexity analyzer (LCA)” published by Lu and colleagues (Ai and Lu, 2010; Lu, 2012). We imported the part-of-speech tagged transcripts into the LCA, and the LCA calculated the lexical sophistication as the ratio of sophisticated lexical words to the number of lexical words (Hyltenstam, 1988; Lu, 2012). Following Lu, words were counted as sophisticated by the LCA if they were not included in the 2000 most frequent words listed in the British National Corpus (Leech et al., 2001; Lu, 2012).

Phonological word features

We used the Irvine Phonotactic Online Dictionary (IPhOD version 2.0) (Vaden et al., 2009) to perform unstressed (syllable-stress ignored) calculations for the number of phonemes for word length, phonological neighborhood density, and word-average biphoneme probability. Phonological neighborhood density refers to the number of words in the language that share all phonemes with the target word except for one phoneme. Word-average biphoneme probability, also called phonotactic frequency, refers to the relative frequency that phoneme pairs co-occur in the language. Same as for lexical diversity, we performed a PCA using varimax rotation with Kaiser Normalization for the three phonological measures (word length, phonological neighborhood density, word-average biphoneme probability). The PCA confirmed a one-factor solution, which accounted for 74.95% of variance in the participants' performance. Every phonological word feature variable had loadings larger than ±0.7 on the extracted component (word length = 0.928, phonological neighborhood density = −0.860, word-average biphoneme probability = .804). We used this component to represent phonological word features.

Statistical analysis

Relationship between lexical variables

Based on visual inspection of the data and the Shapiro-Wilk test for normality, we found that the majority of the seven lexical variables were not normally distributed (Shapiro-Wild test results: MTLD W(58) = 0.963, p=0.072 HD-D W(58) = 0.931, p = 0.003; Maas W(58) = 0.949, p = 0.017; PCA lexical diversity W(58) = 0.967, p =0 .120; Lexical sophistication W(58) = 0.970, p = 0.159; word length W(58) = 0.956, p =0 .035; phonological neighborhood density W(58) = 0.924, p = 0.001; word-average biphoneme probability W(58) = 0.917, p =0 .001; PCA phonological word features W(58) = 0.957, p =0 .038). Thus, we employed Spearman correlation to assess the relationship between variables.

Lesion symptom mapping

We performed region-based lesion symptom mapping to assess the relationship between post-stroke structural brain damage and lexical variables. We used the Johns Hopkins University (JHU) neuroanatomical atlas that segments the brain into 189 grey and white matter areas and ventricles (Faria et al., 2012) and selected 19 grey matter regions of interest (ROIs), constituting the left hemisphere speech-language network (Fig. 2). The 19 grey matter ROIs were selected based on the results from a recent region-based lesion symptom mapping study revealing these ROIs in univariate and/or multivariable modelling as predictive for language tasks in individuals with post-stroke aphasia (Fridriksson et al., 2018). Further, we selected 8 white matter pathways commonly associated with language production (Kümmerer et al., 2013; Mirman et al., 2015; Saur et al., 2008) from the HCP-842 atlas; a population-averaged atlas based on diffusion MRIs from 842 healthy individuals from the human connectome project (Yeh et al., 2018).

Fig. 2

All 27 included grey and white matter regions of interest (ROIs) in the left hemisphere. AngG = angular gyrus, GloPal = globus pallidus, ITG = inferior temporal gyrus, Ins = insula, MTG = middle temporal gyrus, oper = pars opercularis, orb = pars orbitalis, pIns = posterior insula, pMTG = posterior middle temporal gyrus, poleMTG = pole of the middle temporal gyrus, poleSTG = pole of the superior temporal gyrus, postcG = postcentral gyrus, precG = precentral gyrus, pSTG = posterior superior temporal gyrus, Put = putamen, SMG = supramarginal gyrus, STG = superior temporal gyrus, tri = pars triangularis. The colors are arbitrary and used for identification of the regions. We employed the dual stream model as our underlying theoretical model assuming that language processing mainly depends on a dorsal-phonologic-articulatory and a ventral-semantic language processing stream (Hickok and Poeppel, 2007). We grouped the grey matter ROIs into regions belonging to the dorsal or ventral stream based on previous evidence (Fridriksson et al., 2016) (Table 4).

Table 4

ROI#	Region in left hemisphere	Lexical diversity	Lexical sophistication	Phonological word features
ROI#	Region in left hemisphere	β/q-value	β/q-value	β/q-value
Dorsal Stream Grey Matter (ROIs from JHU atlas)
11	Pars opercularis	−0.165/ 0.572	−0.173/ 0.640	0.100/ 0.816
15	Pars triangularis	−0.160/ 0.554	0.036/ 0.937	0.197/ 0.572
23	Postcentral gyrus	0.249/ 0.354	0.519/ 0.081	0.070/ 0.901
25	Precentral gyrus	0.095/ 0.748	0.040/ 0.937	0.121/ 0.750
29	Supramarginal gyrus	−0.186/ 0.582	0.118/ 0.817	−0.563/ 0.146
71	Anterior insula	−0.228/ 0.368	−0.276/ 0.424	0.000/ 0.999
79	Putamen	−0.018/ 0.937	0.257/ 0.221	0.078/ 0.599
81	Globus pallidus	0.164/ 0.746	0.305/ 0.184	0.092/ 0.578
182	Posterior insula	−0.525⁎/ 0.041	−0.473/ 0.184	−0.423/ 0.238

Ventral Stream Grey Matter (ROIs from JHU atlas)
13	Pars orbitalis	−0.095/ 0.743	0.074/ 0.638	0.284/ 0.208
31	Angular gyrus	0.076/ 0.599	−0.089/ 0.834	−0.215/ 0.572
35	Superior temporal gyrus	−0.389/ 0.198	−0.540/ 0.184	−0.198/ 0.709
37	Pole of superior temporal gyrus	−0.227/ 0.373	−0.500⁎/ 0.040	−0.199/ 0.586
39	Middle temporal gyrus	−0.094/ 0.746	−0.113/ 0.748	0.034/ 0.937
41	Pole of middle temporal gyrus	−0.099/ 0.709	−0.247/ 0.368	−0.191/ 0.554
43	Inferior temporal gyrus	0.016/ 0.937	−0.205/ 0.515	−0.018/ 0.937
51	Middle occipital gyrus	−0.025/ 0.852	−0.145/ 0.937	−0.025/ 0.937
184	Posterior superior temporal gyrus	−0.351/ 0.205	−0.603/ 0.081	−0.312/ 0.453
186	Posterior middle temporal gyrus	−0.086/ 0.750	−0.390/ 0.184	−0.017/ 0.937

White Matter (ROIs from HCP-842 atlas)
NA	Arcuate fasciculus	−0.386/ 0.381	−0.719/ 0.162	−0.469/ 0.439
NA	Corticothalamic pathway	0.169/ 0.710	0.513/ 0.253	0.186/ 0.748
NA	Extreme capsule	0.133/ 0.694	0.433/ 0.184	−0.045/ 0.937
NA	Frontal aslant tract	−0.069/ 0.833	0.127/ 0.746	0.045/ 0.937
NA	Inferior fronto-occipital fasciculus	−0.193/ 0.572	0.060/ 0.937	0.379/ 0.354
NA	Inferior longitudinal fasciculus	0.023/ 0.937	−0.256/ 0.564	0.150/ 0.746
NA	Uncinate fasciculus	−0.070/ 0.833	−0.029/ 0.937	0.306/ 0.368
NA	Superior longitudinal fasciculus	0.016/ 0.937	0.282/ 0.519	−0.061./ 0.936

β = standardized coefficients beta; HCP = human connectome project; JHU = Johns Hopkins University; NA = not applicable; q-value=Benjamini-Hochberg adjusted p-value); ROI = region of interest.

parameter estimate is significant using a false discovery rate (Benjamini-Hochberg adjusted p-value; q-value) of 0.05 (two-tailed).

For each of the three word features, we used two different approaches to investigate its relationship with ROI damage, 1) independently of damage to other ROIs, and 2) dependently of damage to other ROIs. In the first approach, we performed linear regressions to assess the effect of damage to one specific ROI (percent lesion in ROI; primary independent variable) on the performance in one specific word feature (dependent variables). Since we assessed 27 different ROIs and 3 different word features, we performed in total 81 linear regressions. In each of these models, we added the variables “lesion volume” and “number of words produced” as secondary independent variables to serve as control variables. Independent variables were kept in the model (no elimination procedure). In the second approach, we developed for each word feature multivariable regression models with least absolute shrinkage and selection operator (LASSO). Lasso is an established statistical modelling approach that is advantageous over conventional modelling approaches in terms of a better handling of multicollinearity between predictor variables, better selection of variables, and less biased estimates of parameters, standard errors and p-values (Lee et al., 2014; Tibshirani, 1996; Xu et al., 2012a; Xu et al., 2012b). For the initial variable selection step, all 27 grey and white matter ROIs and two control variables (lesion volume and number of words produced) were candidate factors. LASSO regularization in 0.02 increments (from min = 0 to max = 1) with 10-fold cross-validation was applied to identify a set of prognostic variables for the best parsimonious model (convergence criterion: 0.00001 with max. 100 iterations). For all regression models, we applied two-tailed statistical tests. We did not include interaction terms in any of the regression models and assessed multicollinearity by calculating the variance of inflation factor (VIF). VIF of higher than 6 were considered as evidence for multicollinearity (Keith, 2006). To correct for multiple comparisons, we calculated a false discovery rate of 0.05 (Benjamini-Hochberg adjusted p-value). IBM SPSS Statistics for Windows (version 24, released 2016, IBM Corp., Armonk, N.Y., USA) was used to conduct all statistical analyses.

Results

Word features

On average, participants produced 404 words (SD = 242, range = 70 to 1012) across all three picture transcriptions. Table 2 shows the correlation of number of produced words and all word features: MTLD, HD-D and number of phonemes (word length) correlated significantly with the number of produced words. For example, participants who produced a larger number of words also used words with higher lexical diversity (MTLD and HD-D) and more phonemes. As to be expected, the three measures of lexical diversity (MTLD, HD-D, Maas) correlated significantly with each other, as well as the three measures of phonological word properties (number of phonemes, phonological neighborhood density, word-average biphoneme probability) in the speech produced during the picture description tasks. Lexical sophistication correlated with Maas and the three measures of phonological word properties. All three word features correlated significantly with aphasia severity (Aphasia Quotient from the WAB) and with each sub score of the WAB (language, spontaneous speech, auditory verbal comprehension, repetition, naming and word finding, reading, writing), except for lexical sophistication, which did not correlate with WAB writing (Table 3).

Table 2

Correlations (Spearman's rho) between number of produced words and word features (N = 58). The table shows the correlation coefficient and p-value for each pair of variables.

	Lexical diversity				Sophistication	Phonological word features
	MTLD	HD-D	Maas	PCA component (lexical diversity)	Lexical sophistication	Number of phonemes	Phonological neighborhood density	Word-average biphoneme probability	PCA component (phonological word features)
Number of produced words	0.718⁎⁎/ <0.0001	0.772⁎⁎/ <0.0001	−0.008/ 0.950	0.576⁎⁎/ <0.0001	−0.029/ 0.828	0.300⁎/ 0.022	0.026/ 0.026	0.105/ 0.433	0.166/ 0.213
MTLD		0.945⁎⁎/ <0.0001	−0.525⁎⁎/ <0.0001	0.936⁎⁎/ <0.0001	0.073/ 0.587	0.536⁎⁎/ <0.0001	−0.182/ 0.171	−0.012/. 931	0.317⁎/ 0.015
HD-D			−0.522⁎⁎/ <0.0001	0.928⁎⁎/ <0.0001	0.102/ 0.448	0.578⁎⁎/ <0.0001	−0.223/ 0.092	0.006/ 0.967	0.353⁎⁎/ 0.007
Maas				−0.748⁎⁎/ <0.0001	−0.419⁎⁎/ 0.001	−0.595⁎⁎/ <0.0001	0.586⁎⁎/ <0.0001	−0.161/ 0.226	−0.543⁎⁎/ <0.0001
PCA component (lexical diversity)					0.213/ 0.108	0.619⁎⁎/ <0.0001	−0.352⁎⁎/ 0.007	0.043/ 0.747	0.437⁎⁎/ 0.001
Lexical sophistication						0.433⁎⁎/ 0.001	−0.436⁎⁎/. 001	0.380⁎⁎/ 0.003	0.502⁎⁎/ <0.0001
Number of phonemes							−0.767⁎⁎/ <0.0001	0.469⁎⁎/ <0.0001	0.898⁎⁎/ <0.0001
Phonological neighborhood density								−0.464⁎⁎/ <0.0001	−0.868⁎⁎/ <0.001
Word-average biphoneme probability									0.732⁎⁎/ <0.0001

HD-D = hypergeometric distribution of the lexical diversity measure “D”; MTLD = measure of textual lexical diversity; PCA component = the 1st principal component; SD = standard deviation.

Correlation is significant at the 0.01 level (two-tailed).

Correlation is significant at the 0.05 level (two-tailed).

Table 3

Correlations (Spearman's rho) between Western Aphasia Battery Quotients, subtest and word features for all study participants (N = 58). The table shows the correlation coefficient and p-value for each pair of variables.

Western Aphasia Battery - Revised	Lexical diversity	Lexical sophistication	Phonological word features
Aphasia Quotient	0.683⁎⁎/ <0.001	0.536⁎⁎/ <0.001	0.558⁎⁎/ <0.001
Language Quotient	0.756⁎⁎/ <0.001	0.398⁎⁎/ 0.003	0.499⁎⁎/ <0.001
Spontaneous Speech Score	0.651⁎⁎/ <0.001	0.467⁎⁎/ <0.001	0.527⁎⁎/ <0.001
Auditory Verbal Comprehension Score	0.636⁎⁎/ <0.001	0.407⁎⁎/ 0.002	0.446⁎⁎/ <0.001
Repetition Score	0.614⁎⁎/ <0.001	0.549⁎⁎/ <0.001	0.472⁎⁎/ <0.001
Naming and Word Finding Score	0.668⁎⁎/ <0.001	0.571⁎⁎/ <0.001	0.633⁎⁎/ <0.001
Reading Score	0.622⁎⁎/ <0.001	0.315⁎/ 0.016	0.438⁎⁎/ 0.001
Writing Score	0.670⁎⁎/ <0.001	0.208/ 0.121	400⁎⁎/ 0.002

Correlation is significant at the 0.01 level (two-tailed).

Correlation is significant at the 0.05 level (two-tailed).

Correlations (Spearman's rho) between number of produced words and word features (N = 58). The table shows the correlation coefficient and p-value for each pair of variables. HD-D = hypergeometric distribution of the lexical diversity measure “D”; MTLD = measure of textual lexical diversity; PCA component = the 1st principal component; SD = standard deviation. Correlation is significant at the 0.01 level (two-tailed). Correlation is significant at the 0.05 level (two-tailed). Correlations (Spearman's rho) between Western Aphasia Battery Quotients, subtest and word features for all study participants (N = 58). The table shows the correlation coefficient and p-value for each pair of variables. Correlation is significant at the 0.01 level (two-tailed). Correlation is significant at the 0.05 level (two-tailed).

Lesion symptom mapping

The proportion of damage to the left posterior insula was significantly predictive of how lexically diverse a participant's produced speech was. This association was confirmed by both regression modelling approaches with one ROI and all ROIs as potential predictors (Tables 4 and 5a). Besides the proportional damage to the posterior insula, proportional damage to left supramarginal gyrus, left inferior fronto-occipital fasciculus, and the number of produced words were independent predictors for lexical diversity using LASSO modelling. All four predictors together explained more than half of the variance in lexical diversity (R2 = 0.580).

Table 5

Lesion symptom mapping for N = 58 participants using the least absolute shrinkage and selection operator (LASSO) for regression modelling. Predictor candidates were all 27 left hemisphere ROIs, and 2 control variables (lesion volume, number of words produced). Two-tailed statistical tests were applied. Table 5a shows the LASSO model for the dependent variable lexical diversity, Table 5b for lexical sophistication, and Table 5c for phonological word features.

a) Dependent variable: lexical diversity
Independent variables	LASSO coefficient
Supramarginal gyrus	−0.118
Posterior insula	−0.186
Inferior fronto-occipital fasciculus	−0.117
Number of words	0.088
Model summary	Coefficient of determination		Expected prediction error
	R²	Adjusted R²	Estimatea	Std. Error
	0.580	0.511	0.736	0.115

Mean squared error (10-fold cross validation).

Lesion symptom mapping for N = 58 participants between one region of interest and one word feature. All 81 statistical models (27 ROIs times 3 word features) are based on linear regressions with one dependent variable (word feature) and three independent variables (primary independent variable: percent lesion in region of interest, secondary independent (control) variables: lesion volume and number of words produced). The variance of inflation factor (VIF) was <5 for all variables in each listed regression models indicating no evidence of multicollinearity across the independent variables. Two-tailed statistical tests were applied. β = standardized coefficients beta; HCP = human connectome project; JHU = Johns Hopkins University; NA = not applicable; q-value=Benjamini-Hochberg adjusted p-value); ROI = region of interest. parameter estimate is significant using a false discovery rate (Benjamini-Hochberg adjusted p-value; q-value) of 0.05 (two-tailed). Lesion symptom mapping for N = 58 participants using the least absolute shrinkage and selection operator (LASSO) for regression modelling. Predictor candidates were all 27 left hemisphere ROIs, and 2 control variables (lesion volume, number of words produced). Two-tailed statistical tests were applied. Table 5a shows the LASSO model for the dependent variable lexical diversity, Table 5b for lexical sophistication, and Table 5c for phonological word features. Mean squared error (10-fold cross validation). Lexical sophistication was significantly predicted by proportion of damage to the pole of the left superior temporal gyrus as confirmed by both regression modelling approaches (Tables 4 and 5b). The proportional damage in this region explained a quarter of the variance in lexical sophistication (R2 = 0.262). In multivariable modelling using LASSO, the proportion of damage to the left supramarginal gyrus was significantly predictive of phonological word features and explained approximately a third of the variance of phonological word features (R2 = 0.297) (Tables 4 and 5c). The results are visualized in Fig. 3.

Fig. 3

Lesion symptom mapping results for lexical features revealed by multivariable regression models using least absolute shrinkage and selection operator (LASSO) (Tables 5a, b, c). pIns = posterior insula, poleSTG = pole of the superior temporal gyrus, IFOF = inferior fronto-occipital fasciculus, SMG = supramarginal gyrus.

Discussion

Our study demonstrates novel evidence regarding crucial neuroanatomy of lexical features produced during discourse. We assessed grey and white matter brain regions and we observed grey as well as white matter regions involved with lexical diversity, lexical sophistication and phonological word features. Our findings emphasize the importance of grey and white matter regions to process lexically diversity, lexically sophistication and phonological complexity. Our findings also support that lexical retrieval during discourse requires the interaction between dorsal and ventral structures. The specific findings and their implications are discussed below.

Neuroanatomical structures supporting lexical diversity

Reduced lexical diversity was associated with lesions to the left supramarginal gyrus, posterior insula and inferior fronto-occiptal fasciculus. More than half of the variance in lexical diversity was explained by the proportional damage to these three regions, including the total number of words produced in the same model. The supramarginal gyrus and insula are considered to be “hub” regions in the language network (Fridriksson et al., 2018) because of their central and overarching role in general language processing. Lexical diversity has been related to global attributes of speech production such as a speaker's competency and communicative effectiveness (Avent and Austermann, 2003; Crossley et al., 2012; Lu, 2012). Low lexical diversity, which implies fewer unique words, may result from deficits at various stages of language processing. A speaker needs to possess implicit vocabulary knowledge to produce lexically diverse speech, and the speaker also needs to be able to access and retrieve target words (Fergadiotis and Wright, 2011). Lexical diversity may also require cognitive functions encompassing planning and memorizing utterances to produce a range of unique words. Lexical diversity may therefore reflect a complex interaction between language specific integration and its relationship with broader cognitive functions. We speculate that this is one of the reasons why it was dependent on language hub regions. Below, we discuss each region associated with lexical diversity separately. The supramarginal gyrus is commonly grouped within the dorsal stream (Fridriksson et al., 2018) and it is possibly involved with lexical-phonological processing. In our study, the supramarginal gyrus was associated with lexical diversity and phonological word features, suggesting that it plays a role in both aspects of lexical retrieval. The role of the supramarginal gyrus are discussed in more detail in the section on phonological word features below. It remains speculative why the posterior insula has a crucial role in producing lexically diverse, connected speech. We speculate that two attributes of the posterior insula might explain its role in lexical diversity: 1) the posterior insula is a general hub for supervising and transferring signals between language-relevant regions, and 2) the posterior insula is part of the dorsal-phonologic-articulatory language processing system. The posterior insula has rich cortical and subcortical connections to a widespread network of brain regions (Augustine, 1996; Ghaziri et al., 2015; Ghaziri et al., 2018; Liu et al., 2018). Its central location and high connectivity may situate the posterior insula to supervise and coordinate signals between language-relevant areas. The posterior insula might contribute on a domain general level to language processing (Julayanont et al., 2016; Liu et al., 2018). Lesions to the posterior insula impact speech and articulation rate in individuals with post-stroke aphasia (Fridriksson et al., 2018; Fridriksson et al., 2016), suggesting that the posterior insula is also part of the dorsal-phonologic-articulatory language processing stream. High lexical diversity is intricately tied to the production of many, unique words, which in turn requires the correct articulation of these words. The inferior fronto-occiptal fasciculus (IFOF) is a long fiber bundle connecting the frontal cortex with posterior brain regions in the parietal, temporal and occipital cortex. The IFOF consists of different parts (e.g., frontal to tempo-parietal, frontal to occipital) likely serving different roles for language processing (Wu et al., 2016). Due to the segmentation of the white matter atlas employed in this study, we assessed the IFOF in its whole and did not analyse subcomponents. Independently of the IFOF segment, the IFOF is grouped within the ventral stream and is believed to be involved in semantic processing (Almairac et al., 2015; Catani and Mesulam, 2008; Friederici and Gierhan, 2013). Previous research suggests that stimulating the IFOF results in semantic paraphasias (Duffau et al., 2009). Thus, we speculate that the relationship between lesions to the IFOF and low lexically diverse speech might arise from deficits in semantic selections leading to selections of easily accessible semantic entries.

Neuroanatomical structures supporting lexical sophistication

Reduced lexical sophistication was associated with lesions to the pole of the left superior temporal gyrus. The pole of the left temporal gyrus is commonly grouped within the ventral processing stream (Fridriksson et al., 2018; Hickok and Poeppel, 2007). The relationship between lexical sophistication and the ventral stream may reflect that sophisticated words have less common lexical-semantic representations compared to not sophisticated words. Thus, the primary challenge for producing sophisticated words may be selecting and processing their correct lexical-semantic entries, while less so selecting and processing their lexical-phonological entries. Interestingly, previous studies have observed a relationship between word frequency and dorsal stream structures (Graves et al., 2007; Wilson et al., 2009), but the effect of lexical sophistication (words not belonging to the 2000 most frequent words) was not directly taken into account. The use of less sophisticated, highly frequent words may not necessarily require retrieval of their lexical-semantic representations, whereas sophisticated, low frequent words do. The link between semantic processing and lexical sophistication is further supported by studies exploring the specific role of the pole of the left temporal gyrus which has been identified as a semantic hub for object naming (Migliaccio et al., 2016; Tsapkini et al., 2011). Semantic errors during naming relate to lesions in the left anterior temporal gyrus (Schwartz et al., 2009). Schwartz and colleagues proposed that the left anterior temporal gyrus is crucial for transmitting fine-grained semantic information to the lexical system (Schwartz et al., 2009). We speculate that damage to the anterior superior temporal gyrus leads to loss in fine-grained semantic differentiation, which is required for more sophisticated words.

Neuroanatomical structures supporting phonological word features

We observed that loss of phonological word features were associated with lesions to the supramarginal gyrus. Fridriksson and colleagues observed that lesions to connections terminating in the supramarginal gyrus had a particularly negative impact on performance in an array of speech and language tests and concluded that the supramarginal gyrus was an important hub for controlling speech and language (Fridriksson et al., 2018). Even though the supramarginal gyrus is considered to be part of the dorsal stream (Fridriksson et al., 2018), it is located in the anatomical intersection between the dorsal and ventral streams and may play a complementary role for semantic and phonological processing. Word length, one of the components of phonological word features, has been linked with activation in area Spt (Hickok et al., 2003; Hickok et al., 2009; Okada et al., 2003), or Heschl's gyrus and mid-superior temporal lobe (Wilson et al., 2009). The supramarginal gyrus is considered to be part of area Spt. Previous research also suggested the involvement of the dorsal stream for biphoneme probability of produced words by mapping biphoneme probability on phoneme level neural networks (Vaden et al., 2011a; Vaden et al., 2011b). Moreover, research on connected speech production found that part of the supramarginal gyrus was related to the number of words produced. However, other regions showed a stronger relationship (e.g., inferior frontal gyrus and anterior insula) (Borovsky et al., 2007). Our findings on the relationship between lesions in the supramarginal gyrus and the use of phonologically less complex words in connected speech is in accordance with prior evidence. As suggested by previous studies, the supramarginal gyrus stores phonological representations (Abel et al., 2009) and/or somatosensory representations that correspond to phonemes (Hickok, 2012). In this study, the supramarginal gyrus was related to complex phonological processing beyond single phonemes such as biphoneme probability and phonological neighborhood. The less common phonemes or phoneme combinations are, the more encoding and somatosensory control is required. Because patients with a lesion to the left supramarginal gyrus may have a diminished capacity for encoding and somatosensory control, this may result in an avoidance of uttering phonologically complex words in connected speech.

Limitations

There are several word features that were not assessed in this study (e.g., word frequency, familiarity, age of acquisition). Importantly, it should be noted that any feature of lexical retrieval per se does not necessarily reflect how well an individual with aphasia is communicating. An individual might produce diverse lexical speech but might not be able to bring an intended point across. Moreover, information on school degree or number of years of education was not available for all participants and therefore, we could not assess how much of the variability in word features might be explained by differences in education. Region-based symptom mapping as employed in this study has limitations. Identifying crucial brain regions by drawing inferences from lesion locations to function can be impeded by the non-random nature of lesion locations due to constraints of vascular territories (Mah et al., 2014). Thus, statistical power depends on the sample size, anatomical variation of brain lesions and behavioral scores (Inoue et al., 2014; Sperber and Karnath, 2018). For example, in our cohort of 58 individuals with stroke, the insula and peri-insular regions were most commonly damaged, whereas superior and middle frontal areas were less often damaged. For this reason, a lack of significant relationship between frontal regions and word features may be related to lack of anatomical statistical power. Moreover, region-based symptom mapping may overlook the contribution of specific parts of a region. Consequently, linear relationships between parcel-level damage and a behavioral outcome may not always be valid. It should be emphasized that the validity of lesion-symptom mapping results depends on multiple methodological steps such as lesion delineation, normalization, standardization, brain segmentation, statistical design, and patient selection (Inoue et al., 2014; Sperber and Karnath, 2018). We explored regional white matter damage and did not evaluate specific pairwise connections or subnetworks. Furthermore, we also did not explore topological properties of subnetworks, which may be the focus of a future dedicated study.

Potential clinical implications

Our study suggests that selected brain regions are associated with impairment in specific discourse-extracted word features in participants with chronic stroke. This information may help clinicians to predict chronic stage deficits and focus on individualized treatment. Performance in the three word features were related to WAB scores. The degree to which an individual with post-stroke aphasia uses lexically diverse, lexically sophisticated or phonologically complex words in discourse remains undefined. However, WAB is a general measure of language performance that is sensitive to multiple aspects of language processing. Lexical retrieval is likely a core component of speech production and is thus reflected in the WAB. Lexical diversity, lexical sophistication and phonological word features in discourse provide more fine grained information beyond the WAB. They may exert important and independent impact on communicative abilities.

Conclusions

Our results suggest that lexical retrieval aspects of discourse depend on the integrity of specific brain regions involving insular and peri-Sylvian areas. These observations may provide insight into potentially underlying mechanisms of lexically diverse, sophisticated and phonologically complex words produced during discourse.

Funding/Acknowledgements

This work was supported by the National Institutes of Health/National Institute on Deafness and Other Communication Disorders (NIDCD) [grant numbers DC014021 (PI: Bonilha), DC011739 (PI: Fridriksson), P50 DC014664 (PI: Fridriksson), DC05375 (PI: Hillis)] and from the American Heart Association [grant number SFDRN26030003 (PI: Bonilha)].

Declaration of Competing Interest

The authors declare no competing financial interests.

4 in total

1. The Impact of Periventricular Leukoaraiosis in Post-stroke Oropharyngeal Dysphagia: A Swallowing Biomechanics and MRI-Based Study.

Authors: Nicolau Guanyabens; Christopher Cabib; Anna Ungueti; Montserrat Duh; Viridiana Arreola; Ernest Palomeras; María Teresa Fernández; Weslania Nascimento; Pere Clavé; Omar Ortega
Journal: Dysphagia Date: 2022-08-23 Impact factor: 2.733

2. White Matter Microstructure Changes and Cognitive Impairment in the Progression of Chronic Kidney Disease.

Authors: Mengchen Liu; Yunfan Wu; Xixin Wu; Xiaofen Ma; Yi Yin; Huamei Fang; Sihua Huang; Huanhuan Su; Guihua Jiang
Journal: Front Neurosci Date: 2020-09-29 Impact factor: 4.677

3. Language Recovery after Brain Injury: A Structural Network Control Theory Study.

Authors: Janina Wilmskoetter; Xiaosong He; Lorenzo Caciagli; Jens H Jensen; Barbara Marebwa; Kathryn A Davis; Julius Fridriksson; Alexandra Basilakos; Lorelei P Johnson; Chris Rorden; Danielle Bassett; Leonardo Bonilha
Journal: J Neurosci Date: 2021-12-06 Impact factor: 6.709

4. Reduced White Matter Integrity in Patients With End-Stage and Non-end-Stage Chronic Kidney Disease: A Tract-Based Spatial Statistics Study.

Authors: Yuhan Jiang; Qiuyi Gao; Yangyingqiu Liu; Bingbing Gao; Yiwei Che; Liangjie Lin; Jian Jiang; Peipei Chang; Qingwei Song; Weiwei Wang; Nan Wang; Yanwei Miao
Journal: Front Hum Neurosci Date: 2021-12-10 Impact factor: 3.169

4 in total