| Literature DB >> 31520744 |
Adam Kenji Yamamoto1, Oiwi Parker Jones2, Thomas M H Hope3, Susan Prejawa4, Marion Oberhuber5, Philipp Ludersdorfer6, Tarek A Yousry7, David W Green8, Cathy J Price9.
Abstract
This fMRI study of 24 healthy human participants investigated whether any part of the auditory cortex was more responsive to self-generated speech sounds compared to hearing another person speak. The results demonstrate a double dissociation in two different parts of the auditory cortex. In the right posterior superior temporal sulcus (RpSTS), activation was higher during speech production than listening to auditory stimuli, whereas in bilateral superior temporal gyri (STG), activation was higher for listening to auditory stimuli than during speech production. In the second part of the study, we investigated the function of the identified regions, by examining how activation changed across a range of listening and speech production tasks that systematically varied the demands on acoustic, semantic, phonological and orthographic processing. In RpSTS, activation during auditory conditions was higher in the absence of semantic cues, plausibly indicating increased attention to the spectral-temporal features of auditory inputs. In addition, RpSTS responded in the absence of any auditory inputs when participants were making one-back matching decisions on visually presented pseudowords. After analysing the influence of visual, phonological, semantic and orthographic processing, we propose that RpSTS (i) contributes to short term memory of speech sounds as well as (ii) spectral-temporal processing of auditory input and (iii) may play a role in integrating auditory expectations with auditory input. In contrast, activation in bilateral STG was sensitive to acoustic input and did not respond in the absence of auditory input. The special role of RpSTS during speech production therefore merits further investigation if we are to fully understand the neural mechanisms supporting speech production during speech acquisition, adult life, hearing loss and after brain injury.Entities:
Keywords: Auditory feedback; Own speech; Speech production; fMRI
Mesh:
Year: 2019 PMID: 31520744 PMCID: PMC6876272 DOI: 10.1016/j.neuroimage.2019.116184
Source DB: PubMed Journal: Neuroimage ISSN: 1053-8119 Impact factor: 6.556
Fig. 3Condition specific responses in left and right pSTS and STG. Activation for each of the four regions in each of the 16 conditions. Going from left to right, conditions 1–8 = speech production, conditions 9–16 = one-back matching. Conditions 1–4 and 9–12 = visual stimuli. Conditions 5–8 and 13–16 = auditory stimuli. O = object naming from pictures (visual) or sounds (auditory), W = words, Ps = pseudowords, C = coloured non-objects, H = male and female humming. Activation is plotted at the voxels, within our regions of interest, showing the peak effect of speech production more than auditory input (contrast c) for RpSTS and LpSTS and the peak effect of the reverse contrast for RSTG and LSTG. These co-ordinates were: (+45 -33 +3), (-48 -31 +1), (+60 -15 +3) and (-57 -15 0). The plots are colour coded to help link the plot to the regions shown in Fig. 2. The plot showing LpSTS is not coloured because there was no significant effect of own or another’s speech in this region. The peak is included for comparison with RpSTS. Standard errors are marked in white boxes above the mean response for each condition.
Fig. 1Examples of the visual stimuli.
Experimental conditions and statistical contrasts. SP is speech production, OBM is one-back matching. Contrast (a) is the main effect of Auditory > Visual conditions. The reverse of this contrast is the main effect of visual input which was not of interest. Contrast (b) is the main effect of speech production compared to one-back matching on exactly the same stimuli. The reverse of this contrast is the main effect of one-back matching which was not of interest. Contrast (c) identified areas where the main effect of speech production (contrast b) was greater than the main effect of auditory input (contrast a) (contrast c = b - a). This is only reported in auditory processing areas (i.e. significant in contrast (a) that also showed an effect of speech production (i.e. significant in contrast (b)), so controlling for all other variables. (c2) is the reverse of contrast (c) and identified areas where the main effect of auditory input (contrast a) was greater than the main effect of speech production (contrast b). Contrast (d) identified the main effect of semantic content (Sem) by comparing pictures, sounds and names of objects to the other conditions. We also tested the reverse of this contrast, (d2). Contrast (e) identified the main effect of sublexical phonological cues to speech production (Phon) by comparing words and pseudowords to all other conditions. We also tested the reverse of this contrast, (e2). Contrast (f) identified whether the effect of words/pseudowords (phonological inputs) was greater in the written domain (orthographic) compared with the auditory domain. The reverse of this contrast (phonological content in the auditory > visual domain) tests for activation related to auditory speech sounds.
| Conditions | Statistical contrasts | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| a | b | c | c2 | d | d2 | e | e2 | f | |||
| Task | Input | Stimulus | Aud | SP | SP-Aud | Sem | Phon | Orth | |||
| SP | Visual | Pictures of objects | −1 | 1 | 2 | −2 | 1 | −1 | −1 | 1 | −1 |
| Words | −1 | 1 | 2 | −2 | 1 | −1 | 1 | −1 | 1 | ||
| Pseudowords | −1 | 1 | 2 | −2 | −1 | 1 | 1 | −1 | 1 | ||
| Coloured non-objects | −1 | 1 | 2 | −2 | −1 | 1 | −1 | 1 | −1 | ||
| Auditory | Sounds of objects | 1 | 1 | 0 | 0 | 1 | −1 | −1 | 1 | 1 | |
| Words | 1 | 1 | 0 | 0 | 1 | −1 | 1 | −1 | −1 | ||
| Pseudowords | 1 | 1 | 0 | 0 | −1 | 1 | 1 | −1 | −1 | ||
| Baseline (Humming) | 1 | 1 | 0 | 0 | −1 | 1 | −1 | 1 | 1 | ||
| OBM | Visual | Pictures of objects | −1 | −1 | 0 | 0 | 1 | −1 | −1 | 1 | −1 |
| Words | −1 | −1 | 0 | 0 | 1 | −1 | 1 | −1 | 1 | ||
| Pseudowords | −1 | −1 | 0 | 0 | −1 | 1 | 1 | −1 | 1 | ||
| Coloured non-objects | −1 | −1 | 0 | 0 | −1 | 1 | −1 | 1 | −1 | ||
| Auditory | Sounds of objects | 1 | −1 | −2 | 2 | 1 | −1 | −1 | 1 | 1 | |
| Words | 1 | −1 | −2 | 2 | 1 | −1 | 1 | −1 | −1 | ||
| Pseudowords | 1 | −1 | −2 | 2 | −1 | 1 | 1 | −1 | −1 | ||
| Baseline (Humming) | 1 | −1 | −2 | 2 | −1 | 1 | −1 | 1 | 1 | ||
In scanner behavioural results.
| Modality | Stimulus | Duration | RT | Accuracy | |
|---|---|---|---|---|---|
| OBM | OBM | SP | |||
| Visual | Objects (O) | 1500 | 683 (115.7) | 99.7 (0.8) | 96.0 (4.6) |
| Words (W) | 1500 | 655 (113.1) | 97.7 (5.8) | 99.6 (1.3) | |
| Pseudowords (Ps) | 1500 | 648 (88.4) | 98.6 (4.3) | 85.8 (15.1) | |
| Colours (C) | 1500 | 762 (111.0) | 95.6 (2.9) | 99.0 (1.9) | |
| Auditory | Objects (O) | 1470 (120) | 1111 (330.6) | 96.7 (5.9) | 91.8 (7.6) |
| Words (W) | 640 (100) | 880 (113.7) | 99.1 (3.0) | 99.5 (1.1) | |
| Pseudowords (Ps) | 680 (120) | 959 (136.1) | 99.1 (1.6) | 88.3 (8.7) | |
| Humming (H) | 1040 (430) | 1125 (226.4) | 88.8 (9.7) | 99.1 (2.1) | |
SP is speech production, OBM is one-back matching. Duration refers to length of stimulus presentation in ms (standard deviation). RT refers to response times in ms (standard deviation) that were only available for one-back matching. Accuracy is the mean percentage of correct responses with standard deviation.
Results of repeated measures ANOVA on OBM response times.
| Effect | F | Df | P value | Post hoc analysis (see |
|---|---|---|---|---|
| Modality (Mod) | 146.6 | 1,20 | 0.000 | Faster for Visual (vis) than Auditory (Aud) |
| Phonology (Phon) | 35.2 | 1,20 | 0.000 | Faster for W & Ps than Obj & C/H |
| Semantics (Sem) | 4.9 | 1,20 | 0.038 | Faster for W than Ps, & for Obj than C/H |
| Mod x Phon | 8.5 | 1,20 | 0.009 | Phon effect is bigger for Aud than Vis stimuli |
| Mod x Phon x Sem | 7.6 | 1,20 | 0.012 | Sem effect is bigger for Aud phon (W < Ps); and Sem effect is bigger for Vis non-phon (O < C) |
| Mod x Sem | 0.115 | 1,20 | 0.738 | Not significant |
| Phon x Sem | 0.053 | 1,20 | 0.821 | Not significant |
‘x’ denotes the testing of an interaction.
fMRI activation results in regions of interest. The contrast labels (a to f) in the first column correspond to those detailed in Table 1. ‘x’ denotes the testing of an interaction. The regions of interest are centred on the areas reported in Agnew et al. (2013) for own versus another’s speech in right pSTS (RpSTS) (x = +48, y = -31, z = +1) and other versus own speech in left STG (LSTG) (-60 -13 + 4). Effects are also reported in the homologues of these regions: left pSTS (LpSTS) (-48 -31 + 1) and right STG (RSTG) (+60 -13 + 4). P-values are corrected for multiple comparisons across the whole brain, unless appended with a u (i.e. p < 0.001u) which indicates uncorrected thresholds or * which indicates a small volume correction for multiple comparisons in the regions of interest and for the effects of interest (i.e. RpSTS for speech production > auditory processing and LSTG for auditory processing > speech production). ns = not significant.
| Contrast | RpSTS | LpSTS | RSTG | LSTG | |||||
|---|---|---|---|---|---|---|---|---|---|
| Z | P | Z | P | Z | P | Z | P | ||
| (a) | Auditory > Visual | 10.3 | 13.5 | 21.8 | 16.0 | ||||
| (b) | SP > OBM | 11.2 | 5.3 | 10.3 | 7.8 | ||||
| (c) | SP > Auditory | 3.6 | 0.004* | ∼ | ns | ∼ | ns | ∼ | ns |
| (c2) | Auditory > SP | ∼ | ns | ∼ | ns | 6.0 | 5.3 | ||
| (d2) | Non-Sem > Sem | ∼ | ns | ∼ | ns | ∼ | ns | ||
| (d2) x (a) | Non-sem > Sem for Auditory > Visual | ∼ | ns | ∼ | ns | ∼ | ns | ||
| (e2) | Non-phon > Phon | ∼ | ns | 4.0 | ∼ | ns | |||
| (e2) x (a) | Non-phon > Phon for Auditory > Visual | 6.3 | 7.2 | 4.8 | |||||
| (f) | Phon from orthography > no orthography | ∼ | ns | ∼ | ns | ∼ | ns | ||
Fig. 2Superior temporal lobe activation for processing own and another’s speech. Sagittal (top), coronal (middle) and axial (bottom) brain slices (at MNI co-ordinates: +45 -33 +6) showing regions of interest in the auditory cortices. All coloured regions (yellow, red, orange, blue and green) were activated by main effect of auditory input and main effect of speech production (both at p < 0.05 corrected for multiple comparisons across the whole brain). Blue areas show the LSTG and RSTG regions that were more activated by hearing another’s speech than own speech (contrast c2 in Table 1). The red RpSTS region was more activated by (i) speech production than listening to another’s speech (contrast c in Table 1) and (ii) one-back matching on written pseudowords compared to rest. The orange bilateral regions bordering the ventral surface of the premotor cortex were also more activated for speech production than listening but are not discussed because they were not in regions of interest and activation was explained by motor activity during speech production. Green regions were activated by one-back matching on written pseudowords compared to rest but are not of interest because they were not more activated by speech production compared to listening to another’s speech. Yellow regions show the remaining auditory input areas activated for the main effects of both auditory input and speech production. Blue, red/orange and green areas include all voxels that surpassed a threshold of p < 0.01 uncorrected to show the full extent of activation around peaks that survived significance after correction for multiple comparisons in regions of interest.
Fig. 4Contrasting effects in bilateral STG and RpSTS. This figure illustrates the task by hemisphere interaction for the word conditions only. Other = other speech when listening to words and performing the one-back matching task. Own = own speech production when the same words were read aloud. These two tasks were selected because (i) they segregate other speech (listening) from own speech (speech production) and (ii) they are matched for phonological and semantic content. The values on the y axis (parameter estimates) correspond to those shown in Fig. 3 for speaking aloud visual words (W) and one-back matching on auditory words (W).