| Literature DB >> 25092665 |
Abstract
What do we hear when someone speaks and what does auditory cortex (AC) do with that sound? Given how meaningful speech is, it might be hypothesized that AC is most active when other people talk so that their productions get decoded. Here, neuroimaging meta-analyses show the opposite: AC is least active and sometimes deactivated when participants listened to meaningful speech compared to less meaningful sounds. Results are explained by an active hypothesis-and-test mechanism where speech production (SP) regions are neurally re-used to predict auditory objects associated with available context. By this model, more AC activity for less meaningful sounds occurs because predictions are less successful from context, requiring further hypotheses be tested. This also explains the large overlap of AC co-activity for less meaningful sounds with meta-analyses of SP. An experiment showed a similar pattern of results for non-verbal context. Specifically, words produced less activity in AC and SP regions when preceded by co-speech gestures that visually described those words compared to those words without gestures. Results collectively suggest that what we 'hear' during real-world speech perception may come more from the brain than our ears and that the function of AC is to confirm or deny internal predictions about the identity of sounds.Entities:
Keywords: auditory; brain; context; language; multimodal; predictive
Mesh:
Year: 2014 PMID: 25092665 PMCID: PMC4123676 DOI: 10.1098/rstb.2013.0297
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1.Regions supporting and visualization of the active hypothesis-and-test mechanism associated with the NOLB model (see §1b for details). (a) Posterior superior temporal (PST) regions (black letters) include the transverse temporal gyrus/sulcus (TTG), posterior aspect of the lateral fissure (LF), planum temporale (PT) and superior temporal gyrus (STG). Though the entire STG is drawn for reference, PST regions include only cortex posterior to the blue line drawn at the anterior aspect of the TTG. Posterior ventral frontal (PVF) regions (red letters) include the pars opercularis (POP) of the inferior frontal gyrus, ventral aspects of the precentral sulcus (VPS), precentral gyrus (VPG) and central sulcus (VCS) and the subcentral gyrus and sulcus (SG). (b) Visualization of hypothesis-and-test processing steps associated with these regions. Hypotheses are formed and tested through bidirectional network interactions between PST and PVF regions. Specifically, context is used to generate hypotheses about associated auditory objects in PST regions (1), an hypothesis is specified as a motor goal and mapped onto motor plans to produce (speak) that goal in PVF regions (1 → 2 → 3), the auditory object associated with those plans is activated through feedback (3 → 1) and compared with acoustic patterns arriving in the TTG (1 → A, represented by the ‘comparator’). A difference results in an error signal, and 1–3 and A are repeated until the error signal is suppressed. A strong prediction might result in a small error signal and only one cycle through the hypothesis formation and testing network (dashed lines). A weaker prediction might require multiple hypothesis-and-test cycles and more metabolic expenditure (dashed and solid lines).
Figure 2.Neuroimaging meta-analyses results. (a) Less meaningful > meaningful stimuli and tasks: passive listening to non-words > words (yellow), experiments in the behavioural domains of language and speech or phonology > syntax or semantics (red) and their overlap (blue). White outline is the intrinsic connectivity network correlated with passive listening and less meaningful stimuli, thresholded at z ≥ 10 to show the PST distribution of activity. (b) Meaningful > less meaningful stimuli and tasks: the converse of (a), i.e. passive listening to words > non-words and syntax or semantics > speech or phonology (red). White outline is frontal activity for passive listening to words and experiments in the behavioural domains of syntax and semantics prior to contrast analyses. Blue outline is deactivation for auditory presentation of words and experiments in the behavioural domain of syntax and semantics. Filled in blue outline is significantly greater for the latter compared with auditory presentation of non-words, speech and phonology. (c) Transverse temporal gyrus (TTG) co-activation network (red), SP (yellow) and their overlap (blue). White outline is covert SP. Black outlines are PST and ventral frontal (PVF) regions as defined in figure 1a. All p's ≤ 0.05 FDR corrected for multiple comparisons with a cluster size of 160 mm3 (20 voxels).
Meta-analyses of activity by region. (Volume of activation in grey matter per region in mm3. Grey and red outlines correspond the PST and PVF regions in figure 1a, where region abbreviations are defined. LH and RH are left and right hemispheres, respectively. Online version in colour.)
Meta-analyses of activity by cluster. (Ten largest clusters only. Region abbreviations defined in figure 1a. MNI coordinates are centres of mass. First region is always the location at the centre and the other regions are part of the cluster. With the exception of the ‘anterior insula’, ‘anterior’ and ‘posterior’ are relative to the TTG (see figure 1a (blue line)).)
| voxels (mm3) | MNI | |||||
|---|---|---|---|---|---|---|
| meta-analyses | regions | |||||
| ( | ||||||
| left | fusiform and cerebellum | 27 608 | −36 | −71 | −19 | |
| a combination of the following contrasts: | right | POP and VPG, VPS, VCS | 22 040 | 47 | 13 | 17 |
| passive listening to non-words > words | left | PT and TTG, posterior STG, LF | 18 992 | −56 | −31 | 13 |
| behavioural domain of language: | left | VPG and POP, VPS, VCS | 17 880 | −46 | 0 | 42 |
| speech > syntax | left | superior frontal gyrus (SMA) | 17 648 | −2 | 16 | 54 |
| speech > semantics | right | fusiform and cerebellum | 16 848 | 39 | −66 | −23 |
| phonology > syntax | right | posterior STG and TTG, PT, LF | 16 736 | 55 | −29 | 3 |
| phonology > semantics | left | anterior insula, SG, putamen | 14 432 | −34 | 9 | 3 |
| left | superior parietal lobule | 6672 | −33 | −63 | 50 | |
| right | thalamus, caudate | 4944 | 2 | −2 | 5 | |
| ( | ||||||
| correlation of stimuli with this network: | left | posterior STG and TTG, PT, LF, PP, middle temporal gyrus | 64 584 | −53 | −24 | 2 |
| non-vocal sounds, | right | posterior STG and TTG, PT, LF, PP, middle temporal gyrus | 62 752 | 55 | −21 | 0 |
| non-verbal vocal sounds, | right | cerebellum | 1360 | 10 | −69 | −34 |
| noise, | right | VPG and VPS, VCS | 1280 | 47 | −8 | 47 |
| music, | left | superior parietal lobule | 1152 | −14 | −61 | 62 |
| tones, | right | superior parietal lobule | 872 | 14 | −62 | 62 |
| pseudo-words, | left | VPG and VCS | 528 | −48 | −13 | 45 |
| none, | right | paracentral lobule | 496 | 9 | −42 | 59 |
| words, | right | middle frontal gyrus | 280 | 31 | −2 | 63 |
| syllables, | left | middle frontal gyrus | 248 | −25 | −7 | 64 |
| false fonts, | ||||||
| ( | ||||||
| left | PP and anterior STG, temporal pole, pars triangularis, pars orbitalis | 25 480 | −50 | 16 | −10 | |
| a combination of the following contrasts: | left | posterior middle temporal gyrus | 18 592 | −46 | −47 | -7 |
| passive listening to words > non-words | right | anterior middle temporal gyrus and anterior STG, temporal pole, pars orbitalis | 11 328 | 57 | 6 | -16 |
| behavioural domain of language: | right | angular gyrus and posterior middle temporal gyrus | 3104 | 56 | −62 | 17 |
| syntax > speech | left | anterior superior frontal gyrus | 2528 | −6 | 53 | 34 |
| syntax > phonology | left | thalamus | 1936 | −7 | −5 | 6 |
| semantics > speech | right | cerebellum and fusiform | 1864 | 34 | −47 | −29 |
| semantics > phonology | right | pars triangularis | 1672 | 51 | 28 | 17 |
| left | occipital pole | 1248 | −6 | −95 | −10 | |
| left | putamen | 1072 | −21 | 8 | 6 | |
| ( | ||||||
| left | anterior insula and POP, VPS, VPG, VCS, SG, thalamus, putamen, TTG, STG, PT, LF | 51 328 | −44 | −2 | 15 | |
| bilateral | superior frontal gyrus (SMA) | 17 952 | 0 | 10 | 52 | |
| left | anterior insula and POP, putamen | 12 224 | 39 | 17 | 2 | |
| right | posterior STG and TTG, PT, LF | 7936 | 58 | −35 | 5 | |
| right | VPG and POP, VPS, VCS | 6032 | 50 | 3 | 38 | |
| left | superior and inferior parietal lobule | 4696 | −37 | −50 | 49 | |
| right | cerebellum | 3784 | 29 | −63 | −33 | |
| left | fusiform and cerebellum | 3496 | −39 | −66 | −27 | |
| right | superior parietal lobule | 1528 | 36 | −55 | 49 | |
| left | superior parietal lobule | 1240 | −22 | −70 | 44 | |
Per cent overlap of meta-analysis with SP. (See figure 1a for location of the PST and PVF regions. PST and PVF overlaps were calculated using grey matter only. See table 2a,c for definitions of less and more meaningful stimuli and tasks.)
| whole brain | PST and PVF regions | |||
|---|---|---|---|---|
| meta-analyses | overt | covert | overt | covert |
| TTG co-activity network | 55 | 71 | 30 | 58 |
| less meaningful stimuli and tasks | 35 | 43 | 36 | 54 |
| more meaningful stimuli and tasks | 7 | 8 | 12 | 8 |
Figure 3.Comparison of lexical affiliates in sentences when preceded by iconic co-speech gestures that visually described those words (blue) and when not preceded by those gestures (red). (a) Example frames from ‘I type the poem’ for the no-gesture (top/red) and gesture video clips (bottom/blue). The latter was constructed to show hand and arm motion from the start of the sentence to the end of the lexical affiliate visually described by the gesture (‘type’). (b) Beginning 148 ms after the onset of the lexical affiliate and progressing in 4 ms steps to 184 ms, brain images show significant no-gesture > iconic gesture activity. The inset at 164 ms magnifies the primary AC and PST regions (figure 1a). Brain images at 180 ms are not shown because there were no significant differences. All p's ≤ 0.05 FDR corrected for multiple comparisons with a cluster size of 20 voxels. (c) Brain images show the mean of activity from the onset of the lexical affiliates to 184 ms illustrating more overall activity for the no-gesture condition. Time series below those images are the averaged bilateral primary AC (TTG) response for that time period and the horizontal lines are the means of those time series.