Literature DB >> 20661456

Hierarchical processing for speech in human auditory cortex and beyond.

Jonathan E Peelle1, Ingrid S Johnsrude, Matthew H Davis.   

Abstract

Entities:  

Year:  2010        PMID: 20661456      PMCID: PMC2907234          DOI: 10.3389/fnhum.2010.00051

Source DB:  PubMed          Journal:  Front Hum Neurosci        ISSN: 1662-5161            Impact factor:   3.169


× No keyword cloud information.
The anatomical connectivity of the primate auditory system suggests that sound perception involves several hierarchical stages of analysis (Kaas et al., 1999), raising the question of how the processes required for human speech comprehension might map onto such a system. One intriguing possibility is that earlier areas of auditory cortex respond to acoustic differences in speech stimuli, but that later areas are insensitive to such features. Providing a consistent neural response to speech content despite variation in the acoustic signal is a critical feature of “higher level” speech processing regions because it indicates they respond to categorical speech information, such as phonemes and words, rather than idiosyncratic acoustic tokens. In a recent fMRI study, Okada et al. (2010) used multi-voxel pattern analysis (MVPA) to investigate neural responses to spoken sentences in canonical auditory cortex (i.e., superior temporal cortex), using a design modeled after Scott et al. (2000). Okada et al. (2010) used a factorial design that crossed speech clarity (clear speech vs. intelligible noise vocoded speech) with frequency order (normal vs. spectrally rotated). Noise vocoding reduces the amount of spectral detail in the speech signal but faithfully preserves temporal information. Depending on the reduction in spectral resolution (i.e., the number of bands used in vocoding), noise vocoded speech can be highly intelligible, especially following training. By contrast, spectral rotation of the speech signal renders it almost entirely unintelligible without any change in overall level of spectral detail. Thus, the clear and vocoded sentences used by Okada et al. (2010) provided two physically dissimilar presentations of intelligible speech that the authors could use to identify acoustically insensitive neural responses; spectrally rotated stimuli allowed the authors to look for response changes due to intelligibility, independent of reductions in spectral detail. In a standard whole-brain univariate analysis, Okada et al. (2010) found intelligibility-related responses (i.e., intelligible activity > unintelligible activity) in large portions of the superior temporal lobes bilaterally, as well as smaller activations in left inferior frontal gyrus, posterior fusiform gyrus, and premotor cortex. The authors then chose maxima for each participant within anatomically defined regions (bilateral posterior, middle, and anterior superior temporal sulcus [STS], as well as Heschl's gyrus) and performed MVPA analyses to assess the ability of these regions to discriminate among the four acoustic conditions. They found that Heschl's gyrus could reliably distinguish all conditions, despite showing a similar average hemodynamic response in the traditional mass univariate analysis. Regions of the STS showed varying degrees of sensitivity to acoustic information. In the left hemisphere, posterior STS was the most acoustically insensitive, followed by anterior STS and Heschl's gyrus (no reliable middle STS activation was identified). In the right hemisphere the greatest acoustic insensitivity was observed in middle STS, close to Heschl's gyrus, followed by anterior and posterior STS respectively. The authors interpret these findings as generally consistent with a hierarchical structure for speech processing in the temporal lobe, with regions of STS in both hemispheres playing a critical role in abstract phonological processes as indicated by their high acoustic insensitivity. With these results, Okada et al. (2010) partially replicate previous univariate fMRI results reported by Davis and Johnsrude (2003). Davis and Johnsrude measured neural activity in response to multiple speech conditions that were equally intelligible but differed acoustically. To achieve this, three different forms of speech degradation were employed: noise vocoded speech, speech segmented by noise bursts, and speech in continuous background noise. Within each type of speech degradation there were three matched levels of intelligibility (confirmed by pre-tests and behavioral ratings collected in the scanner). The authors first identified regions sensitive to speech intelligibility by correlating neural activity with behavioral performance, and then examined the degree to which each of these regions was also sensitive to the acoustic form of the stimuli. Activity in regions close to primary auditory cortex depended on the type of degradation, but other intelligibility-responsive regions were insensitive to this acoustic information. These acoustically insensitive areas included regions located anterior to peri-auditory areas bilaterally, posterior to left peri-auditory cortex, and left inferior frontal gyrus. This arrangement is broadly consistent with the anatomical organization of primate auditory cortex (Kaas et al., 1999) and suggests high levels of acoustic insensitivity in both anterior and posterior regions of left superior temporal cortex – consistent with the univariate analysis reported by Okada et al. (2010), but in contrast to their multivariate results. These conflicting findings regarding acoustic sensitivity in anterior temporal regions could be a result of either (a) the experimental design and specific stimuli used or (b) differing sensitivity of multivariate and univariate analysis methods, a question that requires further investigation. Moving beyond the temporal lobe, the results of Davis and Johnsrude (2003) highlight the role of left inferior frontal cortex in speech comprehension. Activity in left inferior frontal cortex is common, although not universal, in neuroimaging studies of connected speech (e.g., Humphries et al., 2001; Davis and Johnsrude, 2003; Crinion and Price, 2005; Rodd et al., 2005, 2010; Obleser et al., 2007; Peelle et al., 2010b). Regions of prefrontal cortex have extensive anatomical connections to auditory belt and parabelt regions (Hackett et al., 1999; Romanski et al., 1999) and are thus well positioned to modulate the operation of lower-level auditory areas. Davis and Johnsrude (2003) provided evidence linking this fronto-temporal modulation with the recovery of meaning from an impoverished acoustic signal by showing that inferior frontal responses were elevated for distorted-yet-intelligible speech compared to both clear speech and unintelligible noise. This result suggests that activation of inferior frontal regions is a neural correlate of the more effortful listening that is required for the comprehension of degraded speech. Indeed, the relationship between intelligibility and listening effort also deserves consideration in interpreting temporal lobe responses to degraded speech. Okada et al. (2010) treated clear and vocoded speech as having similar intelligibility. This may be true in the sense that word report is equivalent (at ceiling); however, clear and vocoded conditions differ substantially in whether this intelligibility is achieved effortlessly (as for clear speech), or with considerable effort (for vocoded speech). In the Okada et al. (2010) study this difference in listening effort cannot be distinguished from sensitivity to acoustic differences between stimuli. One way to control for these effects would be to match (below-ceiling) intelligibility across different types of acoustic degradation. Using this approach, Davis and Johnsrude (2003) observed that both inferior frontal and peri-auditory regions of the STS showed elevated signal for intelligible but degraded speech compared to both clear speech and noise. Like intelligibility-sensitive regions, the areas responding to listening effort demonstrated a hierarchical organization (i.e., differential degrees of acoustic sensitivity), and hence it might be that these two effects are confounded in temporal lobe responses observed by Okada. On a related topic, we also note that Okada et al. (2010) used continuous fMRI, meaning that the auditory stimuli were presented in the midst of considerable background noise. Although one might assume that any such confounds would apply equally to all conditions tested, in fact, vocoded and clear sentences are not equally intelligible in the presence of background noise, even if word report scores are equivalent (at ceiling) when tested in quiet (Faulkner et al., 2001). Furthermore, even if participants are able to hear the sentences, scanner noise introduces significant additional task components related to segregating the auditory stream of interest and downstream effects of listening effort (McCoy et al., 2005; Wingfield et al., 2006). To date effects of scanner noise have been associated with changes in neural activity using univariate approaches (Seifritz et al., 2006; Gaab et al., 2007; Peelle et al., 2010a); the effect on multivariate results is unknown. Generally, however, the results of auditory fMRI studies employing a standard continuous scanning sequence must be viewed with caution given the additional perceptual processes required. Despite the caveats discussed above, how might the results of Okada et al. (2010) inform our understanding of speech processing? The authors suggest that their findings are consistent with a hierarchy for intelligible speech processing that starts with Heschl's gyrus, followed by anterior and posterior-going streams that progressively increase in acoustic invariance. In particular, their finding of acoustic sensitivity in anterior temporal regions stands in direct opposition to the original Scott et al. (2000) study and several follow-ups (Narain et al., 2003; Scott et al., 2006), as well as Davis and Johnsrude (2003), which argue that anterior temporal responses are largely acoustically invariant. Because of their focus on canonical regions of auditory cortex, Okada et al. (2010) do not discuss regions outside the superior temporal lobe as part of this hierarchy (Figure 1A).
Figure 1

Hierarchical models of processing intelligible speech. (A) Hierarchical processing in the temporal lobe, showing a posterior-anterior gradient in acoustic insensitivity moving away from primary auditory cortex. Posterior and anterior regions of STS discussed by Okada et al. (2010) are outlined in white. (B) An expanded model of hierarchical processing for speech that includes prefrontal, premotor/motor, and posterior inferotemporal regions.

Hierarchical models of processing intelligible speech. (A) Hierarchical processing in the temporal lobe, showing a posterior-anterior gradient in acoustic insensitivity moving away from primary auditory cortex. Posterior and anterior regions of STS discussed by Okada et al. (2010) are outlined in white. (B) An expanded model of hierarchical processing for speech that includes prefrontal, premotor/motor, and posterior inferotemporal regions. Expanding on this description, we argue that a hierarchical model of speech comprehension necessarily includes regions of motor, premotor, and prefrontal cortex (Figure 1B) as part of multiple parallel processing pathways that radiate outward from primary auditory areas (Davis and Johnsrude, 2007). Electrophysiological studies in non-human primates demonstrate auditory responses in frontal cortex, suggesting not only strong frontal involvement, but that these regions may indeed be viewed as part of the auditory system (Kaas et al., 1999; Romanski and Goldman-Rakic, 2002). In addition to prefrontal cortex, left posterior inferotemporal cortex is also critically involved in the speech intelligibility network, especially in accessing or integrating semantic representations (Crinion et al., 2003; Rodd et al., 2005). Although anatomical studies in primates have long emphasized the extensive and highly parallel anatomical coupling between auditory and frontal cortices (Seltzer and Pandya, 1989; Hackett et al., 1999; Romanski et al., 1999; Petrides and Pandya, 2009), frontal regions have only recently become a prominent feature of models of speech processing (Hickok and Poeppel, 2007; Rauschecker and Scott, 2009). Unfortunately, discussions of auditory sentence processing often focus almost exclusively on the importance of superior temporal responses, even when frontal or inferotemporal activity is present (Humphries et al., 2001; Obleser et al., 2008; Okada et al., 2010), resulting in an incomplete picture of the neural mechanisms involved in speech comprehension. In summary, there is now consensus that hierarchical processing is a key organizational aspect of the human cortical auditory system. The results of Okada et al. (2010) uniquely bring into question the degree to which anterior temporal cortex is acoustically insensitive, suggesting a more posterior locus for abstract phonological processing. Challenges for future studies include placing hierarchical organization in the temporal lobe within the broader context of larger networks for auditory and language processing, and clarifying the functional contribution of different parallel auditory processing pathways to comprehension of spoken language under varying degrees of effort.
  28 in total

1.  Role of anterior temporal cortex in auditory sentence comprehension: an fMRI study.

Authors:  C Humphries; K Willard; B Buchsbaum; G Hickok
Journal:  Neuroreport       Date:  2001-06-13       Impact factor: 1.837

2.  Effects of adult aging and hearing loss on comprehension of rapid speech varying in syntactic complexity.

Authors:  Arthur Wingfield; Sandra L McCoy; Jonathan E Peelle; Patricia A Tun; L Clarke Cox
Journal:  J Am Acad Audiol       Date:  2006 Jul-Aug       Impact factor: 1.664

3.  Assessing the influence of scanner background noise on auditory processing. II. An fMRI study comparing auditory processing in the absence and presence of recorded scanner noise using a sparse design.

Authors:  Nadine Gaab; John D E Gabrieli; Gary H Glover
Journal:  Hum Brain Mapp       Date:  2007-08       Impact factor: 5.038

4.  Bilateral speech comprehension reflects differential sensitivity to spectral and temporal features.

Authors:  Jonas Obleser; Frank Eisner; Sonja A Kotz
Journal:  J Neurosci       Date:  2008-08-06       Impact factor: 6.167

5.  Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech.

Authors:  Kayoko Okada; Feng Rong; Jon Venezia; William Matchin; I-Hui Hsieh; Kourosh Saberi; John T Serences; Gregory Hickok
Journal:  Cereb Cortex       Date:  2010-01-25       Impact factor: 5.357

6.  Neural processing during older adults' comprehension of spoken sentences: age differences in resource allocation and connectivity.

Authors:  Jonathan E Peelle; Vanessa Troiani; Arthur Wingfield; Murray Grossman
Journal:  Cereb Cortex       Date:  2009-08-07       Impact factor: 5.357

7.  Frontal lobe connections of the superior temporal sulcus in the rhesus monkey.

Authors:  B Seltzer; D N Pandya
Journal:  J Comp Neurol       Date:  1989-03-01       Impact factor: 3.215

8.  Temporal lobe regions engaged during normal speech comprehension.

Authors:  Jennifer T Crinion; Matthew A Lambon-Ralph; Elizabeth A Warburton; David Howard; Richard J S Wise
Journal:  Brain       Date:  2003-05       Impact factor: 13.501

9.  Functional integration across brain regions improves speech perception under adverse listening conditions.

Authors:  Jonas Obleser; Richard J S Wise; M Alex Dresner; Sophie K Scott
Journal:  J Neurosci       Date:  2007-02-28       Impact factor: 6.167

10.  Neural correlates of intelligibility in speech investigated with noise vocoded speech--a positron emission tomography study.

Authors:  Sophie K Scott; Stuart Rosen; Harriet Lang; Richard J S Wise
Journal:  J Acoust Soc Am       Date:  2006-08       Impact factor: 1.840

View more
  55 in total

1.  Multivariate activation and connectivity patterns discriminate speech intelligibility in Wernicke's, Broca's, and Geschwind's areas.

Authors:  Daniel A Abrams; Srikanth Ryali; Tianwen Chen; Evan Balaban; Daniel J Levitin; Vinod Menon
Journal:  Cereb Cortex       Date:  2012-06-12       Impact factor: 5.357

2.  Hearing loss in older adults affects neural systems supporting speech comprehension.

Authors:  Jonathan E Peelle; Vanessa Troiani; Murray Grossman; Arthur Wingfield
Journal:  J Neurosci       Date:  2011-08-31       Impact factor: 6.167

3.  White matter anisotropy in the ventral language pathway predicts sound-to-word learning success.

Authors:  Francis C K Wong; Bharath Chandrasekaran; Kyla Garibaldi; Patrick C M Wong
Journal:  J Neurosci       Date:  2011-06-15       Impact factor: 6.167

4.  Evidence for Cerebellar Contributions to Adaptive Plasticity in Speech Perception.

Authors:  Sara Guediche; Lori L Holt; Patryk Laurent; Sung-Joo Lim; Julie A Fiez
Journal:  Cereb Cortex       Date:  2014-01-22       Impact factor: 5.357

5.  Coupled neural systems underlie the production and comprehension of naturalistic narrative speech.

Authors:  Lauren J Silbert; Christopher J Honey; Erez Simony; David Poeppel; Uri Hasson
Journal:  Proc Natl Acad Sci U S A       Date:  2014-09-29       Impact factor: 11.205

6.  Association of hearing impairment with brain volume changes in older adults.

Authors:  F R Lin; L Ferrucci; Y An; J O Goh; Jimit Doshi; E J Metter; C Davatzikos; M A Kraut; S M Resnick
Journal:  Neuroimage       Date:  2014-01-09       Impact factor: 6.556

7.  Cortical Representations of Speech in a Multitalker Auditory Scene.

Authors:  Krishna C Puvvada; Jonathan Z Simon
Journal:  J Neurosci       Date:  2017-08-18       Impact factor: 6.167

8.  Acoustic richness modulates the neural networks supporting intelligible speech processing.

Authors:  Yune-Sang Lee; Nam Eun Min; Arthur Wingfield; Murray Grossman; Jonathan E Peelle
Journal:  Hear Res       Date:  2015-12-23       Impact factor: 3.208

9.  Getting the Cocktail Party Started: Masking Effects in Speech Perception.

Authors:  Samuel Evans; Carolyn McGettigan; Zarinah K Agnew; Stuart Rosen; Sophie K Scott
Journal:  J Cogn Neurosci       Date:  2015-12-22       Impact factor: 3.225

10.  Mapping phonemic processing zones along human perisylvian cortex: an electro-corticographic investigation.

Authors:  Sophie Molholm; Manuel R Mercier; Einat Liebenthal; Theodore H Schwartz; Walter Ritter; John J Foxe; Pierfilippo De Sanctis
Journal:  Brain Struct Funct       Date:  2013-05-26       Impact factor: 3.270

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.