Literature DB >> 30322908

Mammalian behavior and physiology converge to confirm sharper cochlear tuning in humans.

Christian J Sumner¹, Toby T Wells², Christopher Bergevin^3,4, Joseph Sollini², Heather A Kreft^5,6, Alan R Palmer², Andrew J Oxenham^5,6, Christopher A Shera^7,8.

Abstract

Frequency analysis of sound by the cochlea is the most fundamental property of the auditory system. Despite its importance, the resolution of this frequency analysis in humans remains controversial. The controversy persists because the methods used to estimate tuning in humans are indirect and have not all been independently validated in other species. Some data suggest that human cochlear tuning is considerably sharper than that of laboratory animals, while others suggest little or no difference between species. We show here in a single species (ferret) that behavioral estimates of tuning bandwidths obtained using perceptual masking methods, and objective estimates obtained using otoacoustic emissions, both also employed in humans, agree closely with direct physiological measurements from single auditory-nerve fibers. Combined with human behavioral data, this outcome indicates that the frequency analysis performed by the human cochlea is of significantly higher resolution than found in common laboratory animals. This finding raises important questions about the evolutionary origins of human cochlear tuning, its role in the emergence of speech communication, and the mechanisms underlying our ability to separate and process natural sounds in complex acoustic environments.

Entities: Disease Gene Species

Keywords: auditory nerve; cochlear tuning; frequency selectivity; otoacoustic emissions; psychoacoustics

Mesh：

Year: 2018 PMID： 30322908 PMCID： PMC6217411 DOI： 10.1073/pnas.1810766115

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

The cochlea within the inner ear acts like an acoustic prism to decompose sound into its constituent frequency components, creating a frequency-to-place map along its length. This decomposition establishes the tonotopic encoding of sound frequency that remains a fundamental organizing principle of the auditory system from the cochlea to the auditory cortex (1–4). The resolution with which the cochlea performs this frequency analysis influences our ability to perceptually separate different sounds and to communicate in complex acoustic environments. The loss of cochlear frequency resolution, through damage or disease, underlies some of the most troublesome problems associated with hearing impairment, including difficulty understanding speech in noise (5). For many years, a consensus existed that cochlear tuning was similar across a wide range of mammalian species, including humans. That conclusion was based on the relatively good correspondence between indirect behavioral estimates of human tuning (6, 7) and direct measures of cochlear tuning taken from the auditory nerve of smaller laboratory animals (8, 9). Very few physiological human data existed, and those that did were not sufficient in number or did not deviate sufficiently from animal data to suggest any fundamental differences between species (10). However, more recent studies have suggested that human cochlear tuning may be sharper, by a factor of two or more, than cochlear tuning in typical laboratory animals, such as cat and guinea pig. The latest estimates from humans combined more refined behavioral measures and new noninvasive objective measures based on otoacoustic emissions (OAEs)—sounds that are emitted by the cochlea and can be recorded in the ear canal (11). Knowledge of any interspecies differences in the frequency resolution of the cochlea is critical to our understanding of a diverse range of issues (12). For example, the claimed disparities in estimates between animal and human tuning are sufficiently large to substantially affect the neural coding and representation of speech and other critical natural sounds (13–15). Quantification of species differences is also important for understanding the mechanisms underlying frequency analysis. For instance, it has been claimed that the cortical representation of frequency results from neural sharpening by the central auditory system from a less sharply tuned representation in the cochlea (16). This claim hinges critically on the assumption that human cochlear tuning is similar to that of small mammals. In large part, claims of sharper tuning in the human cochlea remain controversial (17–19) because of a lack of commensurate measures across species. Direct measures of tuning from single-unit recordings in the auditory nerve [auditory nerve fiber (ANF) in Fig. 1] have been obtained in laboratory animals but are too invasive to be performed in humans. Conversely, the more recent psychophysical methods (PSY) used in humans, involving the masking of a probe tone by spectrally notched noise under PSY forward masking (PSY-F; Fig. 1), have not yet been tested in animals. Estimates based on OAE measurements have been obtained in both humans and smaller mammals and are consistent with the claim of sharper tuning in humans (11, 18). However, uncertainty surrounding the mechanisms by which OAEs are generated, and their relationship to cochlear tuning, leave room for doubt (20, 21). In summary, three types of measure have been used to estimate cochlear tuning—behavioral, otoacoustic, and neural—but have never all been measured and compared in the same species. To resolve this problem, we used ferrets to examine all three measures within the same species. We reasoned that if the two indirect measures (OAE and PSY) provide accurate estimates of cochlear tuning, then they should both agree with the direct neural (ANF) measures. By employing all three methods in the same species, our experiments provide the strongest test to date of the validity of the indirect measures used to assess cochlear frequency tuning in humans.

Fig. 1.

Three different ways of estimating cochlear tuning used in ferrets. ANFs, threshold levels (gray line) for a response are fit with a filter model (red line), from which the ERB (dashed gray line) is calculated. OAEs, the mean phase gradient of OAEs (red line) is used to estimate filter sharpness, QERB (=f/ERB), using the approximate species invariance of the tuning ratio. PSY, the behavioral detection of a pure tone in the presence of two bands of noise, separated by varying spectral distances. ERB (blue dashed line) is estimated by fitting a filter model to the detection thresholds. ANF data from ref. 27.

Results

We estimated ferret frequency tuning perceptually using a psychophysical notched-noise masking paradigm (Fig. 1, PSY; ). This paradigm measures the effectiveness of noises with various spectral shapes at masking a narrowband signal, such as a pure tone. By varying the frequency extent of a spectral notch in the masking noise, the shape and bandwidth of the effective auditory filter can be derived (). We applied this method in ferrets performing behavioral detection tasks, and from the results derived the equivalent rectangular bandwidths (ERBs) of the filters, along with a corresponding dimensionless measure of tuning sharpness, QERB (center frequency/ERB). For any filter shape, the ERB is the bandwidth of the rectangular filter with the same peak height that passes the same total power. Because of cochlear nonlinearities, the exact stimulus conditions employed can influence the measured bandwidths. These include whether the masking noise is presented simultaneously with the signal (PSY simultaneous, PSY-S) or directly precedes the signal (PSY-F), thereby avoiding physical interactions between the stimuli within the cochlea (22–24). The estimated bandwidths can also depend on whether the intensity of the signal is kept constant and the threshold is found by varying the intensity of the masker, or vice versa. We estimated filter bandwidths in ferrets using all of these variants. Consistent with results in humans (22–24), we observed that PSY-F produces significantly sharper estimates of tuning than simultaneous masking [QERB(PSY-S) = 0.72 × QERB(PSY-F); P = 0.04; see Figs. 2 and 3 and ]. We found no significant effect of whether thresholds are derived by varying the level of the masker or target tone (P = 0.2), contrary to expectations (19, 25, 26). The absence of a significant effect may be partly due to our use of low stimulus levels (<40 dB sound pressure level), which are generally below the onset level of the compressive cochlear nonlinearity in ferrets (27), and partly due to the relatively small number of estimates in each condition (n = 5 for the fixed signal and n = 3 for the fixed masker), providing limited statistical power to detect a difference. Therefore, we only distinguish between forward and simultaneous masking in our further comparisons.

Fig. 2.

Fig. 3.

Comparing different measures of frequency resolution in the ferret, independently of the effect of signal frequency. (A) The different tuning measurements as a fraction of the mean ANF tuning at a given frequency. Dashed red lines show excluded OAE outliers (see ). (B) Statistical comparison of the different measures of tuning. Horizontal bars show the mean of each measure as a fraction of auditory nerve tuning and also as effect size (relative to ANF tuning). Asterisks next to data points indicate significant differences compared with auditory nerve tuning. *P < 0.05; **P < 0.01; ***P < 0.001.

Three measures of frequency selectivity agree. (A) Filter sharpness from PSY-F agrees closely with ANF and OAE measurements. Tuning in individual nerve fibers (gray points), psychophysical forward masking (blue points), and a loess trend and its bootstrapped 95% CI for the otoacoustic emissions measurements. Dashed lines indicate bootstrapped 95% CIs for the data. (B) Forward masking (PSY-F; blue points, n = 8) yields a better match to auditory nerve tuning than simultaneous masking (PSY-S; magenta points, n = 22). In B, auditory nerve data are shown as the area within the loess () trend 95% CI. ANF data from ref. 27. Comparing different measures of frequency resolution in the ferret, independently of the effect of signal frequency. (A) The different tuning measurements as a fraction of the mean ANF tuning at a given frequency. Dashed red lines show excluded OAE outliers (see ). (B) Statistical comparison of the different measures of tuning. Horizontal bars show the mean of each measure as a fraction of auditory nerve tuning and also as effect size (relative to ANF tuning). Asterisks next to data points indicate significant differences compared with auditory nerve tuning. *P < 0.05; **P < 0.01; ***P < 0.001. Next, we recorded stimulus-frequency otoacoustic emissions (SFOAEs) from the ears of sedated ferrets and inferred cochlear bandwidths using the emission group delay (Fig. 1; OAE, ). The OAE-based method estimates the sharpness (QERB) of the cochlear filters using the assumption of approximate species invariance of the “tuning ratio.” The tuning ratio is the empirical relationship between emission-delay and auditory nerve-fiber tuning trends obtained from independent measurements in other species. To estimate the ferret QERB trend from the SFOAE delays, we followed Joris et al. (28) and used a tuning ratio obtained by averaging those previously derived for cats, guinea pigs, and chinchillas—species whose tuning ratios are all similar (18). Fig. 2 shows the trend of auditory filter sharpness inferred from the emission delays (data points are shown in ). Finally, we compared the estimates from the two indirect measures with our previously published responses of single auditory-nerve fibers in anesthetized ferrets to short (50 ms) tone pips varying in frequency and sound level (27). The spike counts in response to these tones allowed us to map out the receptive field of each fiber (Fig. 1; ANF) (i.e., the range of stimulus conditions over which the nerve fibers responded). From the lowest (threshold; Fig. 1, ANF; gray line) sound level that produced a response at each frequency, we modeled the shape of the auditory filter in each nerve fiber by fitting a rounded-exponential function (Fig. 1; ANF; brown line, ref. 29), and derived its QERB, in the same manner as was done with the behavioral estimates. Fig. 2 shows that all three measures of QERB—those derived from auditory-nerve responses (ANF), OAEs, and PSY-F—are in good agreement. The agreement includes both the overall sharpness of tuning as well as its approximate power-law dependence on frequency. The agreement is especially remarkable given the very different natures of the three measures employed. To compare the measurements quantitatively, we fitted the data (log-transformed frequency and QERB) with a linear model. With respect to overall tuning sharpness, the agreement among the different measures is most apparent when the data are expressed relative to the mean auditory-nerve tuning at the same frequency (i.e., residuals of the linear model; Fig. 3). Although the mean OAE-based estimates of QERB are similar to those obtained directly from auditory-nerve tuning curves, their ratio is less than unity [QERB(OAE) = 0.82 × QERB(ANF); Fig. 3], and this difference is statistically significant (sandwich-test, P < 0.001; see ), in part due to the very large sample size of the OAE data (n ∼ 1,500). The difference in means implies that the tuning ratio in ferrets derived from these data are somewhat larger than the average of those previously obtained for cat, guinea pig, and chinchilla. For comparison, the variation among the tuning ratios for these three species is shown in figure 9B of ref. 16; the approximate “invariance” of the tuning ratio typically holds to within 5–15%, with the largest variations occurring in the apical regions of the cochlea. Consistent with findings in humans, psychophysical estimates of tuning using simultaneous masking (PSY-S) are significantly broader [QERB(PSY-S) = 0.72 × QERB(PSY-F); Fig. 3] than the tuning estimates derived from both auditory-nerve fiber responses and OAEs (sandwich test, P < 0.01; Cohen’s d ∼ 1). To adapt the behavioral experiments to animal use, we necessarily modified some procedures used in previous human experiments. To explore the possible effects of these modifications, we tested a new set of human listeners using methods (stimuli and task) directly comparable to those used in our ferret experiments, with forward masking and a fixed target level (). The estimated QERB at 4 kHz obtained using these ferret-based procedures with humans is similar to that found in earlier human studies (22) and is more than a factor of 2 sharper than the behavioral estimates from ferrets (P < 0.001; ).

Discussion

Disparate methods for measuring cochlea tuning were employed in a single animal model. Both psychophysical and otoacoustic methods provided reliable and quantitatively accurate estimates of cochlear frequency selectivity. These direct and indirect measures combined with new human behavioral data, collected using the same methods, provide strong support for the claim that frequency resolution is sharper in humans than in common laboratory mammals (summarized in ). We attribute the close correspondence in tuning measures in large part to the refined methods employed in this study and their application within a single species. However, some modest discrepancies remain that are important to address. Tuning estimates obtained here using simultaneous masking are broader than those from ANF and forward-masked methods, consistent with studies in humans (22) and macaque (28, 30). However, other published data suggest either a closer correspondence of simultaneous masking and auditory nerve tuning (31) or even little difference compared with humans (32). Our data also fail to reveal the expected difference in frequency selectivity depending on whether the signal or masker were varied to determine thresholds (19, 25, 26). These inconsistencies may point to species differences other than tuning bandwidth, such as differences in the nature and extent of cochlear nonlinearities or cognition (33). However, the sizes of any differences are not large in comparison with the variability of the data (for example, of individual nerve fibers or of individual animals). A comprehensive assessment in nonhuman mammals of the effects of iso-level (fixed-masker) vs. iso-response (fixed-signal) measurements, forward vs. simultaneous masking, and overall sound level, with larger numbers of measurements, is required to resolve these issues. The agreement of the three tuning measures provides compelling evidence that the limits of perceptual frequency resolution (as measured in our paradigm) are determined primarily in the cochlea, in contrast to previous suggestions (16). This conclusion therefore warrants a fresh evaluation of spectral decomposition in the central auditory system. In some cases, this agreement could obviate the need to postulate additional neural sharpening mechanisms, located between the cochlea and the cortex, to explain previously presumed discrepancies between sharp cortical tuning found in humans and the broad cochlear tuning found in laboratory animals (16) or from earlier estimates in humans using simultaneous masking (6). The tuning bandwidths estimated in human cortical neurons (∼1/12 octave) are in fact remarkably similar to the estimates of human cochlear tuning that we have validated here (∼1/13 octave, ref. 11), indicating that further central processing may not be necessary to account for narrow cortical tuning. Our results also provide data to inform a classical debate in auditory neuroscience on whether the auditory system extracts spectral information from sounds in the form of a rate-place code or a code based on spike timing information, or a combination of the two (34). Proposals involving timing codes have been partly motivated by the poor rate-place coding found in animal studies (13, 14). Indeed, ferret cochlear bandwidths are barely sufficient to resolve adjacent formants [e.g., in the 2- to 3-kHz region the second and third formants can be around 1/3 octave apart (35), close to the bandwidth of ferret auditory filters in this region]. According to the narrower human bandwidths validated here, however, rate-place coding schemes would have considerably more success at representing the formant peaks of human speech in the human auditory system than in other species. Although we have confirmed sharp human cochlear tuning using low-intensity sounds similar to those used to measure auditory-nerve tuning curves in other species, tuning is known to change with sound intensity, becoming broader at high intensities. Behavioral measures in humans have also revealed broader tuning at high sound intensities (36), in line with expectations. In addition, the saturation of firing rate in the auditory nerve at higher intensities also leads to effectively broader tuning and poorer resolution in the majority of ANFs at sound levels where human speech recognition remains robust (13). It is possible that tuning under more complex acoustic conditions is sharpened by central auditory processing, beyond what can be explained by firing rate in the auditory nerve, especially at high levels. Such sharpening might occur through mechanisms involving stimulus-driven spike timing, or phase locking, and lateral inhibition based on the rapid phase transitions produced by the basilar-membrane traveling wave (37). The extent to which putative sharpening mechanisms are required to explain behavioral performance at high sound intensities remains to be explored in light our understanding of human cochlear tuning at low intensities. It is tempting to relate sharp human cochlear tuning to our ability to perceive the subtleties of speech (particularly those involving prosody and pitch) in complex backgrounds, and thus our ability to solve the “cocktail party problem” (38). However, there is evidence for intermediate cochlear tuning in nonhuman primates (28), and one study reported cortical tuning in a nonhuman primate that approached that observed in humans (39). In addition, studies of otoacoustic emissions in another large mammal—the tiger—have also suggested that tuning may approach that found in humans (40). These findings imply that the physical size of the cochlea and its associated tonotopic map play a more important role than any human-specific evolution of cochlear tuning (41). Although sharp cochlear tuning may not be a sufficient condition for the emergence of speech as an effective communication mode (42), it may nevertheless have played an important and perhaps necessary role in its development. Given the complexity of this and the other issues discussed, the development of cochlear models that produce realistic sharp tuning and the nonlinear characteristics that impart dependence on stimulus paradigms, will provide an important step toward evaluating such claims and consolidating our understanding of frequency selectivity, the cochlea, and their relation to perception.

Experimental Methods

Full details of experimental methods are given in . Briefly, we trained ferrets to detect (43) or lateralize (44) brief tones or narrowband noise, in the presence of masking noise, in a positive reinforcement procedure. Using these behavioral methods in ferrets, we measured perceptual thresholds using different variants of notched-noise maskers (6, 22). We also made measurements using similar stimulus paradigms in humans. We also recorded, in lightly anesthetized ferrets, the otoacoustic emissions elicited by pure-tone stimuli, using the SFOAE method (45). Estimates of frequency selectivity derived from these data were compared with previous recordings from the auditory nerve of anesthetized ferrets (27). In the human studies, all participants provided written informed consent before participating, and all procedures were approved by the Institutional Review Board of the University of Minnesota. All procedures with ferrets were carried out under license from the UK Home Office, in accordance with the Animals (Scientific Procedures) Act 1986.

39 in total

1. Estimates of human cochlear tuning at low levels using forward and simultaneous masking.

Authors: Andrew J Oxenham; Christopher A Shera
Journal: J Assoc Res Otolaryngol Date: 2003-07-10

2. Obtaining reliable phase-gradient delays from otoacoustic emission data.

Authors: Christopher A Shera; Christopher Bergevin
Journal: J Acoust Soc Am Date: 2012-08 Impact factor: 1.840

3. A discontinuous tonotopic organization in the inferior colliculus of the rat.

Authors: Manuel S Malmierca; Marco A Izquierdo; Salvatore Cristaudo; Olga Hernández; David Pérez-González; Ellen Covey; Douglas L Oliver
Journal: J Neurosci Date: 2008-04-30 Impact factor: 6.167

4. The spiral staircase: tonotopic microstructure and cochlear tuning.

Authors: Christopher A Shera
Journal: J Neurosci Date: 2015-03-18 Impact factor: 6.167

Review 5. Cochlear Frequency Tuning and Otoacoustic Emissions.

Authors: Christopher A Shera; Karolina K Charaziak
Journal: Cold Spring Harb Perspect Med Date: 2019-02-01 Impact factor: 6.915

6. Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate.

Authors: M B Sachs; E D Young
Journal: J Acoust Soc Am Date: 1979-08 Impact factor: 1.840

7. Speech processing in the auditory system. I: The representation of speech sounds in the responses of the auditory nerve.

Authors: S A Shamma
Journal: J Acoust Soc Am Date: 1985-11 Impact factor: 1.840

Review 8. How We Hear: The Perception and Neural Coding of Sound.

Authors: Andrew J Oxenham
Journal: Annu Rev Psychol Date: 2017-10-16 Impact factor: 24.137

9. AP tuning curves from normal and pathological human and guinea pig cochleas.

Authors: R V Harrison; J M Aran; J P Erre
Journal: J Acoust Soc Am Date: 1981-05 Impact factor: 1.840

10. Relating approach-to-target and detection tasks in animal psychoacoustics.

Authors: Joseph Sollini; Ana Alves-Pinto; Christian J Sumner
Journal: Behav Neurosci Date: 2016-05-19 Impact factor: 1.912

26 in total

1. Cochlear partition anatomy and motion in humans differ from the classic view of mammals.

Authors: Stefan Raufer; John J Guinan; Hideko Heidi Nakajima
Journal: Proc Natl Acad Sci U S A Date: 2019-06-24 Impact factor: 11.205

2. Pitch discrimination with mixtures of three concurrent harmonic complexes.

Authors: Jackson E Graves; Andrew J Oxenham
Journal: J Acoust Soc Am Date: 2019-04 Impact factor: 1.840

3. The Perception of Multiple Simultaneous Pitches as a Function of Number of Spectral Channels and Spectral Spread in a Noise-Excited Envelope Vocoder.

Authors: Anahita H Mehta; Hao Lu; Andrew J Oxenham
Journal: J Assoc Res Otolaryngol Date: 2020-02-11

4. Effect of lowest harmonic rank on fundamental-frequency difference limens varies with fundamental frequency.

Authors: Anahita H Mehta; Andrew J Oxenham
Journal: J Acoust Soc Am Date: 2020-04 Impact factor: 1.840

5. Signal detection: applying analysis methods from psychology to animal behaviour.

Authors: Christian J Sumner; Seirian Sumner
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2020-05-18 Impact factor: 6.237

6. Neural fluctuation cues for simultaneous notched-noise masking and profile-analysis tasks: Insights from model midbrain responses.

Authors: Braden N Maxwell; Virginia M Richards; Laurel H Carney
Journal: J Acoust Soc Am Date: 2020-05 Impact factor: 1.840

7. Cochlear tuning and the peripheral representation of harmonic sounds in mammals.

Authors: William P Shofner
Journal: J Comp Physiol A Neuroethol Sens Neural Behav Physiol Date: 2022-07-22 Impact factor: 2.389

8. Relationship Between Behavioral and Stimulus Frequency Otoacoustic Emissions Delay-Based Tuning Estimates.

Authors: Uzma Shaheen Wilson; Jenna Browning-Kamins; Sriram Boothalingam; Arturo Moleti; Renata Sisto; Sumitrajit Dhar
Journal: J Speech Lang Hear Res Date: 2020-05-28 Impact factor: 2.297

9. Pitch of harmonic complex tones: rate and temporal coding of envelope repetition rate in inferior colliculus of unanesthetized rabbits.

Authors: Yaqing Su; Bertrand Delgutte
Journal: J Neurophysiol Date: 2019-10-30 Impact factor: 2.714

10. No Effect of Musical Training on Frequency Selectivity Estimated Using Three Methods.

Authors: Brian C J Moore; Jie Wan; Ajanth Varathanathan; Sophie Naddell; Thomas Baer
Journal: Trends Hear Date: 2019 Jan-Dec Impact factor: 3.293