| Literature DB >> 24312021 |
Stephen M Town1, Jennifer K Bizley.
Abstract
Timbre is the attribute that distinguishes sounds of equal pitch, loudness and duration. It contributes to our perception and discrimination of different vowels and consonants in speech, instruments in music and environmental sounds. Here we begin by reviewing human timbre perception and the spectral and temporal acoustic features that give rise to timbre in speech, musical and environmental sounds. We also consider the perception of timbre by animals, both in the case of human vowels and non-human vocalizations. We then explore the neural representation of timbre, first within the peripheral auditory system and later at the level of the auditory cortex. We examine the neural networks that are implicated in timbre perception and the computations that may be performed in auditory cortex to enable listeners to extract information about timbre. We consider whether single neurons in auditory cortex are capable of representing spectral timbre independently of changes in other perceptual attributes and the mechanisms that may shape neural sensitivity to timbre. Finally, we conclude by outlining some of the questions that remain about the role of neural mechanisms in behavior and consider some potentially fruitful avenues for future research.Entities:
Keywords: auditory cortex; ferret; neural coding; speech; vowels
Year: 2013 PMID: 24312021 PMCID: PMC3826062 DOI: 10.3389/fnsys.2013.00088
Source DB: PubMed Journal: Front Syst Neurosci ISSN: 1662-5137
Figure 1(A) Amplitude waveforms (top) and spectrograms (bottom) for a female voice speaking “a” (as in “hard”) (B) an artificial “a” and (C) a male ferret making a series of “dook” calls. Such calls have a harmonic structure. (D–F) Amplitude waveform and spectrograms for (D) Piano, (E) Accordion and (F) Oboe, playing the same note. Note that although all three have the same fundamental frequency (and therefore pitch) the relative distribution of energy across the harmonics differs, enabling each to have a characteristic timbre. Also important is the shape of the temporal envelope—each has a different onset dynamic and the characteristic vibrato of the accordion is clearly evident in the amplitude waveform.
Figure 2(A) The position of the F1 and F2 peaks in the spectral envelope of a synthetic vowel /u/. (B) Formant space indicating the position of vowels used to measure the relative contributions of F1 and F2 in vowel identification. Filled circles indicate vowels with which subjects were trained in a 2AFC task. Open circles indicate mismatch vowels presented as probe trials. (C) Responses of one ferret to training (filled bars) and probe vowels (unfilled bars). (D) Responses of one human to training (filled bars) and probe vowels (unfilled bars).
Figure 3Estimation of neural responses to vowels based on SRF. The power spectrum of a vowel is multiplied by the SRF of a neuron to produce an estimated response spectrum. The area under the spectrum is taken as the response energy; a measure of the neurons response magnitude. (A) When the vowel spectrum (black) and SRF (red) overlap (i), the neuron’s response energy is predicted to be large (ii). In contrast, if the vowel spectrum and SRF are separated (iii), the neuron’s response is predicted to be small (iv). (B) Left: SRF (red) recorded from a multi-unit cluster within auditory cortex of an anesthetized ferret and the spectrum of the vowel /u/. Right: Estimated response energy of unit to /u/. (C) Comparison of the estimated (grey) and observed (black) responses of the unit in (B) to a series of vowels. Firing rate and response energy are normalized for comparison. Note that the pattern of vowel discrimination by firing rate differs from the pattern estimated from response energy. Observed responses were measured as the mean firing rate across 20 presentations of each vowel.