| Literature DB >> 12558287 |
Powen Ru1, Taishih Chi, Shihab Shamma.
Abstract
Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few "articulatory" parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a "multiscale" model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed.Mesh:
Year: 2003 PMID: 12558287 DOI: 10.1121/1.1525288
Source DB: PubMed Journal: J Acoust Soc Am ISSN: 0001-4966 Impact factor: 1.840