| Literature DB >> 18681609 |
Sandeep A Phatak1, Andrew Lovitt, Jont B Allen.
Abstract
The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set [/m/, /n/], average-error set [/p/, /t/, /k/, /s/, /[please see text]/, /d/, /g/, /z/, /Z/], and high-error set /f/theta/b/, /v/, /E/,/theta/]. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.Entities:
Mesh:
Year: 2008 PMID: 18681609 DOI: 10.1121/1.2913251
Source DB: PubMed Journal: J Acoust Soc Am ISSN: 0001-4966 Impact factor: 1.840