Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

Literature DB >> 30075658

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

Soo Jin Park¹, Gary Yeung¹, Neda Vesselinova², Jody Kreiman², Patricia A Keating³, Abeer Alwan¹.

Abstract

Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.

Entities: Disease Species

Mesh：

Year: 2018 PMID： 30075658 PMCID： PMC6062421 DOI： 10.1121/1.5045323

Source DB: PubMed Journal: J Acoust Soc Am ISSN： 0001-4966 Impact factor: 1.840

13 in total

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

1. What's new, pussycat? On talking to babies and animals.

2. Age, sex, and vowel dependencies of acoustic measures related to the voice source.

3. Perceptual scaling of voice identity: common dimensions for different vowels and speakers.

4. The perceptual structure of pathologic voice quality.

5. Glottal characteristics of female speakers: acoustic correlates.

6. Exploring different attributes of source information for speaker verification with limited test data.

7. Voice discrimination and recognition are separate abilities.

8. Acoustic correlates of breathy vocal quality.

9. Speaker perception.

10. Toward a unified theory of voice production and perception.

1. Speaker discrimination performance for "easy" versus "hard" voices in style-matched and -mismatched speech.

2. Short-time speaker verification with different speaking style utterances.