Literature DB >> 30075658

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

Soo Jin Park1, Gary Yeung1, Neda Vesselinova2, Jody Kreiman2, Patricia A Keating3, Abeer Alwan1.   

Abstract

Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.

Entities:  

Mesh:

Year:  2018        PMID: 30075658      PMCID: PMC6062421          DOI: 10.1121/1.5045323

Source DB:  PubMed          Journal:  J Acoust Soc Am        ISSN: 0001-4966            Impact factor:   1.840


  13 in total

1.  What's new, pussycat? On talking to babies and animals.

Authors:  Denis Burnham; Christine Kitamura; Ute Vollmer-Conna
Journal:  Science       Date:  2002-05-24       Impact factor: 47.728

2.  Age, sex, and vowel dependencies of acoustic measures related to the voice source.

Authors:  Markus Iseli; Yen-Liang Shue; Abeer Alwan
Journal:  J Acoust Soc Am       Date:  2007-04       Impact factor: 1.840

3.  Perceptual scaling of voice identity: common dimensions for different vowels and speakers.

Authors:  Oliver Baumann; Pascal Belin
Journal:  Psychol Res       Date:  2008-11-26

4.  The perceptual structure of pathologic voice quality.

Authors:  J Kreiman; B R Gerratt
Journal:  J Acoust Soc Am       Date:  1996-09       Impact factor: 1.840

5.  Glottal characteristics of female speakers: acoustic correlates.

Authors:  H M Hanson
Journal:  J Acoust Soc Am       Date:  1997-01       Impact factor: 1.840

6.  Exploring different attributes of source information for speaker verification with limited test data.

Authors:  Rohan Kumar Das; S R Mahadeva Prasanna
Journal:  J Acoust Soc Am       Date:  2016-07       Impact factor: 1.840

7.  Voice discrimination and recognition are separate abilities.

Authors:  D Van Lancker; J Kreiman
Journal:  Neuropsychologia       Date:  1987       Impact factor: 3.139

8.  Acoustic correlates of breathy vocal quality.

Authors:  J Hillenbrand; R A Cleveland; R L Erickson
Journal:  J Speech Hear Res       Date:  1994-08

9.  Speaker perception.

Authors:  Stefan R Schweinberger; Hideki Kawahara; Adrian P Simpson; Verena G Skuk; Romi Zäske
Journal:  Wiley Interdiscip Rev Cogn Sci       Date:  2013-10-24

10.  Toward a unified theory of voice production and perception.

Authors:  Jody Kreiman; Bruce R Gerratt; Marc Garellek; Robin Samlan; Zhaoyan Zhang
Journal:  Loquens       Date:  2014-01
View more
  2 in total

1.  Speaker discrimination performance for "easy" versus "hard" voices in style-matched and -mismatched speech.

Authors:  Amber Afshan; Jody Kreiman; Abeer Alwan
Journal:  J Acoust Soc Am       Date:  2022-02       Impact factor: 1.840

2.  Short-time speaker verification with different speaking style utterances.

Authors:  Hongwei Mao; Yan Shi; Yue Liu; Linqiang Wei; Yijie Li; Yanhua Long
Journal:  PLoS One       Date:  2020-11-11       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.