Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

Literature DB >> 32873043

Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

John H L Hansen¹, Marigona Bokshi¹, Soheil Khorram¹.

Abstract

Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To better understand changes in speech production of the untrained singing voice, this study presents the first cross-language comparison between normal speaking and untrained karaoke singing of the same text content. Previous studies comparing professional singing versus speaking have shown deviations in both prosodic and spectral features. Some investigations also considered assigning the intrinsic activity of the singing. Motivated by these studies, a series of experiments to investigate both prosodic and spectral variations of untrained karaoke singers for three languages, American English, Hindi, and Farsi, are considered. A comprehensive comparison on common prosodic features, including phoneme duration, mean fundamental frequency (F0), and formant center frequencies of vowels was performed. Collective changes in the corresponding overall acoustic spaces based on the Kullback-Leibler distance using Gaussian probability distribution models trained on spectral features were analyzed. Finally, these models were used in a Gausian mixture model with universal background model SID evaluation to quantify speaker changes between speaking and singing when the audio text content is the same. The experiments showed that many acoustic characteristics of untrained singing are considerably different from speaking when the text content is the same. It is suggested that these results would help advance automatic speech production normalization/compensation to improve performance of speech processing applications (e.g., speaker ID, speech recognition, and language ID).

Entities: Disease

Mesh：

Year: 2020 PMID： 32873043 PMCID： PMC7438159 DOI： 10.1121/10.0001526

Source DB: PubMed Journal: J Acoust Soc Am ISSN： 0001-4966 Impact factor: 1.840

Keyword Cloud
References

18 in total

1. Formant frequencies in country singers' speech and singing.

Authors: R E Stone; T F Cleveland; J Sundberg
Journal: J Voice Date: 1999-06 Impact factor: 2.009

2. Perceptual and acoustic study of professionally trained versus untrained voices.

Authors: W S Brown; H B Rothman; C M Sapienza
Journal: J Voice Date: 2000-09 Impact factor: 2.009

3. Comparison of singer's formant, speaker's ring, and LTA spectrum among classical singers and untrained normal speakers.

Authors: V M Oliveira Barrichelo; R J Heuer; C M Dean; R T Sataloff
Journal: J Voice Date: 2001-09 Impact factor: 2.009

4. Acoustic hole filling for sparse enrollment data using a cohort universal corpus for speaker recognition.

Authors: Jun-Won Suh; John H L Hansen
Journal: J Acoust Soc Am Date: 2012-02 Impact factor: 1.840

5. The singing power ratio as an objective measure of singing voice quality in untrained talented and nontalented singers.

Authors: Christopher Watts; Kathryn Barnes-Burroughs; Julie Estis; Debra Blanton
Journal: J Voice Date: 2006-03 Impact factor: 2.009

Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

1. Formant frequencies in country singers' speech and singing.

2. Perceptual and acoustic study of professionally trained versus untrained voices.

3. Comparison of singer's formant, speaker's ring, and LTA spectrum among classical singers and untrained normal speakers.

4. Acoustic hole filling for sparse enrollment data using a cohort universal corpus for speaker recognition.

5. The singing power ratio as an objective measure of singing voice quality in untrained talented and nontalented singers.

6. The sound level of the singer's formant in professional singing.

7. Articulatory interpretation of the "singing formant".

8. The acoustics of the singing voice.

9. Spectral analysis of sung vowels. I. Variation due to differences between vowels, singers, and modes of singing.

10. Perceptions of Voice Teachers Regarding Students' Vocal Behaviors During Singing and Speaking.