| Literature DB >> 27757218 |
Abstract
Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel performance would improve relative to whispered vowel performance as pitch information becomes available. This pattern of results was shown for women's but not for men's voices. A whispered vowel needs to have a duration three times longer than a voiced vowel before listeners can reliably tell whether it's spoken by a man or woman (∼30 ms vs. ∼10 ms). Listeners were half as sensitive to information about speaker-sex when it is carried by whispered compared with voiced vowels.Entities:
Keywords: Speaker-sex discrimination; duration; pitch; speech; vocal-tract length; voiced; whispered
Year: 2016 PMID: 27757218 PMCID: PMC5051627 DOI: 10.1177/2041669516671320
Source DB: PubMed Journal: Iperception ISSN: 2041-6695
Figure 1.Hypothetical speaker-sex discrimination performance as a function of duration for voiced (solid line) and whispered (dashed line) speech. The general form of the psychometric functions is P(t) = γ + (1 – γ – λ)F(t), where P(t) is the probability of correct discrimination of speaker sex at stimulus duration t, with guess rate γ (which in an mAFC task is 1/m, or ½ in our 2AFC task) which sets the lower asymptote representing chance performance, and with lapse rate λ which sets the upper asymptote representing ceiling performance. The function F is for convenience taken to be the logistic function [1 + exp(−x)]−1, which takes values between 0 and 1 for values of t, −∞ < t < ∞ (see Treutwein & Strasburger, 1999). The bracketed region “formants” indicates durations where VTL-related information (the formants of speech) are the main cue to speaker sex discrimination, the region “f0” indicates durations where GPR-related information (voice pitch, as determined by f0) is the main cue for discriminating speaker sex, and the region “formants and f0” indicates durations where both formants and f 0 could contribute to speaker-sex discrimination. Proportion correct values on the y axis are for illustrative purposes only and xaxis durations are purposively left blank.
Figure 2.Proportion correct judgment of original speaker sex for voiced (filled circles) and whispered (open circles) vowels as a function of vowel duration. The large circles indicate the main experiment data. The small circles indicate the supplementary experiment data. The solid (fitted to main experiment data) and dashed (fitted to supplementary experiment data) curves are best-fitting psychometric functions using non-parametric local linear regression fitting (˙ychaluk & Foster, 2009). Data collapsed across correct judgments of both men and women speakers and across all five vowels (Figure 2(a), top). Data plotted separately for men speakers (Figure 2(b), middle) and women speakers (Figure 2(c), bottom). For the main experiment (Figure 2(a)), each point shown for each duration is based on 600 trials [(15 Men + 15 Women Speaker Repetitions) × 20 Listeners]. When plotted separately for the main experiment (Figure 2(b) and (c)), each datum point is based on 300 trials (15 Speaker Repetitions × 20 Listeners). The supplementary experiment data points are based on 210 trials [(15 Men + 15 Women Speaker Repetitions) × 7 Listeners] for Figure 2(a), and 105 trials (15 Speaker Repetitions × 7 Listeners) for Figure 2(b) and (c). Error bars are standard error of the mean across 20 listeners (main experiment) or 7 listeners (supplementary experiment).
Mean Threshold (ms) and Slope Estimates Derived From the Best-Fitting Psychometric Functions for the Main Experiment.
Note. SD based on 200 bootstrap replicates, with 99% confidence intervals.
p < .01.