| Literature DB >> 25076919 |
Shannon L M Heald1, Howard C Nusbaum1.
Abstract
A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.Entities:
Keywords: audio-visual speech perception; multisensory integration; speech perception; talker normalization; talker variability
Year: 2014 PMID: 25076919 PMCID: PMC4100456 DOI: 10.3389/fpsyg.2014.00698
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Summary of results from the split plot ANOVA [Talker Variability (Single-Talker vs. Multiple-Talkers) × Modality of Presentation (Audio-only vs. Audio-visual), with Talker Variability as a within-subject factor and Modality of Presentation as a between-subject factor] for the dependent measure of false alarm rates.
| Source | Estimated means (standard error) | ||
|---|---|---|---|
| Talker variability | 0.409 | 0.526 | 0.010 (0.001) single-talker |
| 0.009 (0.001) multiple-talkers | |||
| Talker Variability × Modality of Presentation | 2.670 | 0.110 | 0.009 (0.002) audio only single-talker |
| 0.010 (0.002) audio only multiple-talkers | |||
| 0.011 (0.002) audio-visual single-talker | |||
| 0.008 (0.002) audio-visual multiple-talkers | |||
| Modality of presentation | 0.011 | 0.918 | 0.010 (0.002) audio-only |
| 0.010 (0.002) audio-visual |
Summary of results from the split plot ANOVA [Talker Variability (Single-Talker vs. Multiple-Talkers) × Modality of Presentation (Audio-only vs. Audio-visual), with Talker Variability as a within-subject factor and Modality of Presentation as a between-subject factor] for the dependent measure of hit rates.
| Source | Estimated means (standard error) | ||
|---|---|---|---|
| Talker variability | 0.199 | 0.658 | 0.964 (0.006) single-talker |
| 0.962 (0.005) multiple-talkers | |||
| Talker Variability × Modality of Presentation | 0.797 | 0.377 | 0.955 (0.008) audio only single talker |
| 0.957 (0.007) audio only multiple-talkers | |||
| 0.973 (0.008) audio-visual single-talker | |||
| 0.967 (0.007) audio-visual multiple-talkers | |||
| Modality of presentation | 1.897 | 0.176 | 0.956 (0.007) audio-only |
| 0.970 (0.007) audio-visual |
Summary of results from the split plot ANOVA [Talker Variability (Single-Talker vs. Multiple-Talkers) × Modality of Presentation (Audio-only vs. Audio-visual), with Talker Variability as a within-subject factor and Modality of Presentation as a between-subject factor] for the dependent measure of d-primes.
| Source | Estimated means (standard error) | ||
|---|---|---|---|
| Talker variability | 0.505 | 0.481 | 0 4.351 (0.101) single-talker |
| 4.289 (0.089) multiple-talker | |||
| Talker Variability × Modality of Presentation | 0.000 | 0.988 | 4.282 (0.143) audio only single-talker |
| 4.221 (0.125) audio only multiple-talkers | |||
| 4.420 (0.143) audio-visual single-talker | |||
| 4.357 (0.125) audio-visual multiple-talkers | |||
| Modality of presentation | 0.653 | 0.423 | 4.252 (0.120) audio-only |
| 4.389 (0.120) audio-visual |