| Literature DB >> 35702051 |
Jan Heeren1,2, Theresa Nuesse2,3, Matthias Latzel4, Inga Holube2,3, Volker Hohmann1,2,5, Kirsten C Wagener1,2, Michael Schulte1,2.
Abstract
A multi-talker paradigm is introduced that uses different attentional processes to adjust speech-recognition scores with the goal of conducting measurements at high signal-to-noise ratios (SNR). The basic idea is to simulate a group conversation with three talkers. Talkers alternately speak sentences of the German matrix test OLSA. Each time a sentence begins with the name "Kerstin" (call sign), the participant is addressed and instructed to repeat the last words of all sentences from that talker, until another talker begins a sentence with "Kerstin". The alternation of the talkers is implemented with an adjustable overlap time that causes an overlap between the call sign "Kerstin" and the target words to be repeated. Thus, the two tasks of detecting "Kerstin" and repeating target words are to be done at the same time. The paradigm was tested with 22 young normal-hearing participants (YNH) for three overlap times (0.6 s, 0.8 s, 1.0 s). Results for these overlap times show significant differences, with median target word recognition scores of 88%, 82%, and 77%, respectively (including call-sign and dual-task effects). A comparison of the dual task with the corresponding single tasks suggests that the observed effects reflect an increased cognitive load.Entities:
Keywords: OLSA; attention; cocktail party; concurrent speech; speech recognition; speech-in-speech
Mesh:
Year: 2022 PMID: 35702051 PMCID: PMC9208053 DOI: 10.1177/23312165221108257
Source DB: PubMed Journal: Trends Hear ISSN: 2331-2165 Impact factor: 3.496
Figure 1.Sketch of the loudspeaker setup, with a participant seated in the center, and showing the positions of the talkers.
Figure 2.Sample sequence of sentences; sentences were timed with a defined temporal overlap and were presented randomly from the three talkers; some sentences started with the name “Kerstin”, which was defined as the call sign.
Figure 3.Attention pattern in the dual task for a sample sequence of sentences. Participants had to repeat the last words of all sentences for the last talker who spoke a sentence starting with the name “Kerstin” (call sign); target words and call signs occurred at the same time due to a defined overlap time between sentences across talkers.
Figure 4.Boxplot of the call-sign detection scores and target-word recognition scores measured for the CCOLSA dual task and the corresponding single tasks for three overlap times. In the net values of the dual task target-word recognition, only those target words with correct call-sign detection were included. Statistical significance is indicated by asterisks (Wilcoxon test; *p < .05, **p < .01, ***p < .001).
Results of the Analysis of Performance Differences between Overlaps with the Friedman Test and Post-hoc Comparisons Using Wilcoxon Test; Bold Font Indicates Statistically Significant Values; p-values are Bonferroni-corrected to Account for Multiple Testing.
| Friedman test | Detection of call sign | Recognition of target words | ||||
|---|---|---|---|---|---|---|
| Single task | Dual Task | Single task | Dual task, net | Dual task, incl. missed-CS effects | ||
| χ2 |
| 3.38 | 8.02 |
|
| |
|
|
| .553 | .054 |
|
| |
| Paired comparison between Overlaps / s | Wilcoxon test | |||||
| 0.8 vs. 0.6 |
|
|
|
| ||
|
|
|
|
| |||
|
|
|
|
| |||
| 1.0 vs. 0.6 |
|
|
|
| ||
|
|
|
|
| |||
|
|
|
|
| |||
| 1.0 vs. 0.8 |
| 28.0 | 73.5 | 61.0 | ||
|
| 0.198 | 0.719 | 0.299 | |||
|
| 0.392 | 0.251 | 0.351 | |||
Figure 5.Influence of head-tracker data on the compensation of target-word recognition scores for call-sign detection errors; boxes 1–3 are identical with boxes 10–12 in Figure 4; statistical significance is indicated by asterisks (Wilcoxon test; *p < .05, **p < .01).