| Literature DB >> 31234732 |
Jan Rennies1,2, Anna Warzybok3, Thomas Brand3, Birger Kollmeier2,3.
Abstract
For speech intelligibility in rooms, the temporal integration of speech reflections is typically modeled by separating the room impulse response (RIR) into an early (assumed beneficial for speech intelligibility) and a late part (assumed detrimental). This concept was challenged in this study by employing binaural RIRs with systematically varied interaural phase differences (IPDs) and amplitude of the direct sound and a variable number of reflections delayed by up to 200 ms. Speech recognition thresholds in stationary noise were measured in normal-hearing listeners for 86 conditions. The data showed that direct sound and one or several early speech reflections could be perfectly integrated when they had the same IPD. Early reflections with the same IPD as the noise (but not as the direct sound) could not be perfectly integrated with the direct sound. All conditions in which the dominant speech information was within the early RIR components could be well predicted by a binaural speech intelligibility model using classic early/late separation. In contrast, when amplitude or IPD favored late RIR components, listeners appeared to be capable of focusing on these components rather than on the precedent direct sound. This could not be modeled by an early/late separation window but required a temporal integration window that can be flexibly shifted along the RIR.Entities:
Keywords: binaural hearing; reflections; speech intelligibility; temporal integration
Mesh:
Year: 2019 PMID: 31234732 PMCID: PMC6593929 DOI: 10.1177/2331216519854267
Source DB: PubMed Journal: Trends Hear ISSN: 2331-2165 Impact factor: 3.293
Overview of Measurement Conditions.
| Experiment | D-IPD | N-IPD | No. of reflections | Reflection delay Δ | Reflection amp. α | R-IPD |
|---|---|---|---|---|---|---|
| I | 0 | 0 | 1 | 10, 50, 100, 150, 200 | 1 | 0 |
| II | 0 | 0 | 1 | 10, 25, 50, 75, 100, 150, 200 | 1 | π |
| III | π | 0 | 1 | 10, 25, 75, 150, 200 | 1 | 0 |
| IV | 0 | π | 1 | 10, 100, 200 | 1 | π |
| V | π | π | 1 | 10, 100, 200 | 1 | 0 |
| VI | 0 | 0 | 1 | 200 | 0, 0.25, 0.75, 1.0, 1.25, 2.0, 2.5 | 0 |
| VII | 0 | 0 | 1 | 200 | 0, 0.25, 0.5, 1.0, 1.75, 2.5 | π |
| VIII | 0 | 0 | 1 | 10 | 1 | 0 |
| 2 | 10, 25 | |||||
| 3 | 10, 25, 50 | |||||
| 5 | 10, 25, 50, 75, 100 | |||||
| 7 | 10, 25, 50, 75, 100, 125, 150 | |||||
| 9 | 10, 25, 50, 75, 100, 125, 150, 175, 200 | |||||
| IX | 0 | 0 | 1 | 10 | 1 | π |
| 3 | 10, 25, 50 | |||||
| 5 | 10, 25, 50, 75, 100 | |||||
| 7 | 10, 25, 50, 75, 100, 125, 150 | |||||
| 9 | 10, 25, 50, 75, 100, 125, 150, 175, 200 | |||||
| X | Same as IX, but with | |||||
| XI | Same as IX, but with | |||||
| XII | 0 | π | 3 | 10, 25, 50 | 1 | 0 |
| 5 | 10, 25, 50, 75, 100 | |||||
| 9 | 10, 25, 50, 75, 100, 125, 150, 175, 200 | |||||
| XIII | 0 | 0 | 3 | 150, 175, 200 | 1 | 0 |
| 5 | 100, 125, 150, 175, 200 | |||||
| 9 | 10, 25, 50, 75, 100, | |||||
| 125, 150, 175, 200 | ||||||
| XIV | 0 | 0 | 1 | 200 | 1 | π |
| 2 | 175, 200 | |||||
| 3 | 150, 175, 200 | |||||
| 5 | 100, 125, 150, 175, 200 | |||||
| 7 | 50, 75, 100, 125, 150, 175, 200 | |||||
| 9 | 10, 25, 50, 75, 100, 125, 150, 175, 200 | |||||
| XV | π | 0 | 1 | 200 | 1 | 0 |
| 3 | 150, 175, 200 | |||||
| 5 | 100, 125, 150, 175, 200 | |||||
| 7 | 50, 75, 100, 125, 150, 175, 200 | |||||
| 9 | 10, 25, 50, 75, 100, 125, 150, 175, 200 | |||||
| XVI | Same as XV, but with | |||||
| XVII–XVIII | 0 | 0 | 9 | See Supplementary Material | ||
| XIX–XX | π | 0 | 9 | See Supplementary Material |
Note. The second and third columns indicate the IPD of the direct sound (D) and noise (N), respectively, while the remaining columns indicate the properties of the reflection(s). IPD = interaural phase difference.
Exemplary Illustration of the Overall Level Increase With Increasing Reflection Amplification Factor α for the Case of a Single Reflection Delayed by 200 ms Relative to the Level of the Direct Sound Only (Top Part), and of the Overall Level Increase When Successively Adding Reflections With α = 1 (Bottom Part).
| α | 0.00 | 0.25 | 0.50 | 0.75 | 1.00 | 1.25 | 1.50 | 1.75 | 2.00 | 2.50 |
| Δ | 0.0 | −0.1 | 0.7 | 1.6 | 2.8 | 3.8 | 4.8 | 5.8 | 6.7 | 8.3 |
| No. of reflections | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| Δ | 0.0 | 2.8 | 4.9 | 6.4 | 7.4 | 8.3 | 8.9 | 9.4 | 10.0 | 10.4 |
Note. These level increases are equivalent to the attenuation required to restore the same overall level as for stimuli consisting only of direct sound.
Figure 1.Left panel: illustration of the window shape to separate early and late part of the BRIRs by assuming that the BRIR components are either fully useful (up to t − 0.5 × DD), fully detrimental (after t + 0.5 × DD), or partially useful (during DD). Right panel: illustration of the window shape to separate useful and detrimental part of the BRIRs by assuming that a flexible window can be shifted flexibly along the BRIR. Different line styles illustrate different RDs.
Figure 2.Mean SRTs across listeners (and standard errors) of Experiments I to V, which comprised a single reflection with varying delay. Denotes conditions with direct sound only. Dashed lines show predictions of BSIM-EL (gray) and BSIM-UD (black). Gray solid lines with circles represent data replotted from Experiment I. SRT = speech recognition threshold; BSIM = binaural speech intelligibility model; SNR = signal-to-noise ratio.
Figure 3.Mean SRTs across listeners (and standard errors) of Experiments VI (black circles) and VII (black squares), which comprised a single reflection delayed by 200 ms and varying reflection amplification factor α. Dashed lines show predictions of BSIM-EL (gray) and BSIM-UD (black). Note that predictions of BSIM-EL are almost identical for both experiments. “inf.” indicates the condition consisting of the reflection only (data copied from α = 0 for Experiment VI, and taken from the direct sound-only condition of Experiment III for Experiment VII). SRT = speech recognition threshold; BSIM = binaural speech intelligibility model; SNR = signal-to-noise ratio.
Figure 4.Mean SRTs across listeners (and standard errors) of Experiments VIII, IX, X, XI (black circles), and XII (black squares), in which reflections were successively added starting from the shortest delay. The number of reflections is given at the abscissas. For comparison, data of Experiment VIII are replotted in each panel in gray (solid line). Dashed lines show predictions of BSIM-EL (gray) and BSIM-UD (black). Gray solid lines with circles represent data replotted from Experiment VIII. SRT = speech recognition threshold; BSIM = binaural speech intelligibility model; SNR = signal-to-noise ratio.
Figure 5.Mean SRTs across listeners (and standard errors) of Experiments XIII to XVI (black) in which reflections were successively added starting from the longest delay. Dashed lines show predictions of BSIM-EL (gray) and BSIM-UD (black). Gray solid lines with circles represent data replotted from Experiment XIII. SRT = speech recognition threshold; BSIM = binaural speech intelligibility model; SNR = signal-to-noise ratio.
Figure 6.Scatter plot of measured versus predicted SRTs for BSIM-EL (t = 100 ms, DD = 200 ms, left panel) and BSIM-UD (RD = 100 ms, right panel) for all 20 experiments. The dashed and dotted lines represent perfect agreement and deviations of ±2 dB, respectively. SRT = speech recognition threshold; BSIM = binaural speech intelligibility model; SNR = signal-to-noise ratio; RMSE = root-mean-square error.
Figure 7.Left panels: Contour plots of the mean absolute prediction error εmean of BSIM-EL for different combinations of early/late limit (t, ordinates) and DD (abscissae) for Part A (top), B (middle), and both parts combined. Values at the contours indicate the magnitude of εmean in dB. The asterisk indicates that not all experiments were included in this analysis (see text). Right panels: different prediction error measures (in dB) for BSIM-UD as a function of the RD. Gray horizontal lines indicate optimal performance (solid) and deviations by ±1 dB (dotted). BSIM = binaural speech intelligibility model; RMSE = root-mean-square error.