Julian Angermeier1,2, Werner Hemmert2, Stefan Zirn1. 1. Peter Osypka Institute of Medical Engineering, Faculty of Electrical Engineering, Medical Engineering and Computer Sciences, 64369University of Applied Sciences Offenburg. 2. Bio-Inspired Information Processing, Munich Institute of Biomedical Engineering, 9184Technical University of Munich.
Abstract
In asymmetric treatment of hearing loss, processing latencies of the modalities typically differ. This often alters the reference interaural time difference (ITD) (i.e., the ITD at 0° azimuth) by several milliseconds. Such changes in reference ITD have shown to influence sound source localization in bimodal listeners provided with a hearing aid (HA) in one and a cochlear implant (CI) in the contralateral ear. In this study, the effect of changes in reference ITD on speech understanding, especially spatial release from masking (SRM) in normal-hearing subjects was explored. Speech reception thresholds (SRT) were measured in ten normal-hearing subjects for reference ITDs of 0, 1.75, 3.5, 5.25 and 7 ms with spatially collocated (S0N0) and spatially separated (S0N90) sound sources. Further, the cues for separation of target and masker were manipulated to measure the effect of a reference ITD on unmasking by A) ITDs and interaural level differences (ILDs), B) ITDs only and C) ILDs only. A blind equalization-cancellation (EC) model was applied to simulate all measured conditions. SRM decreased significantly in conditions A) and B) when the reference ITD was increased: In condition A) from 8.8 dB SNR on average at 0 ms reference ITD to 4.6 dB at 7 ms, in condition B) from 5.5 dB to 1.1 dB. In condition C) no significant effect was found. These results were accurately predicted by the applied EC-model. The outcomes show that interaural processing latency differences should be considered in asymmetric treatment of hearing loss.
In asymmetric treatment of hearing loss, processing latencies of the modalities typically differ. This often alters the reference interaural time difference (ITD) (i.e., the ITD at 0° azimuth) by several milliseconds. Such changes in reference ITD have shown to influence sound source localization in bimodal listeners provided with a hearing aid (HA) in one and a cochlear implant (CI) in the contralateral ear. In this study, the effect of changes in reference ITD on speech understanding, especially spatial release from masking (SRM) in normal-hearing subjects was explored. Speech reception thresholds (SRT) were measured in ten normal-hearing subjects for reference ITDs of 0, 1.75, 3.5, 5.25 and 7 ms with spatially collocated (S0N0) and spatially separated (S0N90) sound sources. Further, the cues for separation of target and masker were manipulated to measure the effect of a reference ITD on unmasking by A) ITDs and interaural level differences (ILDs), B) ITDs only and C) ILDs only. A blind equalization-cancellation (EC) model was applied to simulate all measured conditions. SRM decreased significantly in conditions A) and B) when the reference ITD was increased: In condition A) from 8.8 dB SNR on average at 0 ms reference ITD to 4.6 dB at 7 ms, in condition B) from 5.5 dB to 1.1 dB. In condition C) no significant effect was found. These results were accurately predicted by the applied EC-model. The outcomes show that interaural processing latency differences should be considered in asymmetric treatment of hearing loss.
Entities:
Keywords:
auditory model; binaural hearing; spatial release from masking; speech in noise
Both sound localization and spatial release from masking (SRM) rely on precise timing of
the signals reaching both ears and on symmetrical frequency-place mapping. However,
listeners with hearing loss who use a hearing device only in one ear or who are provided
with different devices in both ears can suffer from vast asymmetric hearing perceptions. In
this study, we focus mainly on temporal asymmetries. For example, hearing instruments like
hearing aids (HAs) and cochlear implants (CIs) can have large processing delays in the order
of several milliseconds, which depend on their exact implementation. Timing differs due to
processing delays but also due to physiologic latencies of acoustic hearing introduced by
the ear canal, middle ear and mainly the cochlear delays in the inner ear (Ruggero & Temchin, 2007), which
are not present when the auditory nerve is directly excited by electric current in the case
of CI stimulation. The impact of this temporal alteration on subjective disturbance or sound
quality has been studied extensively for provision with HAs (Agnew & Thornton, 2000; Bramsløw, 2010; Groth & Søndergaard, 2004; Stone & Moore, 2003; Stone, Moore, Meisenbacher, & Derleth, 2008).
However, most of the studies were looking at maximal preferable delays and did not take the
possibility of asymmetric timing between both ears into account. In treatment of asymmetric
hearing loss, the ears often must be provided with different technical devices. Examples for
this are unilateral hearing loss which requires provision with a HA on only one ear,
treatment of single-sided deafness with a CI, or bimodal provision with a CI in one ear and
a contralateral HA. Besides interaural differences in signal processing and sound
information (e.g. fine structure and spectral information), a considerable interaural
difference typically is a static temporal mismatch of the ear signals. For bimodal listeners
provided with a MED-EL CI and a contralateral HA, an average interaural temporal difference
of 7 ms in the neural representation of incoming signals at the level of the brainstem was
reported (Zirn, Arndt, Aschendorff,
& Wesarg, 2015). Furthermore, interaural differences in latency in the
millisecond range have recently been found in some hearables (Denk, Schepker, Doclo, & Kollmeier, 2020). This
interaural temporal mismatch severely impairs the ability of bimodal listeners to localize
sounds in the frontal horizontal hemisphere. Equalizing the device delay mismatch by
delaying the CI stimulation according to the measured HA processing delay resulted in highly
significant improvements in the root-mean-square error and in the bias of localization
judgements (Angermeier, Hemmert, &
Zirn, 2021; Zirn, Angermeier,
Arndt, Aschendorff, & Wesarg, 2019). The static interaural temporal mismatch
will further be referred to as reference interaural time difference (ITD), i.e., the ITD at
0° azimuth, which is close to 0 µs when no temporal interaural asymmetry is present. What
has not yet been systematically studied is the influence of large reference ITDs on speech
understanding in noise, especially SRM. For SRM, processing of ITDs and interaural level
differences (ILDs) contribute to unmasking when speech and noise are spatially separated
(Lavandier & Best,
2020).In listeners provided unilaterally with a HA or two different HAs with different processing
latencies, ITDs are conveyed by the HAs and can be used for SRM alongside with ILDs. It has
to be noted that these ITDs can be distorted in case of an open fitting, due to direct sound
through the open earpiece overlapping with the delayed amplified signal (Denk, Ewert, & Kollmeier, 2019).
On the other hand, in HA users where the residual hearing in the lower frequencies is
sufficient and no amplification is needed by the HA, processing latencies will not affect
ITD processing. Bimodal listeners, however have very limited access to ITDs (Francart & McDermott, 2013;
Veugen, Chalupper, Snik, Opstal,
& Mens, 2016). Therefore, bimodal users rely mostly on ILDs for SRM. Dieudonné
& Francart (2020) have
reported that SRM in bimodal listeners is mostly driven by monaural head shadow effects,
i.e., one ear having access to a better signal-to-noise ratio when noise and speech are
spatially separated. This effect should theoretically not be affected by an additional
reference ITD.Differences in the accessibility of cues in the presence of a reference ITD makes it
interesting to examine the cues facilitating SRM separately. This study aimed to investigate
if and how different reference ITDs affect SRM. To get a more complete picture of how the
different binaural mechanisms underlying this process are affected by these asymmetries,
tests were conducted with cues presented separately: A) ITDs and ILDs, B) ITDs only or C)
ILDs only. SRM without a reference ITD but with respect to the available cues has been
measured in prior studies. Bronkhorst
& Plomp (1988) showed a substantial SRM for speech and laterally displaced
noise of 7.8 dB when only ILDs were available, a 5 dB SRM difference when ITDs were
available and 10.1 dB when both cues could be utilized together. Further studies
investigated the role of different cues exploited for SRM with symmetrically placed maskers
to reduce better-ear listening with symmetrically placed speech maskers (Ellinger, Jakien, & Gallun,
2017; Glyde, Buchholz, Dillon,
Cameron, & Hickson, 2013; Kidd, Mason, Best, & Marrone, 2010). These studies found that ITDs and ILDs
alone were sufficient to elicit SRM.Another objective of this study was to determine how well the effect of a reference ITD on
SRM can be predicted by an existing phenomenological computational spatial unmasking model.
Computational binaural models have already been successfully applied in the past to
accurately predict SRM in hearing-impaired listeners (Beutelmann, Brand, & Kollmeier, 2010; Vicente, Buchholz, & Lavandier,
2021; Williges, Dietz,
Hohmann, & Jürgens, 2015; Zedan, Williges, & Jürgens, 2018). If a model can replicate the experimental
results in normal hearing listeners, it might also be a valuable tool to predict outcomes in
unilateral HA users or even in CI users with single-sided deafness (SSD) and bimodal
listeners.
Methods
Test Environment and Stimuli
All tests were conducted in an audiometric booth. Speech reception thresholds (SRT) were
measured using the Oldenburg Sentence Test (OlSa), a German matrix sentence test (Wagener, Brand, & Kollmeier,
1999). To measure SRM, subjects performed speech tests with noise and speech
coming from the front (S0N0) and speech coming from the front and
noise coming from 90° azimuth (S0N90). Olnoise was used as a masker
which is composed by overlaying the jittered speech material of the OlSa sentences 30
times resulting in a broadband noise with the same long-term average speech spectrum as
the speech material. Speech material and noise were presented via Sennheiser HD 280 Pro
headphones with a fixed noise level of 65 dB SPL and the speech level adaptively varied
according to the subjects’ answers. Subjects used a tablet computer displaying the word
material of the OlSa to input their answers. The answers were sent to a computer outside
the audiometric booth via Bluetooth and processed on a computer using MATLAB (The
MathWorks Inc., Natick, MA, USA). In-ear head related impulse responses (HRIRs) were used
to allow for spatial separation of speech and noise (Kayser et al., 2009). A reference ITD was
introduced by delaying the signals on the right ear channel by 0, 1.75, 3.5, 5.25 and 7
ms. To further investigate the effect of the reference ITD on different cues used for
spatial unmasking, the HRIRs were manipulated according to Kulkarni et al. (1999) and Ellinger et al. (2017) to only offer either A)
ITDs and ILDs, or B) ITDs only, or C) ILDs only. In the A) condition the HRIRs for 0° and
90° azimuth were not manipulated. In the B) condition for 90° azimuth the right ear
channel of the HRIR at 0° azimuth was delayed by the delay found in the HRIR at 90°
azimuth, thus not offering any ILDs and purely ITDs. In the C) condition the right ear
channel of the HRIR at 90° azimuth was shifted by the delay such that ILDs were present,
while removing ITDs. A graphical representation can be found in Figure 1.
Figure 1.
Head-related impulse responses at 90° azimuth for the different cue conditions (A:
ITD and ILD; B: only ITD; C: only ILD).
Head-related impulse responses at 90° azimuth for the different cue conditions (A:
ITD and ILD; B: only ITD; C: only ILD).
Experimental Procedure
Prior to the speech tests pure tone audiometry was performed at the frequencies of 0.5,
1, 2 and 4 kHz to ensure normal hearing in the participants according to WHO standards
(Olusanya, Davis, & Hoffman,
2019). For each reference ITD, each condition (A, B, C) and each spatial
scenario, two OlSa lists of 20 sentences were measured. This resulted in a total of 60
measured OlSa test lists per participant. Each combination of interaural cue and reference
ITD was tested in blocks of 4 lists containing two lists in each condition,
S0N0 and S0N90, with both lists being
averaged for the calculation of the SRM. The presentation was randomized within and
between these blocks. To avoid fatigue effects on the results the testing was done in up
to 3 sessions lasting approximately 3 h with breaks after each block or if the subjects
desired a break. Two training lists of 20 OlSa sentences were applied at the beginning of
each session.
Subjects
Ten normal hearing subjects (mean age: 25.6 ± 6; min: 21; max: 42; 2 female; 8 male)
participated in this study. Their hearing threshold at 0.5, 1, 2, and 4 kHz did not exceed
20 dB HL with mean thresholds of 8.4 ± 1 dB HL for the right ear and 7.5 ± 0.7 dB HL for
the left ear. All subjects provided written informed consent prior to their participation.
All testing was conducted in accordance with the Code of Ethics of the World Medical
Association (Declaration of Helsinki) for experiments involving humans and approved by the
Technical University of Munich ethics committee (340/19).
Simulations
Model Structure
For each condition, SRTs were simulated using a blind equalization-cancellation (EC)
model (Hauth, Berning, Kollmeier,
& Brand, 2020) implemented in the auditory modeling toolbox (Majdak, Hollomey, & Baumgartner,
2021). The model uses the mixed speech and noise signals of the right and left
ear channels as inputs and splits them into 30 ERB spaced frequency bands between 150 Hz
and 8500 Hz using a gammatone filterbank (Hohmann, 2002). The frequency bands up to 1500
Hz are then fed into an EC mechanism (Durlach, 1963) to model binaural unmasking. The
cancellation is performed either by subtracting (level-minimization) or adding
(level-maximization) the equalized left and right ear channel to account for negative as
well as positive SNRs. After this step a blind decision stage based on a modulation
analysis by Santos, Senoussaoui,
& Falk (2014) is used to select whether the level-minimization or the
level-maximization yields the better SNR. The selected path is then used for further
processing. The frequency bands above 1500 Hz are only used for a better-ear processing
with a SRMR selector to determine which ear channel has the better SNR. Both paths (EC
and better ear) are then combined via a gammatone synthesis filterbank to be further
processed by the back end of the model. As a back end the speech intelligibility index
(SII; ANSI S3.5-1997) was
used. For more details concerning the model structure, see Hauth et al. (2020).
Modeling Parameters and SRT Calculations
Each measured combination of noise azimuth, utilized cue and reference ITD was modeled.
To further extend the modeling predictions all modeling was additionally done at 10 ms
reference ITD. For each condition the mixed speech and noise were used as an input to
the model at 21 SNRs between 0 and − 20 dB SNR at 1 dB steps. For each SNR, ten
sentences of the OlSa were presented to make sure every word of the OlSa speech material
appeared once. Further, 10 Monte-Carlo simulations were run per sentence to account for
the random jitter in the EC process of the model. This process resulted in 2100
simulations per combination of noise azimuth, cue, and reference ITD, with an overall
number of 69300 simulations for all combinations. The SRT was calculated as described by
Hauth et al. (2020) by
using the intersection of the mean SII over all Monte-Carlo simulations and sentences
and the mean experimental SRT measured at S0N0 without a reference
ITD and with both spatial cues available. The resulting SII was further used as a
reference SII for all other conditions modeled.To investigate the influence of more realistic environments, especially the influence
of reverberation, the model was also applied with an in-ear HRIR set recorded in a
cafeteria within the same HRIR database by Kayser et al. (2009). In this setup the head and
torso simulator (HATS) used to record the HRIRs was seated at a table with the target
speaker at 0° azimuth opposite to it. The distance between HATS and target speaker was
102 cm. The target speaker faced the HATS. The spatially separated speaker was located
at 90° azimuth to the HATS at a distance of 52 cm. In the original HRIR dataset this
speaker is at − 90° but the room was flipped by switching the right-ear and left-ear
channel of the HRIRs. The spatially separated speaker was oriented towards the frontal
speaker. For the given cafeteria setting the reverberation time T60 was 1250
ms. For this setting reference ITDs were to the same as the ones applied in the anechoic
condition. For SRT extraction the experimentally determined reference SII from anechoic
measurements was used.
Statistical Analysis
Non-parametric Friedman tests with an alpha level of 0.05 were used to test for
differences between different reference ITDs in the measured SRTs and SRM results for all
conditions. In the case of significant outcomes post-hoc pairwise testing was applied via
Wilcoxon signed-rank tests with a Bonferroni-Holm correction for multiple comparisons. To
compare the measured data with the modeling results linear regression was performed. We
performed all statistical testing in MATLAB.
Results
Experimental Results
Figure 2 shows the measured SRTs
as boxplots for the respective cue available to the listeners. For each reference ITD the
green boxplots correspond to the S0N0 condition, and the blue
boxplots correspond to the S0N90 condition. In the
S0N0 condition Friedman tests revealed no significant difference
for rising reference ITD for the condition A), with both cues available (χ2(4) = 5.56
p = 0.2), B) (χ2(4) = 7.98 p = 0.09), and C) (χ2(4) = 7.56 p = 0.1). In the spatially
separated condition significant differences for differing reference ITD could be seen in
condition A) (χ2(4) = 34.71 p < 0.001) and in condition B) (χ2(4) = 37.01 p <0.001).
If only ILDs were present (condition C)), a rising reference ITD did not lead to a
significant change in the measured SRTs (χ2(4) = 7.67 p = 0.1). Test-retest
reproducibility was calculated as mean absolute difference between the two measured
testlists per condition, reference ITD and spatial configuration revealing 0.7 ± 0.2 dB in
condition A) 0.7 ± 0.1 dB in condition B) and 0.9 ± 0.2 dB in condition C). No significant
differences were found between the conditions.
Figure 2.
Measured speech reception thresholds (SRT) in ten normal hearing subjects for
reference ITD of 0, 1.75, 3.5, 5.25 and 7 ms as boxplots (red line: median; box:
1st–3rd quartile; whiskers: minimum and maximum without
outliers; outliers in red) for either collocated speech and noise at 0° azimuth (green
boxes) or speech at 0° and noise at 90° azimuth (blue boxes). Modeled SRTs for
collocated speech and noise are denoted by diamonds, and for spatially separated
speech and noise by circles. Modeled SRTs depicted up to 10 ms reference ITD. A) ITDs
& ILDs condition B) ITDs only condition C) ILDs only condition.
Measured speech reception thresholds (SRT) in ten normal hearing subjects for
reference ITD of 0, 1.75, 3.5, 5.25 and 7 ms as boxplots (red line: median; box:
1st–3rd quartile; whiskers: minimum and maximum without
outliers; outliers in red) for either collocated speech and noise at 0° azimuth (green
boxes) or speech at 0° and noise at 90° azimuth (blue boxes). Modeled SRTs for
collocated speech and noise are denoted by diamonds, and for spatially separated
speech and noise by circles. Modeled SRTs depicted up to 10 ms reference ITD. A) ITDs
& ILDs condition B) ITDs only condition C) ILDs only condition.Figure 3 shows the calculated
SRM for each of the cue conditions. In condition A), the mean SRM at 0 ms reference ITD
was 8.82 dB with a standard deviation of 1.12 dB. With a rising reference ITD the SRM
decreased to 4.63 ± 0.89 dB at 7 ms reference ITD. Friedman tests revealed a significant
influence of reference ITD in this condition (χ2(4) = 37.61 p < 0.01). For condition
B), the SRM at 0 ms was 5.48 ± 1.21 dB and decreased to 1.12 ± 0.41 dB at a reference ITD
of 7 ms. This effect also proved highly significant (χ2(4) = 35.04 p < 0.01). In
condition C), the SRM at 0 ms reference ITD was 4.81 ± 0.97 dB and slightly decreased at 7
ms reference ITD to 3.94 ± 0.81 dB. The SRM in this condition was also significantly
affected by the rising temporal asymmetry between both ears (χ2(4) = 10.64 p = 0.031).
Pairwise comparisons showed significant differences between all measured SRM values in
condition A) (p < 0.05). In condition B) only the differences in SRM at 0 ms versus
1.75 ms (p = 0.09) and at 3.5 ms versus 5.25 ms reference ITD (p = 0.09) did not prove
significant. In condition C) none of the tested differences proved to be significant. We
hypothesize this to be due to better-ear listening dominating the SRM, which is not
influenced by a rising reference ITD, given its monaural nature. At a reference ITD of 7
ms, there was almost no more SRM due to ITDs in condition A) and the SRM measured
approached the SRM caused by ILDs seen in condition C) which was not influenced by the
reference ITD.
Figure 3.
Measured spatial release from masking (SRM) in ten normal hearing subjects for
reference ITD of 0, 1.75, 3.5, 5.25 and 7 ms as boxplots (red line: median; box:
1st–3rd quartile; whiskers: minimum and maximum without
outliers; outliers in red). Modeled SRM denoted by diamonds. Modeled SRM depicted up
to 10 ms reference ITD. A) ITDs & ILDs condition B) ITDs only condition C) ILDs
only condition.
Measured spatial release from masking (SRM) in ten normal hearing subjects for
reference ITD of 0, 1.75, 3.5, 5.25 and 7 ms as boxplots (red line: median; box:
1st–3rd quartile; whiskers: minimum and maximum without
outliers; outliers in red). Modeled SRM denoted by diamonds. Modeled SRM depicted up
to 10 ms reference ITD. A) ITDs & ILDs condition B) ITDs only condition C) ILDs
only condition.
Modeling Results
Figure 2 shows the modeled SRTs
for S0N0 as diamonds and for S0N90 plotted as
circles. The model's behavior matched the experimental results qualitatively. For
S0N0 the coefficients of determination (R2) and
root-mean-square errors (RMSE) are reported in Table 1. The low values for R2 in the
conditions B) and C) are not surprising given the shallow slope within the datasets. For
the S0N90 condition the coefficient of determination were generally
much higher as Table 1 shows.
The additional modeled SRTs at a reference ITD of 10 ms show further deterioration in the
S0N90 condition in conditions A) and B) but not in condition
C).
Table 1.
Coefficients of Determination (R2) and Root-Mean-Square Errors (RMSE) for
the Linear Regression Between Modeled and Measured Results. A) ITDs & ILDs
Condition B) ITDs Only Condition C) ILDs Only Condition.
A
B
C
R2
RMSE [dB]
R2
RMSE [dB]
R2
RMSE [dB]
S0N0
0.6
0.03
0.02
0.05
0.07
0.05
S0N90
0.92
0.6
0.93
0.5
0.3
0.1
SRM
0.95
0.5
0.97
0.4
0.38
0.2
Coefficients of Determination (R2) and Root-Mean-Square Errors (RMSE) for
the Linear Regression Between Modeled and Measured Results. A) ITDs & ILDs
Condition B) ITDs Only Condition C) ILDs Only Condition.Figure 3 depicts a comparison
between the modeled SRM and the measured SRM. Linear regression was performed to determine
the accuracy of the model. The corresponding coefficients of determination and RMSEs for
the comparison between modeled and measured SRM can be found in Table 1. The additionally modeled SRM for a
reference ITD of 10 ms show that a reference ITD of 10 ms eliminates SRM based on ITD as
seen in condition B).In Figure 4 the modeled SRTs for
S0N0 and S0N90 condition are depicted for
the anechoic condition in magenta and the reverberant (cafeteria) condition in blue. For
S0N0 condition the difference between anechoic condition and
cafeteria condition was quite low with a mean absolute difference of 0.15 dB for condition
A), 0.14 dB for condition B) and 0.14 dB for condition C). In the
S0N90 condition, the mean absolute difference between the anechoic
condition and the cafeteria condition was 1.1 dB for condition A), 0.9 dB for condition B)
and 1.2 dB in condition C).
Figure 4.
Modeled speech reception thresholds (SRT) for reference ITDs of 0, 1.75, 3.5, 5.25, 7
& 10 ms for spatially collocated and spatially separated target and masker in an
anechoic setting (magenta) and a reverberant cafeteria environment (blue). A) ITDs
& ILDs condition B) ITDs only condition C) ILDs only condition.
Modeled speech reception thresholds (SRT) for reference ITDs of 0, 1.75, 3.5, 5.25, 7
& 10 ms for spatially collocated and spatially separated target and masker in an
anechoic setting (magenta) and a reverberant cafeteria environment (blue). A) ITDs
& ILDs condition B) ITDs only condition C) ILDs only condition.Figure 5 displays the modeled
SRM for the anechoic environment in magenta and the cafeteria environment in blue. For
condition A) the initial SRM at 0 ms reference ITD is 8.3 dB in the anechoic environment
and 6.6 dB in the cafeteria environment. At a reference ITD of 10 ms the model predicted
SRM of 3.5 dB for the anechoic environment and 5.1 dB in the cafeteria environment,
showing a lower influence of reference ITD on SRM in the reverberant environment. When
only ITDs were present in the signal (condition B)), SRM at 0 ms reference ITD is at 5.2
dB in the anechoic environment and at 2.4 dB in the cafeteria environment. At 10 ms
reference ITD both SRM in the anechoic environment and in the cafeteria were at 0 dB in
this condition. With only ILDs present in the signals SRM for the anechoic environment was
4.1 dB at 0 ms reference ITD and slightly deteriorated to 3.7 dB at 10 ms reference ITD.
In the cafeteria environment SRM by ILD was at 5.4 dB for a reference ITD of 0 ms and at
5.1 dB at 10 ms reference ITD.
Figure 5.
Modeled spatial release from masking (SRM) for reference ITDs of 0, 1.75, 3.5, 5.25,
7 & 10 ms in an anechoic setting (magenta) and a reverberant cafeteria environment
(blue). A) ITDs & ILDs condition B) ITDs only condition C) ILDs only
condition.
Modeled spatial release from masking (SRM) for reference ITDs of 0, 1.75, 3.5, 5.25,
7 & 10 ms in an anechoic setting (magenta) and a reverberant cafeteria environment
(blue). A) ITDs & ILDs condition B) ITDs only condition C) ILDs only
condition.
Discussion
In this study, the influence of reference ITDs of several milliseconds on SRM was
investigated. Unilateral delays in this range were found in bimodal HA/CI users (Zirn et al., 2015). To deepen the
understanding of the effect of a reference ITD, all measurements were conducted with the
different binaural cues that enable SRM separately. Further we investigated whether the
measured effects can be reproduced by an EC-model for binaural unmasking. The results showed
that an increasing reference ITD led to a significant decrease in SRM, which affected
unmasking by ITDs more drastically than unmasking though ILDs. When both cues were available
to the subjects, unmasking due to ITDs and ILDs were roughly additive. Furthermore, the
measured results were accurately predicted by an EC-model. When a reverberant environment
was used with the model the decrease in SRM by reference ITDs was less drastic than in an
anechoic condition.In contrast to previous studies investigating SRM with isolated cues, the reported results
showed that with a reference ITD of 0 ms, SRM achieved when ITDs were used to separate
target and masker was bigger than the SRM achieved when only ILDs were present in the
signal. These differences probably originated from different masker types. Glyde et al. (2013) applied
symmetrically placed speech maskers to measure SRM, whereas Bronkhorst & Plomp (1988) used a single source of
masking noise with a speech envelope. Both maskers allow the use of dynamic head shadow in
the quiet parts of the masker, allowing better speech understanding when only ILDs are
present compared to ITDs alone. These methodological differences would require further
experiments to allow for sufficient comparisons between our data and the data reported in
the literature.The presented results reveal a limiting factor to SRM in listeners with asymmetric
treatment of hearing loss, which has not yet been reported. A reference ITD in the range of
a few milliseconds has already significant detrimental effects on the unmasking of speech
when ITDs and ILDs are conveyed sufficiently by the hearing devices. In the ITDs only
condition (B), a reference ITD of 7 ms almost eliminated SRM. In the ILDs only condition (C)
a reference ITD did not influence SRM significantly, processing in the auditory system in
this condition seems more robust against a reference ITD. However, with binaural
measurements it is only possible to differentiate between SRM based on binaural mechanisms
and access to one ear having a better signal to noise ratio, which is a monaural mechanism
to some extent. To disentangle these mechanisms, monaural measurements could be utilized to
measure the effect of head shadow as proposed by Dieudonné & Francart (2019). The present
findings also suggest that, while SRM based on ITDs diminishes for a reference ITD of 7 ms,
considerable SRM remains due to better-ear listening. In bimodal listeners provided with a
CI and a contralateral HA, the device delay mismatch between HA and CI is expected to have
no big impact on SRM, since only ILDs and not ITDs are sufficiently conveyed by the CI
(Francart & McDermott,
2013; Veugen et al.,
2016). Consequently it has been shown that SRM in bimodal listeners is driven by
head shadow (Williges et al.,
2019). However, as soon as more sophisticated CI coding strategies code ITDs with
higher fidelity in the future, the impact of a reference ITD on SRM needs to be considered
as a significant limiting factor, especially since significant deterioration of SRM can
already be seen at a reference ITD of 1.75 ms. This leads to the hypothesis that, when both
cues are available for listeners, a matching accuracy of processing latencies in their
respective devices below this reference ITD of 1.75 ms should be considered favorable.The applied model was able to predict the experimental data with sufficient precision. This
is probably due to the combination of an EC-processing and a monaural SRMR processing, where
the monaural part still offers substantial SRM in the presence of ILDs when the
EC-processing fails to deliver release from masking. The detrimental effect of a reference
ITD on the EC-processing can be easily explained by the processing errors included in this
model, that were originally proposed by Vom Hövel (1984). These processing errors for the estimation of ITDs are dependent
on the actual ITDs in the signal. Thus, as the reference ITD increases, the processing
errors of the ITD equalization within the model increase, making the equalization less
accurate. The level equalization, however, is not affected by the reference ITD since
mathematically, the ILD processing errors are dependent on the ILDs that are found within
the signal. This allows the model to accurately describe the effects that can be measured
experimentally. The low value for R2 in the ILDs only condition can be explained
due to the measured data being explained better by its own mean than by the model, which is
not surprising as to the SRM was only minimally affected by an increasing reference ITD.
However, the low RMSE in this condition still demonstrated a good accuracy of the model.
Overall, the model can predict the measured SRM in the presence of a reference ITD of
several milliseconds with high precision.In future studies the model by Hauth
et al. (2020) could possibly be used to model performance in listeners with
asymmetric hearing loss in which a reference ITD greater than 0 ms occurs due to the
asymmetries in treatment. Williges et
al. (2015) and Zedan et al.
(2018) both used EC-based models to accurately predict SRM in simulated bimodal
listeners, given some additional preprocessing of the signal to simulate the CI and the HA.
These results in combination with the high accuracy the model provides in normal hearing
listeners supports the hypothesis that modeling the effects of a reference ITD in hearing
impaired listeners with EC-based models is possible and that models are a valuable tool for
the prediction of treatment outcomes.When comparing the modeling results in an anechoic environment to a more realistic
reverberant scenario the influence of a rising reference ITD is much lower. This is due to
the SRM based on ITD being vastly diminished in a reverberant environment. The influence of
reverberation on ITD processing has been shown for fine structure ITD (Devore & Delgutte, 2010) and for envelope ITD
(Monaghan, Krumbholz, & Seeber,
2013). In the ILD only condition SRM was better in the reverberant environment than
in the anechoic environment irrespective of the reference ITD, suggesting that ILDs play a
more important role in spatial unmasking in reverberation. However even in the reverberant
environment an overall SRM decay of 1.6 dB between 0 and 10 ms reference ITD can be observed
when both ILD and ITD are present in the signal. To further verify whether these modeling
results can be considered realistic, more experimentation in the same reverberant conditions
with normal hearing listeners should be undertaken in the future.
Conclusion
In normal hearing listeners, a reference ITD has a significant detrimental influence on
SRM. Our results show that this reference ITD mainly impairs the effect of ITDs in the
signal on SRM. When only ILDs were presented, a reference ITD of a few milliseconds showed
no significant effect on SRM. These results may be particularly relevant for bimodal
listeners with different devices in each ear. The measured results can be accurately
predicted by an EC-based model from Hauth et al. (2020) for binaural unmasking of speech. In a reverberant
environment, modeling showed a smaller but still detrimental effect of reference ITD on SRM,
mainly due to the small proportion of SRM based on ITD in reverberation. These results
however should be verified experimentally.