Literature DB >> 33710928

Can Dual Compression Offer Better Mandarin Speech Intelligibility and Sound Quality Than Fast-Acting Compression?

Yuan Chen¹, Lena L N Wong², Volker Kuehnel³, Jinyu Qian^4,5, Solveig Christina Voss⁴, Wang Shangqiguo².

Abstract

The aim of this study was to evaluate the efficacy of dual compression for Mandarin-speaking hearing aid users. Dual compression combines fast and slow compressors operating simultaneously across all frequency channels. The study participants were 31 hearing aid users with symmetrical moderate-to-severe hearing loss, with a mean age of 67 years. A new pair of 20-channel behind-the-ear hearing aids (i.e., Phonak Bolero B90-P) was used during the testing. The results revealed a significant improvement in speech reception thresholds in noise when switching from fast-acting compression to dual compression. The sound quality ratings revealed that most listeners preferred dual compression to fast-acting compression for listening effort, listening comfort, speech clarity, and overall sound quality at +4 dB signal-to-noise ratio. These results are consistent with predictions based on the theoretical understanding of dual and fast-acting compression. However, whether these results can be generalized to other languages or other dual compression systems should be verified by future studies.

Entities: Chemical

Keywords: Chinese; compression; hearing aids; sound quality; speech perception

Mesh：

Year: 2021 PMID： 33710928 PMCID： PMC7958173 DOI： 10.1177/2331216521997610

Source DB: PubMed Journal: Trends Hear ISSN： 2331-2165 Impact factor: 3.293

Compression is typically implemented in hearing aids (HAs) to address the loudness recruitment and reduced dynamic range associated with sensorineural hearing loss by reducing gain for high-level sounds while increasing gain for low-level sounds (Moore, 2008). However, the effects of compression systems on the amplified signal depend on the time constants, namely, the attack time (AT) and the release time (RT). Compression can be categorized as fast (AT: 0.5–20 ms; RT: 10–100 ms) and slow (RT > 500 ms) (Cox & Xu, 2010; Kuk et al., 2019; Moore, 2008). Fast gain recovery as determined by the RT improves the audibility of soft sounds and thus may improve speech intelligibility. However, fast-acting compression leads to amplification of low-level noise, resulting in a reduced signal-to-noise ratio (S/N) (Bor et al., 2008). In addition, fast-acting compression tends to reduce intensity contrasts and the modulation depth of speech, introducing distortion of temporal cues (Moore, 2008). Furthermore, many commercial HAs, like the ones used in this study, have 20 or more channels. The number of compression channels may influence the effect of compression speed on speech intelligibility. Multiple channels provide better frequency shaping to match the prescribed targets. However, as multichannel compression can change gain across frequency as well as time, both spectral and temporal contrast may be reduced when fast-acting compression is applied with multiple channels (Plomp, 1988; Stone & Moore, 2008). Although slow-acting compression may provide lower audibility and less information from temporal dips in the background, it is better at preserving original temporal cues and the S/N than fast-acting compression (Moore, 2008). Therefore, slow-acting compression is often preferred over fast-acting compression for reduced distortion, naturalness of sounds, greater comfort, and better sound quality structure (Gatehouse et al., 2006; Hansen, 2002; Moore, 2008; Neuman et al., 1995, 1998). The application of slow-acting compressions is preferred in listeners with poorer sensitivity to temporal fine structure since this leads to greater reliance on the temporal envelop which is better preserved by the slow compression (Moore & Sęk, 2016). Contrasting results regarding the effect of compression time constants on speech intelligibility exist in literature. Although Davies-Venn et al. (2009), Gatehouse et al. (2006) and Jenstad and Souza (2005) reported better speech intelligibility with fast-acting compression, Reinhart and Souza (2016) and Stone and Moore (2004) reported better speech intelligibility using slow-acting compression. In addition, Salorio-Corbetto et al. (2020) and Novick et al. (2001) did not find a significant difference in the intelligibility of speech in noise between slow- and fast-acting compression. These varied results may be due to the trade-off effects among the distortion introduced by compression, the audibility produced by compression, and the working memory (WM) (Kuk, 2016). According to the Ease of Language Understanding (ELU) model (Rönnberg et al., 2019), any mismatch between the perceptual input and phonological representation stored in long-term memory (LTM) disrupts automatic lexical retrieval, leading to explicit, effortful processing mechanisms based on WM (Füllgrabe & Rosen, 2016). As mentioned, fast-acting compression introduces more distortion of temporal information in speech but is likely to increase audibility for HA users. HA users with better WM may be more likely to tolerate these distortions and benefit from the improved audibility, leading to improved speech intelligibility (Kuk, 2016). In contrast, the distortion introduced by fast-acting compression could be more detrimental for those with poor WM. These listeners are more likely to benefit from slow-acting compression (Kuk, 2016). At present, no protocols have been made for determining the amount of distortion someone with poor WM may tolerate, or for identifying HA users who have difficulty understanding speech processed with fast-acting compression. Thus, clinicians often make arbitrary adjustments to HA compression parameters based on client feedback. The limitations of fast- and slow-acting compression were the starting point for the development of a dual compression system that combines both these types of compression. Moore & Glasberg (1988) and Moore et al. (1991) described a dual front-end automatic gain control system that included a fast and a slow control voltage generator. The system was determined by the slow-acting control voltage, while the fast-acting control voltage came into operation when intense sound occurred to protect the user from sudden transient loud sounds without affecting the long-term gain. The HAs used in this study blended fast and slow dynamic range compression (DRC) simultaneously across all frequency channels at an adjustable ratio (Figure 1). Compression was not entirely independent in the different channels but a coupling across neighboring bands that was motivated by cochlea Bark filter resolution was applied (Phonak, 2000). The overall DRC was applied as a weighted average of gain that was smoothed with either a fast or a slow set of time constants. The dynamic behavior for fast and dual compression was similar for all frequency channels, and the compression time constants were fixed. More specifically, the fast-acting compressor operated with an AT of 10 ms and an RT of 60 ms (Figure 2). The dual compressor (see Figure 2A and B) combined the fast-acting compressor described earlier with a slow-acting compressor that had an AT of 1 s and an RT of 8 s. In this implementation, the slow-acting compressor dominated the overall system dynamics. The resulting AT and RT of the dual compressor were 1.2 s and 7.1 s, respectively (Figure 2A). Figure 2B, which is an enlarged section of Figure 2A, depicts the recovery process of the output signal. At t = 40 s, there was a fast recovery of the output from 44 dB sound pressure level (SPL) at RT of 60 ms for the fast compression settings. The slow recovery of the output persisted until t = 47 s where output returned to 4 dB below the stationary gain (ANSI/ASA S3.22-2014), but a partial recovery with a fast RT of 60 ms can be observed for the dual compression system as well. The lower compression paths used identical compression knee points. The theoretical advantage of a dual compressor (such as this one) over a compression system with a fast AT and slow RT is the increased audibility of weaker sounds after strong sounds that trigger the fast attack. This can help avoid the wearers’ perception of devices being “dead” after sudden strong sounds.

Figure 1.

Figure 2.

AT and RT for fast compression and dual compression. A: American National Standards Institute (ANSI) S3.22-2014 step response and derived AT and RT for fast compression (gray line) and dual compression (solid black curve). The fast compression had an AT of 10 ms and RT of 60 ms, whereas the dual compression had an AT of 1.2 s and RT of 7.1 s. The static compression ratio was set to CR = 2.7. The section marked with an ellipse is shown magnified in Panel B. B: An enlarged section of Panel A at t = 40 s showing the release behavior for dual compression (solid line) and fast compression (gray line). The gain for dual compression recovered to 4 dB below the stationary gain at t = 47.1 s (see Panel A), which is outside the range visible in this detailed view. AT = attack time; RT = release time; SPL = sound pressure level.

Schematic Diagram of the Signal Processing Applied in This Study. The processing of the gain with fast-acting and slow-acting gain regulation and a noise canceller used the same input signal of a 20-frequency-band short-term spectrum derived from a FFT filter bank. Fast and slow compression and the noise canceller delivered a gain that was combined additively in the logarithmic domain. These gains were applied as filter weights to the filter bank signal before it was converted back to a time-domain signal with an IFFT. The dual compressor had a control parameter, w, which determined the amount of fast-acting and slow-acting compression. A value of w = 1 resulted in fast-acting compression only, whereas w = 0 resulted in slow-acting compression only. In the study, w = 1 was used for fast-acting compression and w = 0.4 for dual compression. In each of the 20 bands, the noise canceller derived a signal-to-noise ratio (S/N) estimate and calculated a gain depending on this S/N. For low S/N values, negative gain was applied, whereas for large S/N values, the gain approached 0 dB. ADC = Analog-to-digital conversion; FFT = fast Fourier transform; IFFT = inverse fast Fourier transform; SNR = signal-to-noise ratio; ST = short-term. AT and RT for fast compression and dual compression. A: American National Standards Institute (ANSI) S3.22-2014 step response and derived AT and RT for fast compression (gray line) and dual compression (solid black curve). The fast compression had an AT of 10 ms and RT of 60 ms, whereas the dual compression had an AT of 1.2 s and RT of 7.1 s. The static compression ratio was set to CR = 2.7. The section marked with an ellipse is shown magnified in Panel B. B: An enlarged section of Panel A at t = 40 s showing the release behavior for dual compression (solid line) and fast compression (gray line). The gain for dual compression recovered to 4 dB below the stationary gain at t = 47.1 s (see Panel A), which is outside the range visible in this detailed view. AT = attack time; RT = release time; SPL = sound pressure level. The dual compression system was matched in root mean square (RMS) output sound pressure level to the fast compression system. This was achieved by choosing a similar ratio between the RT and AT constants for the two compressors (Figure 1). Choosing a similar ratio of AT and RT for the fast and slow compressor results in similar output for soft to loud speech signal without the need to choose different knee points or compression ratios for the two compressors. Table 1 shows the 2 cc coupler gains for the International Speech Test Signal (ISTS) for 50, 65, and 80 dB input levels. The gains differed by less than 2 dB between fast and dual compression. Figure 3 shows a separation between the 99th and 30th percentile output levels for the dual compressor than for the fast-acting compressor. This indicates that less distortion is introduced as the modulation depth of speech is reduced less by the dual compressor. We expected the dual compressor to outperform the fast-acting compressor for the intelligibility of speech in noise as well as for sound quality rating.

Table 1.

Measured 2 cc Coupler Gain and MPO Recorded for the Mean Hearing Loss Configuration of the Participants of the Study (Average for Left and Right Ears).

Frequency (Hz)	250	500	750	1000	1500	2000	3000	4000	6000	8000
MPO (dB SPL 2 cc)	95	97	100	102	104	103	106	106	83	67
2cc gain at 80 dB (dB)	1	2	7	14	23	24	31	28	5	−9
2cc gain at 65 dB (dB)	12	15	19	25	33	35	42	39	15	2
2cc gain at 50 dB (dB)	18	22	26	31	37	39	47	45	20	7
Static CR	1.7	1.7	2	2	2.2	2.5	2.7	2.8	3	2.9
Compression knee point (dB SPL in 1/3 octave band)	47	48	46	43	39	38	34	34	36	38

Note. The gains are shown as 2 cc coupler gains for ISTS signals (IEC 60118-15, 2015) at 80, 65, and 50 dB SPL input levels. The static CRs are indicated as well as the compression knee points expressed as third octave band levels. The measured 2 cc response curves differed by less than 2 dB between fast and dual compression and hence the average across fast and dual compression is presented here. MPO = maximum power output; CR = compression ratio; SPL = sound pressure level.

Figure 3.

Third Octave Output Spectra Recorded on KEMAR (Burkhard & Sachs, 1975) for 20 s of the International Speech Test Signal (ISTS; Holube et al., 2010; International Electrotechnical Commission, 2012) Played at 65 dB SPL. The signals were recorded with a prescribed gain for a flat 60 dB HL hearing loss according to the Adaptive Phonak Digital Tonal (APDT) gain prescription rule. The three solid lines show the 30th percentile, the root mean square (RMS), and the 99th percentile of the recorded third octave spectrum for the dual compression setting. Level percentiles were calculated using short-term spectra of 125 ms duration having 50% overlap. Dashed lines show the same values for fast-acting compression. The RMS levels of the two compression strategies were matched within ±2 dB. The dynamic range for the dual compressor was larger than that for the fast-acting compressor and closer to the dynamic range of the uncompressed ISTS (not shown), which is about 30 dB between the 30th and 99th percentiles (Holube et al., 2010). SPL = sound pressure level.

Measured 2 cc Coupler Gain and MPO Recorded for the Mean Hearing Loss Configuration of the Participants of the Study (Average for Left and Right Ears). Note. The gains are shown as 2 cc coupler gains for ISTS signals (IEC 60118-15, 2015) at 80, 65, and 50 dB SPL input levels. The static CRs are indicated as well as the compression knee points expressed as third octave band levels. The measured 2 cc response curves differed by less than 2 dB between fast and dual compression and hence the average across fast and dual compression is presented here. MPO = maximum power output; CR = compression ratio; SPL = sound pressure level. Third Octave Output Spectra Recorded on KEMAR (Burkhard & Sachs, 1975) for 20 s of the International Speech Test Signal (ISTS; Holube et al., 2010; International Electrotechnical Commission, 2012) Played at 65 dB SPL. The signals were recorded with a prescribed gain for a flat 60 dB HL hearing loss according to the Adaptive Phonak Digital Tonal (APDT) gain prescription rule. The three solid lines show the 30th percentile, the root mean square (RMS), and the 99th percentile of the recorded third octave spectrum for the dual compression setting. Level percentiles were calculated using short-term spectra of 125 ms duration having 50% overlap. Dashed lines show the same values for fast-acting compression. The RMS levels of the two compression strategies were matched within ±2 dB. The dynamic range for the dual compressor was larger than that for the fast-acting compressor and closer to the dynamic range of the uncompressed ISTS (not shown), which is about 30 dB between the 30th and 99th percentiles (Holube et al., 2010). SPL = sound pressure level. In summary, this study compared speech intelligibility and sound-quality preference between fast-acting compression and dual compression using commercially available HAs. According to previous studies, better hearing thresholds (Souza & Sirow, 2014), higher education level (Chen et al., 2020), previous experience with a given HA signal processing scheme (Ng & Rönnberg, 2020) and better WM (Cox & Xu, 2010) were significantly correlated with speech intelligibility. The effects of these variables on speech intelligibility and sound-quality preference with different compression schemes were also examined.

Methods

Participants

HA users from the Shenkang Hearing Center (Beijing, China) were recruited via telephone. Patient files were reviewed, and all patients who met the following inclusion criteria were contacted: (a) symmetrical moderate to severe hearing loss, defined as an interaural difference of less than 10 dB HL at octave frequencies from 250 to 8000 Hz; (b) native monolingual standard Mandarin speakers living in Beijing; (c) normal cognitive function and a passing score on the Montreal Cognitive Assessment (MoCA, Chinese version; Yu et al., 2012); and (d) bilateral HA use for at least 3 months with a daily wearing time of at least 5 hr. A total of 31 HA users agreed to participate (23 males and 8 females). Their age was between 33 and 87 years (mean = 67, standard deviation [SD] = 14, median = 69). The mean pure-tone hearing thresholds are shown in Figure 4. The mean education level was 6.1 years (SD = 1.7, median = 6.0). Participants had used HAs for an average of 10.5 years (SD = ±8.6, median = 8.0).

Figure 4.

Mean and Standard Deviation of the Pure-Tone Hearing Thresholds of Participants’ Right and Left Ears.

Mean and Standard Deviation of the Pure-Tone Hearing Thresholds of Participants’ Right and Left Ears. Seventeen participants had used HAs equipped with the same dual compression scheme as in this study for more than 6 months. The other 14 participants had no experience with dual compression HAs and were wearing HAs from Phonak, Oticon, or Resound that did not implement the dual compression scheme used in this study.

Materials and Test Equipment

A sentence recognition test and sound quality paired comparison measures were conducted. In addition, WM was measured.

Sentence Recognition

The Mandarin Hearing in Noise Test (MHINT) was used to measure sentence reception thresholds (SRTs) in quiet and noisy conditions. The MHINT is an adaptive test that measures the presentation level or S/N ratio at which half the keywords are correctly recognized in quiet or in noise and was developed using the same rationale as the English Hearing in Noise Test (Wong, Ho, et al., 2007). The test corpus contains 12 sentence lists, and each list consists of 20 sentences. Distributions of phonemes and lexical tones are balanced between the lists. Each sentence consists of 10 characters, and the background noise is a speech-spectrum-shaped noise initiated 1 s before the start of each sentence (Wong, Ho, et al., 2007) and ending 1 s after the sentence. Four test conditions were used (two compression strategies and two listening conditions). Each condition was evaluated using one MHINT list. In the noisy condition, speech and noise were presented from a loudspeaker situated 1 m away from the participants at 0° azimuth, with the noise level fixed at 65 dBA. The level of the speech was adjusted to determine the SRT. Before testing, calibration was conducted using a sound level meter placed at the center of the head position of participant. The order of the test conditions and sentence lists was randomized for each participant. Participants were encouraged to make a guess, even if they did not hear the whole sentence. A practice list was presented for participants to get used to the test procedure. A 5- to 10-min break was offered upon request or when the participant seemed tired.

Sound Quality Paired Comparison

Paired comparisons were conducted to evaluate preferences. Participants listened to the same recording under two compression settings. The target speech was presented in noise at +1 dB and +4 dB S/N, and participants were asked to indicate their preference between dual and fast-acting compression in terms of the listening effort, listening comfort, speech clarity, and overall sound quality, all in one block. First, participants were asked to indicate their preferred compression setting and then to indicate how much better the preferred compression setting was compared to the other. The order of the two compression settings and test conditions (i.e., +1 dB and +4 dB S/N) was randomized for each participant. A recording of restaurant noise played at 66 dBA and a recording of a Mandarin translation of “The North Wind and the Sun” (Holube et al., 2010) played at 70 dBA and at 67 dBA were used as target speech material to yield S/Ns of +4 dB and +1 dB. Before the paired comparison, oral and written instructions were given to ensure that the participants fully understood the five rating questions. The paired comparison rating scale ranged from 1 to 7 for listening effort, listening comfort, speech clarity, and overall quality: 1—dual compression was much better, 2—dual compression was better, 3—dual compression was slightly better, 4—no difference, 5—fast-acting compression was slightly better, 6—fast-acting was compression better, 7—fast-acting compression was much better. For loudness judgments, although the rating scale still ranged from 1 to 7, participants were asked to judge which compression scheme sounded louder. Letters (A or B) were randomly assigned to the stimuli presented using the dual or fast-acting compression in order not to reveal the order of the compression conditions to participants. The speech and noise were replayed as many times as needed.

Working Memory

WM was measured using the One Back Test (OBK) from the CogState Battery, which has been adapted for use among Chinese speakers and has good reliability and validity (Zhong et al., 2013). In the OBK, a playing card is presented at the center of the screen, and the participant is asked to indicate whether the card is the same as the previous card by pressing keys on a keyboard. The results are used to analyze the speed of performance (mean of the log10 transformed reaction times for correct responses). Lower scores indicate better WM.

Procedures

A pair of Phonak Bolero B90-P, 20-channel behind-the-ear HAs were used. Participants’ own custom ear molds were used during testing. If participants wore in-the-ear HAs, disposable universal ear tips were used. The vent of the earmolds was prescribed by the fitting software, typically with vents smaller than 1 mm. Phonak Target fitting software (version 5.1) was used to fit and fine-tune the HAs. The Adaptive Phonal Digital Tonal (APDT; Wong et al., 2018) prescription rule was used to prescribe HA gain parameters. The APDT rule prescribed more gain for soft low-frequency inputs than an older prescription rule, the Adaptive Phonak Digital (APD; Latzel, 2013), to enhance the audibility of tonal information in speech, as low frequencies carry more information for Mandarin than English speech intelligibility. Dual compression, as described earlier, is implemented in the APDT as the default compression method. A research version of the fitting software was used to fit participants with fast-acting or dual compression based on APDT and to allow manual switching between the two types of compression. The electroacoustic properties of the HAs were checked both for proper functioning and for matching to the prescribed settings prior to and after the study. Table 1 shows measured 2 cc coupler gains for soft, moderate, and loud speech for the average hearing loss of the participants. During initial fitting, a real-ear loop-gain measurement was conducted to determine the maximum stable gain before feedback. The fitted gain was limited to this maximum stable gain. All functions were turned off except for the feedback management and noise reduction (NR). NR was set at 14, which is a moderate setting activated with an AT of several seconds and RT of several milliseconds. This NR function employs a Wiener filter-type algorithm in all the HA channels. The maximum attenuation applied by the NR was 7 dB for an unmodulated broadband noise. This setting was used since Wong et al. (2018) found it to be the preferred setting among a range of mild to more aggressive NR settings. Thus, it was expected that most HA users in this study would prefer this setting. More information about this NR setting can be found in Wong et al. (2018). To verify that there was only little interaction between the compression and NR, additional technical measurements were performed using the phase inversion method described by Hagerman and Olofsson (2004). The International Speech Test Signal (ISTS) signal and the spectrally matched IFnoise (European Hearing Instrument Manufacturers Association, 2016; Holube et al., 2010) were played back simultaneously at 65 dB SPL. An HA was programmed to the average hearing loss of the participants with the gain according to Table 1. The output of the HA was recorded using a 2 cc coupler. To ensure that the phase inversion method was valid, we confirmed that the signal and noise extracted with the phase inversion method had only minimal contributions from the other signal (below −15 dB). The SII (ANSI/ASA S3.5-1997 (R2017)) weighted S/N of the output signal was determined for the conditions NR off or NR (set to 14), both combined with fully linear gain, fast-acting compression or dual compression setting. Figure 5 shows the SII-weighted S/N. With the fast compression, the overall S/N was 3 dB lower than for the linear setting and the S/N improvement with the NR was reduced to 2.4 dB compared to 2.8 dB with the slow compression. The change of 0.4 dB was markedly smaller than the effect of compression on the S/N, suggesting little interaction between NR and compression system.

Figure 5.

SII Weighted Signal to Noise Ratio (S/N) in dB of the Hearing Aid Output for the Average Amplification Setting Used in the Study. NR = noise reduction.

SII Weighted Signal to Noise Ratio (S/N) in dB of the Hearing Aid Output for the Average Amplification Setting Used in the Study. NR = noise reduction. For fine-tuning, the choice of compression type was randomized. The settings for 15/31 randomly chosen participants were fine-tuned using dual compression, while the settings for the remaining participants were fine-tuned using fast-acting compression. A recording of “The North Wind and the Sun” (Holube et al., 2010) was presented at 65 dBA via a loudspeaker 1 meter in front of participants (0° azimuth). Participants were asked to comment on the overall loudness and the loudness balance between ears. The broadband gain for 65 dB input (G65) was adjusted in 1 dB steps until participants were satisfied with the balance and loudness comfort in both ears. Next, to ensure that the loudness of the music was comfortable, a short, lively orchestral piece, which provided greater variations in sound level than average speech, was presented at 70 dBA. Participants were asked to judge the sound quality of the music. To adjust the tonal balance, the gain in the frequency regions above and below 1.5 kHz was increased or decreased in 1 dB steps for all input levels until the participants indicated that the music was neither boomy nor tinny. The two ears were fitted together. Nearly 3/4 of the participants preferred fine-tuning of 1 dB while no changes were made for the others. After the HA fitting and fine-tuning, sentence recognition and sound quality comparisons were conducted. The order of compression settings and measures for sentence recognition was randomized for each participant. All testing was conducted in a sound booth in the Shenkang Hearing Center (Beijing, China) with background noise lower than 35 dBA. All participants received an honorarium of 100 RMB (equal to about 13 USD). Ethical approval was obtained from the University of Hong Kong. Written informed consent was obtained from all participants prior to the study.

Results

Sentence Recognition in Quiet and in Noise

Pearson product–moment correlation coefficients with Bonferroni corrections showed that age, hearing thresholds, education level, previous experience with dual compression, and WM were not significantly related to SRTs for dual compression and fast-acting compression in quiet or in noise. Therefore, data from all participants were combined for further analysis. Paired samples t-tests showed no significant difference between SRTs for dual-compression (Mean = 49.1, SD = 3.7) and fast-acting compression (Mean = 48.7, SD = 4.1) in quiet (p > .05) but a significant difference was found between SRTs for dual compression (Mean = 0.9, SD = 2.2), and for fast-acting compression (Mean = 1.8, SD = 2.1) in noise; t(30) = −4.0, p < .001, Cohen’s d = 0.72.

Sound-Quality Preferences

Spearman’s rank-order correlation coefficients showed that hearing thresholds, age, education level, WM, and previous experience with dual compression were not significantly related to compression setting preferences. Therefore, data from participants with and without prior experience with dual compression were combined for further analysis. Figure 6 shows sound quality paired comparison results at +1 dB and +4 dB S/N. A rating of 4 indicated no difference between the two compression strategies. A score above 4 indicated a preference for fast-acting compression or that fast-acting compression sounded louder than dual compression, while a score below 4 indicated a preference for dual compression or that dual compression sounded louder than fast compression. Mean ratings at +4 dB S/N were 3.5 (SD = 0.7) for listening effort, 3.5 (SD = 0.7) for listening comfort, 3.3 (SD = 0.9) for speech clarity, 3.3 (SD = 0.9) for overall sound quality, and 3.5 (SD = 0.8) for loudness. Wilcoxon signed-rank tests with Bonferroni correction indicated that paired comparison ratings for all five sound qualities were significantly lower than 4, suggesting slight preferences for dual compression (ps < .05) at +4 dB S/N with effect sizes (r) ranging from −0.52 to −0.60. In other words, participants found that listening to speech with dual compression at +4 dB S/N required less effort and was more comfortable, clearer, louder, and of better overall quality.

Figure 6.

Box-and-Whisker Plots of Sound Quality Paired Comparisons for the +1 dB and +4 dB S/N Conditions. The “+” indicates mean score, the box represents the quartiles, and the whiskers indicate the range of ratings. The “*” indicates a score significantly lower than 4, suggesting a preference for dual compression. At +1 dB S/N, only the loudness rating was significantly lower than 4. In other words, participants found that dual compression was louder than fast-acting compression, but there was no overall preference for dual compression regarding listening effort, listening comfort, speech clarity, or overall sound quality. Correlation analyses among preference ratings obtained at +1 dB S/N were not conducted since participants reported extremely low speech intelligibility in this condition. At +4 dB S/N, positive correlations (with Bonferroni correction) were found among four of the five sound quality measures: listening effort, listening comfort, speech clarity, and overall quality (rs = 0.52–0.60). Loudness judgments were not correlated with any sound quality judgments (Table 2). This was confirmed by a principal component analysis with orthogonal rotation (varimax). Only one component was found, which yielded an eigenvalue value greater than 1 and explained 62.5% of the variance. Factor loadings for listening effort, listening comfort, speech clarity, overall quality, and loudness were 0.77, 0.92, 0.88, 0.91, and 0.30, respectively. Thus, listening effort, listening comfort, speech clarity, and overall quality represented a single dimension, with loudness being different from the other four.

Table 2.

Spearman’s Rank Correlation Coefficients Between Paired Comparison Sound Quality Ratings at +4 dB S/N.

	Comfort	Clarity	Overall	Loudness
Effort	0.66**	0.56*	0.52*	0.07
Comfort		0.73**	0.85**	0.15
Clarity			0.78**	0.15
Overall				0.27

*p < .05. ** p < .01 after Bonferroni correction.

Spearman’s Rank Correlation Coefficients Between Paired Comparison Sound Quality Ratings at +4 dB S/N. *p < .05. ** p < .01 after Bonferroni correction.

Discussion

In this study, we evaluated the effects of dual and fast-acting compression on SRTs and sound-quality preferences. In quiet, there was no difference in SRTs for the two compression settings. However, in noise, dual compression yielded better SRTs and higher sound-quality preferences than fast-acting compression.

Sentence Recognition

Slow-acting compression provides less amplification for short-term low-level inputs than fast-acting compression (Moore, 2008). The dual compression investigated in this study was designed to give an RMS output SPL matched with that of the fast-acting compression, while offering larger signal dynamics for speech signals (Figure 3). However, there was no significant difference in SRTs in quiet between dual and fast-acting compression settings, suggesting that the slightly lower low-level gain of the dual-compression system had no material effects in quiet. In noise, SRTs with dual compression were significantly lower (better) than those with fast-acting compression. As mentioned previously, this may have happened because dual compression is better at preserving temporal cues than fast-acting compression. The average improvement in SRTs in noise was 0.9 dB, which corresponds to approximately 10% improvement in speech intelligibility for the MHINT (Wong, Soli, et al., 2007). However, this improvement was small and may not be clinically noticeable by some listeners.

Sound Quality Comparisons

Paired comparisons were performed at +1 and +4 dB S/N. Based on the SRTs and informal communications, we assumed that most participants could understand the content of the continuous discourse at +4 dB S/N and thus were able to make reliable judgments of sound quality. Slight preferences for dual compression at +4 dB S/N were found for listening effort, listening comfort, speech clarity, and overall sound quality. This may be attributed to the slow-acting compressor incorporated in the dual compression setting in this study. Previous studies reported that slow-acting compression was likely to be preferred over fast-acting compression for listening effort, listening comfort, and overall sound quality (Cox & Xu 2010; Hansen, 2002). The positive findings in this study add to the literature comparing dual and fast-acting compression and may help clinicians to decide whether to adopt this type of dual compression. Loudness judgments were not correlated with sound-quality preferences, suggesting that loudness is a dimension different from the sound-quality attributes. This was also found in our previous study (Wong et al., 2018). Although the dual compression used here was matched to the fast compression for RMS level, signals processed with the dual compression was perceived to be slightly louder than those processed with fast compression at +1 dB and +4 dB S/N. This may be explained by the fact that peak levels were somewhat higher with dual compression than with fast-acting compression (Figure 3) and the loudness of dynamic signals is known to be strongly influenced by peak levels (Glasberg & Moore, 2002). For the paired comparisons performed at +1 dB S/N, there was no preference for dual compression over fast-acting compression in listening effort, listening comfort, speech clarity, or overall sound quality. The lack of sound-quality preferences may have been due to participants not being able to discriminate the speech in noise well enough to make judgments. Smeds et al. (2015) showed that listening in noise mostly occurs in positive S/N settings; thus, one may question whether measuring sound quality at a low S/N is ecologically valid. The findings of this study suggest that measuring sound quality at a low S/N ratio may not provide additional information for the evaluation of compression in terms of sound-quality preferences. HA users can make judgments of sound quality when they are able to hear the speech in noise (e.g., at +4 dB S/N), but they may not be able to do so at a low S/N (e.g., +1 dB S/N or below). This finding is consistent with those of Preminger and Van Tasell (1995) and Souza et al. (2013), who suggested that low intelligibility would dominate other quality attributes. Therefore, speech quality should only be evaluated when speech intelligibility is acceptable. Compression is often implemented in combination with NR algorithms to improve the S/N by reducing HA gain for background noises while preserving gain for speech (Wong et al., 2018). The NR algorithm can be viewed as dynamic range expansion, often with comparable smoothing time constants to the compression, and hence, potentially counteracting the effects of compression. The combined effects, which may vary greatly across commercially available HAs (Brons et al., 2015), must be considered. Kortlang et al. (2018) evaluated three combined implementations of NR and compression. They showed that parallel operation of NR and compression denoted as (p) in their publication, reduced noise annoyance for listeners with hearing loss more effectively than a serial implementation. This parallel implementation is comparable to the one used in this study (Figure 1), where the NR was applied to the signal independent from the compression, and little interaction between the NR and the compression was expected. In addition, the NR had a short RT (<10 ms), which helped to decouple the NR and the compression.

The Effects of Hearing Thresholds, Education Level, Previous Experience With Dual Compression and Working Memory

Although previous studies have shown correlations between hearing thresholds and SRTs (Souza & Sirow, 2014), and between education level and SRTs in noise (Chen et al., 2020), such correlations were not found in this study. This may be partially attributed to the relative narrow range of hearing thresholds and education level, as most participants (n = 27) exhibited moderately severe to severe hearing loss (i.e., 56–82 dB), and 30 out of the 31 participants were graduates of junior middle school or below (i.e., ≤9 years of education). Previous studies suggested that individuals with better WM had better in speech perception in noise with fast than with slow compression, while those with low WM performed better with slow compression than with fast compression (Cox & Xu, 2010; Ohlenforst et al., 2016; Souza & Sirow, 2014). This study failed to find a significant relationship between compression type and WM. There may be two reasons for this. First, previous studies tended to categorize participants as having either high or low WM on the basis of the median for the group (Cox & Xu, 2010; Ohlenforst et al., 2016; Souza & Sirow, 2014). This categorization was quite arbitrary and might have led to bias, especially when the sample was small. The WM scores were regarded as a continuous variable in this study as we were interested in comparing the effects of compression type on speech perception and sound-quality preference when controlling the effects of WM. When WM is treated as a continuous variable, a relationship between WM and compression type may not be found (e.g., Kuk et al., 2019). In addition, the relationship between WM and speech perception was smaller for participants who has used HAs for a longer period (Ng & Rönnberg, 2020; Rählmann et al., 2017). According to the ELU Model (Rönnberg et al., 2019), WM comes into play when a mismatch exists between the perceptual input (e.g., phonology, prosody, syntax, and semantics) and the representation stored in LTM. Although signal processing in HAs could cause such a mismatch, after consistent exposure to distorted information via HAs, newly established and recalibrated internal representations could gradually supplement the existing LTM representations, weakening the WM–SRT relationship (Ng & Rönnberg, 2020). In this study, participants had used HAs for an average of 10.5 years and thus were experienced HA users. Therefore, the nonsignificant WM–SRT relationship is not surprising. This is consistent with the findings of Rählmann et al. (2017). Using a master HA, they found that the relationship between WM and SRT significantly weakened for listeners with more than 7 years of HA experience compared to those with less than 3.5 years of HA experience and became nonsignificant for listeners with the most experience, although the HA signal processing by Rählmann et al. (2017) was not the same as in the participants’ own HAs. The long-term use of HA in this study may also explain the nonsignificant relationship between previous experience with dual compression and SRTs, since familiarity with HA signal processing may have alleviated a possible mismatch between the speech input and LTM representations, even when the HA signal processing was not the same as for their own HAs (Rählmann et al., 2017).

Limitations

We did not compare fast-acting and dual compression with slow-acting compression. This makes it difficult to unambiguously allocate the improvements in SRTs and preference for dual compression to the slow-acting compression component or the combination of fast- and slow-acting compression. It is worth noting that the NR function was turned on when the SRTs were measured in noise for better ecological validity. However, we expected little interaction between the NR and the compression, as discussed earlier. The findings of this study may not apply to HAs employing different NR functions or a different interaction between NR and compression. The speech-shaped noise used for the SRT measurement might not have reflected the potential advantages of the fast compression for backgrounds with amplitude fluctuations. Fast compression is better than slow compression at restoring the audibility of weak sounds rapidly following intense sounds, providing the potential for listening in the dips (Moore, 2008). In backgrounds such as multitalker babble, fast-acting compression allow better “glimpses” of the target sound (Moore et al., 1999). In addition, fast-acting compression improves the ability to detect a weak consonant following a relatively intense vowel (Moore, 2008). Chen et al. (2013) reported a 3:1 intelligibility advantage of vowel-only sentences over consonant-only sentences in Mandarin as compared to an intelligibility advantage of 2:1 in English, suggesting that consonants in Mandarin do not contribute as much to sentence intelligibility as consonants in English (Chen et al., 2017). Therefore, future studies are warranted to examine whether the advantages of slow or dual compression over fast compression are language dependent.

Conclusions

Dual compression, as implemented in this study, yielded slightly better SRTs for speech in speech-spectrum-shaped noise than fast-acting compression. In addition, participants slightly preferred dual compression over fast compression for sound quality at +4 dB S/N. Experience with dual compression, age, gender, and degree of hearing loss did not affect the results, suggesting that clinicians need not be concerned about these factors when switching from fast to dual compression. Whether these results predict performance at other S/Ns using different types of noise or in real-life situations and whether they could be applied to other types of fast and dual compression systems should be assessed in future studies.

41 in total

1. Anthropometric manikin for acoustic research.

Authors: M D Burkhard; R M Sachs
Journal: J Acoust Soc Am Date: 1975-07 Impact factor: 1.840

2. Development and analysis of an International Speech Test Signal (ISTS).

Authors: Inga Holube; Stefan Fredelake; Marcel Vlaming; Birger Kollmeier
Journal: Int J Audiol Date: 2010-12 Impact factor: 2.117

3. Linear and nonlinear hearing aid fittings--2. Patterns of candidature.

Authors: Stuart Gatehouse; Graham Naylor; Claus Elberling
Journal: Int J Audiol Date: 2006-03 Impact factor: 2.117

4. The effect of compression ratio and release time on the categorical rating of sound quality.

Authors: A C Neuman; M H Bakke; C Mackersie; S Hellman; H Levitt
Journal: J Acoust Soc Am Date: 1998-05 Impact factor: 1.840

5. The Role of Lexical Tone Information in the Recognition of Mandarin Sentences in Listeners With Hearing Aids.

Authors: Yuan Chen; Lena L N Wong; Jinyu Qian; Volker Kuehnel; Solveig Christina Voss; Fei Chen
Journal: Ear Hear Date: 2020 May/Jun Impact factor: 3.570

6. Assessment of hearing aid algorithms using a master hearing aid: the influence of hearing aid experience on the relationship between speech recognition and cognitive capacity.

Authors: Sebastian Rählmann; Markus Meis; Michael Schulte; Jürgen Kießling; Martin Walger; Hartmut Meister
Journal: Int J Audiol Date: 2017-04-27 Impact factor: 2.117

7. Evaluation of the Efficacy of a Dual Variable Speed Compressor over a Single Fixed Speed Compressor.

Authors: Francis Kuk; Chris Slugocki; Petri Korhonen; Eric Seper; Ole Hau
Journal: J Am Acad Audiol Date: 2018-11-13 Impact factor: 1.664

8. Effects of audibility and multichannel wide dynamic range compression on consonant recognition for listeners with severe hearing loss.

Authors: Evelyn Davies-Venn; Pamela Souza; Marc Brennan; G Christopher Stecker
Journal: Ear Hear Date: 2009-10 Impact factor: 3.570

9. Reliability and validity of the CogState battery Chinese language version in schizophrenia.

Authors: Na Zhong; Haifeng Jiang; Jin Wu; Hong Chen; Shuxing Lin; Yan Zhao; Jiang Du; Xiancang Ma; Ce Chen; Chengge Gao; Kenji Hashimoto; Min Zhao
Journal: PLoS One Date: 2013-09-02 Impact factor: 3.240

10. Preferred Compression Speed for Speech and Music and Its Relationship to Sensitivity to Temporal Fine Structure.

Authors: Brian C J Moore; Aleksander Sęk
Journal: Trends Hear Date: 2016-09-07 Impact factor: 3.293

1 in total

1. Speech Perception in Noise Is Associated With Different Cognitive Abilities in Chinese-Speaking Older Adults With and Without Hearing Aids.

Authors: Yuan Chen; Lena L N Wong; Shaina Shing Chan; Joannie Yu
Journal: Front Psychol Date: 2022-01-04

1 in total