Aging may limit speech understanding outcomes in cochlear-implant (CI) users. Here, we examined age-related declines in auditory temporal processing as a potential mechanism that underlies speech understanding deficits associated with aging in CI users. Auditory temporal processing was assessed with a categorization task for the words dish and ditch (i.e., identify each token as the word dish or ditch) on a continuum of speech tokens with varying silence duration (0 to 60 ms) prior to the final fricative. In Experiments 1 and 2, younger CI (YCI), middle-aged CI (MCI), and older CI (OCI) users participated in the categorization task across a range of presentation levels (25 to 85 dB). Relative to YCI, OCI required longer silence durations to identify ditch and exhibited reduced ability to distinguish the words dish and ditch (shallower slopes in the categorization function). Critically, we observed age-related performance differences only at higher presentation levels. This contrasted with findings from normal-hearing listeners in Experiment 3 that demonstrated age-related performance differences independent of presentation level. In summary, aging in CI users appears to degrade the ability to utilize brief temporal cues in word identification, particularly at high levels. Age-specific CI programming may potentially improve clinical outcomes for speech understanding performance by older CI listeners.
Aging may limit speech understanding outcomes in cochlear-implant (CI) users. Here, we examined age-related declines in auditory temporal processing as a potential mechanism that underlies speech understanding deficits associated with aging in CI users. Auditory temporal processing was assessed with a categorization task for the words dish and ditch (i.e., identify each token as the word dish or ditch) on a continuum of speech tokens with varying silence duration (0 to 60 ms) prior to the final fricative. In Experiments 1 and 2, younger CI (YCI), middle-aged CI (MCI), and older CI (OCI) users participated in the categorization task across a range of presentation levels (25 to 85 dB). Relative to YCI, OCI required longer silence durations to identify ditch and exhibited reduced ability to distinguish the words dish and ditch (shallower slopes in the categorization function). Critically, we observed age-related performance differences only at higher presentation levels. This contrasted with findings from normal-hearing listeners in Experiment 3 that demonstrated age-related performance differences independent of presentation level. In summary, aging in CI users appears to degrade the ability to utilize brief temporal cues in word identification, particularly at high levels. Age-specific CI programming may potentially improve clinical outcomes for speech understanding performance by older CI listeners.
Time-varying information contains important cues for speech perception (Rosen, 1992). For example,
the duration of a silent interval can influence the identification of final
fricative-affricate contrasts, such that listeners’ perception of a word can change
from a fricative (e.g., dish) to an affricate (e.g., ditch) as the silence duration
increases (Dorman, Raphael,
& Isenberg, 1980). Even when spectral information is severely
degraded, such as in the case of listening through a cochlear implant (CI), temporal
information can provide relatively robust information to support listeners’
intelligibility of consonants, vowels, words, and sentences (Friesen, Shannon, Baskent, & Wang,
2001; Shannon, Zeng,
Kamath, Wygonski, & Ekelid, 1995).As people age, their speech understanding abilities tend to decrease, particularly in
adverse conditions such as in the presence of background noise (e.g., Dubno, Dirks, & Morgan,
1984; Frisina &
Frisina, 1997) or when the target speech signals are distorted by time
compression, reverberation, and interruption (e.g., Gordon-Salant & Fitzgibbons, 1993). One
potential mechanism underlying these speech understanding difficulties is a decline
in the ability to process temporal properties of sounds with advancing age (Füllgrabe, Moore, & Stone,
2015; Gordon-Salant,
Fitzgibbons, & Yeni-Komshian, 2011; Schneider & Pichora-Fuller, 2001).
Age-related declines in temporal processing abilities have been documented across a
variety of tasks in individuals with normal hearing (NH) and hearing impairment (HI)
(Fitzgibbons &
Gordon-Salant, 1995; Gordon-Salant & Fitzgibbons, 1999; Gordon-Salant, Yeni-Komshian, & Fitzgibbons,
2008; Gordon-Salant,
Yeni-Komshian, Fitzgibbons, & Barrett, 2006; Humes, Kewley-Port, Fogerty, & Kinney,
2010; Pichora-Fuller
& Souza, 2003; Roque, Gaskins, Goupell, Anderson, & Gordon-Salant, 2019; Snell, 1997; Strouse, Ashmead, Ohde, &
Grantham, 1998). For example, Gordon-Salant et al. (2008) showed that
older adults exhibited reduced ability to identify and discriminate word contrasts
based on a variety of temporal duration cues such as dish–ditch (silence duration)
and buy-pie (voice onset time).Mechanistically, the temporal processing deficits associated with aging are likely to
result from encoding differences in the auditory system. For example, animal studies
have shown that aging is associated with the loss of spiral ganglion cells in the
auditory periphery (Makary,
Shin, Kujawa, Liberman, & Merchant, 2011; Otte, Schuknecht, & Kerr, 1978; Sergeyenko, Lall, Liberman, &
Kujawa, 2013), which may limit neural synchronization to temporal
features in the acoustic inputs (Lopez-Poveda, 2014; Lopez-Poveda & Barrios, 2013). Both animal and human work have
demonstrated that aging is associated with a deterioration in the encoding of
temporal cues at the midbrain level (e.g., Roque et al., 2019; Walton, Frisina, & O’Neill, 1998) as
well as at the cortical level (e.g., Hughes, Turner, Parrish, & Caspary,
2010; Roque et al.,
2019; Tremblay,
Piskosz, & Souza, 2003; Willott, 1991).The use of a CI as an intervention for hearing difficulties is increasing among the
aging population. Both retrospective and prospective evidence indicate that aging
may be a factor limiting speech understanding performance in CI users (Blamey et al., 2013; Holden et al., 2013; Kim et al., 2018; Shader
et al., 2019; Sladen &
Zappler, 2015). In this study, we examined age-related auditory temporal
processing deficits as a potential mechanism that underlies the ostensible speech
understanding deficits associated with aging in CI users. The investigation of
auditory temporal processing ability seems an appropriate focus considering that:
(a) The CI transmits mainly temporal envelope cues (Loizou, 2006), (b) temporal modulation
processing ability has been shown to be correlated with speech understanding
outcomes in CI users (Fu,
2002), and (c) CI listeners appear to rely mostly on temporal cues to
categorize speech sounds (Winn,
Chatterjee, & Idsardi, 2012).To date, there is limited direct evidence regarding age-related temporal processing
differences in CI users. Recently, Goupell et al. (2017) utilized vocoded word
stimuli to simulate listening through CI processors and examined age-related
differences in the ability to use temporal cues to categorize word segments (i.e., a
dish–ditch continuum with varying duration of a silent interval before the fricative
noise). They found that older NH listeners needed longer silence durations to
discriminate the word contrasts compared with younger NH listeners, suggesting
age-related temporal processing deficits with spectrally degraded speech signals.
Similar evidence has demonstrated that aging may decrease older NH listeners’
ability to use temporal envelope cues of vocoded stimuli for tasks such as phoneme
recognition (Schvartz,
Chatterjee, & Gordon-Salant, 2008), fundamental frequency
discrimination (Schvartz-Leyzac
& Chatterjee, 2015), and gender identification (Schvartz & Chatterjee, 2012).The purpose of this study was to examine age-related changes in temporal processing
across adult younger CI (YCI) listeners (<45 years old), middle-aged CI (MCI;
between 45 and 64 years old), and older CI (OCI; >64 years old) ages. Temporal
processing ability was quantified as the performance on a speech categorization task
based on silence duration cues (dish–ditch continuum; Gordon-Salant et al., 2006, 2008; Goupell et al., 2017; Roque et al., 2019). The rationale for
investigating speech categorization is that this type of task may target a specific
problem in speech understanding with a CI (Winn, Won, & Moon, 2016). Particularly,
performance on the speech categorization task may be more closely related to speech
understanding performance than traditional psychophysical tasks that target temporal
processing (e.g., temporal modulation detection; Winn et al., 2016). We hypothesized that
the ability to categorize dish versus ditch would diminish with increasing age.
Experiment 1: Processing of Silence Duration Cues in Word Segments at a Fixed
Presentation Level in CI Listeners
Methods
Participants
Three groups of adult CI users participated in this experiment: seven YCI (7
ears; 21.0 to 42.5 years, mean age and standard deviation
[SD] = 30.5 ± 9.3), 19 MCI (19 ears; 45.5 to
63.6 years, mean age and SD = 56.2 ± 5.3), and 12 OCI (12
ears; 65.4 to 81.0 years, mean age and SD = 72.4 ± 5.0).
All participants were native speakers of American English. We screened
participants with the Montreal Cognitive Assessment (MoCA; Nasreddine et al.,
2005) to ensure normal or near-normal cognitive function (≥22 out
of 30 possible points) (Cecato, Martinelli, Izbicki, Yassuda, & Aprahamian, 2016;
Dupuis et al.,
2015). The MoCA data were missing for 1 YCI (CCE), 4 MCI (CBI,
CBM, CCC, and CCH), and 2 OCI (CAN and CBY). Details of their demographic
information are provided in Table 1. The effect of age group
was not significantly different for duration of deafness, Kruskal–Wallis,
χ2(2) = 1.198, p = .549, or
duration of CI use, Kruskal–Wallis χ2(2) = 5.160,
p = .076. Written informed consent was
obtained from all participants. All materials and procedures were approved
by the Institutional Review Board at the University of Maryland. All
participants received monetary compensation for their participation.
Table 1.
Demographic Information for the CI Participants.
Experiment no.
Stimulus presentation method
Age-group
Subject code
Sex
Age (years)
CI ear
DoD (years)
CI use duration (years)
CI processor
Etiology
1, 2
DAI
YCI
CAR
M
24.0
Right
1
6.0
Cochlear-Freedom
Genetic
2
DAI
YCI
CAT
M
28.7
Left
12
6.9
Cochlear-Freedom
Unknown
1, 2
DAI
YCI
CAT
M
27.8
Right
8
9.6
Cochlear-Freedom
Genetic
1, 2
DAI
YCI
CBP
F
36.1
Left
3
16.1
Cochlear-Nucleus 5
Unknown
1, 2
Headphone
YCI
CBQ
M
21.0
Right
3
18.0
Advanced Bionics-Harmony
Genetic
1, 2
Headphone
YCI
CBU
M
41.2
Left
39
1.1
Advanced Bionics-Naida
Unknown
1, 2
DAI
YCI
CBW
M
42.5
Right
1
16.3
Cochlear-Nucleus 6
Cogan's Syndrome
1, 2
DAI
YCI
CCE
F
21.0
Right
2
19.0
Cochlear-Nucleus 6
Unknown
2
DAI
YCI
CCM
F
35.2
Left
<1
2.2
MED-EL-Opus2
Ototoxicity
2
DAI
YCI
CCM
F
35.2
Right
<1
1.2
MED-EL-Sonnet
Ototoxicity
2
DAI
YCI
CCV
M
31.5
Right
<1
29.5
Cochlear-Nucleus 6
Bacterial meningitus
1, 2
DAI
MCI
CAJ
F
63.6
Right
47
16.0
Cochlear-Nucleus 5
Genetic
2
DAI
MCI
CAQ
F
57.7
Right
17
0.7
Cochlear-Nucleus 6
Ménière disease
1, 2
DAI
MCI
CAQ
F
57.7
Left
17
0.7
Cochlear-Nucleus 6
Ménière disease
2
DAI
MCI
CAW
M
53.4
Right
17
4.0
Cochlear-Nucleus 5
Unknown
1, 2
DAI
MCI
CAW
M
53.4
Left
2
7.1
Cochlear-Nucleus 5
Unknown
1, 2
DAI
MCI
CAX
M
53.3
Left
<1
4.3
Cochlear-Nucleus 5
Unknown
2
DAI
MCI
CAY
F
59.0
Left
5
6.0
Cochlear-Nucleus 5
Unknown
1, 2
DAI
MCI
CAY
F
57.6
Right
<1
9.6
Cochlear-Nucleus 5
Unknown
1, 2
DAI
MCI
CBA
F
54.1
Left
31
2.3
Cochlear-Nucleus 5
Stickler's Syndrome
1, 2
DAI
MCI
CBF
M
57.1
Left
5
5.3
Cochlear-Nucleus 6
Hereditary
1, 2
DAI
MCI
CBG
F
61.2
Right
2
3.9
Cochlear-Nucleus 5
Rh factor
1
DAI
MCI
CBH
F
61.4
Right
13
3.4
Cochlear-Nucleus 5
Nerve Damage
1, 2
DAI
MCI
CBI
M
56.1
Left
<1
1.5
MED-EL-Opus2
Unknown
1, 2
DAI
MCI
CBJ
F
52.7
Right
33
2.7
MED-EL-Opus2
Genetic
1, 2
DAI
MCI
CBK
F
56.0
Left
5
6.0
Cochlear-Freedom
Unknown
1, 2
Headphone
MCI
CBM
F
50.5
Left
13
4.5
Advanced Bionics-Harmony
Unknown
1, 2
DAI
MCI
CBN
M
50.4
Right
2
8.4
Cochlear-Nucleus 5
Rubella
1, 2
DAI
MCI
CBR
F
62.1
Left
54
2.1
Cochlear-Nucleus 6
Premature birth
1, 2
DAI
MCI
CBV
F
62.8
Left
11
3.8
Cochlear-Nucleus 5
Enlarged Vestibular Aqueduct Syndrome
1, 2
Headphone
MCI
CCC
F
63.0
Right
5
8.0
Advanced Bionics-Naida Q70
Unknown
1
Headphone
MCI
CCD
M
45.5
Right
24
21.6
Cochlear-Nucleus 6
Unknown
1
Headphone
MCI
CCH
M
48.7
Right
0
<1
Advanced Bionics-Naida Q90
Unknown
2
DAI
MCI
CCL
F
51.7
Left
<1
7.7
Cochlear-Nucleus 6
Ménière disease
2
DAI
MCI
CCL
F
51.7
Right
2
1.7
Cochlear-Nucleus 6
Ménière disease
2
DAI
OCI
CAB
F
71.3
Left
36
19.9
Cochlear-Nucleus 5
Unknown
2
DAI
OCI
CAB
F
71.3
Right
44
13.1
Cochlear-Nucleus 5
Unknown
2
DAI
OCI
CAD
M
76.7
Left
3
13.6
Cochlear-Nucleus 6
Unknown
2
DAI
OCI
CAD
M
76.7
Right
7
8.5
Cochlear-Nucleus 5
Unknown
2
Headphone
OCI
CAK
M
69.9
Left
22
12.9
Cochlear-Nucleus 6
Sinus surgery
1, 2
DAI
OCI
CAK
M
68.6
Right
<1
10.5
Cochlear-Nucleus 6
Unknown
1, 2
DAI
OCI
CAM
F
70.5
Right
6
4.5
Cochlear-Nucleus 6
Unknown
1, 2
DAI
OCI
CAN
F
75.4
Right
?
9.8
Cochlear-Freedom
Unknown
1, 2
DAI
OCI
CAO
F
70.1
Right
5
5.1
Cochlear-Nucleus 5
Measles
1, 2
DAI
OCI
CBB
M
81.0
Right
2
2.0
Cochlear-Nucleus 5
Sudden SNHL
2
DAI
OCI
CBC
F
77.8
Right
17
1.8
Cochlear-Nucleus 6
Hereditary, measles
1, 2
DAI
OCI
CBC
F
76.8
Left
<1
2.8
Cochlear-Nucleus 6
Hereditary, measles
1, 2
DAI
OCI
CBD
M
79.3
Right
<1
5.3
Cochlear-Freedom
Unknown
1
Headphone
OCI
CBS
F
65.8
Right
15
0.6
Advanced Bionics-Naida
Unknown
1, 2
DAI
OCI
CBT
F
73.9
Right
11
2.9
Cochlear-Nucleus 5
Unknown
1, 2
Headphone
OCI
CBY
M
69.5
Right
12
15.5
Advanced Bionics-Naida
Unknown
2
DAI
OCI
CCA
M
75.4
Right
1
4.3
Cochlear-Nucleus 6
Antibiotics, aging
2
DAI
OCI
CCA
M
75.4
Left
61
2.0
Cochlear-Nucleus 6
Measles
2
DAI
OCI
CCI
F
65.4
Right
8
3.4
Cochlear-Nucleus 6
Otosclerosis, measles, mumps, chicken pox
1, 2
DAI
OCI
CCI
F
65.4
Left
2
14.4
Cochlear-Nucleus 6
Otosclerosis, measles, mumps, chicken pox
2
Headphone
OCI
CCJ
M
72.5
Right
4
19.0
Advanced Bionics-Harmony
Unknown
1, 2
Headphone
OCI
CCJ
M
72.5
Left
15
7.8
Advanced Bionics-Harmony
Unknown
2
DAI
OCI
CDG
M
73.0
Left
<1
5.0
Cochlear-Nucleus 6
Virus
Note. DoD = duration of deafness; DAI = direct
audio input; YCI = younger cochlear-implant users;
MCI = middle-aged cochlear-implant users; OCI = older
cochlear-implant users; CI = cochlear-implant; ? = Unknown.
Demographic Information for the CI Participants.Note. DoD = duration of deafness; DAI = direct
audio input; YCI = younger cochlear-implant users;
MCI = middle-aged cochlear-implant users; OCI = older
cochlear-implant users; CI = cochlear-implant; ? = Unknown.
Stimuli
Stimuli for the current experiment have been described in previous studies
(Gordon-Salant
et al., 2006; Goupell et al., 2017). The stimuli consisted of a seven-step
continuum of speech tokens that varied the silence duration so that the
endpoints were perceived as the words “dish” and “ditch.” The continuum was
created as follows: An adult American male speaker produced the words dish
and ditch in isolation. A hybrid ditch was then created by replacing the
burst and frication in the ditch token with the frication portion of the
dish token. The silence duration in the hybrid ditch was varied
parametrically over seven equal 10-ms steps from 60 to 0 ms, resulting in a
stimulus set that spanned a perceptual continuum from ditch (60-ms silence
duration) to dish (0-ms silence duration). Figure 1 displays the waveforms and
spectrograms of the endpoint stimuli (i.e., 0- and 60-ms silence duration)
from the continuum.
Figure 1.
Spectrograms (a; top row) and waveforms (b; bottom row) for the
endpoint stimuli from the stimulus continuum. Left: The 0-ms silence
duration stimulus, perceived as dish. Right: The 60-ms silence
duration stimulus, perceived as ditch. Black triangles on the left
panels indicate the onset of the frication portion. Black triangles
on the right panels indicate the silence period before the frication
portion.
Spectrograms (a; top row) and waveforms (b; bottom row) for the
endpoint stimuli from the stimulus continuum. Left: The 0-ms silence
duration stimulus, perceived as dish. Right: The 60-ms silence
duration stimulus, perceived as ditch. Black triangles on the left
panels indicate the onset of the frication portion. Black triangles
on the right panels indicate the silence period before the frication
portion.
Design
This experiment consisted of 280 trials (7 stimuli × 40 repetitions) that
were usually divided into four blocks of 70 trials each (7 stimuli × 10
repetitions). For two participants, the trials were divided into more than
four blocks (5–6 blocks) to reduce fatigue. The order of the trials was
randomized in each block, which was different for each listener.
Participants could take short breaks between blocks. The stimulus
intensities were adjusted for individual participants so that the sound was
set at a level that participants reported to be comfortably soft while
maintaining good discrimination between the endpoint dish–ditch stimuli
(i.e., 0- and 60-ms silence duration). In other words, participants chose a
level at which they thought they could best discriminate the two words.
Procedure
Participants were tested individually in a sound-attenuating booth
(Industrial Acoustics, Inc., Bronx, NY). The participant’s task was to
respond whether they heard “dish” or “ditch” after each stimulus
presentation. Participants self-initiated each trial by clicking a box
reading “Begin Trial” on the screen. The stimulus, randomly selected from
the seven-step continuum, was then presented. Participants responded by
clicking a box on the left or right side of the screen, corresponding to
“dish” and “ditch,” respectively. Participants were encouraged to guess if
they were unsure and had unlimited time to respond. After their response,
the task moved on to the next trial. No feedback was provided.Before testing, participants received a training task, wherein they were
required to identify the endpoint stimuli, 0- and 60-ms silence duration, as
dish or ditch, respectively. The trial procedure was identical to the main
experiment except that participants were provided with correct answer
feedback after each response. Ten repetitions of each stimulus were
presented. An accuracy of at least 90% was required before proceeding to the
main experiment. All participants established stable performance and
achieved the 90% goal within 15 mins, and most achieved near-perfect word
identification performance on the training task.Stimulus presentation and response collection were controlled with custom
scripts in MATLAB (MathWorks, Natick, MA). The stimuli were presented
through participants’ clinical processors with their everyday settings as
set by their audiologists. We chose direct audio input (DAI) to directly
deliver stimuli to the processor monaurally (see Table 1). Note that newer
generations of CI sound processors are moving away from the inclusion of
DAI. When we encountered participants without the ability to use DAI,
headphones (Sennheiser HD650s) were used to deliver stimuli to the
processor. The testing (including training and breaks) was usually completed
within 1 to 1.5 hr.
Psychometric function analysis
The percentage of dish responses along the dish–ditch continuum was
calculated. A logistic function implemented in the
psignifit toolbox (Wichmann & Hill, 2001) in
MATLAB (MathWorks, Natick, MA) was used to fit the psychometric function for
the percentage data of dish responses from each condition for each
listener.Two metrics were calculated from the psychometric function: 50% crossover
point and slope. The 50% crossover point was quantified as the silence
duration (in ms) that corresponds to 50% of dish responses. In some cases,
no crossover point was found between 0- and 60-ms silence duration. In such
cases, the curve fit was extrapolated out of the range of 0- to 60-ms and
then the crossover point was estimated. Crossover points were adjusted to be
in the range of 0 to 70 ms, such that values larger than 70 ms or smaller
than 0 ms were set to be 70 ms or 0 ms, respectively. The slope was
quantified as the maximum slope that occurred over the entire function or
the percentage change in dish responses per unit change in silence duration
(%/ms). The upper limit of slope considered appropriate was set at 7.5%/ms
because that would be a single step change from 100% to 0% dish responses
(Goupell et al.,
2017).If the slope of the function was backward, such that longer silence duration
was associated with more dish responses, we considered the output metric
values (crossover point and slope) to be inappropriate. In addition, if the
data from a given condition (e.g., uniform 100% or 0% dish responses along
the dish–ditch continuum) failed to be fitted by the logistic function, we
also considered those data inappropriate. For both scenarios, we applied the
following procedures to set the metric values: If the percentage of dish
responses for any step along the dish–ditch continuum was not higher than
50%, this means that a 50% crossover point may not occur even at the
shortest possible silence duration (i.e., 0 ms). In such cases, we set the
crossover point to be 0 ms. Otherwise, we set it to be 70 ms. The slope was
set to be 0%/ms for both of these patterns of data.In the current experiment, data from all participants were successfully
fitted by the logistic function and the fitted functions were in the
appropriate direction such that longer silence durations were associated
with less dish responses.
Statistical analysis
We performed separate Kruskal–Wallis tests on the crossover point and slope
data to examine the aging effect, with age group (YCI, MCI, or OCI) as the
between-subjects factor. The crossover point and slope data from one MCI
participant (CAW) consisted of 30 out of 40 repetitions for each stimulus
due to computer errors, but were included in the analysis. We also performed
similar analyses to examine the aging effect on the demographic data
(duration of deafness and duration of CI use). Missing (unknown) values in
the demographic data were excluded and values specified as “<1” were
treated as 1. We applied the same procedures to analyze the demographic data
for Experiment 2.
Results
Figure 2(a) displays the
percentage of dish responses as a function of silence duration. The three age
groups (YCI, MCI, and OCI) were able to accurately discriminate the words dish
and ditch by comparing their responses at the endpoints. Their performance was
comparable across the various silence durations. Figure 2(b) and (c) shows mean crossover
points and slopes of the performance functions across the three groups,
respectively. The effect of age group was not statistically significant for the
crossover point, Kruskal–Wallis χ2(2) = 2.608,
p = .272, or the slope, Kruskal–Wallis
χ2(2) = 1.853, p = .396.
Therefore, while Experiment 1 showed that CI listeners could discriminate the
words dish and ditch, it failed to reveal age-related temporal processing
deficits.
Figure 2.
Results for YCI (blue/squares), MCI (green/circles), and OCI
(red/triangles) groups at a fixed presentation level in Experiment 1.
(a) Mean percentage of trials that participants reported dish responses
for the dish–ditch continuum. The continuum consisted of seven stimuli
with the silence duration parametrically varied from 0 to 60 ms. Error
bars denote ±1 standard deviation. (b) and (c) Mean crossover point and
slope of the psychometric functions. Error bars denote 95% confidence
intervals. YCI = younger cochlear-implant users; MCI = middle-aged
cochlear-implant users; OCI = older cochlear-implant users.
Results for YCI (blue/squares), MCI (green/circles), and OCI
(red/triangles) groups at a fixed presentation level in Experiment 1.
(a) Mean percentage of trials that participants reported dish responses
for the dish–ditch continuum. The continuum consisted of seven stimuli
with the silence duration parametrically varied from 0 to 60 ms. Error
bars denote ±1 standard deviation. (b) and (c) Mean crossover point and
slope of the psychometric functions. Error bars denote 95% confidence
intervals. YCI = younger cochlear-implant users; MCI = middle-aged
cochlear-implant users; OCI = older cochlear-implant users.
Discussion
This experiment examined age-related changes in the processing of temporal cues
(silence duration) in word segments in CI users. The results failed to reveal
age-related differences in auditory temporal processing. The null findings from
the current experiment warrant further investigation considering: (a) the vast
literature on age-related temporal processing deficits in NH and HI listeners
(Fitzgibbons &
Gordon-Salant, 1995; Gordon-Salant & Fitzgibbons, 1999;
Gordon-Salant et al.,
2006, 2008; Humes et al.,
2010; Pichora-Fuller & Souza, 2003; Roque et al., 2019; Snell, 1997; Strouse et al., 1998);
and (b) the current experiment may lack statistical power to reveal age group
differences due to a relatively small sample size for YCI
(n = 7) listeners.The finding of this experiment is in contrast to Goupell et al. (2017) who demonstrated
age-related temporal processing deficits in word segments in NH listeners. They
recruited younger and older NH listeners to complete a categorization task on
the dish–ditch continuum, which was similar to the current experiment. To
simulate hearing through CI processors, the stimuli were vocoded using tonal
carriers. They found later crossover points (i.e., longer silence duration) and
shallower slopes with the vocoded stimuli for older than younger NH
listeners.Regarding the findings of age-related temporal processing differences, the
discrepancy between the current experiment and prior work may be a result of
stimulus level differences. Here, stimulus intensities were set at a level that
participants reported to be comfortably soft while maintaining good perceived
discrimination between the endpoint dish–ditch stimuli. The levels used in the
current experiment may be lower than those from prior studies (≥ 65 dB SPL) that
revealed age-related temporal processing differences with a similar
categorization task on the dish–ditch continuum (Gordon-Salant et al., 2006, 2008; Goupell et al., 2017;
Roque et al.,
2019). Therefore, in Experiment 2, we manipulated stimulus
presentation levels and further explored age-related changes in the processing
of temporal cues in word segments in CI listeners.
Experiment 2: Processing of Silence Duration Cues in Word Segments as a Function
of Presentation Level in CI Listeners
Nine YCI (11 ears; 21.0 to 42.5 years, mean age and
SD = 31.3 ± 7.5), 17 MCI (21 ears; 50.4 to 63.6 years, mean
age and SD = 56.4 ± 4.3), and 15 OCI (22 ears; 65.4 to
81.0 years, mean age and SD = 73.1 ± 4.2) listeners took
part in the current experiment. Among these participants, 13 participants
were tested separately in both ears (Table 1). While the behavioral
performance of the two ears in one individual cannot be completely
independent, it can nonetheless be distinguishable given differing
characteristics of the electrode–neuron interface, and therefore we assumed
that they could be treated independently. Seven YCI (7 ears), 16 MCI (16
ears), and 11 OCI (11 ears) also participated in Experiment 1. All
participants were native speakers of American English and were screened with
the MoCA, similar to Experiment 1. The MoCA data were missing for 1 YCI
(CCE), 3 MCI (CBI, CBM, and CCC), and 2 OCI (CAN and CBY). Their demographic
information is provided in Table 1. The effect of age group
was not significantly different for duration of deafness, Kruskal–Wallis
χ2(2) = 2.825, p = .244, or
duration of CI use, Kruskal–Wallis χ2(2) = 5.296,
p = .071.Stimuli were identical to those for Experiment 1.The stimulus presentation levels were parametrically changed between nominal
values of 25 and 85 dB in equal steps of 10 dB. We approximated that these
level values reflect the stimulus levels that participants actually received
in dB SPL. However, it is not possible to confirm such an approximation.
This was because for the stimulation methods adopted here (DAI or
headphone), we were not able to calibrate the stimulus levels received by
the sound processor. We used participants’ clinical sound processors with
their everyday settings and the levels that they normally used. Hence, their
processors determined the actual levels of the electrical stimulation. We
therefore reported level in this experiment without a true reference because
a reference is not possible in our experimental setup.Because the lower intensity (e.g., 25 dB) may be below participants’
thresholds, we selected the lowest presentation level individually by
querying the level (among the seven possible levels) that they could just
hear the sample word dish before the experiment. Table 2 lists the
proportion of participants/ears per age group at each presentation level. We
conducted 3 (age group: YCI, MCI, and OCI) × 2 (participants/ears that could
hear the sample word dish: Yes or No) χ2 tests of independence.
We found that the number of participants/ears that could
versus could not hear the sample word dish did not
significantly differ across age groups at 25 dB, χ2(2) = 3.254,
p = .197, but was significantly
different across group at 35 dB, χ2(2) = 6.716,
p = .035. Post hoc analysis showed
that compared to YCI and OCI, there was a significantly lower number of MCI
who could hear the sample word dish at 35 dB
(p = .01, uncorrected).
Table 2.
Proportion of Participants/Ears Per Age Group at Each Presentation
Level in Experiment 2.
Age-group
Presentation level (dB)
25
35
45
55
65
75
85
YCI
9.09
90.91 (10;10)
100
100
100
100
100
MCI
14.29
57.14 (16.67;0)
95.24
100 (4.76;0)
100 (9.52;0)
100 (4.76;0)
100 (4.76;19.05)
OCI
0
86.36 (10.53;0)
100
100
100
100 (4.55;4.55)
100 (22.73;0)
Note. Number(s) in parentheses represent the
proportion of participants/ears (relative to the total number of
participants/ears who completed the task at that level for that
age group) that were fitted by a logistic function in a backward
direction (first number) and that were failed to be fitted by
the logistic function (second number). Zeros were included in
the parentheses for illustration purposes. YCI = younger
cochlear-implant users; MCI = middle-aged cochlear-implant
users; OCI = older cochlear-implant users.The following data
were fitted by logistic functions in the backward direction: (a)
35 dB: one YCI (CBQ-right ear), two MCI (CAY-right ear, CBK-left
ear), and two OCI (CAO-right ear, CBB-right ear); (b) 55 dB: one
MCI (CAW-left ear); (c) 65 dB: two MCI (CAW-left ear, CAW-right
ear); (d) 75 dB: one MCI (CBN-right ear) and one OCI (CBT-right
ear); and (e) 85 dB: one MCI (CBN-right ear). The following data
failed to be fitted by the logistic functions: (a) 35 dB: one
YCI (CCM-right ear); (b) 75 dB: one OCI (CBC-left ear); and (c)
85 dB: four MCI (CAQ-right ear, CAY-left ear, CAY-right ear,
CBA-left ear) and five OCI (CAD-right ear, CBB, CBC-left ear,
CBC-right ear, CBT-right ear).
Proportion of Participants/Ears Per Age Group at Each Presentation
Level in Experiment 2.Note. Number(s) in parentheses represent the
proportion of participants/ears (relative to the total number of
participants/ears who completed the task at that level for that
age group) that were fitted by a logistic function in a backward
direction (first number) and that were failed to be fitted by
the logistic function (second number). Zeros were included in
the parentheses for illustration purposes. YCI = younger
cochlear-implant users; MCI = middle-aged cochlear-implant
users; OCI = older cochlear-implant users.The following data
were fitted by logistic functions in the backward direction: (a)
35 dB: one YCI (CBQ-right ear), two MCI (CAY-right ear, CBK-left
ear), and two OCI (CAO-right ear, CBB-right ear); (b) 55 dB: one
MCI (CAW-left ear); (c) 65 dB: two MCI (CAW-left ear, CAW-right
ear); (d) 75 dB: one MCI (CBN-right ear) and one OCI (CBT-right
ear); and (e) 85 dB: one MCI (CBN-right ear). The following data
failed to be fitted by the logistic functions: (a) 35 dB: one
YCI (CCM-right ear); (b) 75 dB: one OCI (CBC-left ear); and (c)
85 dB: four MCI (CAQ-right ear, CAY-left ear, CAY-right ear,
CBA-left ear) and five OCI (CAD-right ear, CBB, CBC-left ear,
CBC-right ear, CBT-right ear).Each presentation level consisted of 140 trials (7 stimuli × 20 repetitions).
Trials from all levels were mixed together and divided into four blocks.
Each block was composed of 35 trials (7 stimuli × 5 repetitions) at each
presentation level. The trial order was randomized for each block in each
listener. For five participants/ears, the trials were divided into more than
four blocks (5, 7, or 10 blocks) to reduce fatigue. Participants could take
short breaks between blocks. The testing (including training and breaks) was
usually completed within 2 hr.The experimental procedures were identical to those detailed in Experiment 1.
All participants achieved near-perfect word identification performance on
the training task. The training task was administered at 65 dB.The same psychometric function analysis as detailed in Experiment 1 was
applied to the percentage of dish responses along the dish–ditch continuum
calculated from each condition for each listener. Table 2 shows the proportion of
participants/ears that were fitted by a logistic function in a backward
direction or that failed to be fitted by the logistic function.To ensure balanced numbers of participants at each condition, we focused the
statistical analysis on presentation levels from 45 to 85 dB. Data from four
ears (CAD-left ear, CBC-right ear, CBU-right ear, and CCJ-left ear)
consisted of 15 to 19 repetitions of each stimulus due to computer errors
but were included in the analysis. We also reanalyzed the data after
converting presentation levels to sensation levels. Please refer to the
Online Appendix A for details. As shown in Table 1, the two ears of 13
participants were tested separately. The results from each ear (see Online
Appendix B) were treated as independent observations consistent with
previous studies (e.g., Bierer & Litvak, 2016; Donaldson, Rogers, Johnson, & Oh,
2015).Linear mixed-effects modeling implemented via the lme4
package (Bates, Maechler,
Bolker, & Walker, 2014) in R version 3.5.1 (R Core Team, 2013)
was used to fit the data of crossover point and slope, respectively. In the
model, age group (YCI, MCI, or OCI) and presentation level (45, 55, 65, 75,
or 85 dB) were included as the fixed effects. By-participant/ear intercept
was included as a random effect to account for baseline performance
differences for all the ears across participants. In that sense, two ears
from the same participant were allowed to have different performance
baselines. Both fixed-effects factors were treated as categorical variables.
We systematically removed fixed effects that did not contribute
significantly to the model (p > .05) to
reduce the risk of overfitting the data by comparing each simpler model to
the more complex model using the likelihood ratio test (Baayen, Davidson, &
Bates, 2008). We present results from the simplest, best-fitting
model in the results section. Significance values for the fixed effects in
the optimal model were computed using the analysis of
variance function in the lmerTest package
(Kuznetsova,
Brockhoff, & Christensen, 2017). Post hoc analysis for
significant fixed effects, if necessary, was carried out using the
lsmeans function of the lsmeans
package (Lenth,
2016). Multiple comparisons were corrected by controlling false
discovery rate (Benjamini
& Hochberg, 1995). Descriptive statistics, if reported,
represent mean ± SD.Figure 3(a) displays the
percentage of dish responses as a function of silence duration. While the three
age groups (YCI, MCI, and OCI) were able to discriminate the words dish and
ditch, the MCI and OCI groups had longer crossover durations and shallower
slopes (i.e., shallower curves for the percentage of dish responses as a
function of silence duration) than the YCI listeners at higher presentation
levels but not at lower levels.
Figure 3.
Results for YCI (blue/squares), MCI (green/circles), and OCI
(red/triangles) groups as a function of presentation level (45, 55, 65,
75, and 85 dB) in Experiment 2. (a) Mean percentage of trials that
participants reported dish responses for the dish–ditch continuum. The
continuum consisted of seven stimuli with the silence duration
parametrically varied from 0 to 60 ms. Error bars denote ±1 standard
deviation. (b) and (c) Mean crossover point and slope of the
psychometric functions. Error bars denote 95% confidence intervals.
YCI = younger cochlear-implant users; MCI = middle-aged cochlear-implant
users; OCI = older cochlear-implant users.
Results for YCI (blue/squares), MCI (green/circles), and OCI
(red/triangles) groups as a function of presentation level (45, 55, 65,
75, and 85 dB) in Experiment 2. (a) Mean percentage of trials that
participants reported dish responses for the dish–ditch continuum. The
continuum consisted of seven stimuli with the silence duration
parametrically varied from 0 to 60 ms. Error bars denote ±1 standard
deviation. (b) and (c) Mean crossover point and slope of the
psychometric functions. Error bars denote 95% confidence intervals.
YCI = younger cochlear-implant users; MCI = middle-aged cochlear-implant
users; OCI = older cochlear-implant users.Figure 3(b) shows the
mean crossover points of the performance functions for the three groups across
presentation levels 45 to 85 dB. The main effect of age group was not
significant, F(2, 51.1) = 2.436,
p = .098. The main effect of presentation
level was significant, F(4, 203.4) = 11.17,
p < .001. The interaction between age
group and presentation level also was significant, F(8,
203.4) = 2.458, p = .015. Post hoc analysis of
the interaction revealed that the crossover point for the OCI group was
significantly later than that for the YCI group at 85 dB (OCI: 60.6 ms ± 13.5
vs. YCI: 33.4 ms ± 13.4; p < .001). No
other comparisons between the age groups (YCI vs. MCI or MCI vs. OCI) were
statistically significant (p > .13 in all
cases). This suggests that OCI listeners needed longer silence durations than
YCI listeners to change their percept from dish to ditch when the stimuli were
presented at 85 dB.Figure 3(c) shows the
mean slopes of the performance functions for the three groups across
presentation levels 45 to 85 dB. The main effect of age group was not
statistically significant, F(2, 50.6) = 2.44,
p = .097. The main effect of presentation
level was significant, F(4, 202.7) = 9.878,
p < .001. The interaction between age
group and presentation level was significant, F(8,
202.8) = 2.426, p = .016. Post hoc analysis of
the interaction revealed that the slope was significantly shallower for the OCI
group than that for the YCI group at 75 dB (OCI: 1.71%/ms ± 1.05 vs. YCI:
3.68%/ms ± 2.01; p = .015) and 85 dB (OCI:
0.66%/ms ± 1.17 vs. YCI: 2.98%/ms ± 1.5;
p = .004). No other comparisons between the
age groups (YCI vs. MCI or MCI vs. OCI) were statistically significant
(p > 0.11 in all cases). This suggests
that OCI listeners found it more difficult to discriminate dish and ditch than
YCI listeners at 75 and 85 dB.Furthermore, we recoded presentation levels into sensation levels and reanalyzed
the data, the details of which are reported in Online Appendix A. We found
similar patterns of results (comparing Figure 3 for presentation levels and
Figure S1 for sensation levels). There were minor differences regarding the
significance of the age group effects for the metric of crossover point, which
was significant when using the presentation levels but not significant when
using sensation levels.Experiment 2 demonstrated that OCI listeners, compared with YCI listeners,
required longer silence durations to identify ditch and exhibited reduced
ability to distinguish the words dish and ditch (i.e., shallower slopes).
Interestingly, such age-related performance differences were dependent on the
presentation level, such that the differences emerged only at higher levels (≥
75 dB; Figure 3). The
age-related performance differences in CI users found here concur with extensive
prior work using unprocessed stimuli in NH and HI listeners (Fitzgibbons &
Gordon-Salant, 1995; Gordon-Salant & Fitzgibbons, 1999;
Gordon-Salant et al.,
2006, 2008; Humes et al.,
2010; Pichora-Fuller & Souza, 2003; Roque et al., 2019; Snell, 1997; Strouse et al., 1998),
as well as with our recent work using vocoded stimuli in NH listeners (Goupell et al., 2017).
The level dependence of age-related performance differences explain the lack of
age-related differences in word identification suggested in Experiment 1,
considering that comfortably soft levels were used in that experiment. In the
General Discussion, we speculate on the mechanisms related to the effect of
presentation level in CI users.The calibration of stimulus levels received by the sound processor via DAI or
headphone is nontrivial. Multiple factors may affect stimulus levels, including
differences in the CI device programming approach across audiologists, volume
settings on the CI device chosen by individual participants, and variability in
the determination of loudness perception. Unfortunately, many of these factors
were difficult to control. This posed limitations on the current experiment;
specifically, we could not confirm that the actual levels were consistent across
participants. However, these design limitations were inevitable given the
methods of stimulus presentation required for CI users. We developed Experiment
2 as a follow-up to Experiment 1. As a result, we tried to keep the experimental
parameters in Experiment 2 as close as possible to Experiment 1, including the
stimulation choices. Nevertheless, this study is important as an initial step to
characterize the role of sound level on age-related changes in temporal
processing in CI users. Even though we could not precisely manipulate the
stimulus presentation levels, our findings suggest that stimulus level may exert
a large effect on auditory temporal processing in older CI users. This argument
was corroborated by the alternative analysis focusing on sensation levels; in
that analysis, the age-related performance differences in word identification
persisted in older CI users (see Figure S1 in Online Appendix). Together, these
data and the current approach are informative for designing similar types of
experiments that aim to control presentation levels and mapping of the
processors.In summary, the current experiment showed that the ability to process silence
duration cues in word segments appears to decrease at higher stimulus
presentation levels particularly for older CI users. The literature with NH
listeners also suggests some evidence of a small amount of performance decline
at higher-than-normal sound levels on speech perception tasks (e.g., Liu, 2008; Molis & Summers,
2003; Studebaker,
Sherbecoe, McDaniel, & Gwaltney, 1999). Therefore, Experiment 3
aimed to examine the extent to which NH listeners exhibit performance decline
with increasing stimulus presentation level for the dish–ditch contrast.
Experiment 3: Processing of Silence Duration Cues in Sine-Vocoded Word Segments
as a Function of Presentation Level in NH Listeners
Sixteen younger NH (YNH, 18.6 to 26.2 years, mean age and
SD = 21.5 ± 2.1) and 11 older NH (ONH, 65.1 to 78.1 years,
mean age and SD = 68.9 ± 3.6) listeners were tested. NH was
defined as pure tone thresholds ≤ 25 dB HL at octave frequencies from 250 to
4000 Hz. Figure 4
displays the average thresholds for both groups. The threshold data for
8000 Hz were missing for one YNH participant. All listeners were native
speakers of American English. We screened ONH listeners with the MoCA (Nasreddine et al.,
2005) to ensure normal or near-normal cognitive function (Cecato et al., 2016;
Dupuis et al.,
2015). Their MOCA scores ranged from 25 to 30, with 9 out of 10
participants scoring above 26. The MoCA data were missing for one ONH
participant.
Figure 4.
Mean pure tone thresholds in dB HL (re: ANSI 2018) for YNH
(blue/square) and ONH (red/triangle) groups in Experiment 3. The
horizontal dashed line indicates 25 dB HL. Error bars denote ±1
standard deviation. YNH = younger normal-hearing listeners;
ONH = older normal-hearing listeners.
Mean pure tone thresholds in dB HL (re: ANSI 2018) for YNH
(blue/square) and ONH (red/triangle) groups in Experiment 3. The
horizontal dashed line indicates 25 dB HL. Error bars denote ±1
standard deviation. YNH = younger normal-hearing listeners;
ONH = older normal-hearing listeners.Stimuli consisted of the unprocessed, nonvocoded seven-step dish–ditch
continuum used in Experiments 1 and 2, as well as vocoded dish–ditch
continua with two, four, or eight contiguous channels. Details of the
vocoding processes can be found in Goupell et al. (2017). Each
unprocessed stimulus was forward-backward bandpass filtered into the
corresponding number of channels (two, four, or eight) with cut-off
frequencies logarithmically spaced from 200 to 8000 Hz. Note that some of
the stimulus energy is above the guaranteed region of NH at 4000 Hz. This
upper frequency value was chosen to match the previous study (Goupell et al.,
2017) and to match the frequency range presented to the CI
listeners. It is important to remember that age-related changes in
performance for the NH listeners in this experiment could be partially a
result of hearing loss above 4 kHz. Temporal envelopes were extracted from
each channel and low-pass filtered at 400 Hz. The extracted envelopes
modulated tonal carriers; these modulated sine tones were then summed to
create the final vocoded stimulus. Thus, there were four stimulus types that
were characterized by varying numbers of channels (two, four, and eight
channels, and unprocessed).Similar to Experiment 2, the possible presentation levels were parametrically
changed between 25 and 85 dB SPL in equal steps of 10 dB. We selected the
lowest presentation level individually by querying the level (among the
seven possible levels) that the listeners could just hear the unprocessed
sample word dish before the experiment. At 25 dB SPL, 13
(81.3%) YNH and 3 (27.3%) ONH participants completed the tasks (Table 3). At 35 dB
SPL and above, all participants completed the tasks. We conducted 2 (age
group: YNH and ONH) × 2 (participants/ears that could hear the sample word
dish: Yes or No) χ2 tests of independence. We found that the
number of participants/ears that could versus could not hear the sample word
dish was significantly different across group at 25 dB SPL,
χ2(1) = 7.867, p = .015, such
that compare to YNH, there was a lower number of ONH who could hear the
sample word dish at 25 dB SPL.
Table 3.
Proportion of Participants Per Age Group at Each Presentation Level
in Experiment 3.
Age group
Number of channels
Presentation level (dB SPL)
25
35
45
55
65
75
85
ONH
Unprocessed
27.27 (100;0)
100 (45.45;0)
100 (18.18;0)
100
100
100
100
Eight channels
27.27 (33.33;0)
100 (54.55;0)
100 (9.09;0)
100
100
100
100
Four channels
27.27 (100;0)
100 (63.64;9.09)
100 (36.36;18.18)
100
100 (9.09;0)
100
100
Two channels
27.27 (66.67;0)
100 (36.36;18.18)
100 (27.27;18.18)
100 (45.45;18.18)
100 (45.45;18.18)
100 (18.18;18.18)
100 (27.27;18.18)
YNH
Unprocessed
81.25 (7.69;0)
100 (6.25;0)
100
100
100
100
100
Eight channels
81.25 (15.38;0)
100 (6.25;0)
100
100
100
100
100
Four channels
81.25 (23.08;0)
100 (12.5;0)
100 (6.25;0)
100
100 (6.25;0)
100
100
Two channels
81.25 (23.08;0)
100 (12.5;6.25)
100 (18.75;0)
100
100 (12.5;0)
100 (6.25;0)
100 (0;6.25)
Note. Number(s) in parentheses represent the
proportion of participants (relative to the total number of
participants who completed the task at that level for that age
group) that were fitted by a logistic function in a backward
direction (first number) and that were failed to be fitted by
the logistic function (second number). Zeros were included in
the parentheses for illustration purpose. YNH = younger
normal-hearing listeners; ONH = older normal-hearing
listeners.
Proportion of Participants Per Age Group at Each Presentation Level
in Experiment 3.Note. Number(s) in parentheses represent the
proportion of participants (relative to the total number of
participants who completed the task at that level for that age
group) that were fitted by a logistic function in a backward
direction (first number) and that were failed to be fitted by
the logistic function (second number). Zeros were included in
the parentheses for illustration purpose. YNH = younger
normal-hearing listeners; ONH = older normal-hearing
listeners.Each presentation level consisted of 280 trials (7 stimuli × 4 channels × 10
repetitions). Trials from all presentation levels were mixed together and
divided into five blocks. Each block was composed of 56 trials (7
stimuli × 4 channels × 2 repetitions) at each presentation level. The trials
from one YNH participant were divided into four blocks. The trial order was
randomized for each block in individual listeners. Participants could take
short breaks between blocks. The testing (including training and breaks) was
usually completed within 2.5 hr.The testing procedures were identical to those detailed in Experiment 1
except for two modifications. First, the stimuli were presented monaurally
through one ER2 insert earphone (Etymotic, Elk Grove Village, IL) to the
right ear in YNH listeners and to the better ear in ONH listeners. Better
ear was defined as the ear with better averaged audiometric thresholds
across 500, 1000, 2000, and 4000 Hz. Second, in the training task,
participants were presented with the endpoint stimuli (0- and 60-ms silence
duration) in both unprocessed and vocoded (16 channels) speech modes at
65 dB SPL. All participants achieved an accuracy of at least 90% on the
training task; hence, none were excluded from the experiment.The same psychometric function analysis as detailed in Experiment 1 was
applied to the percentage of dish responses along the dish–ditch continuum
calculated from each condition for each listener. Table 3 shows the proportion of
participants who were fitted by a logistic function in a backward direction
or who failed to be fitted by the logistic function.To be consistent with Experiment 2, we focused the statistical analysis on
presentation levels from 45 to 85 dB SPL. Data from one ONH participant
consisted of nine (out of 10) repetitions of each stimulus due to computer
errors but were included in the analysis. We also reanalyzed the data after
converting presentation levels to sensation levels. Please refer to Online
Appendix A for details. The linear mixed-effects modeling implemented via
the lme4 package (Bates et al., 2014) in R version
3.5.1 (R Core Team,
2013) was used to fit the data of 50% crossover points and
slopes. In the model, age group (YNH or ONH), number of channels (two, four,
and eight channels, or unprocessed) and presentation level (45, 55, 65, 75,
or 85 dB SPL) were included as the fixed effects, and by-participant
intercept was included as a random effect to account for baseline
performance differences. All the fixed-effects factors were treated as
categorical variables. We adopted similar approaches to those detailed in
Experiment 2 to determine the significance of fixed effects and to conduct
post hoc analyses. Descriptive statistics, if reported, represent
mean ± SD.Figure 5(a) displays the
percentage of dish responses as a function of silence duration. While both YNH
and ONH groups were able to discriminate the words dish and ditch, the ONH group
had longer crossover durations and shallower slopes (i.e., shallower curve for
the percentage of dish responses as a function of silence duration) than the YNH
participants across presentation levels and channels.
Figure 5.
Results for YNH (blue/squares) and ONH (red/triangles) groups as a
function of presentation level (45, 55, 65, 75, and 85 dB SPL) in
Experiment 3. (a) Mean percentage of trials that participants reported
dish responses for the dish–ditch continuum. The rows (from top to
bottom) show data for unprocessed and vocoded stimuli (8, 4, and 2
channels). The continuum consisted of seven stimuli with the silence
duration parametrically varied from 0 to 60 ms. Error bars denote ±1
standard deviation. (b) and (c) Mean crossover points (b) and slopes (c)
of the psychometric functions. As a comparison, we displayed results for
YCI (transparent blue/squares) and OCI (transparent red/triangles)
groups from Experiment 2. The columns (from left to right) show data for
unprocessed and vocoded stimuli (8, 4, and 2 channels). Error bars
denote 95% confidence intervals. YNH = younger normal-hearing listeners;
ONH = older normal-hearing listeners; YCI = younger cochlear-implant
users; OCI = older cochlear-implant users.
Results for YNH (blue/squares) and ONH (red/triangles) groups as a
function of presentation level (45, 55, 65, 75, and 85 dB SPL) in
Experiment 3. (a) Mean percentage of trials that participants reported
dish responses for the dish–ditch continuum. The rows (from top to
bottom) show data for unprocessed and vocoded stimuli (8, 4, and 2
channels). The continuum consisted of seven stimuli with the silence
duration parametrically varied from 0 to 60 ms. Error bars denote ±1
standard deviation. (b) and (c) Mean crossover points (b) and slopes (c)
of the psychometric functions. As a comparison, we displayed results for
YCI (transparent blue/squares) and OCI (transparent red/triangles)
groups from Experiment 2. The columns (from left to right) show data for
unprocessed and vocoded stimuli (8, 4, and 2 channels). Error bars
denote 95% confidence intervals. YNH = younger normal-hearing listeners;
ONH = older normal-hearing listeners; YCI = younger cochlear-implant
users; OCI = older cochlear-implant users.Figure 5(b) shows the
mean crossover points of the performance functions for the two groups as a
function of presentation level for unprocessed and vocoded stimuli. The main
effect of age group was significant, F(1, 25) = 7.911,
p = .009. The main effect of channel was
significant, F(3, 503) = 5.435,
p = .001. The interaction between channel
and age group also was significant, F(3, 503) = 3.434,
p = .017. Post hoc analysis of the
interaction revealed that the crossover points were significantly later for the
ONH compared with the YNH group for unprocessed (ONH: 43.29 ms ± 15.51 vs. YNH:
29.0 ms ± 12.1; p = .012), eight-channel (ONH:
36.59 ms ± 22.41 vs. YNH: 26.08 ms ± 12.21;
p = .049), and four-channel (ONH:
45.05 ms ± 21.05 vs. YNH: 31.33 ms ± 17.12;
p = .014) stimuli. No significant
age-group difference was observed for two-channel stimuli (ONH:
39.33 ms ± 33.52; YNH: 35.76 ms ± 23.69;
p = .453). These results indicate that ONH
listeners needed longer silence durations to change their percept from dish to
ditch relative to YNH listeners, but such age-related differences disappear for
stimuli with fewer vocoded channels.The main effect of presentation level was significant, F(4,
503) = 49.762, p < .001. Post hoc analysis
showed that the crossover points became significantly earlier as the
presentation level increased up to 75 dB SPL
(p < .01 in all cases). The comparison
between 75 and 85 dB SPL was not statistically significant
(p = .355). These results suggest that
participants needed shorter silence durations to change their percept from dish
to ditch with increasing levels.Figure 5(c) shows the
mean slope of the performance functions for the two groups as a function of
presentation level for unprocessed and vocoded stimuli. All main effects were
significant: age group, F(1, 25) = 6.773,
p = .015, channel, F(3,
491) = 79.595, p < .001, and presentation
level, F(4, 491) = 24.379,
p < .001. The interaction between
channel and age group was significant, F(3, 491) = 3.786,
p = .010. Post hoc analysis revealed that
the slopes were significantly shallower for the ONH group than those for the YNH
group when listening to eight-channel (ONH: 2.46%/ms ± 1.59 vs. YNH:
4.43%/ms ± 2.31; p = .006) and two-channel
(ONH: 0.12%/ms ± 1.39 vs. YNH: 1.99%/ms ± 1.94;
p = .007) stimuli but not when listening
to unprocessed (ONH: 3.43%/ms ± 1.91 vs. YNH: 4.39%/ms ± 2.18;
p = .149) or four-channel (ONH:
2.57%/ms ± 2.24 vs. YNH: 3.55%/ms ± 2.01;
p = .144) stimuli.The interaction between channel and presentation level was significant,
F(12, 491) = 2.108,
p = .015. Post hoc analysis revealed that for
the unprocessed stimuli, the slopes became significantly steeper at presentation
levels of 65 to 85 dB SPL compared with 45 dB SPL
(p < .01 in all cases) and at the
presentation level of 85 dB SPL compared with 55 dB SPL
(p = .001). For the eight-channel stimuli,
the slopes were significantly steeper at presentation levels of 65 to 85 dB SPL
compared with presentation levels of 45 to 55 dB SPL
(p < .01 in all cases). For the
four-channel stimuli, the slopes were steepest for presentation levels at 65 to
75 dB SPL, followed by that at 55 dB SPL, and least steep for that at 45 dB SPL
(p < .05 in all cases). The slope at
85 dB SPL was steeper than that at 45 dB SPL
(p < .001). For two-channel stimuli,
the slopes were not significantly different between presentation levels
(p > .2 in all cases). These results
suggest that while it generally became easier to discriminate dish and ditch
with increasing presentation level, the level benefit is dependent on the
spectral resolution of the stimuli.Furthermore, we recoded presentation levels into sensation levels and reanalyzed
the data, the details of which are reported in the Online Appendix. We found
similar patterns of results (comparing Figure 5 for presentation levels and
Figure S2 for sensation levels). There were minor differences regarding the
significance of the age group × sound level interaction effects for the metric
of slope, which were not significant when using the presentation levels but were
significant when using sensation levels.Experiment 3 demonstrated age-related temporal processing deficits in NH
listeners with unprocessed and vocoded stimuli, a finding which concurs with
Goupell et al.
(2017) using a similar paradigm. Importantly, Experiment 3 extended
this prior study by varying stimulus presentation levels. The current results
suggest that while the ability to utilize temporal cues for word identification
in NH listeners generally improves with increasing presentation levels for both
age groups, there are consistent age-related temporal processing deficits that
may not significantly change across levels. Previous studies have used a similar
paradigm to assess age-related changes in temporal processing at different fixed
levels. There are noted performance differences across these studies, but they
consistently revealed age-related declines in temporal processing (85 dB SPL,
Gordon-Salant et al.,
2006, 2008; 65 dB SPL, Goupell et al., 2017; 75 dB SPL, Roque et al., 2019). Our findings of
enhanced temporal processing (i.e., shorter crossover durations and steepest
slopes) with higher presentation levels help explain performance differences
across these prior studies. Importantly, the consistent age-related differences
in temporal processing across presentation levels as demonstrated in this study
further reinforce the hypothesis of age-related temporal processing deficits
that can be inferred from these past investigations.Findings from this experiment contrast with those from CI users in Experiment 2,
wherein age-related performance differences occurred at higher presentation
levels (≥75 dB) but not at lower levels (<75 dB) (Figure 3). The discrepancy in the level
effects between CI (Experiment 2) and NH (Experiment 3) listeners was further
evidenced by the fact that the ability to distinguish dish and ditch generally
improves with elevating sound levels in NH listeners (Figure 5) but may plateau or become worse
at intermediate sound levels in CI listeners (Figure 3). Note that due to the
approximate calibration of stimuli for the CI processors, the range of sound
levels actually received by the participants was not necessarily comparable
between CI and NH listeners. Nevertheless, these results undoubtedly highlight
the differential sensitivity to level changes between NH and CI listeners. This
argument is consistent with the findings that CI listeners demonstrate a much
smaller dynamic range (Skinner, Holden, Holden, Demorest, & Fourakis, 1997; Zeng et al., 2002) and
abnormal loudness growth (Zhang & Zeng, 1997) compared with NH listeners.What are the mechanisms underlying the disparities in level effects between NH
listeners and CI users, particularly regarding their influence on the
age-related changes in word identification based on temporal cues? Here, we
offer some plausible explanations. The first explanation may lie in the
differences between acoustic and electric hearing despite the fact that we
simulated CI hearing via vocoding. In electric hearing, higher stimulus
presentation levels could induce larger current spread that reduces spectral
resolution (Eisen &
Franck, 2005), which in turn, may disrupt the processing of temporal
modulation cues (Oxenham
& Kreft, 2014). However, such change in spectral resolution with
increasing levels in electric hearing was not systematically accounted for in
the results observed for listeners with acoustic hearing for unprocessed or
vocoded stimuli in this study. Alternatively, speech temporal cues might have
already been lost or been distorted by CI signal processing or preprocessing
(e.g., automatic gain control, AGC) at high presentation levels, which was not
an issue for the NH listeners.Another explanation is that speech understanding in NH listeners can exhibit
either no change or a small drop in performance with increasing presentation
levels (Miranda &
Pichora-Fuller, 2002). We did not observe a decrease in the
perception of temporal cues with increasing levels in our NH data. Unlike in NH
listeners, a CI processor directly excites spiral ganglia and bypasses stimulus
encoding at the cochlear level. The auditory system beyond the cochlea (auditory
nerve and central nervous system) may have undergone significant pathological
changes due to deafness in the CI listeners (e.g., Middlebrooks, 2018; Shepherd & Hardie,
2001). For example, previous studies suggest that hearing impairment
results in the loss of auditory fibers (e.g., Middlebrooks, 2018; Webster & Webster,
1981), which may lead to decreased neural synchrony to auditory
stimuli. Decreased neural synchrony may be one of the mechanisms underlying the
reduced ability to process temporal cues with increasing levels in CI listeners.
For instance, Miranda and
Pichora-Fuller (2002) demonstrated that the introduction of temporal
jitter to the speech stimuli, a simulation of neural desynchrony, resulted in a
decline of word recognition performance at high presentation levels in younger
NH listeners. With aging, the retrocochlear neural substrates may be subjected
to further pathological changes (Hughes et al., 2010; Makary et al., 2011;
Otte et al.,
1978; Roque et al.,
2019; Sergeyenko
et al., 2013; Tremblay et al., 2003; Walton et al., 1998; Willott, 1991), which
might exacerbate performance decline with increasing levels in older CI users.
In later sections, we further discuss the possible mechanisms related to the
effect of presentation level in CI users.
General Discussion
Overview of Results
This study demonstrated that CI listeners can identify words based on a temporal
contrast (silence duration) and that aging in CI users appears to be associated
with decreased ability to utilize brief temporal cues in word segments (Figure 3). These findings
concur with the vast literature on age-related temporal processing deficits with
acoustic hearing in NH and HI listeners (Fitzgibbons & Gordon-Salant, 1995;
Gordon-Salant &
Fitzgibbons, 1999; Gordon-Salant et al., 2006, 2008; Goupell et al., 2017;
Humes et al.,
2010; Pichora-Fuller & Souza, 2003; Roque et al., 2019; Snell, 1997; Strouse et al., 1998).
This study extends these findings to CI users and suggests that age-related
declines in word identification based on temporal cues in CI users are dependent
on stimulus presentation levels, such that older CI users demonstrate reduced
performance in the utilization of brief temporal cues for word identification at
higher levels (Figure
3).
Level Dependency of Age-related Temporal Processing Deficits in CI Users:
Potential Explanations
This study (Experiment 2) revealed that age interacts with sound level to affect
auditory temporal processing for temporally based word contrasts in CI
listeners. Our finding on the processing of silence duration cues in word
segments for older CI users (Figure 3) is reminiscent of a previous study that showed that in
some CI listeners, performance on syllable identification tasks decreased at a
higher stimulus level (Franck, Xu, & Pfingst, 2003).What are the mechanisms underlying the level dependency of age-related declines
in word identification based on temporal cues in CI users? Following the
discussion of level effects in Franck et al. (2003), we speculate that
for electric hearing with CI, on the one hand, increasing levels may produce
positive effects on the encoding of temporal cues. For example, the number of
auditory nerve fibers responding to the stimulus may increase with higher
intensities, which leads to more faithful encoding of stimulus temporal features
(Lopez-Poveda,
2014; Lopez-Poveda & Barrios, 2013). On the other hand, increasing
levels may cause negative effects on the encoding of temporal cues. For example,
as discussed earlier, higher levels could reduce spectral resolution due to
larger current spread, which, in turn, may interfere with the processing of
temporal modulation cues (Oxenham & Kreft, 2014). Besides, the processing of the speech
sounds through the CI sound processer may lead to partial loss or distortion of
speech cues, especially at higher levels. For instance, many CI programming
parameters (e.g., amplitude mapping, AGC, microphone sensitivity, input dynamic
range) may affect the transmission of temporal cues and other cues, such as
intensity.These competing effects associated with stimulus level may interact with younger
and older CI listeners differently considering the age-related changes at the
level of the spiral ganglia or above. For example, the potential positive
effects of increasing levels (e.g., more responding auditory nerve fibers) may
be diminished in older CI users due to the loss of auditory nerve fibers with
aging. Specifically, the low spontaneous-rate fibers, which are important for
the encoding of temporal cues at higher intensities, may be more affected by
aging (Bharadwaj, Verhulst,
Shaheen, Liberman, & Shinn-Cunningham, 2014; Schmiedt, Mills, &
Boettcher, 1996).It is also possible that the level dependency of age-related declines in word
identification based on temporal cues occurs because the three age groups (YCI,
MCI, and OCI) are mapped differently. While possible, this explanation seems not
to adequately address our findings for the following reasons. First, if the
three groups were mapped differently, we should observe systematic differences
between the groups across levels. Instead, we found age-group performance
differences only at higher levels (≥75 dB; Figure 3). Furthermore, as shown in Table 2, the
proportion of participants who can hear the word dish at 25 or 35 dB did not
significantly differ between the YCI and OCI groups. Second, at the time of this
study, we were unaware of any empirical studies advocating for or implementing
age-customized CI fitting procedures. Third, if such an approach to mapping does
occur, it would have to be implemented across numerous clinicians and clinical
sites, as our CI listeners were recruited from across the DC-Baltimore
metropolitan area and the US. Rather, it would be more likely that clinicians
followed roughly similar procedures to map CI patients of different ages (Wolfe & Schafer,
2014), despite anecdotal evidence that lower stimulation rates could
be used in OCI listeners than are used in YCI listeners. Furthermore, both
analyses with presentation levels and sensation levels pointed to similar
patterns of age-related performance differences at higher levels (Figure 3 and Figure
S1).To summarize, we posit that the level dependency of age-related declines in word
identification based on temporal cues in CI listeners may be attributed to
limits of the CI device to process speech cues, declines in temporal processing
with aging, or the interaction between these two factors. Assuming that CI
listeners of different age groups are mapped similarly, it may be reasonable to
propose that age-related changes play an important role in the observed age
effects on word identification. Future studies are needed to elucidate the exact
mechanisms underlying changes in temporal processing of older CI listeners.
Data From Both Ears in Bilateral CI Users: How to Handle?
In our Experiment 2, we collected data from both ears in 13 bilateral CI users.
Currently, there is no consensus in the approach to handling the data from the
two ears in a single CI listener. Some studies averaged data from the two ears
and treated the data as from a single listener (e.g., Feng & Oxenham, 2018). This
approach seems reasonable if we assume that temporal processing is predominantly
affected by central factors that are common to both ears. Other studies treated
data from each ear as independent observations (e.g., Bierer & Litvak, 2016; Donaldson et al.,
2015). This approach seems more appropriate if there may also be
significant ear-specific contributions to temporal processing. We adopted the
latter approach based on the assumption that ear-specific factors (e.g.,
auditory nerve survival) may significantly contribute to the age-related
differences in temporal processing. A close inspection of data from these
bilateral CI users (see Online Appendix B) shows that there are indeed
between-ear differences in discriminating the words dish and ditch. Future
research may systematically evaluate between-ear differences in temporal
processing from bilateral CI users to better understand ear-specific peripheral
and central contributions to age-related temporal processing deficits.
Limitations of This Study and Future Recommendations
As discussed earlier, our manipulation of sound levels in CI users may not be as
precise as we wanted, likely due to the variabilities introduced by sound
processors from individual CI participants. Here, future studies may consider
the following approaches to more rigorously manipulate and examine the sound
level effects. First, the same (research) processor with similar programming
parameters may be used to deliver sounds to the CI across participants. This
could potentially minimize the variabilities from individual clinical
processors. Second, at least two methods may be used to match sound levels
across CI participants. Loudness judgement may be used to choose sound levels
that are rated at the same loudness (as is a standard in the field of CI
research; e.g., Bierer &
Litvak, 2016; Donaldson et al., 2015; Feng & Oxenham, 2018; Friesen et al., 2001;
Fu, 2002).
Electrophysiological measures may also be used to facilitate the matching of
sound levels across CI participants. For example, we can match individual levels
by adjusting them to elicit brainstem responses (wave V) with equal amplitudes
(Gordon, Abbasalipour,
& Papsin, 2016). Finally, it may be necessary to estimate the
electrode–neuron interface with available measures such as the electrically
evoked compound action potential and computerized tomography (DeVries, Scheperle, &
Bierer, 2016; Verbist, Frijns, Geleijns, & Van Buchem, 2005). This is because
the electrode–neuron interface is considered a significant contributor to
performance variability in CI listeners (Bierer, 2010; DeVries et al., 2016; Long et al., 2014) that
may mediate the sound level effects.Indeed, the current approach to measure temporal processing via clinical sound
processors with everyday settings may not be optimal. As mentioned earlier, many
CI device-related factors (e.g., amplitude mapping, AGC, microphone sensitivity,
input dynamic range) may impair the transmission of speech cues, especially at
varying signal levels. Those device-related factors were not systematically
manipulated in this study. Hence, the limitations of CI processors to preserve
temporal cues in speech may obscure the genuine temporal processing limits of
older CI listeners. Many previous studies have adopted single- or multi-channel
direct stimulation approaches to assess temporal processing abilities, with
potential advantages to bypass the front-end processing that may interact with
temporal cue perception. Further research may utilize both approaches (clinical
sound processors and direct stimulation) to provide converging evidence
regarding limitations in temporal processing associated with aging in CI
listeners.This study demonstrated age-related temporal processing deficits with only one
temporally based speech contrast (i.e., dish–ditch continuum). Clearly, this may
limit the generalizability of our findings to other speech contrasts based on
temporal cues. Nevertheless, our study and other studies have utilized the
dish–ditch continuum and revealed age-related declines in temporal processing
across diverse populations (NH, HI, and CI listeners; Gordon-Salant et al., 2006, 2008; Goupell et al., 2017;
Roque et al.,
2019). This suggests that this temporal contrast may represent a
highly sensitive paradigm to reveal age-related temporal processing differences.
Here, the dish–ditch continuum was presented as isolated words. Prior work
suggests that the ability to process temporal cues may change when temporal
contrasts are presented in sentential contexts (Gordon-Salant et al., 2008). Given that
temporally based contrasts are typically embedded in sentences during natural
speech processing, future studies could use the dish–ditch contrast embedded in
sentences to reexamine the effects of aging and presentation levels on temporal
processing in CI users.
Implications for Programming in Adult CI Users
For CI programming, an important parameter is stimulus level. Our results suggest
a potential interaction between stimulus level and age to affect word
identification based on temporal cues such as silence duration, such that word
identification performance may be diminished for older CI users at higher
levels. This interaction may be attributed to limitations of the CI device to
preserve temporal cues, age-related changes in temporal processing, or the
combination of these factors. Therefore, clinical CI fitting procedures may need
to include age as a potential variable. Strategies to manipulate CI processor
parameters related to stimulus level (e.g., amplitude mapping function, AGC time
constants and threshold, microphone sensitivity, input dynamic range, electrode
thresholds and comfortable levels) may need to take into account the
preservation of temporal cues.
Conclusions
Older NH and CI listeners, relative to their younger counterparts, appear to exhibit
reduced ability to utilize brief temporal cues in word identification. Importantly,
such age-related performance differences appear to be independent of presentation
level for NH listeners but may emerge only at high presentation levels for CI
listeners. These results suggest that clinicians may want to consider age-specific
CI device settings to improve speech understanding in older CI listeners.Click here for additional data file.Supplemental material, TIA886688 Supplemental Material for Age-Related Temporal
Processing Deficits in Word Segments in Adult Cochlear-Implant Users by Zilong
Xie, Casey R. Gaskins, Maureen J. Shader, Sandra Gordon-Salant, Samira Anderson
and Matthew J. Goupell: On behalf of the ComorBidity in Relation to AIDS (COBRA)
Collaboration and the Korean NeuroAIDS Project in Trends in Hearing
Authors: Laura K Holden; Charles C Finley; Jill B Firszt; Timothy A Holden; Christine Brenner; Lisa G Potts; Brenda D Gotter; Sallie S Vanderhoof; Karen Mispagel; Gitry Heydebrand; Margaret W Skinner Journal: Ear Hear Date: 2013 May-Jun Impact factor: 3.570