| Literature DB >> 33054620 |
Anna R Tinnemore1, Sandra Gordon-Salant1, Matthew J Goupell1.
Abstract
Speech recognition in complex environments involves focusing on the most relevant speech signal while ignoring distractions. Difficulties can arise due to the incoming signal's characteristics (e.g., accented pronunciation, background noise, distortion) or the listener's characteristics (e.g., hearing loss, advancing age, cognitive abilities). Listeners who use cochlear implants (CIs) must overcome these difficulties while listening to an impoverished version of the signals available to listeners with normal hearing (NH). In the real world, listeners often attempt tasks concurrent with, but unrelated to, speech recognition. This study sought to reveal the effects of visual distraction and performing a simultaneous visual task on audiovisual speech recognition. Two groups, those with CIs and those with NH listening to vocoded speech, were presented videos of unaccented and accented talkers with and without visual distractions, and with a secondary task. It was hypothesized that, compared with those with NH, listeners with CIs would be less influenced by visual distraction or a secondary visual task because their prolonged reliance on visual cues to aid auditory perception improves the ability to suppress irrelevant information. Results showed that visual distractions alone did not significantly decrease speech recognition performance for either group, but adding a secondary task did. Speech recognition was significantly poorer for accented compared with unaccented speech, and this difference was greater for CI listeners. These results suggest that speech recognition performance is likely more dependent on incoming signal characteristics than a difference in adaptive strategies for managing distractions between those who listen with and without a CI.Entities:
Keywords: attention; cochlear implants; perceptual masking; speech perception; visual perception
Mesh:
Year: 2020 PMID: 33054620 PMCID: PMC7575283 DOI: 10.1177/2331216520960601
Source DB: PubMed Journal: Trends Hear ISSN: 2331-2165 Impact factor: 3.293
Demographic Information for Paired Sets of Listeners in the Two Listener Groups Including Age and Baseline Speech Recognition, Duration of Deafness (CI Group), Device Configuration (CI Group), and Number of Vocoder Channels for Matched Performance (NH Group).
| Listener number | Group | Age (years) | Baseline audio-only speech recognition (%) | Duration of deafness (years) | Device configuration |
|---|---|---|---|---|---|
| 1 | CI | 70 | 72 | R: 1 | Unilateral |
| 2 | CI | 73 | 66 | R: <1, L: 22 | Bilateral |
| 3 | CI | 60 | 52 | R: 12, L: 4 | Bilateral |
| 4 | CI | 61 | 42 | R: 11, L: 11 | Bilateral |
| 5 | CI | 54 | 94 | R: 3, L: 6 | Bilateral |
| 6 | CI | 50 | 42 | R: 8, L: 4 | Bilateral |
| 7 | CI | 50 | 56 | R: 20, L: 24 | Bilateral |
| 8 | CI | 76 | 72 | R: 19, L: 17 | Bilateral |
| 9 | CI | 39 | 70 | L: 8 | Unilateral |
| 10 | CI | 21 | 82 | R: 2, L: 3 | Bilateral |
| 11 | CI | 78 | 90 | R: 7, L: 3 | Bilateral |
| 12 | CI | 43 | 40 | L: 37 | Unilateral |
| 13 | CI | 55 | 70 | R: 7 | Unilateral |
| 14 | CI | 56 | 46 | R: 4, L: 2 | Bilateral |
| 15 | CI | 37 | 98 | R: 3, L: 2 | Bilateral |
| 16 | CI | 67 | 86 | R: 8, L: 2 | Bilateral |
| 17 | CI | 70 | 52 | R: 3, L: 1 | Bilateral |
| 18 | CI | 72 | 60 | R: 7, L: 2 | Bilateral |
| 19 | CI | 61 | 88 | R: <1, L: 5 | Bilateral |
| 20 | CI | 50 | 68 | R: 1, L: 5 | Bilateral |
Listener number | Group | Age (years) | Baseline audio-only speech recognition (%) | Vocoder channels used | Change in vocoded speech recognition (post–pre) (%) |
| 1 | NH | 72 | 78 | 6 | 10 |
| 2 | NH | 71 | 68 | 7 | 12 |
| 3 | NH | 60 | 66 | 4 | 2 |
| 4 | NH | 62 | 56 | 5 | –10 |
| 5 | NH | 55 | 92 | 10 | 8 |
| 6 | NH | 52 | 40 | 3 | 8 |
| 7 | NH | 49 | 48 | 4 | –2 |
| 8 | NH | 74 | 66 | 6 | 18 |
| 9 | NH | 40 | 70 | 3 | –6 |
| 10 | NH | 21 | 78 | 7 | 8 |
| 11 | NH | 77 | 94 | 8 | –12 |
| 12 | NH | 44 | 44 | 3 | –8 |
| 13 | NH | 55 | 76 | 4 | –24 |
| 14 | NH | 57 | 42 | 3 | –10 |
| 15 | NH | 37 | 90 | 8 | –4 |
| 16 | NH | 66 | 84 | 8 | 2 |
| 17 | NH | 68 | 42 | 3 | –10 |
| 18 | NH | 70 | 68 | 5 | –4 |
| 19 | NH | 62 | 76 | 7 | 16 |
| 20 | NH | 51 | 60 | 4 | 0 |
Note. CI = cochlear implant; NH = normal hearing.
Generalized Linear Mixed-Effects Model Describing the Effects of Experimental Variables and Other Predictors on Speech Recognition Performance.
| Speech recognition | ||||
|---|---|---|---|---|
| Fixed effects | Estimate |
|
|
|
| (Intercept) | 0.86 | 0.23 | 3.66 | <.001*** |
| TalkerAccented | –1.75 | 0.23 | –7.46 | <.001*** |
| ConditionAV+D | 0.05 | 0.11 | 0.43 | .669 |
| ConditionAV+ST | –0.85 | 0.12 | –7.11 | <.001*** |
| GroupCI | 0.42 | 0.27 | 1.54 | .123 |
| Baseline speech recognition | 0.56 | 0.12 | 4.52 | <.001*** |
| Age | –0.08 | 0.12 | –0.67 | .502 |
| Executive control | 0.35 | 0.13 | 2.75 | .006** |
| TalkerAccented:ConditionAV+D | 0.06 | 0.16 | 0.39 | .699 |
| TalkerAccented:ConditionAV+ST | 0.46 | 0.17 | 2.76 | .006** |
| TalkerAccented:GroupCI | –0.46 | 0.17 | –2.76 | .006** |
| ConditionAV+D:Baseline speech recognition | 0.05 | 0.06 | 0.88 | .382 |
| ConditionAV+ST:Baseline speech recognition | –0.16 | 0.08 | –2.12 | .034* |
Note. Significant fixed effects are marked with asterisks, with p values generated by Wald z scores. The intercept estimate represents the predicted speech recognition performance of the NH vocoder group in the unaccented, AV-only condition. The baseline speech recognition, executive control, and age predictors were standardized and centered prior to model fitting. CI = cochlear implant; AV+D = audiovisual plus visual distractions; AV+ST = audiovisual plus visual secondary task, SD = standard deviation.
Significance codes: 0 *** 0.001 ** 0.01 * 0.05.
Figure 1.Speech Recognition Performance for CI and NH (Vocoded) Listeners. Speech recognition performance in percent correct is shown for each of the three conditions and each of the two talkers. Error bars represent ±1 standard error.
CI = cochlear implant; NH = normal hearing; AV = audiovisual; ST = secondary task; AV+D = audiovisual plus visual distractions; AV+ST = audiovisual plus visual secondary task.
Figure 2.ST Performance. Accuracy, in percent correct, on the ST is shown when performed in isolation or while also repeating sentences spoken by an unaccented or an accented talker. Error bars represent ±1 standard error.
CI = cochlear implant; NH = normal hearing; ST = secondary task.