| Literature DB >> 35217676 |
K Cieśla1,2, T Wolak3, A Lorens3, M Mentzel4, H Skarżyński3, A Amedi4.
Abstract
Understanding speech in background noise is challenging. Wearing face-masks, imposed by the COVID19-pandemics, makes it even harder. We developed a multi-sensory setup, including a sensory substitution device (SSD) that can deliver speech simultaneously through audition and as vibrations on the fingertips. The vibrations correspond to low frequencies extracted from the speech input. We trained two groups of non-native English speakers in understanding distorted speech in noise. After a short session (30-45 min) of repeating sentences, with or without concurrent matching vibrations, we showed comparable mean group improvement of 14-16 dB in Speech Reception Threshold (SRT) in two test conditions, i.e., when the participants were asked to repeat sentences only from hearing and also when matching vibrations on fingertips were present. This is a very strong effect, if one considers that a 10 dB difference corresponds to doubling of the perceived loudness. The number of sentence repetitions needed for both types of training to complete the task was comparable. Meanwhile, the mean group SNR for the audio-tactile training (14.7 ± 8.7) was significantly lower (harder) than for the auditory training (23.9 ± 11.8), which indicates a potential facilitating effect of the added vibrations. In addition, both before and after training most of the participants (70-80%) showed better performance (by mean 4-6 dB) in speech-in-noise understanding when the audio sentences were accompanied with matching vibrations. This is the same magnitude of multisensory benefit that we reported, with no training at all, in our previous study using the same experimental procedures. After training, performance in this test condition was also best in both groups (SRT ~ 2 dB). The least significant effect of both training types was found in the third test condition, i.e. when participants were repeating sentences accompanied with non-matching tactile vibrations and the performance in this condition was also poorest after training. The results indicate that both types of training may remove some level of difficulty in sound perception, which might enable a more proper use of speech inputs delivered via vibrotactile stimulation. We discuss the implications of these novel findings with respect to basic science. In particular, we show that even in adulthood, i.e. long after the classical "critical periods" of development have passed, a new pairing between a certain computation (here, speech processing) and an atypical sensory modality (here, touch) can be established and trained, and that this process can be rapid and intuitive. We further present possible applications of our training program and the SSD for auditory rehabilitation in patients with hearing (and sight) deficits, as well as healthy individuals in suboptimal acoustic situations.Entities:
Mesh:
Year: 2022 PMID: 35217676 PMCID: PMC8881456 DOI: 10.1038/s41598-022-06855-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Group composition and the English language background.
| Group 1 & group 2 | Group 1: trained with audio-tactile input | Group 2: trained with audio only input | Between-group comparisons | |
|---|---|---|---|---|
| N | 40 | 20 | 20 | N.A |
| Age | M = 26.67, SD = 3.48 | M = 27.8, SD = 3.53 | M = 25.55, SD = 3.11 | F(1,38) = 0.01, p = 0.897 |
| Female:male | 21:19 | 10:10 | 9:11 | χ2 = 0.05, p = 0.752 |
Age of initial English language acquisition “At what age did you start learning English? “ | M = 6.85, SD = 2.53 | M = 6.95, SD = 2.78 | M = 6.75, SD = 2.33 | F(1,38) = 0.26, p = 0.612 |
| Years of using/being exposed to the English language | M = 19.82, SD = 4.3 | M = 20.84, SD = 4.1 | M = 18.8, SD = 4.34 | F(1,38) = 0.01, p = 0.911 |
Self-rated English proficiency (scale 1–5) “How would You rate your English abilities in …?” | (a) Reading: M = 4.43, SD = 0.59 (b) Writing: M = 4.03, SD = 0.66 (c) Speaking: M = 4.35, SD = 0.66 (d) Understanding from hearing: M = 4.45, SD = 0.67 | (a) Reading: M = 4.45, SD = 0.68 (b) Writing: M = 3.9, SD = 0.85 (c) Speaking: M = 4.35, SD = 0.74 (d) Understanding from hearing: M = 4.45, SD = 0.68 | (a) Reading: M = 4.4, SD = 0.5 (b) Writing: M = 4.15, SD = 0.36 (c) Speaking: M = 4.35, SD = 0.58 (d) Understanding from hearing: M = 4.45, SD = 0.86 | (a) Reading: F(1,38) = 3.26, p = 0.079 (b) Writing: F(1,38) = 8.21, p = 0.07 (c) Speaking: F(1,38) = 1.98, p = 0.16 (d) Understanding from hearing: F(1,38) = 0.00, p = 1.000 |
Every-day communication in English “When communicating with others, what percentage of the time, during a typical day, do you communicate in English? | M = 21.78, SD = 21.61 | M = 24.4, SD = 22.5 | M = 19.15, SD = 20.9 | F(1,38) = 0.58, p = 0.45 |
Figure 1(A) The vibrating interface of the SSD and (B) the MatLab GUI.
Figure 2The timeline of the experiment. AT Audio-Tactile, SRT an individually established Speech Reception Threshold that is then used throughout the Training session.
SRT values in subsequent speech-understanding tests in both groups together and in each group separately.
| Session | 1 (before training) | 2 (after training) | ||||
|---|---|---|---|---|---|---|
| Test | Audio only | Audio Tactile matching | Audio Tactile nonMatching | Audio only | Audio tactile matching | Audio tactile nonMatching |
| Both groups together | M = 22.96 (SD = 10.98) | M = 16.8 (SD = 9.15) | M = 16.67 (SD = 9.3) | M = 6.47 (SD = 6.9) | M = 2.09 (SD = 6) | M = 10.16 (SD = 8.7) |
| Group 1 (Audio tactile training) | M = 21.46 (SD = 10.68) | M = 14.66 (SD = 8.86) | M = 14.6 (SD = 8.96) | M = 6.71 (SD = 7.96) | M = 1.89 (SD = 6.28) | M = 10.44 (SD = 8.62) |
| Group 2 (audio only training | M = 24.44 (SD = 11.34) | M = 18.96 (SD = 9.31) | M = 18.73 (SD = 9.42) | M = 6.22 (SD = 5.85) | M = 2.28 (SD = 5.89) | M = 9.88 (SD = 9.02) |
M mean, SD standard deviation.
Figure 3Speech reception thresholds in three test conditions before and after training in two groups separately; bars correspond to standard errors of the mean (Bonferroni, *indicates p < 0.017, **indicates p < 0.003, ***indicates p < 0.0003).
Figure 4Speech reception thresholds in two sessions separately; bars correspond to standard errors of the mean; p values were Bonferroni corrected, *indicates p < 0.017, **indicates p < 0.003, ***indicates p < 0.0003.
Figure 5Scatterplots showing a positive relationship between SRT values obtained in three tests before training and the amount of improvement in each of them; shadowing corresponds to a 95% confidence interval.