Literature DB >> 34999618

How Face Masks Interfere With Speech Understanding of Normal-Hearing Individuals: Vision Makes the Difference.

Rasmus Sönnichsen¹, Gerard Llorach Tó², Sabine Hochmuth¹, Volker Hohmann^2,3,4, Andreas Radeloff^1,3,4.

Abstract

OBJECTIVE: To investigate the effects of wearing a simulated mask on speech perception of normal-hearing subjects. STUDY
DESIGN: Prospective cohort study.
SETTING: University hospital. PATIENTS: Fifteen normal-hearing, native German speakers (8 female, 7 male). INTERVENTION: Different experimental conditions with and without simulated face masks using the audiovisual version of the female German Matrix test (Oldenburger Satztest, OLSA). MAIN OUTCOME MEASURES: Signal-to-noise ratio (SNR) at speech intelligibility of 80%.
RESULTS: The SNR at which 80% speech intelligibility was achieved deteriorated by a mean of 4.1 dB SNR when simulating a medical mask and by 5.1 dB SNR when simulating a cloth mask in comparison to the audiovisual condition without mask. Interestingly, the contribution of the visual component alone was 2.6 dB SNR and thus had a larger effect than the acoustic component in the medical mask condition.
CONCLUSIONS: As expected, speech understanding with face masks was significantly worse than under control conditions. Thus, the speaker's use of face masks leads to a significant deterioration of speech understanding by the normal-hearing listener. The data suggest that these effects may play a role in many everyday situations that typically involve noise.

Entities: Chemical

Mesh：

Year: 2022 PMID： 34999618 PMCID： PMC8843397 DOI： 10.1097/MAO.0000000000003458

Source DB: PubMed Journal: Otol Neurotol ISSN： 1531-7129 Impact factor: 2.311

INTRODUCTION

Since December 2019 the COVID-19 pandemic originating in Wuhan, China has spread all over the world (1). After a reduction of new COVID-cases over the summer of 2020, several waves hit many countries since fall 2020 (2,3). With growing knowledge concerning the new coronavirus and its ways of transmission, wearing a face mask and additional personal protective gear has been shown to reduce the risk of infection (4). Factors such as air pollution and different virus outbreaks have led to face masks being part of everyday life to protect oneself and others in many Asian countries even before the COVID-19 pandemic (5–7). This has now been adapted by the whole world and therefore wearing a face mask on several occasions has become part of the daily routine. Nevertheless, there are also disadvantages of wearing face masks. Besides a certain discomfort and minor potential health risks of long-term wearing (e.g., de novo headaches) (8), face masks strongly interfere with communication. Acoustic attenuation (9–12), loss of facial expressions (13), and altered speech memory (14) have been identified as relevant factors. Moreover, the loss of the possibility to speechread, that is the skill to understand by using visual cues of the talker, is of great importance (15). Most of the established speech intelligibility tests use audio-only stimuli to examine the auditory system. A clinically well established speech intelligibility test is the matrix sentence test, which is available in different languages (16). Recently, an audiovisual version of the test has been introduced that is a modification with incorporation of a speechreading aspect into the German matrix test (Oldenburger Satztest, OLSA) (17). To this end, the original audio-only OLSA was supplemented with video content from the speaker (Supplemental Figure S1). This is a valuable and necessary addition, since it is well known that hearing impaired are especially reliant on speechreading (18–20). The importance of this aspect is underlined by several countries (e.g., GB (21), AUS (22), CAN (23)) having adapted rules for face mask wearing when communicating with hearing impaired individuals, which include the possibility to remove the mask, as long as certain hygiene measures are followed. In everyday life, however, even normal-hearing subjects have difficulties in understanding interlocutors with a face mask. So far, the acoustic attenuation of the masks is considered the main mechanism behind this phenomenon (10–12). In order to understand the repercussions of wearing a mask during communication, we tested speech reception in five different conditions with normal-hearing participants: two control conditions (audio-only and audiovisual), an audiovisual condition with simulated mask and unaltered audio, and two audiovisual conditions with a simulated mask and filtered audio (medical and cloth mask). We show here that the majority of normal-hearing individuals can use speechreading for speech comprehension and that its absence is a large factor for worsened speech comprehension when listening to individuals wearing face masks.

MATERIALS AND METHODS

The experimental protocol was approved by the institutional review board (Medizinische Ethikkommission) of the University of Oldenburg and all experiments were performed in accordance with the relevant guidelines and regulations. Informed consent was obtained from all participants. Informed consent for publication of videos and video stills was obtained from the female speaker of the audiovisual German Matrix test. Fifteen normal-hearing, native German speakers (8 female, 7 male) aged between 22 and 42 years (mean age: 30.6 years) participated in the study. Clinical standard audiometric tests (pure-tone thresholds, digits in quiet, speech intelligibility in noise) were performed. Pure tone averages at 500, 1, 2, and 4 kHz (PTA4) of all participants did not exceed 10 dB HL in both ears. Thresholds of 50% intelligibility in digits in quiet measured with the Freiburg digit test were 0 dB HL or better. Speech reception thresholds of 50% intelligibility in noise measured with the male Oldenburg sentence test (24) ranged between −5.6 and −8.6 dB SNR with an average of −6.8 ± 0.9 dB SNR. The audiovisual version of the female German Matrix test was used as previously described by Llorach et al. (17). This version uses the audio material of the female German Matrix test (25,26) and video recordings of the talker's head (for details see Llorach et al. (17)). An audiovisual mask condition was simulated by editing a mask shaped object on the talker's mouth (Supplemental Figure S1). In addition, the audio signal was filtered according to attenuation patterns of a handmade two-layer cloth mask of cotton fabric and a medical mask type IIR (EN14683). The filter parameters were determined as follows: A female speaker was recorded speaking 30 sentences under each of the three conditions (without mask, cloth mask, medical mask type IIR). The recorded sentences contained three times the complete base word matrix of the German matrix, were cut sentence-by-sentence and equalized in RMS-level. The filter was built based on the difference in third-octave frequency spectra between speech produced uncovered and speech produced with the respective mask types. The average spectral differences of the two masks to the no-mask-condition of the recorded talker are shown in Figure 1 including SD on sentence level. The spectral differences of the cloth mask in this study were very similar to those described by Corey et al. (11) (see reprinted curve of Corey et al. in Fig. 1). The spectral attenuation effect of the medical mask (type II) in Corey et al. (11) was a little lower in the higher frequencies than observed in the current study. This might be due to the different types of medical masks used (type IIR in the current study versus type II in the study of Corey et al.).

FIG. 1

Third-octave spectral difference of speech spoken with medical (dashed line) and cloth (dotted line) mask compared to uncovered speech (black line). Shaded areas refer to standard deviation of spectral differences on single sentence level (blue: cloth mask, red: medical mask). For reasons of comparison spectral differences of comparable masks from Corey et al. (2020) are replotted (stars: two-layer cotton mask, pentagram: medical mask). The listener was seated in a sound-treated examination room in front of a loudspeaker (8030C studio monitor, Genelec, Iisalmi, Finland) and a 23.8” screen (P2419H, DELL GmbH, Frankfurt, Germany). Screen and loudspeaker were placed 80 cm in front of the seated participant. The size of the head on the screen matched the size of a real head in 1.3 m distance representing a general communication distance. The height of the loudspeaker was adjusted to the height of the ears of an average listener. The experiments were programmed in Matlab2018 (The MathWorks Inc., Nattick, Massachusetts, USA), and reproduced using VLC media player 3.0.3. (videolan.org, General Public License). The acoustic signal was directed through a sound card (Fireface uc, RME Audio AG, Haimhausen, Germany) to the loudspeaker. Acoustic signals were calibrated to a level of 85 dB SPL using a level meter (322A, PCE Deutschland GmbH, Meschede, Germany) placed at the listeners head position. The video was calibrated for synchrony using an external camera as described by Llorach et al. (17). The sentences were presented in the stationary test-specific noise. Participants were instructed using an instruction sheet. The noise level was fixed to a level of 65 dB SPL. The presentation level of the speech started at a level of 60 dB SPL and was adjusted after each sentence according to the participant's response yielding 80% intelligibility. The SRT80% was determined instead of the more usual SRT50%, since some individuals are capable of understanding 50% correct by speechreading-only (i.e., independent from the acoustic signal), leading to an undeterminable SNR. For each condition 20 sentences were presented in open-set response format, that is, participants were asked to repeat the words understood, guessing was permitted. The number of correct words was then scored by the investigator for each sentence. Participants were trained with two lists of 20 sentences in the audiovisual condition. Afterwards, the following five conditions were measured in random order: Audio only Audiovisual Audiovisual with simulated mask Audiovisual with simulated mask and cloth mask audio Audiovisual with simulated mask and medical mask audio A total of 140 sentences were played to each participant; 40 sentences for the training and 100 sentences for the tests. For statistical analysis, the data was tested for normality with the Kolmogorov–Smirnov test. If normality was proven, significance was tested by one-way ANOVA and Tukey's test for correction of multiple testing and the data was plotted as mean with standard deviation (SD). If normality tests failed (data of test lists, Fig. 5), Friedman's test with Dunn's correction for multiple testing was used and data was plotted as median with range. A p value of 0.05 or less was considered statistically significant. Statistical analysis was performed with Prism 9 (GraphPad Software, San Diego, CA).

FIG. 5

Training effect of audiovisual speech reception threshold at 80% word recognition. A significant training effect was found between the first and the second run (Training) but not between the second and third (Trial). Data shown as median and range. ∗∗p < 0.01; ns: not statistically significant.

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

RESULTS

Acoustic Attenuation of Two Types of Masks

Medical and cloth masks led to an acoustic attenuation of the voice predominantly in the middle and high frequencies (Fig. 1). The cloth mask had a detectable effect of more than 1 dB beginning from 1.5 kHz upward with a maximum of about 8 dB attenuation at around 8 kHz (blue line in Fig. 1). The medical mask had more favorable acoustic properties with a detectable effect above 2.5 kHz and a maximum attenuation of about 6 dB at around 8 kHz (red line in Fig. 1).

Speechreading With and Without Mask

There was a large improvement in speech perception of the normal-hearing subjects in this study when visual cues provided by the mouth region were available (Fig. 2): In the audio-only condition the average speech reception threshold at 80% word recognition (SRT80%) was −6.9 dB SNR (SD 1.0 dB). This indicates that 80% of the test words are correctly understood when the noise signal is 6.9 dB louder than the speech signal. In the audiovisual condition (face visible, unaltered audio), however, the average SRT80% was −9.4 dB SNR (SD 1.6 dB) indicating a statistically significant benefit of the visual cues of 2.5 dB (SD 1.5 dB, p < 0.001). In the audiovisual condition with visual mask (mouth region not visible, but unaltered audio) the SRT80% was almost equal to the audio-only condition with −6.8 dB SNR (SD 1.1 dB, p = 0.993; difference not significant). When the acoustic attenuation of the masks was added to the aforementioned condition, speech recognition further deteriorated: with the acoustic filter of the medical mask the SRT80% was −5.3 dB SNR (SD 0.9 dB) and with the acoustic filter of the cloth mask the SRT80% was −4.3 dB SNR (SD 1.0 dB).

FIG. 2

Speech perception given as speech reception threshold at 80% word recognition (SRT80%) in dB signal to noise-ratio. Negative values indicate that the sound pressure level of the noise was higher than that of the speech signal. Individual values and mean are shown. A, audio only; AV, audiovisual; ∗∗p < 0.01; ns: not statistically significant.

Visual and Acoustic Effects of the Mask

The audiovisual condition with visual mask and unaltered audio was used as “baseline” condition in Figure 3. Removing the mask improved and adding the acoustic filter of the mask worsened the SNR at SRT80%. The data indicate that the visual aspect of the mask accounted for 2.6 dB SNR (SD 1.4 dB) SRT80%-difference, which was almost equal to the audio-only condition (no visual information at all). The acoustic attenuation accounted for 1.6 dB SNR (SD 1.1 dB, medical mask) and 2.5 dB SNR (SD 1.0 dB, cloth mask). Thus, both the visual occlusion and the acoustic attenuation of the mask significantly deteriorated the SNR at SRT80%.

FIG. 3

Contribution of visual and acoustic aspects of masks to the deterioration of speech reception thresholds. Starting from an audiovisual condition with mask but unaltered audio (baseline) adding vision enhances the SRT80%, adding acoustic filters of the masks in contrast deteriorates the SRT80%. ∗p < 0.05; ∗∗p < 0.01.

Individual Audiovisual Benefit of Normal-Hearing Subjects

There were large differences in the SRTs of the audiovisual condition (no mask and unaltered audio) among the normal-hearing subjects. Not surprisingly, almost all subjects performed better in the audiovisual condition without mask compared to the audiovisual condition with visual mask and unaltered audio (Fig. 4). Only 2 of 15 subjects did not benefit from having visual cues of the mouth region with a SRT80%-difference of −0.3 dB SNR and 0.2 dB SNR, respectively. The benefit of the other 13 subjects lay between 1.3 dB SNR and 4.5 dB SNR.

FIG. 4

Individual speech reading benefit. All but two subjects (red lines on the left; red dots on the right) benefit significantly from unmasking (AV) compared to the visual mask condition, indicating that they can use the visual cues. Black line on the right: mean speech reading benefit. We found a training effect of about 2 dB SNR, given as the difference between the two training lists of 20 sentences in audiovisual condition that were applied prior to the actual measurements. The SRT of the second training list did not differ significantly from the SRT measured in the audiovisual condition within the randomized sequence of test conditions (Fig. 5). Training effect of audiovisual speech reception threshold at 80% word recognition. A significant training effect was found between the first and the second run (Training) but not between the second and third (Trial). Data shown as median and range. ∗∗p < 0.01; ns: not statistically significant.

DISCUSSION

Understanding speech in situations involving a certain background noise is part of everyday life. Even for normal-hearing individuals this poses a challenge. Hearing–impaired individuals are even more affected by background noises. Speechreading in combination with auditory information can improve speech intelligibility. In our experiments normal–hearing subjects showed an improved speech understanding in noise when visual cues were available. The observed visual benefit disappeared completely when the mouth region was covered by a face mask, therefore linking this effect to speechreading. This resulted in an equally effective communication when speaking with face masks as if the interlocutors would not have any visual information at all. It has been shown before that normal-hearing individuals profit from visual cues when speech intelligibility is assessed especially in noisy environments (27–29). In accordance, we report a 2.5 dB increase in SRT80% values in audiovisual conditions compared to audio-only conditions, which corresponds to a difference in speech intelligibility of about 30%, when approximating with the intelligibility function's slope from the female German matrix test as derived from Kollmeier et al. (16). This effect was smaller than observed by Llorach et al. (17) for the same test material. One plausible explanation is that the participants in the aforementioned study did more training lists and more conditions, thus they were able to improve their performance over the lists by getting used to the material and the talker. Other groups have reported no improvement in speech intelligibility in normal-hearing subjects, when offered an audiovisual signal compared to an audio-only signal, but this seemed to be due to an insufficient signal-to-noise ratio resulting in a ceiling effect (15). Consistent with Llorach et al. (17), our results show a wide range of audiovisual gain between subjects ranging from −0.3 to 4.5 dB SRT80% improvement. Factors that influence audiovisual gain include the ability to speechread, the ability of encoding auditory information and integration of both modalities (30). It is also clear that higher cortical processes and different biological systems are involved in audiovisual integration and that differences in the efficacy of these processes can at least partially explain inter-individual differences in normal-hearing subjects (31). In addition, we evaluated the attenuation properties of two types of face masks (cloth and medical). In general accordance with previous findings a similar reduction in high frequency sound levels of both masks could be detected (10–12). For further studies it is important to note the differences in the acoustic attenuation of cloth masks which seem to be highly dependent on the material and number of layers used (11,12). In accordance with our findings of attenuation properties we showed that filtering the speech signal according to the attenuation patterns of a medical or a cloth mask further deteriorates speech understanding. This effect depends on the type of mask and its attenuation properties in the mid- and high frequencies and was up to the size of effect of masking the visual information (2.5 dB for the cloth mask vs. 2.6 dB visual loss). Muzzi et al. (32) investigated the effect of different types of face masks and face shields and found that different face masks had an impact on auditory speech recognition thresholds and the speech intelligibility index in noisy environments by attenuating the acoustic speech signal. They describe a decline of up to 6.4% in speech intelligibility index scores and a more than 20% decline in speech recognition when wearing a medical face mask (32). This is in line with our findings where we found an average decline of 1.6 dB in SRT80% for the medical mask that would corresponds to about 20% intelligibility loss (16). The cloth mask would correspond to about 31% auditory intelligibility loss (16). In contrast, Magee et al. did not detect significant differences in speech intelligibility between no-mask conditions and different mask conditions in audio-only analysis in quiet, but they discuss, that measuring in noisy environments could reduce speech intelligibility (33). Other groups have shown that the effect on speech intelligibility of surgical or medical face masks in certain speech intelligibility assessments is less distinct compared to N95 masks or air-purifying respirators (32,34–36). A probable cause are their larger attenuation properties especially in the mid-frequency region (11), which are known to be most important for speech intelligibility (37,38). Our study shows that both effects, the acoustic deterioration and the missing visual cues of speechreading add up to a substantial loss in speech understanding in noise already in normal-hearing subjects by hindering the process of speech encoding (30). This combined effect was up to 5 dB in the worst acoustic condition (cloth mask) corresponding to about 60% intelligibility difference (16). This seems even more relevant since similar masks are commonly used by the public during the COVID-19 pandemic in daily life (39–41). Some studies have evaluated modifications and alternatives to face masks to overcome the adverse effects of mask wearing on communication. It seems that raising the voice can at least partially compensate for the attenuation of the speech signal and loss of the possibility to speechread (42). Corey et al. discussed the use of transparent masks (11). Although it seems they have worse acoustic properties compared to medical masks, speech intelligibility improves with addition of visual cues especially in hearing impaired individuals (15). Further suggestions include a speech signal amplification by using lapel microphones to enhance the signal noise ratio (11). More suitable nowadays could be the use of smartphones, which alone or in combination with headphones offer very effective noise reduction in everyday life. Using only one speaker with a simulated face mask for the audiovisual German matrix test poses a limitation to this study. For further investigations of speech intelligibility with different masks, ideal conditions would include a speaker actually wearing different types of face masks (11,15). In addition, a more realistic approximation of speech intelligibility in everyday situations could be achieved by using different male and female speakers as described for other speech intelligibility tests before (43). Future studies should include hearing-impaired listeners since it can be assumed that hearing loss has an additional impact on speechreading and audiovisual integration resulting in a greater audiovisual gain compared to normal-hearing subjects (19,44).

CONCLUSION

We demonstrate that audiovisual speech perception is highly affected by face mask wearing. Interestingly, even in normal hearing subjects, visual aspects play a major role for this phenomenon. Both, visual and acoustic effects, thus contribute to the explanation of speech comprehension difficulties in the everyday experience of normal-hearing subjects.

34 in total

1. Acoustic properties of naturally produced clear speech at normal speaking rates.

Authors: Jean C Krause; Louis D Braida
Journal: J Acoust Soc Am Date: 2004-01 Impact factor: 1.840

2. Effect of speechreading in presbycusis: Do we have a third ear?

Authors: Luis Roque Reis; Pedro Escada
Journal: Otolaryngol Pol Date: 2017-12-30

3. Speechreading as a communication mediator.

Authors: Letícia Neves de Oliveira; Alexandra Dezani Soares; Brasilia Maria Chiari
Journal: Codas Date: 2014 Jan-Feb

4. The effect of talker- and listener-related factors on intelligibility for a real-word, open-set perception test.

Authors: Duncan Markham; Valerie Hazan
Journal: J Speech Lang Hear Res Date: 2004-08 Impact factor: 2.297

5. Knowledge, attitudes, and practices of Hong Kong population towards human A/H7N9 influenza pandemic preparedness, China, 2014.

Authors: Emily Y Y Chan; Calvin K Y Cheng; Greta Tam; Zhe Huang; Poyi Lee
Journal: BMC Public Health Date: 2015-09-22 Impact factor: 3.295

6. The negative impact of wearing personal protective equipment on communication during coronavirus disease 2019.

Authors: T Hampton; R Crunkhorn; N Lowe; J Bhat; E Hogg; W Afifi; S De; I Street; R Sharma; M Krishnan; R Clarke; S Dasgupta; S Ratnayake; S Sharma
Journal: J Laryngol Otol Date: 2020-07-28 Impact factor: 1.469

7. Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis.

Authors: Derek K Chu; Elie A Akl; Stephanie Duda; Karla Solo; Sally Yaacoub; Holger J Schünemann
Journal: Lancet Date: 2020-06-01 Impact factor: 79.321