| Literature DB >> 34295288 |
Hoyoung Yi1, Ashly Pingsterhaus1, Woonyoung Song2.
Abstract
The coronavirus pandemic has resulted in the recommended/required use of face masks in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure the potential effects of wearing face masks on speech intelligibility in noisy environments where excessive background noise can create communication challenges. The effects of wearing transparent face masks and using clear speech to facilitate better verbal communication were evaluated in this study. We evaluated listener word identification scores in the following four conditions: (1) type of mask condition (i.e., no mask, transparent mask, and disposable face mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at -5 signal-to-noise ratio). Results indicate that in the presence of noise, listeners performed less well when the speaker wore a disposable face mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual presentation when listening to clear speech. Results indicate the combination of face masks and the presence of background noise negatively impact speech intelligibility for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including compensating for a lack of visual cues and reduced acoustic signals.Entities:
Keywords: COVID-19; audiovisual perception; clear speech; face masks; speech intelligibility
Year: 2021 PMID: 34295288 PMCID: PMC8292133 DOI: 10.3389/fpsyg.2021.682677
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Face mask conditions including (A) no mask, (B) transparent mask, and (C) disposable face mask.
Acoustic measures of sentence materials as produced with no mask (NO), transparent mask (TM), and disposable face mask (DM), in conversational speech and in clear speech.
| Conversational | NO | 225.03 | 113.35 | 1638.24 |
| Conversational | TM | 223.84 | 115.58 | 1609.41 |
| Conversational | DM | 224.14 | 118.09 | 1584.13 |
| Clear | NO | 221.69 | 132.92 | 3069.72 |
| Clear | TM | 222.55 | 140.20 | 3218.86 |
| Clear | DM | 221.53 | 138.87 | 3174.53 |
Figure 2Long-term-average spectrum (LTAS) for conversational speech (left) and clear speech (right) produced with no mask (NO, red line), transparent mask (TM, green line), and disposable face mask (DM, blue line).
Parameter estimates of odds ratio for the main effects and interaction effects of the mixed effects logistic regression model.
| Intercept | −0.999 | 0.127 | 0.368 | −7.863 | <0.001 |
| Clear | 0.441 | 0.140 | 1.555 | 3.146 | 0.002 |
| Transparent | −1.431 | 0.186 | 0.239 | −7.688 | <0.001 |
| Disposable | −0.892 | 0.166 | 0.410 | −5.365 | <0.001 |
| Speech-shaped | 0.986 | 0.138 | 2.680 | 7.141 | <0.001 |
| Audiovisual | 1.523 | 0.141 | 4.585 | 10.821 | <0.001 |
| Clear * transparent | −0.705 | 0.272 | 0.494 | −2.591 | 0.010 |
| Clear * disposable | 0.518 | 0.217 | 1.678 | 2.389 | 0.017 |
| Clear * speech-shaped | 0.659 | 0.198 | 1.932 | 3.328 | 0.001 |
| Transparent * speech-shaped | 0.676 | 0.230 | 1.966 | 2.943 | 0.003 |
| Disposable * speech-shaped | 0.468 | 0.212 | 1.597 | 2.206 | 0.027 |
| Clear * Audiovisual | 0.945 | 0.213 | 2.573 | 4.434 | <0.001 |
| Transparent * audiovisual | 0.132 | 0.232 | 1.141 | 0.570 | 0.569 |
| Disposable * audiovisual | −0.739 | 0.218 | 0.478 | −3.394 | 0.001 |
| Speech-shaped * Audiovisual | −0.258 | 0.200 | 0.773 | −1.289 | 0.198 |
| Clear * transparent * speech-shaped | 0.734 | 0.335 | 2.084 | 2.195 | 0.028 |
| Clear * disposable * speech-shaped | −0.386 | 0.292 | 0.680 | −1.324 | 0.186 |
| Clear * transparent * audiovisual | 0.699 | 0.345 | 2.011 | 2.027 | 0.043 |
| Clear * disposable * audiovisual | −1.102 | 0.304 | 0.332 | −3.622 | <0.001 |
| Clear * speech-shaped * audiovisual | −0.837 | 0.319 | 0.433 | −2.628 | 0.009 |
| Transparent * speech-shaped * audiovisual | −0.012 | 0.305 | 0.988 | −0.040 | 0.968 |
| Disposable * speech-shaped * audiovisual | −0.459 | 0.293 | 0.632 | −1.567 | 0.117 |
| Clear * transparent * speech-shaped * audiovisual | 0.117 | 0.481 | 1.124 | 0.242 | 0.809 |
| Clear * disposable * speech-shaped * audiovisual | 1.146 | 0.432 | 3.146 | 2.655 | 0.008 |
Random effects of the mixed effects logistic regression model.
| Intercept | subjects | 0.090 | |
| Intercept | sentence number | 0.257 |
Figure 3Proportion of correct keywords in sentences for comparisons of no mask (NO), transparent mask (TM), and disposable face mask (DM) produced with conversational (CO) and clear speaking (CL) styles presented in Audio only and Audiovisual modes with Speech-Shaped Noise and 4-Talker Babble.
Figure 5Proportion of correct keywords in sentences for comparisons between conversational (CO) and clear (CL) in each type of face masks (no mask: NO, transparent mask: TM, disposable face mask: DM) in both background noises and presentation modes.
The contrast of face masks (no mask, NO, transparent mask, TM, and disposable face mask, DM) in each speaking style, background noise, and presentation mode.
| Audio only | 4-T babble | Conversational | NO – TM | 1.431 | 0.186 | 7.688 | <0.001 |
| NO – DM | 0.892 | 0.166 | 5.365 | <0.001 | |||
| TM - DM | −0.539 | 0.203 | −2.659 | 0.024 | |||
| Clear | NO – TM | 2.136 | 0.199 | 10.740 | <0.001 | ||
| NO – DM | 0.374 | 0.140 | 2.677 | 0.022 | |||
| TM - DM | −1.762 | 0.202 | −8.735 | <0.001 | |||
| SSN | Conversational | NO – TM | 0.755 | 0.135 | 5.591 | <0.001 | |
| NO – DM | 0.424 | 0.132 | 3.205 | 0.0041 | |||
| TM - DM | −0.331 | 0.136 | −2.434 | 0.045 | |||
| Clear | NO – TM | 0.726 | 0.140 | 5.166 | <0.001 | ||
| NO – DM | 0.292 | 0.144 | 2.033 | 0.126 | |||
| TM - DM | −0.433 | 0.136 | −3.176 | 0.005 | |||
| Audiovisual | 4-T babble | Conversational | NO – TM | 1.299 | 0.138 | 9.413 | <0.001 |
| NO – DM | 1.631 | 0.142 | 11.511 | <0.001 | |||
| TM - DM | 0.332 | 0.144 | 2.314 | 0.062 | |||
| Clear | NO – TM | 1.305 | 0.161 | 8.084 | <0.001 | ||
| NO – DM | 2.215 | 0.160 | 13.816 | <0.001 | |||
| TM - DM | 0.910 | 0.135 | 6.760 | <0.001 | |||
| SSN | Conversational | NO – TM | 0.635 | 0.146 | 4.354 | <0.001 | |
| NO – DM | 1.622 | 0.144 | 11.231 | <0.001 | |||
| TM - DM | 0.987 | 0.135 | 7.288 | <0.001 | |||
| Clear | NO – TM | −0.209 | 0.233 | −0.900 | 1 | ||
| NO – DM | 1.446 | 0.188 | 7.674 | <0.001 | |||
| TM - DM | 1.656 | 0.199 | 8.319 | <0.001 |
Summary of Post-hoc analysis for the significant four-way interaction with Bonferroni's correct.
The contrast of background noises (4-T babble and speech-shaped noise, SSN) in each face mask condition, speaking style, and presentation mode.
| Audio only | Conversational | NO | −0.986 | 0.138 | −7.141 | <0.001 |
| TM | −1.661 | 0.183 | −9.062 | <0.001 | ||
| DM | −1.454 | 0.161 | −9.021 | <0.001 | ||
| Clear | NO | −1.644 | 0.142 | −11.541 | <0.001 | |
| TM | −3.054 | 0.198 | −15.405 | <0.001 | ||
| DM | −1.726 | 0.142 | −12.189 | <0.001 | ||
| Audiovisual | Conversational | NO | −0.728 | 0.145 | −5.015 | <0.001 |
| TM | −1.391 | 0.138 | −10.058 | <0.001 | ||
| DM | −0.736 | 0.141 | −5.233 | <0.001 | ||
| Clear | NO | −0.550 | 0.204 | −2.691 | 0.007 | |
| TM | −2.064 | 0.196 | −10.519 | <0.001 | ||
| DM | −1.318 | 0.140 | −9.450 | <0.001 |
Summary of Post-hoc analysis for the significant four-way interaction with Bonferroni's correction.
Figure 4Proportion of correct keywords in sentences for comparisons of Audio only (AO) and Audiovisual (AV) modes for each type of face mask (no mask: NO, transparent mask: TM, disposable face mask: DM) in both background noises and speaking styles.
The contrast of speaking styles (conversational, CO, and clear, CL) in each face mask condition, background noise, and presentation mode.
| Audio only | 4-T babble | NO | −0.441 | 0.140 | −3.146 | 0.002 |
| TM | 0.264 | 0.233 | 1.132 | 0.258 | ||
| DM | −0.959 | 0.165 | −5.807 | <0.001 | ||
| SSN | NO | −1.100 | 0.139 | −7.887 | <0.001 | |
| TM | −1.129 | 0.136 | −8.285 | <0.001 | ||
| DM | −1.232 | 0.137 | −8.986 | <0.001 | ||
| Audiovisual | 4-T babble | NO | −1.386 | 0.161 | −8.586 | <0.001 |
| TM | −1.380 | 0.138 | −9.968 | <0.001 | ||
| DM | −0.802 | 0.139 | −5.756 | <0.001 | ||
| SSN | NO | −1.208 | 0.192 | −6.303 | <0.001 | |
| TM | −2.053 | 0.197 | −10.427 | <0.001 | ||
| DM | −1.384 | 0.140 | −9.914 | <0.001 |
Summary of Post-hoc analysis for the significant four-way interaction with Bonferroni's correction.
The contrast of presentation modes (audio-only, AO and audiovisual, AV) in each face mask condition, background noise, and speaking style.
| 4-T babble | Conversational | NO | −1.523 | 0.141 | −10.821 | <0.001 |
| TM | −1.655 | 0.184 | −9.007 | <0.001 | ||
| DM | −0.784 | 0.166 | −4.711 | <0.001 | ||
| Clear | NO | −2.468 | 0.161 | −15.317 | <0.001 | |
| TM | −3.299 | 0.199 | −16.563 | <0.001 | ||
| DM | −0.627 | 0.138 | −4.529 | <0.001 | ||
| SSN | Conversational | NO | −1.265 | 0.143 | −8.867 | <0.001 |
| TM | −1.385 | 0.138 | −10.023 | <0.001 | ||
| DM | −0.067 | 0.134 | −0.495 | 0.621 | ||
| Clear | NO | −1.373 | 0.190 | −7.245 | <0.001 | |
| TM | −2.308 | 0.195 | −11.819 | <0.001 | ||
| DM | −0.219 | 0.142 | −1.546 | 0.122 |
Summary of Post-hoc analysis for the significant four-way interaction with Bonferroni's correction.