Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.

Literature DB >> 33748329

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.

Amin Edraki¹, Wai-Yip Chan¹, Jesper Jensen², Daniel Fogerty³.

Abstract

Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.

Entities: Chemical

Keywords: spectro-temporal modulation; speech intelligibility; speech quality model

Year: 2020 PMID： 33748329 PMCID： PMC7978234 DOI： 10.1109/taslp.2020.3039929

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

38 in total

1. Speech-in-noise enhancement using amplification and dynamic range compression controlled by the speech intelligibility index.

Authors: Henning Schepker; Jan Rennies; Simon Doclo
Journal: J Acoust Soc Am Date: 2015-11 Impact factor: 1.840

2. Multiresolution spectrotemporal analysis of complex sounds.

Authors: Taishih Chi; Powen Ru; Shihab A Shamma
Journal: J Acoust Soc Am Date: 2005-08 Impact factor: 1.840

3. The contribution of temporal fine structure to the intelligibility of speech in steady and modulated noise.

Authors: Kathryn Hopkins; Brian C J Moore
Journal: J Acoust Soc Am Date: 2009-01 Impact factor: 1.840

4. A comparative intelligibility study of single-microphone noise reduction algorithms.

Authors: Yi Hu; Philipos C Loizou
Journal: J Acoust Soc Am Date: 2007-09 Impact factor: 1.840

5. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.

Authors: Søren Jørgensen; Torsten Dau
Journal: J Acoust Soc Am Date: 2011-09 Impact factor: 1.840

6. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

Authors: Marc René Schädler; Birger Kollmeier
Journal: J Acoust Soc Am Date: 2015-04 Impact factor: 1.840

7. Prediction of Intelligibility of Non-linearly Processed Speech.

Authors: Carl Ludvigsen; Claus Elberling; Gitte Keidser; Torben Poulsen
Journal: Acta Otolaryngol Date: 1990 Impact factor: 1.494

8. The effect of simulated room acoustic parameters on the intelligibility and perceived reverberation of monosyllabic words and sentences.

Authors: Daniel Fogerty; Ahmed Alghamdi; Wai-Yip Chan
Journal: J Acoust Soc Am Date: 2020-05 Impact factor: 1.840

Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.

1. Speech-in-noise enhancement using amplification and dynamic range compression controlled by the speech intelligibility index.

2. Multiresolution spectrotemporal analysis of complex sounds.

3. The contribution of temporal fine structure to the intelligibility of speech in steady and modulated noise.

4. A comparative intelligibility study of single-microphone noise reduction algorithms.

5. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.

6. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

7. Prediction of Intelligibility of Non-linearly Processed Speech.

8. The effect of simulated room acoustic parameters on the intelligibility and perceived reverberation of monosyllabic words and sentences.

9. Temporal envelope and fine structure cues for speech intelligibility.

10. Effect of temporal envelope smearing on speech reception.

1. Relating Suprathreshold Auditory Processing Abilities to Speech Understanding in Competition.

2. Glimpsing keywords across sentences in noise: A microstructural analysis of acoustic, lexical, and listener factors.