Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Towards Robust Speech Super-resolution.

Literature DB >> 34458395

Towards Robust Speech Super-resolution.

Abstract

Speech super-resolution (SR) aims to increase the sampling rate of a given speech signal by generating high-frequency components. This paper proposes a convolutional neural network (CNN) based SR model that takes advantage of information from both time and frequency domains. Specifically, the proposed CNN is a time-domain model that takes the raw waveform of low-resolution speech as the input, and outputs an estimate of the corresponding high-resolution waveform. During the training stage, we employ a cross-domain loss to optimize the network. We compare our model with several deep neural network (DNN) based SR models, and experiments show that our model outperforms existing models. Furthermore, the robustness of DNN-based models is investigated, in particular regarding microphone channels and downsampling schemes, which have a major impact on the performance of DNN-based SR models. By training with proper datasets and preprocessing, we improve the generalization capability for untrained microphone channels and unknown downsampling schemes.

Entities: Chemical

Keywords: Speech super-resolution; bandwidth extension; convolutional neural network; robust speech super-resolution

Year: 2021 PMID： 34458395 PMCID： PMC8386817 DOI： 10.1109/taslp.2021.3054302

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

Keyword Cloud
References

6 in total

1. Image Super-Resolution Using Deep Convolutional Networks.

Authors: Chao Dong; Chen Change Loy; Kaiming He; Xiaoou Tang
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2016-02 Impact factor: 6.226

2. A New Framework for CNN-Based Speech Enhancement in the Time Domain.

Authors: Ashutosh Pandey; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2019-04-29

3. On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

Authors: Ashutosh Pandey; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2020-08-14

4. Complex Ratio Masking for Monaural Speech Separation.

Authors: Donald S Williamson; Yuxuan Wang; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2015-12-23

5. Dense CNN with Self-Attention for Time-Domain Speech Enhancement.

Authors: Ashutosh Pandey; DeLiang Wang
Journal: IEEE/ACM Trans Audio Speech Lang Process Date: 2021-03-08

Review 6. SciPy 1.0: fundamental algorithms for scientific computing in Python.

Authors: Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; Stéfan J van der Walt; Matthew Brett; Joshua Wilson; K Jarrod Millman; Nikolay Mayorov; Andrew R J Nelson; Eric Jones; Robert Kern; Eric Larson; C J Carey; İlhan Polat; Yu Feng; Eric W Moore; Jake VanderPlas; Denis Laxalde; Josef Perktold; Robert Cimrman; Ian Henriksen; E A Quintero; Charles R Harris; Anne M Archibald; Antônio H Ribeiro; Fabian Pedregosa; Paul van Mulbregt
Journal: Nat Methods Date: 2020-02-03 Impact factor: 28.547

6 in total