Literature DB >> 34179220

Towards Model Compression for Deep Learning Based Speech Enhancement.

Ke Tan1, DeLiang Wang2.   

Abstract

The use of deep neural networks (DNNs) has dramatically elevated the performance of speech enhancement over the last decade. However, to achieve strong enhancement performance typically requires a large DNN, which is both memory and computation consuming, making it difficult to deploy such speech enhancement systems on devices with limited hardware resources or in applications with strict latency requirements. In this study, we propose two compression pipelines to reduce the model size for DNN-based speech enhancement, which incorporates three different techniques: sparse regularization, iterative pruning and clustering-based quantization. We systematically investigate these techniques and evaluate the proposed compression pipelines. Experimental results demonstrate that our approach reduces the sizes of four different models by large margins without significantly sacrificing their enhancement performance. In addition, we find that the proposed approach performs well on speaker separation, which further demonstrates the effectiveness of the approach for compressing speech separation models.

Entities:  

Keywords:  Model compression; pruning; quantization; sparse regularization; speech enhancement

Year:  2021        PMID: 34179220      PMCID: PMC8224477          DOI: 10.1109/taslp.2021.3082282

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  8 in total

1.  Product quantization for nearest neighbor search.

Authors:  Hervé Jégou; Matthijs Douze; Cordelia Schmid
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2011-01       Impact factor: 6.226

2.  Supervised Speech Separation Based on Deep Learning: An Overview.

Authors:  DeLiang Wang; Jitong Chen
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2018-05-30

3.  Pruning algorithms-a survey.

Authors:  R Reed
Journal:  IEEE Trans Neural Netw       Date:  1993

4.  Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

Authors:  Yi Luo; Nima Mesgarani
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-05-06

5.  Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

Authors:  Ke Tan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-11-22

6.  Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.

Authors:  Yuzhou Liu; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2019-09-12

7.  On Training Targets for Supervised Speech Separation.

Authors:  Yuxuan Wang; Arun Narayanan; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2014-12

8.  Approximate nearest neighbor search by residual vector quantization.

Authors:  Yongjian Chen; Tao Guan; Cheng Wang
Journal:  Sensors (Basel)       Date:  2010-12-08       Impact factor: 3.576

  8 in total
  2 in total

1.  Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.

Authors:  Ashutosh Pandey; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2022-03-22

2.  Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.

Authors:  Heming Wang; DeLiang Wang
Journal:  IEEE/ACM Trans Audio Speech Lang Process       Date:  2021-12-28
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.