Literature DB >> 25439765

Deep Convolutional Neural Networks for large-scale speech tasks.

Tara N Sainath1, Brian Kingsbury2, George Saon3, Hagen Soltau4, Abdel-rahman Mohamed5, George Dahl6, Bhuvana Ramabhadran7.   

Abstract

Convolutional Neural Networks (CNNs) are an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. Since speech signals exhibit both of these properties, we hypothesize that CNNs are a more effective model for speech compared to Deep Neural Networks (DNNs). In this paper, we explore applying CNNs to large vocabulary continuous speech recognition (LVCSR) tasks. First, we determine the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks. Specifically, we focus on how many convolutional layers are needed, what is an appropriate number of hidden units, what is the best pooling strategy. Second, investigate how to incorporate speaker-adapted features, which cannot directly be modeled by CNNs as they do not obey locality in frequency, into the CNN framework. Third, given the importance of sequence training for speech tasks, we introduce a strategy to use ReLU+dropout during Hessian-free sequence training of CNNs. Experiments on 3 LVCSR tasks indicate that a CNN with the proposed speaker-adapted and ReLU+dropout ideas allow for a 12%-14% relative improvement in WER over a strong DNN system, achieving state-of-the art results in these 3 tasks.
Copyright © 2014 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Deep learning; Neural networks; Speech recognition

Mesh:

Year:  2014        PMID: 25439765     DOI: 10.1016/j.neunet.2014.08.005

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  26 in total

1.  DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations.

Authors:  Kevin R Coffey; Russell G Marx; John F Neumaier
Journal:  Neuropsychopharmacology       Date:  2019-01-04       Impact factor: 7.853

2.  A Weakly Supervised Learning Framework for Detecting Social Anxiety and Depression.

Authors:  Asif Salekin; Jeremy W Eberle; Jeffrey J Glenn; Bethany A Teachman; John A Stankovic
Journal:  Proc ACM Interact Mob Wearable Ubiquitous Technol       Date:  2018-06

3.  Machine Learning for Electronically Excited States of Molecules.

Authors:  Julia Westermayr; Philipp Marquetand
Journal:  Chem Rev       Date:  2020-11-19       Impact factor: 60.622

4.  Recent Machine Learning Progress in Lower Limb Running Biomechanics With Wearable Technology: A Systematic Review.

Authors:  Liangliang Xiang; Alan Wang; Yaodong Gu; Liang Zhao; Vickie Shim; Justin Fernandez
Journal:  Front Neurorobot       Date:  2022-06-02       Impact factor: 3.493

5.  Developing crossmodal expression recognition based on a deep neural model.

Authors:  Pablo Barros; Stefan Wermter
Journal:  Adapt Behav       Date:  2016-10-10       Impact factor: 1.942

6.  Quantum-chemical insights from deep tensor neural networks.

Authors:  Kristof T Schütt; Farhad Arbabzadah; Stefan Chmiela; Klaus R Müller; Alexandre Tkatchenko
Journal:  Nat Commun       Date:  2017-01-09       Impact factor: 14.919

7.  Deep learning with convolutional neural networks for EEG decoding and visualization.

Authors:  Robin Tibor Schirrmeister; Jost Tobias Springenberg; Lukas Dominique Josef Fiederer; Martin Glasstetter; Katharina Eggensperger; Michael Tangermann; Frank Hutter; Wolfram Burgard; Tonio Ball
Journal:  Hum Brain Mapp       Date:  2017-08-07       Impact factor: 5.038

8.  An Intelligent Gear Fault Diagnosis Methodology Using a Complex Wavelet Enhanced Convolutional Neural Network.

Authors:  Weifang Sun; Bin Yao; Nianyin Zeng; Binqiang Chen; Yuchao He; Xincheng Cao; Wangpeng He
Journal:  Materials (Basel)       Date:  2017-07-12       Impact factor: 3.623

9.  Convolutional Neural Networks with 3D Input for P300 Identification in Auditory Brain-Computer Interfaces.

Authors:  Eduardo Carabez; Miho Sugi; Isao Nambu; Yasuhiro Wada
Journal:  Comput Intell Neurosci       Date:  2017-11-07

10.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition.

Authors:  Francisco Javier Ordóñez; Daniel Roggen
Journal:  Sensors (Basel)       Date:  2016-01-18       Impact factor: 3.576

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.