Literature DB >> 26783204

Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

Swapna Agarwalla1, Kandarpa Kumar Sarma2.   

Abstract

Automatic Speaker Recognition (ASR) and related issues are continuously evolving as inseparable elements of Human Computer Interaction (HCI). With assimilation of emerging concepts like big data and Internet of Things (IoT) as extended elements of HCI, ASR techniques are found to be passing through a paradigm shift. Oflate, learning based techniques have started to receive greater attention from research communities related to ASR owing to the fact that former possess natural ability to mimic biological behavior and that way aids ASR modeling and processing. The current learning based ASR techniques are found to be evolving further with incorporation of big data, IoT like concepts. Here, in this paper, we report certain approaches based on machine learning (ML) used for extraction of relevant samples from big data space and apply them for ASR using certain soft computing techniques for Assamese speech with dialectal variations. A class of ML techniques comprising of the basic Artificial Neural Network (ANN) in feedforward (FF) and Deep Neural Network (DNN) forms using raw speech, extracted features and frequency domain forms are considered. The Multi Layer Perceptron (MLP) is configured with inputs in several forms to learn class information obtained using clustering and manual labeling. DNNs are also used to extract specific sentence types. Initially, from a large storage, relevant samples are selected and assimilated. Next, a few conventional methods are used for feature extraction of a few selected types. The features comprise of both spectral and prosodic types. These are applied to Recurrent Neural Network (RNN) and Fully Focused Time Delay Neural Network (FFTDNN) structures to evaluate their performance in recognizing mood, dialect, speaker and gender variations in dialectal Assamese speech. The system is tested under several background noise conditions by considering the recognition rates (obtained using confusion matrices and manually) and computation time. It is found that the proposed ML based sentence extraction techniques and the composite feature set used with RNN as classifier outperform all other approaches. By using ANN in FF form as feature extractor, the performance of the system is evaluated and a comparison is made. Experimental results show that the application of big data samples has enhanced the learning of the ASR system. Further, the ANN based sample and feature extraction techniques are found to be efficient enough to enable application of ML techniques in big data aspects as part of ASR systems.
Copyright © 2015 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Artificial Neural Network (ANN); Automatic Speech Recognition (ASR); Deep Neural Network (DNN); Fully Focused Time Delay Neural Network (FFTDNN); Multi Layer Perceptron (MLP); Recurrent Neural Network (RNN)

Mesh:

Year:  2015        PMID: 26783204     DOI: 10.1016/j.neunet.2015.12.010

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  5 in total

1.  High Precision Sea Surface Temperature Prediction of Long Period and Large Area in the Indian Ocean Based on the Temporal Convolutional Network and Internet of Things.

Authors:  Tianying Sun; Yuan Feng; Chen Li; Xingzhi Zhang
Journal:  Sensors (Basel)       Date:  2022-02-19       Impact factor: 3.576

Review 2.  A proposed artificial intelligence-based real-time speech-to-text to sign language translator for South African official languages for the COVID-19 era and beyond: In pursuit of solutions for the hearing impaired.

Authors:  Milka C Madahana; Katijah Khoza-Shangase; Nomfundo Moroe; Daniel Mayombo; Otis Nyandoro; John Ekoru
Journal:  S Afr J Commun Disord       Date:  2022-08-19

3.  Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data.

Authors:  Steven T Simon; Divneet Mandair; Premanand Tiwari; Michael A Rosenberg
Journal:  J Cardiovasc Pharmacol Ther       Date:  2021-03-08       Impact factor: 2.457

4.  Multi-Source Deep Transfer Neural Network Algorithm.

Authors:  Jingmei Li; Weifei Wu; Di Xue; Peng Gao
Journal:  Sensors (Basel)       Date:  2019-09-16       Impact factor: 3.576

5.  Assessment of a Machine Learning Model Applied to Harmonized Electronic Health Record Data for the Prediction of Incident Atrial Fibrillation.

Authors:  Premanand Tiwari; Kathryn L Colborn; Derek E Smith; Fuyong Xing; Debashis Ghosh; Michael A Rosenberg
Journal:  JAMA Netw Open       Date:  2020-01-03
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.