Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

Literature DB >> 26783204

Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

Swapna Agarwalla¹, Kandarpa Kumar Sarma².

Abstract

Automatic Speaker Recognition (ASR) and related issues are continuously evolving as inseparable elements of Human Computer Interaction (HCI). With assimilation of emerging concepts like big data and Internet of Things (IoT) as extended elements of HCI, ASR techniques are found to be passing through a paradigm shift. Oflate, learning based techniques have started to receive greater attention from research communities related to ASR owing to the fact that former possess natural ability to mimic biological behavior and that way aids ASR modeling and processing. The current learning based ASR techniques are found to be evolving further with incorporation of big data, IoT like concepts. Here, in this paper, we report certain approaches based on machine learning (ML) used for extraction of relevant samples from big data space and apply them for ASR using certain soft computing techniques for Assamese speech with dialectal variations. A class of ML techniques comprising of the basic Artificial Neural Network (ANN) in feedforward (FF) and Deep Neural Network (DNN) forms using raw speech, extracted features and frequency domain forms are considered. The Multi Layer Perceptron (MLP) is configured with inputs in several forms to learn class information obtained using clustering and manual labeling. DNNs are also used to extract specific sentence types. Initially, from a large storage, relevant samples are selected and assimilated. Next, a few conventional methods are used for feature extraction of a few selected types. The features comprise of both spectral and prosodic types. These are applied to Recurrent Neural Network (RNN) and Fully Focused Time Delay Neural Network (FFTDNN) structures to evaluate their performance in recognizing mood, dialect, speaker and gender variations in dialectal Assamese speech. The system is tested under several background noise conditions by considering the recognition rates (obtained using confusion matrices and manually) and computation time. It is found that the proposed ML based sentence extraction techniques and the composite feature set used with RNN as classifier outperform all other approaches. By using ANN in FF form as feature extractor, the performance of the system is evaluated and a comparison is made. Experimental results show that the application of big data samples has enhanced the learning of the ASR system. Further, the ANN based sample and feature extraction techniques are found to be efficient enough to enable application of ML techniques in big data aspects as part of ASR systems.

Entities: Species

Keywords: Artificial Neural Network (ANN); Automatic Speech Recognition (ASR); Deep Neural Network (DNN); Fully Focused Time Delay Neural Network (FFTDNN); Multi Layer Perceptron (MLP); Recurrent Neural Network (RNN)

Mesh：

Year: 2015 PMID： 26783204 DOI： 10.1016/j.neunet.2015.12.010

Source DB: PubMed Journal: Neural Netw ISSN： 0893-6080

Keyword Cloud
Cited

5 in total

1. High Precision Sea Surface Temperature Prediction of Long Period and Large Area in the Indian Ocean Based on the Temporal Convolutional Network and Internet of Things.

Authors: Tianying Sun; Yuan Feng; Chen Li; Xingzhi Zhang
Journal: Sensors (Basel) Date: 2022-02-19 Impact factor: 3.576

Review 2. A proposed artificial intelligence-based real-time speech-to-text to sign language translator for South African official languages for the COVID-19 era and beyond: In pursuit of solutions for the hearing impaired.

Authors: Milka C Madahana; Katijah Khoza-Shangase; Nomfundo Moroe; Daniel Mayombo; Otis Nyandoro; John Ekoru
Journal: S Afr J Commun Disord Date: 2022-08-19

3. Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data.

Authors: Steven T Simon; Divneet Mandair; Premanand Tiwari; Michael A Rosenberg
Journal: J Cardiovasc Pharmacol Ther Date: 2021-03-08 Impact factor: 2.457

4. Multi-Source Deep Transfer Neural Network Algorithm.

Authors: Jingmei Li; Weifei Wu; Di Xue; Peng Gao
Journal: Sensors (Basel) Date: 2019-09-16 Impact factor: 3.576

5. Assessment of a Machine Learning Model Applied to Harmonized Electronic Health Record Data for the Prediction of Incident Atrial Fibrillation.

Authors: Premanand Tiwari; Kathryn L Colborn; Derek E Smith; Fuyong Xing; Debashis Ghosh; Michael A Rosenberg
Journal: JAMA Netw Open Date: 2020-01-03

5 in total