Literature DB >> 30271809

Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM Neural Network.

Myungjong Kim1, Beiming Cao1, Ted Mau2, Jun Wang1.   

Abstract

Silent speech recognition (SSR) converts non-audio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing effective speaker-independent SSR approaches. Speaker-independent SSR approaches, however, are critical for reducing the amount of training data required from each speaker. In this paper, we investigate speaker-independent SSR from the movements of flesh points on tongue and lip with articulatory normalization methods that reduce the inter-speaker variation. To minimize the across-speaker physiological differences of the articulators, we propose Procrustes matching-based articulatory normalization by removing locational, rotational, and scaling differences. To further normalize the articulatory data, we apply feature-space maximum likelihood linear regression and i-vector. In this paper, we adopt a bidirectional long short term memory recurrent neural network (BLSTM) as an articulatory model to effectively model the articulatory movements with long-range articulatory history. A silent speech data set with flesh points was collected using an electromagnetic articulograph (EMA) from twelve healthy and two laryngectomized English speakers. Experimental results showed the effectiveness of our speaker-independent SSR approaches on healthy as well as laryngectomy speakers. In addition, BLSTM outperformed standard deep neural network. The best performance was obtained by BLSTM with all the three normalization approaches combined.

Entities:  

Keywords:  Articulatory normalization; Procrustes matching; long short term memory; silent speech recognition

Year:  2017        PMID: 30271809      PMCID: PMC6154510          DOI: 10.1109/TASLP.2017.2758999

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  13 in total

1.  Vowel posture normalization.

Authors:  M Hashi; J R Westbury; K Honda
Journal:  J Acoust Soc Am       Date:  1998-10       Impact factor: 1.840

2.  An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification.

Authors:  Jun Wang; Ashok Samal; Panying Rong; Jordan R Green
Journal:  J Speech Lang Hear Res       Date:  2016-02       Impact factor: 2.297

3.  Reconstructing the voice of an individual following laryngectomy.

Authors:  Zahoor Ahmad Khan; Phil Green; Sarah Creer; Stuart Cunningham
Journal:  Augment Altern Commun       Date:  2011-02-02       Impact factor: 2.214

4.  Accuracy of the NDI wave speech research system.

Authors:  Jeffrey J Berry
Journal:  J Speech Lang Hear Res       Date:  2011-04-15       Impact factor: 2.297

5.  Development of a (silent) speech recognition system for patients following laryngectomy.

Authors:  M J Fagan; S R Ell; J M Gilbert; E Sarrazin; P M Chapman
Journal:  Med Eng Phys       Date:  2007-06-27       Impact factor: 2.242

6.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

7.  Individual differences in vowel production.

Authors:  K Johnson; P Ladefoged; M Lindau
Journal:  J Acoust Soc Am       Date:  1993-08       Impact factor: 1.840

8.  Articulatory distinctiveness of vowels and consonants: a data-driven approach.

Authors:  Jun Wang; Jordan R Green; Ashok Samal; Yana Yunusova
Journal:  J Speech Lang Hear Res       Date:  2013-07-09       Impact factor: 2.297

Review 9.  Electrolarynx in voice rehabilitation.

Authors:  Hanjun Liu; Manwa L Ng
Journal:  Auris Nasus Larynx       Date:  2007-01-18       Impact factor: 1.863

10.  Modulating phonation through alteration of vocal fold medial surface contour.

Authors:  Ted Mau; Joseph Muhlestein; Sean Callahan; Roger W Chan
Journal:  Laryngoscope       Date:  2012-08-01       Impact factor: 3.325

View more
  9 in total

1.  Inertial Measurements for Tongue Motion Tracking Based on Magnetic Localization with Orientation Compensation.

Authors:  Nordine Sebkhi; Arpan Bhavsar; David V Anderson; Jun Wang; Omer T Inan
Journal:  IEEE Sens J       Date:  2020-12-22       Impact factor: 3.301

2.  Surface Electromyography-Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech.

Authors:  Jennifer M Vojtech; Michael D Chan; Bhawna Shiwani; Serge H Roy; James T Heaton; Geoffrey S Meltzner; Paola Contessa; Gianluca De Luca; Rupal Patel; Joshua C Kline
Journal:  J Speech Lang Hear Res       Date:  2021-05-12       Impact factor: 2.297

Review 3.  Current Role of Total Laryngectomy in the Era of Organ Preservation.

Authors:  Alexandre Bozec; Dorian Culié; Gilles Poissonnet; Olivier Dassonville
Journal:  Cancers (Basel)       Date:  2020-03-03       Impact factor: 6.639

4.  A Portable Sign Language Collection and Translation Platform with Smart Watches Using a BLSTM-Based Multi-Feature Framework.

Authors:  Zhenxing Zhou; Vincent W L Tam; Edmund Y Lam
Journal:  Micromachines (Basel)       Date:  2022-02-20       Impact factor: 2.891

5.  A novel silent speech recognition approach based on parallel inception convolutional neural network and Mel frequency spectral coefficient.

Authors:  Jinghan Wu; Yakun Zhang; Liang Xie; Ye Yan; Xu Zhang; Shuang Liu; Xingwei An; Erwei Yin; Dong Ming
Journal:  Front Neurorobot       Date:  2022-09-02       Impact factor: 3.493

6.  Human-computer interactive physical education teaching method based on speech recognition engine technology.

Authors:  Yunpeng Sang; Xingquan Chen
Journal:  Front Public Health       Date:  2022-07-18

7.  Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis.

Authors:  Beiming Cao; Alan Wisler; Jun Wang
Journal:  Sensors (Basel)       Date:  2022-08-13       Impact factor: 3.847

8.  Comparison of ARIMA and LSTM in Forecasting the Incidence of HFMD Combined and Uncombined with Exogenous Meteorological Variables in Ningbo, China.

Authors:  Rui Zhang; Zhen Guo; Yujie Meng; Songwang Wang; Shaoqiong Li; Ran Niu; Yu Wang; Qing Guo; Yonghong Li
Journal:  Int J Environ Res Public Health       Date:  2021-06-07       Impact factor: 3.390

9.  Silent speech command word recognition using stepped frequency continuous wave radar.

Authors:  Christoph Wagner; Petr Schaffer; Pouriya Amini Digehsara; Michael Bärhold; Dirk Plettemeier; Peter Birkholz
Journal:  Sci Rep       Date:  2022-03-09       Impact factor: 4.379

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.