| Literature DB >> 36119717 |
Jinghan Wu1,2, Yakun Zhang2,3, Liang Xie2,3, Ye Yan1,2,3, Xu Zhang4, Shuang Liu1, Xingwei An1, Erwei Yin1,2,3, Dong Ming1.
Abstract
Silent speech recognition breaks the limitations of automatic speech recognition when acoustic signals cannot be produced or captured clearly, but still has a long way to go before being ready for any real-life applications. To address this issue, we propose a novel silent speech recognition framework based on surface electromyography (sEMG) signals. In our approach, a new deep learning architecture Parallel Inception Convolutional Neural Network (PICNN) is proposed and implemented in our silent speech recognition system, with six inception modules processing six channels of sEMG data, separately and simultaneously. Meanwhile, Mel Frequency Spectral Coefficients (MFSCs) are employed to extract speech-related sEMG features for the first time. We further design and generate a 100-class dataset containing daily life assistance demands for the elderly and disabled individuals. The experimental results obtained from 28 subjects confirm that our silent speech recognition method outperforms state-of-the-art machine learning algorithms and deep learning architectures, achieving the best recognition accuracy of 90.76%. With sEMG data collected from four new subjects, efficient steps of subject-based transfer learning are conducted to further improve the cross-subject recognition ability of the proposed model. Promising results prove that our sEMG-based silent speech recognition system could have high recognition accuracy and steady performance in practical applications.Entities:
Keywords: Mel frequency spectral coefficient; convolutional neural network; silent speech recognition; subject-based transfer learning; surface electromyography (sEMG)
Year: 2022 PMID: 36119717 PMCID: PMC9478652 DOI: 10.3389/fnbot.2022.971446
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 3.493
Examples of the utterances in corpus.
|
|
|
|
|
|
|---|---|---|---|---|
| Physiology | 8 | 我要上厕所 | wo3yao4shang4ce4suo3 | I'm going to the toilet |
| Safety | 28 | 紧急呼救 | jin3ji2hu1jiu4 | Emergency |
| Social Interaction | 65 | 我要发短信 | wo3yao4fa1duan3xin4 | I want to send a text message |
| Self-respect and Fulfillment | 77 | 我能行的 | wo3neng2xing2de5 | I can do it |
| Entertainment | 89 | 我要看电视 | wo3yao4kan4dian4shi4 | I want to watch TV |
Figure 1Positions of paired electrodes adhered on subject's face and neck for data acquisition.
Figure 2Data collection experiment timeline for one session.
Figure 3Architecture of proposed PICNN model.
Figure 4Structure of inception model used in this paper.
Figure 5Diagram of proposed sEMG-based silent speech recognition system.
Classification accuracy (%) for different feature extraction methods and classifiers.
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
| LDA | 19.11 | 16.65 | 18.71 | 19.36 | 25.04 | 24.12 | 16.15 | 12.96 | 7.34 |
| RF | 42.31 | 40.44 | 40.87 | 47.35 | 44.27 | 43.41 | 39.17 | 24.41 | 29.90 |
| SVM | 53.22 | 46.51 | 51.46 | 58.73 | 54.10 | 51.67 | 45.42 | 28.54 | 39.42 |
| CNN | 60.36 | 57.14 | 58.56 | 67.29 | 68.29 | 68.87 | 64.33 | 80.93 | 87.34 |
| Inception | 66.83 | 63.03 | 63.66 | 72.08 | 70.32 | 69.89 | 71.31 | 81.36 | 89.80 |
| PICNN | 68.36 | 66.01 | 67.79 | 74.95 | 71.83 | 72.90 | 73.31 | f82.67 | 90.76 |
Recognition rate analysis for different models.
|
|
|
|
|
|
|---|---|---|---|---|
| Minimum | 0.35 | 0.59 | 0.61 | 0.74 |
| Mean | 0.61 | 0.87 | 0.89 | 0.90 |
| Standard Deviation | 0.150 | 0.076 | 0.071 | 0.054 |
Specific information of the 10 classes of demands with the lowest recognition rate.
|
|
|
|
|---|---|---|
| 38 | 98 (1), 82 (2), 33 (2), 30 (1), 24 (2), 13 (1), 3 (1), 0 (1) | 0.83 |
| 77 | 82 (1), 48 (1), 31 (1), 13 (1), 11 (1), 0 (1) | 0.81 |
| 79 | 0.81 | |
| 85 | 93 (1), 80 (1), 79 (2), 77 (1), 30 (1) | 0.81 |
| 92 | 97 (2), 68 (1), 66 (2), 25 (1), 18(1), 6 (1) | 0.81 |
| 80 | 98 (1), 90 (2), 81 (4), 23 (1), 16 (1) | 0.79 |
| 89 | 97 (2), 95 (2), 92 (2), 88 (1), 83 (1), 77 (1), 69 (1), 68 (2), 66 (4), 29 (1), 6 (1), 4 (1) | 0.79 |
| 69 | 97 (1), 92 (2), 84 (1), 77 (1), 74 (1), 59 (1), 33 (1), 23 (4), 4 (1), 0 (1) | 0.76 |
| 4 | 71 (1), 69 (1), 42 (1), 39 (1), 36 (2), 30 (1), 15 (1), 11 (2) | 0.75 |
| 12 | 92 (1), 91 (1), 82 (1), 81 (1), 77 (1), 69 (1), 60 (1), 38 (1), 32 (1), 31 (1), 13 (1), 11(1), 5 (1), 3 (2), 1 (1) | 0.74 |
Phonetic information for utterances labeled 79, 85, and 12.
|
|
|
|
|
|---|---|---|---|
| 79 | 我要剪头发 | wo3yao4jian3tou2fa4 | I want to cut my hair |
| 85 | 我要洗头发 | wo3yao4xi3tou2fa4 | I want to wash my hair |
| 12 | 我有点冷 | wo3you3dian3leng3 | I'm a little bit cold |
Figure 6Subject-based transfer learning process. The logarithmic trendline of classification accuracy for each subject was drawn based on the PICNN model.