| Literature DB >> 30376567 |
Yu Hu1, Yongkang Wong2, Wentao Wei1, Yu Du1, Mohan Kankanhalli3, Weidong Geng1.
Abstract
The surface electromyography (sEMG)-based gesture recognition with deep learning approach plays an increasingly important role in human-computer interaction. Existing deep learning architectures are mainly based on Convolutional Neural Network (CNN) architecture which captures spatial information of electromyogram signal. Motivated by the sequential nature of electromyogram signal, we propose an attention-based hybrid CNN and RNN (CNN-RNN) architecture to better capture temporal properties of electromyogram signal for gesture recognition problem. Moreover, we present a new sEMG image representation method based on a traditional feature vector which enables deep learning architectures to extract implicit correlations between different channels for sparse multi-channel electromyogram signal. Extensive experiments on five sEMG benchmark databases show that the proposed method outperforms all reported state-of-the-art methods on both sparse multi-channel and high-density sEMG databases. To compare with the existing works, we set the window length to 200ms for NinaProDB1 and NinaProDB2, and 150ms for BioPatRec sub-database, CapgMyo sub-database, and csl-hdemg databases. The recognition accuracies of the aforementioned benchmark databases are 87.0%, 82.2%, 94.1%, 99.7% and 94.5%, which are 9.2%, 3.5%, 1.2%, 0.2% and 5.2% higher than the state-of-the-art performance, respectively.Entities:
Mesh:
Year: 2018 PMID: 30376567 PMCID: PMC6207326 DOI: 10.1371/journal.pone.0206049
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Proposed attention-based hybrid CNN-RNN architecture for sEMG-based gesture recognition.
The layers configuration of proposed attention-based hybrid CNN-RNN.
| Layers | Name | Configurations | Modules |
|---|---|---|---|
| 1 | 64 kernels, kernel size (3 × 3) | CNN | |
| 2 | 64 kernels, kernel size (3 × 3) | ||
| 3 | 64 kernels | ||
| 4 | 64 kernels | ||
| 5 | 512 outputs | ||
| 6 | 512 outputs | ||
| 7 | 128 outputs | ||
| 8 | LSTM, 512 hidden unit outputs | RNN | |
| 9 | Attention | ||
| 10 | Classification | ||
| 11 |
Fig 2Six raw signal based sEMG image representation methods.
Comparison of gesture recognition accuracy with various image representation methods on NinaProDB1.
Here, we employ the GengNet [26], and the sliding window length is fixed at 200ms for all experiments.
| sEMG Image | Classification Accuracy |
|---|---|
| raw-image1 | 83.5% |
| raw-image2 | 82.9% |
| signal-image1 | 84.9% |
| signal-image2 | 79.8% |
| activity-image1 | 78.1% |
| activity-image2 | 72.8% |
| feature-signal-image1 |
Details of five sEMG benchmark databases.
| Database | Subjects | Gestures | Sessions | Trials | Number of electrodes | Sampling rate (Hz) |
|---|---|---|---|---|---|---|
| NinaproDB1 | 27 | 52 | 1 | 10 | 10 | 100 |
| NinaProDB2 | 40 | 50 | 1 | 6 | 12 | 2000 |
| BioPatRec26MOV | 17 | 26 | 1 | 3 | 8 | 2000 |
| CapgMyo-DBa | 18 | 8 | 1 | 10 | 128 | 1000 |
| csl-hdemg | 5 | 27 | 5 | 10 | 192 | 2048 |
Classification accuracy of the proposed method and previous works.
| NinaProDB1 | NinaProDB2 | BioPatRec26MOV | CapgMyo-DBa | csl-hdemg | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 150ms | 200ms | Trial | 200ms | Trial | 50ms | 150ms | Trial | 40ms | 150ms | Trial | 150ms | 170ms | Trial | |
| Feature-LDA [ | - | - | - | - | - | 86.3% | 92.9% | - | - | 99.0% | - | - | - | - |
| Traditional-RF [ | - | 75.3% | - | - | - | - | - | - | - | - | - | - | - | |
| AtzoriNet [ | - | 66.6% | - | - | - | - | - | - | - | - | - | - | - | |
| GengNet [ | - | 77.8% | 96.7% | - | - | - | - | - | 99.0% | 99.5% | - | 89.3% | 90.4% | 96.8% |
| ZhaiNet [ | - | - | - | 78.71% | - | - | - | - | - | - | - | - | - | - |
| RNN Module with raw-signal | 78.1% | 79.8% | 95.0% | Did not converge | 76.4% | 82.3% | 92.6% | 71.8% | 80.4% | 90.4% | 65.3% | 71.1% | 75.8% | |
| CNN Module with raw-image1 | 82.6% | 83.5% | 96.5% | 73.4% | 97.6% | 82.1% | 83.9% | 92.2% | 98.0% | 97.7% | 98.9% | 92.0% | 92.1% | 95.2% |
| CNN Module with feature-signal-image1 | 85.4% | 86.3% | 97.2% | 81.4% | 97.5% | 85.2% | 90.0% | 95.8% | - | - | - | - | - | - |
| Hybrid CNN-RNN with raw-image1 | 83.5% | 84.7% | 96.5% | 74.6% | 97.7% | 88.5% | 92.2% | 96.8% | 99.1% | 99.6% | 99.9% | 94.3% | 94.8% | 96.1% |
| Hybrid CNN-RNN with feature-signal-image1 | 86.4% | 86.7% | 97.1% | 82.0% | 97.5% | 89.9% | 93.9% | 97.5% | - | - | - | - | - | - |
| Attention-based hybrid CNN-RNN with raw-image1 | 83.7% | 84.8% | 96.5% | 74.8% | 97.6% | 88.7% | 92.5% | 96.8% | ||||||
| Attention-based hybrid CNN-RNN with feature-signal-image1 | - | - | - | - | - | - | ||||||||
Recognition time of each sample on five benchmark databases with attention-based hybrid CNN-RNN architecture.
The recognition window length is 200ms for NinaProDB1 and NinaProDB2, 150ms for BioPatRec26MOV, CapgMyo-DBa and csl-hdemg.
| NinaProDB1 | NinaProDB2 | BioPatRec26MOV | CapgMyo-DBa | csl-hdemg | |
|---|---|---|---|---|---|
| GPU | 3.0ms | 3.6ms | 4.1ms | 7.8ms | 6.0ms |
| CPU | 106ms | 140ms | 107ms | 258ms | 327ms |
Fig 3Classification accuracy of RNN module with raw-signal, CNN module hybrid CNN-RNN and attention-based hybrid CNN-RNN architectures with raw-image1 on five benchmark databases.
Fig 4Classification accuracy of attention-based hybrid CNN-RNN architecture with with different numbers of subsegments on NinaProDB1.
Fig 5Classification accuracy of CNN module, hybrid CNN-RNN and attention-based hybrid CNN-RNN architectures with three sEMG image representation methods on three sparse multi-channel benchmark databases.
Classification accuracy of different image representation methods on NinaProDB1.
We use the same sliding window length (200ms) for all experiments mentioned bellow.
| CNN Module | hybrid CNN-RNN | Attention-based hybrid CNN-RNN | |
|---|---|---|---|
| raw-image1 | 83.5% | 84.7% | 84.8% |
| raw-image2 | 82.9% | 80.8% | 80.9% |
| signal-image1 | 84.9% | 85.6% | 85.9% |
| signal-image2 | 79.8% | 81.6% | 82.0% |
| activity-image1 | 78.1% | 78.8% | 79.1% |
| activity-image2 | 72.8% | 74.1% | 74.5% |
| feature-signal-image1 | |||
| feature-signal-image2 | 83.1% | 84.5% | 84.7% |