| Literature DB >> 32764286 |
José Jair Alves Mendes Junior1, Melissa La Banca Freitas2, Daniel Prado Campos3, Felipe Adalberto Farinelli1, Sergio Luiz Stevan2, Sérgio Francisco Pichorim1.
Abstract
Sign Language recognition systems aid communication among deaf people, hearing impaired people, and speakers. One of the types of signals that has seen increased studies and that can be used as input for these systems is surface electromyography (sEMG). This work presents the recognition of a set of alphabet gestures from Brazilian Sign Language (Libras) using sEMG acquired from an armband. Only sEMG signals were used as input. Signals from 12 subjects were acquired using a MyoTM armband for the 26 signs of the Libras alphabet. Additionally, as the sEMG has several signal processing parameters, the influence of segmentation, feature extraction, and classification was considered at each step of the pattern recognition. In segmentation, window length and the presence of four levels of overlap rates were analyzed, as well as the contribution of each feature, the literature feature sets, and new feature sets proposed for different classifiers. We found that the overlap rate had a high influence on this task. Accuracies in the order of 99% were achieved for the following factors: segments of 1.75 s with a 12.5% overlap rate; the proposed set of four features; and random forest (RF) classifiers.Entities:
Keywords: feature extraction; machine learning; pattern recognition; sign language; signal segmentation; surface electromyography; wearable device
Mesh:
Year: 2020 PMID: 32764286 PMCID: PMC7471999 DOI: 10.3390/s20164359
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Alphabet set from Brazilian Sign Language (Libras). The set is composed of 26 letters, including 20 static gestures and six dynamic gestures.
Figure 2(a) Commercial MyoTM device used for data acquisition with the numbers of its respective channels. (b) Placement of the armband on the subjects’ forearm. The reference (channel 3) was placed on the flexor carpi ulnaris to make all acquisitions uniform.
Figure 3Sequence for sEMG signals acquisition. When the acquisition started, a light and a buzzer indicated that the subject should perform the gesture. After the 1.3 s, the light indicator turned off and the buzzer indicated to the subject to rest. After 1.3 s, both indicators turned on, repeating this process until the acquisition was completed.
Time-domain and frequency-domain features extracted from this work.
| Domain | Feature | Feature Name | Parameters |
|---|---|---|---|
| Time | AR4 | 4th-order Autoregressive Coefficients | - |
| CEPS | Cepstral Coefficients | 4th-order | |
| DASDV | Difference Absolute Standard Deviation Value | - | |
| HIST | Histogram | 9 Bins | |
| IEMG | Integral of EMG | - | |
| LOGDEC | Log Detector | - | |
| LS | L-Scale | Two Moment | |
| MAV | Mean Absolute Value | - | |
| MAV1 and MAV2 | Modified Mean Absolute Value | - | |
| MFL | Maximum Fractal Length | - | |
| MSR | Mean Square Root | - | |
| MYOP | Myopulse percentage rate | Threshold = 10−2 | |
| RMS | Root Mean Square | - | |
| SampEn | Sample Entropy | Dimension = 2 | |
| SSC | Sign Slope Change | Slope threshold = 10−4 | |
| TM3, TM4, and TM5 | Absolute Value of 3rd, 4th and 5th Moments | - | |
| VAREMG | Variance | - | |
| VORDER | V-Order | 3 Order | |
| WAMP | Willison Amplitude | Threshold = 10−2 | |
| WL | Waveform Length | - | |
| ZC | Zero Crossing | Amplitude threshold = 10−2 | |
| Frequency | FR | Frequency Ratio | Low frequencies = 10–50 Hz |
| MDF | Median Frequency | - | |
| MNF | Mean Frequency | - | |
| MNP | Mean Power Spectrum | - | |
| PKF | Peak Frequency | - | |
| SM1, SM2, and SM3 | Spectral Momentum | - | |
| TTP | Total Power Spectrum | - |
Feature Sets organized and applied in this work.
| Feature Set | Features |
|---|---|
| G1 | Hudgins et al. (1993) set: MAV, WL, ZC, and SSC [ |
| G2 | Liu et al. (2007) set: AR4 and HIST [ |
| G3 | Most repeated features in |
| G4 | TD4 set: MFL, MSR, WAMP, and LS [ |
| G5 | TD9 set: LS, MFL, MSF, WAMP, ZC, RMS, IAV, DASDV, and VAREMG [ |
| G6 | Best features performed individually |
| G7 | Best time-domain features in G6 |
| G8 | Best frequency-domain features in G6 |
| G9 | Reduced feature set from G6 with relevance in accuracy based on statistical analysis |
Classifiers applied in this work and their parameterization.
| Classifier | Parameters |
|---|---|
| 1–nearest neighbor | |
| Linear Discriminant Analysis (LDA) | - |
| Naïve Bayes (NB) | Normal distribution |
| Multi-Layer Perceptron (MLP) | 30 neurons in hidden layer |
| Quadratic Discriminant Analysis (QDA) | - |
| Random Forest (RF) | 30 trees |
| Extreme Learning Machine (ELM) | 1000 neurons in the hidden layer |
| Support Vector Machine with Linear Discrimination (SVMLin) | C = 100 |
| Support Vector Machine with Radial Basis Function Discrimination (SVMRBF) | C = 10 and Gaussian size = 1 |
Figure 4Steps in data processing methodology.
Figure 5Accuracies for each extracted feature for all the classifiers considered in this work. Their abbreviations are listed in Table 1 and in Abbreviations section. As segmentation parameters, a window of 1 s and an overlap fraction of 1 (100%, a disruptive window without overlapping) were considered. The mean for each feature is presented by the dashed lines. The features with high accuracies were used for the creation of the feature sets G6 to G9.
Figure 6The selection process of the features from G6 to G9. The boxplot in (a) presents the distribution of accuracies increasing the number of features of G6 (from the highest to the smallest hit rate obtained separately) for the best classifiers (ELM, KNN, RF, and SVMRBF). The bars represent the 25th and 75th percentiles; the whiskers represent approximately 99.3% of the coverage of distribution, and the central mark represent the medians. (b) Distributions for the Tukey post-hoc from Friedman statistical test to evaluate the distribution of results with a confidence interval of 95% (p-value < 0.05) when increasing of the number of features in the classification process. The highlight indicates the distribution that is related to high accuracy obtained with a reduced number of features in (a) that corresponds statistically with the several distributions as demonstrated in (b).
Figure 7(a) Accuracies for the selected features sets (G1 to G9) for the classifiers that presented the highest accuracies in the previous analysis. (b) Distributions for the Tukey post-hoc from Friedman statistical test to evaluate the distribution of results with a confidence interval of 95% (p-value < 0.05). Two sets of ranks can be observed, separating G1 to G3 and G4 to G9. One can note that the feature sets G4, G5, G6, G8, and G9 have similar distributions.
Figure 8Results for the segmentation analysis. The color bar indicates the accuracy for each classifier on the influence of feature sets and segmentation parameters. The high accuracies are concentrated for large window length and small overlap rate.
Figure 9Critical distance diagram (CD) obtained from Friedman test and Nemenyi post-hoc test. The test was performed for (a) window length and (b) overlap rate variation. On the left, the best average rank performances are indicated. The critical difference attributes are denoted by the lines that connect the distributions. They indicate where there are no statistical differences. It can be noted that a window length of 1.75 s presented similar distributions to 2.25 and 2 s in several classifiers. Even the 12.5% overlap rate showed the best ranks, and some distributions were highly related to values of 25%.
Accuracies from the classifiers with similar accuracies in segmentation analysis (KNN, RF, and SVMRBF). The range of 1.25 to 2.25 s for window length and 25 and 12.5 overlap rates were considered due to the similar results reached in the segmentation influence analysis.
| Accuracy (%) | ||||||
|---|---|---|---|---|---|---|
| Window Length | G4 | |||||
| KNN | RF | SVMRBF | ||||
| 25 | 12.5 | 25 | 12.5 | 25 | 12.5 | |
| 1.25 s | 92.5 ab | 98 ab | 89.3 ab | 94.9 ab | 94.2 ab | 98.3 abc |
| 1.5 s | 95.1 ab | 99 abc | 93.8 abc | 97.5 abc | 96.4 abc | 99.1 abcd |
| 1.75 s | 97.8 abc | 99.7 bcd | 97.2 cd | 99 bcd | 98 cd | 99.5 abcd |
| 2 s | 99.8 abceF | 99.9 cdeF | 98.9 cde | 99.6 de | 99.3 cde | 99.8 bcde |
| 2.25 s | 99.9 ceF | 99.9 cdeF | 99.9 deF | 99.9 deF | 99.8 de | 99.9 de |
|
| ||||||
| 1.25 s | 93.2 ab | 97 ab | 90.3 ab | 95.3 ab | 93.8 ab | 97.7 ab |
| 1.5 s | 95.8 abc | 98.8 abcd | 94.3 abc | 97.7 abcd | 96.3 abc | 99 abc |
| 1.75 s | 97.5 bdc | 99.4 bcde | 96.9 bcd | 98.7 bcde | 97.9 bcd | 99.4 bcd |
| 2 s | 98.8 cde | 99.5 bcde | 99.2 cde | 98.2 bcde | 98.9 cde | 99.6 cde |
| 2.25 s | 99.9 deF | 99.9 cdeF | 99.9 deF | 99.9 cdeF | 99.7 de | 99.9 de |
a,b,c,d,e Equivalent statistical distribution for each window length in its respective column (same overlap rate) from Tukey post-hoc from Friedman test (p > 0.05): 1.25 (a), 1.5 (b), 1.75 (c), 2 (d), and 2.25 (e) seconds. F Equivalent statistical distribution for each 25 and 12.5 overlap rate pair in its respective row according to Wilcoxon test (p > 0.05).
Figure 10Distribution of accuracies considering the data for each subject individually in the test and training steps. (a) Distributions concerning overlap rate and (b) distributions concerning classifiers. The bars represent the 25th and 75th percentiles; the whiskers represent approximately 99.3% the coverage of distribution, and the crosses (+) are the outliers.
Figure 11Hit rates obtained for acquisition trial validation. (a) Accuracies obtained for all subjects considering the variation of signal overlap related to classifiers. (b) Distribution of accuracies for an overlap rate of 12.5% (the best value observed in (a) for each subject and classifier). The bars represent the 25th and 75th percentiles; the whiskers represent approximately 99.3% of the coverage of distribution, and the crosses (+) are the outliers.
Figure 12Confusion matrix for random forest classifier for the classes for all subjects using the validation by the acquisition trials, highlighting the influence of overlap rates of 100% (a), 50% (b), 25% (c), and 12.5% (d). The color bar represents the accuracy obtained in each case and the white cells represent values less than 0.3% of misclassification samples.
Sign Language recognition (SLR) systems with sEMG as input and their main characteristics.
| Work | SLR | Signals/ | sEMG Channels and/Other Sensors | Window | Features | Classifier | Results |
|---|---|---|---|---|---|---|---|
| [ | ASL | 9/- | 2 | 512 samples/Y | IAV, ZC, DAMV, AR, CEPS, Mean frequency | DA | 97% |
| [ | GSL | 60/3 | 5/Accelerometer | -/- | Sample Entropy | DA | 93% |
| [ | CSL | 72/2 | 5/Accelerometer | 100 ms/N | MAV, AR | DT, HMM | 98% |
| [ | CSL | 223/5 | 4/Accelerometer | 64 points/Y | MAV, AR | DTW, HMM | 92% to 96% |
| [ | ASL | 40/4 | 4/Accelerometer | 128 ms/N | MAV, AR, HIST, RMS, | NB, KNN, DT, SVM | 96% |
| [ | CSL | 121/5 | 4/Accelerometer | 128 ms/Y | MAV, AR, Mean, VAREMG, Linear Prediction Coefficients | RF, HMM | 98% |
| [ | ASL | 80/4 | 4/ | 128 ms/N | MAV, AR, HIST, RMS, Reflection Coefficients, VAREMG, WAMP, MDF, Modify MNF | NB, KNN, DT, SVM | 96% |
| [ | Libras | 20/- | 8 (Myo armband) | -/- | Mean | SVM | 41% |
| [ | KSL | 30/6 | 8 (Myo armband)/ | 200 ms/N | Raw signal | CNN | 98% |
| [ | CSL | 18/8 | 5/ | 176 ms/Y | MAV, ZC, SSC, and WL | LDA | 91% |
| [ | Libras | 20/1 | 8 (Myo armband) | 750 ms/N | IEMG, MAV, RMS, LOGDEC, ZC, SSC, MNF, PKF, MNP, SM0, SM1 | MLP | 81% |
| This | Libras | 26/12 | 8 (Myo armband) | Window Length: | AR4, CEPS, DASDV, HIST, IEMG, LOGDEC, LS, MAV, MAV1, MAV2, MFL, MSR, MYOP, RMS, SampEn, SSC, TM3, TM4, TM5, VAREMG, VORDER, WAMP, WL, ZC, FR, MDF, MNF, PKF, SM1, SM2, SM3, TTP | KNN, | 99% |