Literature DB >> 33037253

Myoelectric digit action decoding with multi-output, multi-class classification: an offline analysis.

Agamemnon Krasoulis¹, Kianoush Nazarpour^2,3.

Abstract

The ultimate goal of machine learning-based myoelectric control is simultaneous and independent control of multiple degrees of freedom (DOFs), including wrist and digit artificial joints. For prosthetic finger control, regression-based methods are typically used to reconstruct position/velocity trajectories from surface electromyogram (EMG) signals. Unfortunately, such methods have thus far met with limited success. In this work, we propose action decoding, a paradigm-shifting approach for independent, multi-digit movement intent prediction based on multi-output, multi-class classification. At each moment in time, our algorithm decodes movement intent for each available DOF into one of three classes: open, close, or stall (i.e., no movement). Despite using a classifier as the decoder, arbitrary hand postures are possible with our approach. We analyse a public dataset previously recorded and published by us, comprising measurements from 10 able-bodied and two transradial amputee participants. We demonstrate the feasibility of using our proposed action decoding paradigm to predict movement action for all five digits as well as rotation of the thumb. We perform a systematic offline analysis by investigating the effect of various algorithmic parameters on decoding performance, such as feature selection and choice of classification algorithm and multi-output strategy. The outcomes of the offline analysis presented in this study will be used to inform the real-time implementation of our algorithm. In the future, we will further evaluate its efficacy with real-time control experiments involving upper-limb amputees.

Entities: Chemical Disease Species

Mesh：

Year: 2020 PMID： 33037253 PMCID： PMC7547112 DOI： 10.1038/s41598-020-72574-7

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Upper-limb loss can negatively impact an affected individual’s ability to perform activities of daily living. To mitigate this effect, prosthetic devices have historically aimed at restoring the appearance and basic functionality of a missing limb using artificial components. The advancement of robotics research in the last decades has led to the advent of upper-limb prostheses with highly-sophisticated mechanical capabilities. Sensing technologies and control algorithms, however, have not kept pace; as a result, they currently impose a bottleneck on the control dexterity enjoyed by prosthesis users[1-3]. Prosthetic hands are typically controlled using muscle activity signals, called electromyograms (EMGs), recorded on the skin surface. Traditionally, clinical solutions have deployed simple, amplitude-based control algorithms which rely on monitoring the activity of a pair of antagonist remnant muscles. The amplitude of each recorded muscle is usually linked to the activation of a specific function, for example, wrist rotation or hand opening/closing. To access a different function, the user has to switch between the available modes by using a trigger signal, such as muscle co-contraction[2]. Although this algorithm has proven robust, it results in limited control and can also be non-intuitive and cumbersome for the end-user. Unfortunately, this leads to an increased prosthesis rejection rate[4]. To improve control dexterity, machine learning algorithms can be used to infer movement intent from EMG recordings. Typically, a classifier is used to map features extracted from multiple EMG channels onto a discrete output variable encoding grasp type and/or other prosthesis functions. This paradigm has been highly-successful and, in the last decade, has found its way towards commercial adoption[3]. One caveat of this approach is that it can only support a single function activation at a time. That is, to achieve outcomes that require activating more than one prosthesis functions, for example, wrist rotation and hand closing. the user has to trigger a sequence of commands. This results in sub-optimal and unnatural control. To address this issue, simultaneous (i.e., multi-output) classification has been proposed as a means of decoding multiple hand and wrist functions together, hence resulting in greater flexibility and dexterity[5-10]. Intuitive selection and activation of grasp and wrist functions can be greatly beneficial for the user in performing activities of daily living. Yet, this paradigm results in severe prosthesis under-actuation and can thus offer much less functionality and dexterity than a natural hand. From a technical perspective, the ultimate goal of the myoelectric control field is to approximate this dexterity via simultaneous and independent control of multiple degrees of freedom (DOFs) in a continuous space. To that end, several groups, including us, have used regression-based methods to reconstruct wrist kinematic trajectories[11-15], finger positions[16-22] and velocities[23, 24], as well as fingertip forces[25-28]. Only a few studies have, however, thus far demonstrated the feasibility of real-time prosthetic finger control in amputee users[17, 21, 22]. This indicates that independent prosthetic digit control is indeed a challenging problem, which calls for new and more efficient methods for tackling it. In this work, we propose a novel approach for simultaneous and independent control of prosthetic digits. Our algorithm is based on multi-output, multi-class classification. This is in contrast with previous work in this area, which has focused on the use of multi-output regression algorithms to achieve the same goal. At each time step, our algorithm uses surface EMG features to decode movement intent for each available DOF into one of three classes: open, close, or stall (i.e., no movement). Our motivation is to replace the multi-output regressor with a multi-output classifier with the aim of simplifying the decoding part. This is achieved by using discrete as opposed to real-valued (i.e., continuous) targets (i.e., outputs). We term our approach action decoding, since it is based on predicting digit actions rather than positions and/or velocities. A schematic of the approach is shown in Fig. 1.

Figure 1

Action decoding paradigm. Multi-channel raw EMG measurements are pre-processed and fed as inputs into a multi-output classifier. The classifier has six outputs corresponding to the following DOFs: thumb rotation and flexion/extension of the thumb, index, middle, ring, and little digits. For each DOF, the algorithm classifies movement intent into one of three actions: open, close, or stall (i.e., no movement). Predictions can then be used to control the digits of a prosthesis using discrete actions. Any type of multi-output, multi-class classifier can be used as the decoder. We have previously evaluated the proposed action controller in a robotic hand tele-operation task with a data glove and found that it can achieve comparable performance to digit position (i.e., joint angle) control[29]. Here, we provide a first implementation of the method in the context of myoelectric decoding. We demonstrate the feasibility of using surface EMG measurements to decode digit actions. We also perform a systematic offline analysis investigating several aspects of the method, including feature selection and choice of classifier and multi-output strategy. With regard to the latter, we evaluate the efficacy of a state-of-the-art method for multi-output classification, namely, classifier chains (CC), which takes output dependencies into account when making predictions. The outcomes of our analysis are used to inform the real-time implementation of the algorithm, which we present in a separate study[30].

Results

We analysed data from 10 able-bodied and two transradial amputee subjects. We extracted features from 16 surface EMG channels and used them to decode digit movement (i.e., actions) for the following DOFs: thumb rotation and flexion/extension of the thumb, index, middle, ring, and little digits. As a first step, we performed an exhaustive feature analysis including a large number of commonly used time-domain EMG features. For each subject, features were ranked using a forward feature selection algorithm and we computed mean ranks for all features. The results of this analysis are presented in Fig. 2a. The highest-performing feature was Wilson amplitude (WAmp), followed by log-variance (LogVar) and Hjorth (Hjorth) parameters. Fig. 2b shows average classification performance by means of F1-score for an increasing number of added features. We observed a plateau in performance after including six to eight features. Based on this observation, we selected the seven most highly-ranked features for the rest of our analysis: WAmp, LogVar, Hjorth, kurtosis (Kurt), auto-regressive (AR) coefficients, waveform length (WL), and skewness (Skew).

Figure 2

Feature analysis. (a) Average ranking for individual features using the sequential forward selection method. The procedure was run independently for each participant and average rankings were computed (). Lower ranks indicate larger feature importance. (b) Performance as a function of the number of features used for decoding. Higher F1-scores indicate better performance. Points, means; error bars, standard errors estimated via bootstrapping (1000 iterations). Next, we investigated the potential of using CC and ensembles of CC to improve classification performance by exploiting output dependencies. The results of this analysis are presented in Fig. 3a. For each participant, we report the performance of the best- and worst-performing CC, as well as average performance of ensemble CC consisting of 10 chains with random label orders. We compare the performance of CCs and ensemble CC to that of multiple independent classifiers. In all cases, we used linear discriminant analysis (LDA) models as the base classifiers for the CC. We use F1-score (see Methods section) throughout as our main performance measure, unless noted otherwise.

Figure 3

Performance comparisons. (a) Comparison of multi-output classification strategies using an LDA base classifier. CC best and CC worst correspond to best- and worst-performing single CC models, respectively. For each participant, 100 random chains were generated and evaluated. Ensemble CC corresponds to a model with 10 random chains. Straight lines, medians; solid boxes, interquartile ranges; whiskers, overall ranges of non-outlier data; dots, individual data points (); asterisk, ; n.s., . (b) Comparison of classification algorithms using F1-score and the independent multi-output strategy. Classifiers are presented in order of decreasing median performance and statistical comparisons are performed only against the highest-performing algorithm. CC classifier chain, RDA regularised discriminant analysis, LDA linear discriminant analysis, QDA quadratic discriminant analysis, RF random forest, GNB Gaussian naive Bayes, KNN k-nearest neighbours, LR logistic regression, ET extra trees, BL baseline.

We did not observe a difference in F1-score between the best-performing CC (CC-best) and independent output classifiers (). Moreover, performance was not statistically different when we used ensembles of 10 CC (). F1-scores for the worst-performing CC (CC-worst) were statistically lower than independent (), CC-best (), and ensemble CC (). This finding indicates that a poor label ordering can in fact decrease performance, as compared to independent classifiers. A performance summary with all evaluation metrics is presented in Table 1 and detailed results are provided in Supplementary Figure S1.

Table 1

Multi-output strategy benchmark.

Multi-output strategy	F1-score (macro-average)	Exact match ratio	Hamming score	Recall (macro-average)	Precision (macro-average)
Independent	0.63 (0.54, 0.68)	0.52 (0.43, 0.59)	0.80 (0.71, 0.83)	0.61 (0.54, 0.68)	0.67 (0.53, 0.72)
CC (best)	0.63 (0.53, 0.66)	0.53 (0.44, 0.60)	0.80 (0.70, 0.83)	0.61 (0.54, 0.67)	0.66 (0.51, 0.71)
CC (worst)	0.61 (0.53, 0.66)	0.53 (0.44, 0.60)	0.80 (0.70, 0.83)	0.60 (0.54, 0.67)	0.65 (0.52, 0.71)
Ensemble CC	0.63 (0.55, 0.67)	0.53 (0.44, 0.60)	0.80 (0.71, 0.83)	0.61 (0.55, 0.68)	0.66 (0.53, 0.72)

Median scores and overall ranges are reported for each strategy using the LDA classifier.

Bold values indicate highest average performance for each evaluation metric.

Figure 3b summarises the results of the classifier benchmark analysis using independent output classifiers. The highest median performance was achieved by regularised discriminant analysis (RDA) (), closely followed by LDA. We performed statistical comparisons between the highest-performing algorithm (i.e., RDA) and all other classifiers and found that RDA significantly outperformed all classifiers except LDA. All classifiers performed higher than chance. A performance summary with all evaluation metrics is presented in Table 2.

Table 2

Classifier benchmark.

Classifier	F1-score (macro-average)	Exact match ratio	Hamming score	Recall (macro-average)	Precision (macro-average)
RDA	0.64 (0.56, 0.69)	0.58 (0.52, 0.62)	0.82 (0.79, 0.84)	0.63 (0.54, 0.70)	0.67 (0.59, 0.70)
LDA	0.64 (0.55, 0.70)	0.59 (0.53, 0.62)	0.83 (0.79, 0.85)	0.62 (0.54, 0.69)	0.70 (0.58, 0.73)
QDA	0.63 (0.39, 0.67)	0.53 (0.03, 0.60)	0.78 (0.40, 0.81)	0.66 (0.58, 0.73)	0.61 (0.51, 0.65)
RF	0.61 (0.50, 0.68)	0.62 (0.57, 0.64)	0.84 (0.81, 0.85)	0.57 (0.47, 0.66)	0.77 (0.65, 0.81)
GNB	0.57 (0.49, 0.63)	0.54 (0.10, 0.57)	0.76 (0.61, 0.78)	0.59 (0.54, 0.69)	0.58 (0.47, 0.61)
KNN	0.56 (0.46, 0.63)	0.58 (0.51, 0.62)	0.82 (0.77, 0.83)	0.52 (0.44, 0.61)	0.65 (0.50, 0.70)
LR	0.56 (0.45, 0.66)	0.58 (0.55, 0.60)	0.82 (0.80, 0.84)	0.53 (0.43, 0.65)	0.69 (0.62, 0.72)
ET	0.51 (0.43, 0.62)	0.59 (0.56, 0.62)	0.83 (0.80, 0.84)	0.47 (0.41, 0.60)	0.76 (0.68, 0.83)
BL	0.33 (0.33, 0.33)	0.10 (0.08, 0.15)	0.61 (0.58, 0.66)	0.33 (0.33, 0.33)	0.33 (0.33, 0.33)

Median scores and overall ranges are reported for each classifier using the independent multi-output strategy.

Bold values indicate highest average performance for each evaluation metric.

Figure 4 shows average confusion matrices obtained with the independent multi-output RDA classifier for individual DOFs. The best performance was achieved for the ring digit, followed by the middle, index, and thumb digits. The lowest performance was achieved for the thumb rotation and little digit flexion/extension DOFs.

Figure 4

Confusion matrices. For each DOF, the average confusion matrices obtained with the independent multi-output RDA classifier are shown. Colour bar and annotated scores indicate normalised prediction rates.

Additional results from the benchmark analysis are provided in the supplementary material: Figure S2 shows algorithmic comparisons for both independent classifiers and CC for all evaluation metrics; Figure S3 shows performance of the two highest-performing algorithms (i.e., LDA and RDA) for individual subjects; and Figure S4 summarises performance of all classifiers for individual DOFs using independent classifiers for each output. Performance comparisons. (a) Comparison of multi-output classification strategies using an LDA base classifier. CC best and CC worst correspond to best- and worst-performing single CC models, respectively. For each participant, 100 random chains were generated and evaluated. Ensemble CC corresponds to a model with 10 random chains. Straight lines, medians; solid boxes, interquartile ranges; whiskers, overall ranges of non-outlier data; dots, individual data points (); asterisk, ; n.s., . (b) Comparison of classification algorithms using F1-score and the independent multi-output strategy. Classifiers are presented in order of decreasing median performance and statistical comparisons are performed only against the highest-performing algorithm. CC classifier chain, RDA regularised discriminant analysis, LDA linear discriminant analysis, QDA quadratic discriminant analysis, RF random forest, GNB Gaussian naive Bayes, KNN k-nearest neighbours, LR logistic regression, ET extra trees, BL baseline. Multi-output strategy benchmark. Median scores and overall ranges are reported for each strategy using the LDA classifier. Bold values indicate highest average performance for each evaluation metric. Classifier benchmark. Median scores and overall ranges are reported for each classifier using the independent multi-output strategy. Bold values indicate highest average performance for each evaluation metric. Confusion matrices. For each DOF, the average confusion matrices obtained with the independent multi-output RDA classifier are shown. Colour bar and annotated scores indicate normalised prediction rates.

Discussion

In this work, we introduced a novel paradigm for upper-limb prosthetic digit control using surface EMG signals. In the proposed approach, EMG features are used to decode discrete digit actions via multi-output classification. At each time step, the algorithm classifies movement intent for each DOF into one of three categories: open, close, or stall (i.e., no movement). We have previously evaluated this type of controller in a robotic hand tele-operation task with a data glove and have found that it can achieve comparable performance to digit position control[29]. The aim of this study was twofold: (1) to demonstrate the feasibility of using surface EMG signals from the forearm to decode digit actions; and (2) to carry-out a systematic offline investigation prior to implementing the algorithm in real-time and testing it with upper-limb amputees. We have shown that it is feasible, in principle, to decode digit actions from surface EMG signals. The median F1-score of the best-performing configuration (i.e., independent multi-output RDA classifiers) was . The median Hamming score, exact match ratio, and macro-average precision and recall scores were all significantly and substantially higher than chance (supplementary material). We observed the highest performance for the ring and middle digits (Fig. 4). This is in agreement with previous work on regression-based reconstruction of digit position/velocity trajectories[18, 21, 23]. This finding is expected from a physiological perspective, given that these two digits are controlled by extrinsic superficial muscles, as opposed to the thumb, for example, which is controlled by intrinsic and deep extrinsic muscles, both of which are not easily accessible from the surface of the forearm. Our offline investigation scrutinised several important aspects of the method, including feature analysis, choice of decoding algorithm, and evaluation of two multi-output strategies. It has been previously demonstrated that both feature selection and choice of classifier can substantially influence the performance of myoelectric classification systems[31, 32]. In line with previous reports, which were mainly concerned with upper-limb motion/grip classification[9, 31–33], we found that discriminant analysis-based classifiers, such as LDA and RDA, can achieve the highest level of performance. The results of our feature selection/ranking analysis (Fig. 2a) are also largely in agreement with previous reports from the motion classification literature[32, 34]. In multi-output classification settings, it is often desirable to exploit output dependencies to improve decoding performance. In this regard, we investigated the potential of using the state-of-the-art method of CC to improve classification. The exact match ratio was marginally improved with both CC and ensemble CC (see supplementary material). Nevertheless, we did not observe an increase in F1-score using either method. The improvement in exact match ratio is expected, since it has been theoretically shown that CC maximise this metric exactly by equivalently minimising 0/1 loss[35]. On the other hand, when labels are evaluated independently, as with macro-average F1-score, there is no guarantee that CC will outperform independent classifiers, although this may often happen in practice[36]. We attribute the ineffectiveness of CC in our case to the fact that the dataset comprised both single-digit exercises as well as full-hand grips (five exercises of each type). Including a large number of single-digit motions results in outputs becoming largely independent and, thus, there is less structure in the output domain that CC can exploit to leverage performance. In myoelectric control, multi-output classification has been previously used only to decode simultaneous wrist and hand motions[5-10]. For control of prosthetic digits, however, previous efforts have focused on using multi-output regression to reconstruct position[16–19, 21], velocity[23, 24] or fingertip force trajectories[26-28]. It is worth noting that despite using a discrete decoder in our approach, intermediate positions are possible. We have previously shown in a hand tele-operation task with a data glove[29] that by using a small action step (i.e., digit position increase/decrease step), control becomes approximately continuous. In other words, despite using a classifier as the decoder, our approach allows for arbitrary hand configurations. From a control theory perspective, our action-based paradigm can be viewed as an extreme, discretised case of velocity decoding; velocity can either be zero, or take a constant value, which is only parametrised by its sign/direction. One study has previously adopted a similar approach to ours[37], but with four main differences: firstly, the labels corresponded to isometric muscle contractions while the hand was kept fixed, as opposed to digit actions during unconstrained finger movement in our case; secondly, labels were binary (i.e., stimuli corresponded to fully open or closed digits), whereas with our approach actions can take three values (i.e., open, close, or stall); thirdly, our approach requires less computational resources for both training and inference, due to using a set of linear classifiers as opposed to a convolutional neural network; and finally, our recording setup was simpler. That is, we used sparse EMG electrodes as opposed to a two-dimensional electrode grid, and we did not constrain the participants’ hand or forearm. In comparison with regression-based methods, our approach only requires discrete ground truth labels for training the decoders. This offers a theoretical advantage over regression, since it can potentially remove the need for using data gloves or motion tracking systems, which are typically required for acquiring real-valued (i.e., continuous) ground truth labels[16–19, 21, 23, 24]. In this study, we analysed a previously collected dataset comprising data glove measurements[21], thus we obtained the discrete labels by thresholding the respective velocity profiles. Alternatively, discrete ground truth labels can be acquired by prompting the user to perform imaginary finger movements with specified direction[30], in a similar fashion that machine learning-based commercial systems are typically calibrated. This feature renders our approach more suitable than regression in a clinical setting, and also makes it suitable for people with bilateral limb deficiency or amputation. Our study has three limitations. Firstly, it was limited to offline analyses. It is well-accepted in the myoelectric control community that offline performance measures are not always a good proxy of real-time control performance[10, 14, 21, 38, 39]. Therefore, it is imperative to evaluate control algorithms with real-time implementations and user-in-the-loop experiments. Given that we have introduced a completely novel paradigm for prosthetic digit control, the main purpose of this work was to systematically explore different parameters of the method and lay the groundwork for the subsequent real-time implementation. We report our implementation and evaluation of the proposed algorithm with amputee participants in a separate study[30]. The second limitation of the study was that we did not consider neural networks in our classifier benchmarks. Neural networks can naturally handle multi-output, multi-class classification problems via appropriate design of the output layers. For our application, it is likely that parameter sharing in the early layers of a network may benefit overall performance by optimising a combination of ouput loss functions, one for each DOF (i.e., multi-task learning). This is currently seen as a future research direction. The third limitation of the study is that we did not investigate aspects of decoding robustness under non-stationary conditions and/or generalisation to novel finger configurations. It is well-known that machine learning-based myoelectric control algorithms typically suffer from poor generalisation under different limb positions and/or muscle contraction levels[40, 41]. Moreover, generalisation to novel postures is a much desired feature, as it allows to extrapolate to movements not present in the training set. In principle, this feature is supported by our framework, given that the motion of each digit is controlled independently. It will be invaluable in the future to systematically investigate all the above aspects of decoding generalisation, ideally outside a lab-controlled environment. In conclusion, we have proposed a new paradigm for prosthetic digit control based on multi-output, multi-class classification. We have demonstrated the feasibility of decoding actions for all five digits and rotation of the thumb using surface EMG measurements recorded on the forearm in both able-bodied and transradial amputee participants. Our algorithm warrants further investigation with real-time, user-in-the-loop experiments with upper-limb amputees.

Methods

Dataset

The dataset used in the study was previously collected and made publicly available by us[21]. For completeness, we briefly describe here the experimental protocol and refer the reader to the original study for more details. Data collection. (a) Sixteen wireless EMG sensors were placed on the surface of the skin and were arranged in two rows of eight equidistant sensors below the elbow. (b–c) Hand kinematic data were collected using an instrumented data glove placed on the contralateral side. Participants were instructed to perform bilateral mirrored movements. Parts of the figure were previously published by us under a Creative Commons Attribute License (CC BY 4.0)[21]. Ten able-bodied and two right-arm transradial (i.e., below-elbow) amputee participants were included in the study. All able-bodied participants were right-hand dominant. All experiments were performed in accordance with the Declaration of Helsinki and were approved by the local Ethics Committees of the School of Informatics, University of Edinburgh (#201507160854) and School of Engineering, Newcastle University (#14-NAZ-056). Prior to the experiments, subjects read a participant information sheet and gave written informed consent. We placed 16 Delsys Trigno surface EMG sensors (Delsys, Inc.) on the participants’ skin below the right elbow arranged in two rows of eight equidistant electrodes and without targeting specific muscles. Prior to electrode placement, we cleansed participants’ skin using 70% isopropyl alcohol. We used adhesive elastic bandage to secure the locations of the electrodes throughout the sessions. The sampling rate of EMG data acquisition was fixed at 1111 Hz. In addition, we recorded hand kinematic data using an 18-DOF Cyberglove II data glove (CyberGlove Systems, LLC), which we placed on the participants’ left hand. Data glove measurements were calibrated for each participant using a quick calibration routine provided by the manufacturer. The sampling rate of glove data acquisition was fixed at 25 Hz. Participants sat comfortably on an office chair and rested both arms on a table. They were asked to perform a series of bilateral mirrored hand exercises whilst these were instructed on a computer display placed approximately 1 m in front of them. The selection of movements was done such that both single-finger as well as full-hand exercises were included. The following nine unique movements were selected: thumb abduction/adduction; thumb, index, middle, and combined ring and little fingers flexion/extension; cylindrical, lateral, and tripod grips; and index pointer. Three blocks of exercises were recorded for each participant: datasets A and B comprised 10 repetitions of each exercise and dataset C only two. Consecutive trials were interleaved with 3 s of rest. The experimental protocol and apparatus used are shown in Fig. 5.

Figure 5

Data collection. (a) Sixteen wireless EMG sensors were placed on the surface of the skin and were arranged in two rows of eight equidistant sensors below the elbow. (b–c) Hand kinematic data were collected using an instrumented data glove placed on the contralateral side. Participants were instructed to perform bilateral mirrored movements. Parts of the figure were previously published by us under a Creative Commons Attribute License (CC BY 4.0)[21].

Pre-processing

Myoelectric and glove data were upsampled to 2 kHz and synchronised using linear interpolation. We processed myoelectric data using an overlapping window approach. We set the window length to 128 ms and the increment to 50 ms (i.e., approximately 60% overlap). We filtered the EMG signals using a 4th-order band-pass Butterworth digital filter with lower and upper cutoff frequencies of 10 and 500 Hz, respectively. We used a linear mapping[21] to transform the calibrated glove measurements into joint angles for the following six DOFs: thumb rotation, flexion/extension of the thumb, index, middle, ring, and little digits. Joint angles were then normalised in the range 0–1, with 0 corresponding to a DOF being fully open (or thumb rotator fully reposed) and 1 to fully closed (thumb rotator fully opposed). Finally, we smoothed the calibrated glove measurements with a -order low-pass Butterworth filter with cutoff frequency of 1 Hz.

Digit action estimation from joint angle trajectories

To extract digit action labels from calibrated glove data we used the following procedure. Firstly, we estimated joint velocities by computing the first-order difference of normalised joint angle trajectories. We then thresholded the computed differences using a tolerance , such that joint velocities larger than were assigned the “close” label and velocities smaller than were assigned the “open” label. Velocity values in the range were assigned the “stall” label, corresponding to no movement. Finally, for joint angles less than 7.5% away from either boundary (0 or 1), we assumed that the respective actions were “open” and “close”, respectively, regardless of the joint velocity. The digit action estimation procedure was performed independently for each DOF. An illustration is provided in Fig. 6 using data from one participant as an example.

Figure 6

Finger action estimation from glove data. The mapping from position to action space is demonstrated using data from one participant. The position trajectory for each DOF (grey traces, left y-axes) is normalised between 0 (i.e., fully open) and 1 (i.e., fully closed). The first-order discrete position difference is computed and transformed into action via thresholding (black traces, right y-axes). The shown excerpt corresponds to two repetitions of the cylindrical grasp exercise.

The digit movement (i.e., target) variables were highly-imbalanced: approximately 75% of the samples corresponded to the “stall” class, whereas the remaining 25% was equally split between the “open” and “close” classes. Finger action estimation from glove data. The mapping from position to action space is demonstrated using data from one participant. The position trajectory for each DOF (grey traces, left y-axes) is normalised between 0 (i.e., fully open) and 1 (i.e., fully closed). The first-order discrete position difference is computed and transformed into action via thresholding (black traces, right y-axes). The shown excerpt corresponds to two repetitions of the cylindrical grasp exercise.

Performance evaluation and metrics

We considered a range of performance metrics to characterise classification performance. In multi-label classification, which is a special case of multi-output classification whereby the outputs are binary, the following metrics are commonly used[42-44]: (1) exact match ratio or accuracy is the percentage of samples that have all their labels correctly classified; (2) hamming score is the fraction of correctly classified labels to the total number of labels; (3) precision, recall and F1-score (i.e., harmonic mean of precision and recall) can be used in a similar way to multi-class classification, that is, either on a per-label basis or by using an appropriate method to average across labels. Macro- and micro-averaging are common choices: macro-averaging computes the metric of interest for each label independently and averages across labels; whereas micro-averaging aggregates contributions from each label to compute the average metric. The exact match ratio is a strict measure, since it requires all labels to be correctly classified for a sample to be considered correct. Given that the number of labels in our case was relatively large (i.e., six), we did not consider this measure as our main evaluation metric. On the other hand, training/testing datasets for individual participants were highly-imbalanced, due to domination of the “stall” class over the “open” and “close” classes. Therefore, hamming loss was not an appropriate evaluation metric either. Taking the above into consideration, we selected F1-score as our main performance measure and used macro-averaging to account both for multiple labels as well as multiple classes within each label. We additionally considered the following metrics: exact match ratio (i.e., accuracy), hamming score, recall (macro-average) and precision (macro-average).

EMG feature extraction and selection

We experimented with a large group of time-domain EMG features. In the feature analysis investigation, we included the following time-domain features: mean absolute value (MAV)[45], waveform length (WL)[32], rate of zero-crossings (ZC)[46], slope sign changes (SSC)[32], Wilson amplitude (WAmp)[46], root mean square (RMS)[32], integrated EMG (IEMG)[46], variance (VAR)[46], log-variance (LogVar)[12], kurtosis (Kurt)[47], skewness (Skew)[48], -order auto-regressive (AR) coefficients[46], histogram (Hist) counts[46] (), and Hjorth (Hjorth) parameters (i.e., activity, mobility, and complexity)[49]. For the feature analysis investigation, we used a modified version of the sequential forward feature selection algorithm[50]. The algorithm was initialised with an empty feature set. Within each iteration, all available features were tentatively added to the pool, one at a time, and the respective F1-scores were estimated. The feature that yielded the highest performance was added to the pool and the procedure was repeated until all features were included. In that way, each EMG feature was assigned a rank, which was equal to the order that it was added to the pool. The forward selection procedure was performed independently for each participant and respective feature ranks were averaged. For each participant, models were fitted using independent multi-output LDA classifiers on dataset A and were evaluated on dataset B. Based upon the results of the feature selection analysis (Fig. 2), we used the following features in the rest of the study: WAmp, LogVar, Hjorth, Kurt, AR, WL, and Skew.

Classifier chain analysis

Classifier chains (CC) is a popular machine learning tool for multi-label classification (i.e., a special case of multi-output classification whereby outputs are binary) that takes into account label dependencies. We briefly describe the method here and refer the interested reader to the original paper for more details[36]. Given a set of labels , a CC model learns classifiers, which are linked in a chain. Firstly, the label chain (i.e., order of labels) needs to be defined: . The first classifier in the chain is then fitted using input features only. The ground truth data from are then included as an additional input feature for training the second classifier in the chain . This process is repeated for all remaining labels in the chain by including for each label ground truth data from previous labels in the chain . For inference, the same procedure is followed, except predictions from previous labels in the chain are used at each step. A popular variant of the method is the ensemble of CC, whereby several CC models are trained with random orders of labels and their predictions are aggregated using a voting scheme. In our application, the number of labels (i.e., outputs) was , that is, the number of DOFs: thumb opposition/reposition and flexion/extension of thumb, index, middle, ring, and little digits. For each label, the set of classes was and thus the number of classes was . In our CC analysis, we tested all possible orders of labels, that is, a total of chains. For each participant, we report performance for the best- and worst-performing chains in terms of F1-score. In addition, we report best performance from a set of 100 ensemble CC models, each trained with a random set of 10 label orders. We implemented the ensemble CC using a soft voting scheme, which predicts class labels based on the predicted probabilities from each chain in the ensemble. We compared the performance of CC and ensemble CC to that of independent multi-output classification, whereby an independent classifier is trained for each output. This method is often referred to as binary relevance[42, 43] when dealing with binary multi-output classification problems (i.e., multi-label). Note, however, that this term is not appropriate for our problem, which is multi-output and multi-class. Therefore, we refer to this strategy as independent multi-output classification.

Classifier training and hyper-parameter optimisation

We considered a wide range of classifiers in our classification benchmark analysis. With few exceptions (e.g., LDA, quadratic discriminant analysis (QDA) and Gaussian naive Bayes (GNB)), most algorithms have hyper-parameters which were systematically tuned with hold-out validation. For each participant, we fitted models on dataset A and tuned hyper-parameters using randomised search with 50 iterations on dataset B. We finally report performance on dataset C. There were approximately samples in datasets A and B (training and validation, respectively) and samples in dataset C (testing). We performed 10 independent runs for each participant/classifier experiment and report average performance results across runs. Baseline performance was assessed using a dummy classifier always predicting the “stall” class for each label, which was the dominating class in the training dataset. The list of algorithms used in the benchmark along with hyper-parameters and respective search ranges for each classifier are provided in Supplementary Table S1. Using 16 EMG electrodes and the optimal feature set identified as part of the feature selection analysis (i.e., WAmp, LogVar, Hjorth, Kurt, -order AR coefficients, WL, and Skew), the input dimensionality was . For the k-nearest neighbours (KNN) and logistic regression (LR) classifiers we reduced the input dimensionality to using principal component analysis to speed-up training. We performed model training and testing in Python 3.7 (https://www.python.org/) using the scikit-learn library v. 0.22 (https://scikit-learn.org/stable/)[51] and custom-written code.

Statistical analysis

We used two-sided Wilcoxon signed-rank tests to compare performance between pairs of classifiers. All comparisons were performed at the population level using participant-average scores, thus the number of samples was . To account for multiple comparisons, we used the Holm-Bonferroni correction method. Statistical analysis was performed in Python 3.7 using the Pingouin library v. 0.3.1 (https://pingouin-stats.org/)[52]. Supplementary Information.

32 in total

Review 1. Upper limb prosthesis use and abandonment: a survey of the last 25 years.

Authors: Elaine A Biddiss; Tom T Chau
Journal: Prosthet Orthot Int Date: 2007-09 Impact factor: 1.895

2. Extracting signals robust to electrode number and shift for online simultaneous and proportional myoelectric control by factorization algorithms.

Authors: Silvia Muceli; Ning Jiang; Dario Farina
Journal: IEEE Trans Neural Syst Rehabil Eng Date: 2013-10-10 Impact factor: 3.802

3. Application of higher order statistics to surface electromyogram signal classification.

Authors: Kianoush Nazarpour; Ahmad R Sharafat; S Mohammad P Firoozabadi
Journal: IEEE Trans Biomed Eng Date: 2007-10 Impact factor: 4.538

4. Orthogonal fuzzy neighborhood discriminant analysis for multifunction myoelectric hand control.

Authors: Rami N Khushaba; Ahmed Al-Ani; Adel Al-Jumaily
Journal: IEEE Trans Biomed Eng Date: 2010-02-17 Impact factor: 4.538

5. Real-time and simultaneous control of artificial limbs based on pattern recognition algorithms.

Authors: Max Ortiz-Catalan; Bo Håkansson; Rickard Brånemark
Journal: IEEE Trans Neural Syst Rehabil Eng Date: 2014-02-19 Impact factor: 3.802

Review 6. The extraction of neural information from the surface EMG for the control of upper-limb prostheses: emerging avenues and challenges.

Authors: Dario Farina; Ning Jiang; Hubertus Rehbaum; Aleš Holobar; Bernhard Graimann; Hans Dietl; Oskar C Aszmann
Journal: IEEE Trans Neural Syst Rehabil Eng Date: 2014-02-11 Impact factor: 3.802

7. Surface EMG in advanced hand prosthetics.

Authors: Claudio Castellini; Patrick van der Smagt
Journal: Biol Cybern Date: 2008-11-18 Impact factor: 2.086

8. Proof of Concept of an Online EMG-Based Decoding of Hand Postures and Individual Digit Forces for Prosthetic Hand Control.

Authors: Alycia Gailey; Panagiotis Artemiadis; Marco Santello
Journal: Front Neurol Date: 2017-02-01 Impact factor: 4.003

9. Extraction of Multi-Labelled Movement Information from the Raw HD-sEMG Image with Time-Domain Depth.

Authors: Alexander E Olsson; Paulina Sager; Elin Andersson; Anders Björkman; Nebojša Malešević; Christian Antfolk
Journal: Sci Rep Date: 2019-05-10 Impact factor: 4.379

10. A real-time comparison between direct control, sequential pattern recognition control and simultaneous pattern recognition control using a Fitts' law style assessment procedure.

Authors: Sophie M Wurth; Levi J Hargrove
Journal: J Neuroeng Rehabil Date: 2014-05-30 Impact factor: 4.262

2 in total

1. Learning regularized representations of categorically labelled surface EMG enables simultaneous and proportional myoelectric control.

Authors: Alexander E Olsson; Nebojša Malešević; Anders Björkman; Christian Antfolk
Journal: J Neuroeng Rehabil Date: 2021-02-15 Impact factor: 4.262

2. Arduino-Based Myoelectric Control: Towards Longitudinal Study of Prosthesis Use.

Authors: Hancong Wu; Matthew Dyson; Kianoush Nazarpour
Journal: Sensors (Basel) Date: 2021-01-24 Impact factor: 3.576

2 in total