| Literature DB >> 32351945 |
Johannes Burdack1, Fabian Horst1, Sven Giesselbach2,3, Ibrahim Hassan1,4, Sabrina Daffner5, Wolfgang I Schöllhorn1,6.
Abstract
Human movements are characterized by highly non-linear and multi-dimensional interactions within the motor system. Therefore, the future of human movement analysis requires procedures that enhance the classification of movement patterns into relevant groups and support practitioners in their decisions. In this regard, the use of data-driven techniques seems to be particularly suitable to generate classification models. Recently, an increasing emphasis on machine-learning applications has led to a significant contribution, e.g., in increasing the classification performance. In order to ensure the generalizability of the machine-learning models, different data preprocessing steps are usually carried out to process the measured raw data before the classifications. In the past, various methods have been used for each of these preprocessing steps. However, there are hardly any standard procedures or rather systematic comparisons of these different methods and their impact on the classification performance. Therefore, the aim of this analysis is to compare different combinations of commonly applied data preprocessing steps and test their effects on the classification performance of gait patterns. A publicly available dataset on intra-individual changes of gait patterns was used for this analysis. Forty-two healthy participants performed 6 sessions of 15 gait trials for 1 day. For each trial, two force plates recorded the three-dimensional ground reaction forces (GRFs). The data was preprocessed with the following steps: GRF filtering, time derivative, time normalization, data reduction, weight normalization and data scaling. Subsequently, combinations of all methods from each preprocessing step were analyzed by comparing their prediction performance in a six-session classification using Support Vector Machines, Random Forest Classifiers, Multi-Layer Perceptrons, and Convolutional Neural Networks. The results indicate that filtering GRF data and a supervised data reduction (e.g., using Principal Components Analysis) lead to increased prediction performance of the machine-learning classifiers. Interestingly, the weight normalization and the number of data points (above a certain minimum) in the time normalization does not have a substantial effect. In conclusion, the present results provide first domain-specific recommendations for commonly applied data preprocessing methods and might help to build more comparable and more robust classification models based on machine learning that are suitable for a practical application.Entities:
Keywords: convolutional neural network; data processing; data selection; gait classification; ground reaction force; multi-layer perceptron; random forest classifier; support vector machine
Year: 2020 PMID: 32351945 PMCID: PMC7174559 DOI: 10.3389/fbioe.2020.00260
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Experimental procedure with the chronological order of the six sessions (S1–S6) and the duration of the rest periods between subsequent sessions.
Figure 2Combinations of commonly used data preprocessing steps before machine-learning classifications. (1) Data points per foot and dimension. (2) Time-continuous waveforms without reduction (TC), time-discrete gait variables by an unsupervised reduction (TD), and principle components by a supervised reduction using Principal Component Analysis (PCA). (3) Z-transformation combined with scaling from [−1, 1] over single trials (ST) or all trials (AT). fc: individual optimal filter cut-off frequency. (4) ΔtGRF: first-time derivative of GRF.
Length of the resulting input feature vectors depending on different combinations of preprocessing methods.
| TC | 11 | GRF; ΔtGRF | No; Yes | 66 = 11 * 3 * 2 |
| 101 | GRF; ΔtGRF | No; Yes | 606 = 101 * 3 * 2 | |
| 1,001 | GRF; ΔtGRF | No; Yes | 6006 = 1001 * 3 * 2 | |
| TD | 11; 101; 1,001 | GRF | No; Yes | 28 = 7 * 2 * 2 |
| ΔtGRF | No; Yes | 24 = 6 * 2 * 2 | ||
| PCA | 11 | GRF | No | 46 (44, 47) |
| Yes | 47 (44, 48) | |||
| ΔtGRF | No | 53 (49, 55) | ||
| Yes | 38 (43, 46) | |||
| 101 | GRF | No | 78 (73, 83) | |
| Yes | 72.5 (69, 79) | |||
| ΔtGRF | No | 239 (210, 268) | ||
| Yes | 108 (97, 119) | |||
| 1,001 | GRF | No | 79 (73, 84) | |
| Yes | 72 (68, 79) | |||
| ΔtGRF | No | 369 (341, 386) | ||
| Yes | 108 (97, 120) |
TC, time-continuous waveforms for three dimensions (*3) and two steps (*2); TD, time-discrete gait variables of minima and maxima of the three dimensions (GRF: 7; ΔtGRF: 6) for two steps (*2) and their relative occurrences (*2); PCA, Median and interquartile distance of the number of principle components.
Mean F1-score for each individual participant depending on each preprocessing method and machine-learning classifier.
| S01 | 38.3 | 42.0 | 41.4 | 39.0 | 34.9 | 42.4 | 43.2 | 40.2 | 25.7 | 54.5 | 39.9 | 40.4 | 46.4 | 45.0 | 36.3 | 32.9 |
| S02 | 24.5 | 29.4 | 27.9 | 26.0 | 23.2 | 28.4 | 29.2 | 24.5 | 20.6 | 35.7 | 27.2 | 26.7 | 30.0 | 30.7 | 23.9 | 23.2 |
| S03 | 36.9 | 43.5 | 40.6 | 39.8 | 36.1 | 44.3 | 40.2 | 40.9 | 27.6 | 52.1 | 40.1 | 40.3 | 44.8 | 43.9 | 38.2 | 33.9 |
| S04 | 42.9 | 50.0 | 48.2 | 45.1 | 41.5 | 48.8 | 49.5 | 48.8 | 36.5 | 53.6 | 45.9 | 47.3 | 51.4 | 56.2 | 42.0 | 36.8 |
| S05 | 49.9 | 50.2 | 52.0 | 48.1 | 47.1 | 51.1 | 52.0 | 50.2 | 36.5 | 63.6 | 49.6 | 50.6 | 56.7 | 56.2 | 45.5 | 41.9 |
| S06 | 38.4 | 39.8 | 39.3 | 38.9 | 32.0 | 42.3 | 43.2 | 38.9 | 28.0 | 49.8 | 39.3 | 38.9 | 42.8 | 44.6 | 38.2 | 30.8 |
| S07 | 31.5 | 40.5 | 35.4 | 36.7 | 30.5 | 40.0 | 37.6 | 34.6 | 28.7 | 44.8 | 36.2 | 35.9 | 39.2 | 41.7 | 32.8 | 30.3 |
| S08 | 42.7 | 49.0 | 46.4 | 45.4 | 41.4 | 49.0 | 47.0 | 47.3 | 38.2 | 51.8 | 45.7 | 46.0 | 49.0 | 52.1 | 44.9 | 37.6 |
| S09 | 43.2 | 47.2 | 46.1 | 44.3 | 39.1 | 49.8 | 46.7 | 43.5 | 34.0 | 58.1 | 45.3 | 45.1 | 51.2 | 49.7 | 41.2 | 38.7 |
| S10 | 41.3 | 40.3 | 41.2 | 40.4 | 34.2 | 44.1 | 44.1 | 44.4 | 27.5 | 50.5 | 40.5 | 41.1 | 43.8 | 43.2 | 42.2 | 33.9 |
| S11 | 38.5 | 40.7 | 42.5 | 36.7 | 35.3 | 42.8 | 40.8 | 42.0 | 27.6 | 49.3 | 39.5 | 39.7 | 44.0 | 45.1 | 35.2 | 34.2 |
| S12 | 34.1 | 31.9 | 36.2 | 29.8 | 27.9 | 35.4 | 35.7 | 35.3 | 22.9 | 40.9 | 33.5 | 32.6 | 36.7 | 34.9 | 34.1 | 26.3 |
| S13 | 31.7 | 34.5 | 34.4 | 31.8 | 28.6 | 36.9 | 33.8 | 32.5 | 27.8 | 39.0 | 32.8 | 33.4 | 36.9 | 36.6 | 31.2 | 27.6 |
| S14 | 33.9 | 34.0 | 38.1 | 29.8 | 28.3 | 37.3 | 36.2 | 35.7 | 24.4 | 41.7 | 34.1 | 33.8 | 36.3 | 35.4 | 35.7 | 28.3 |
| S15 | 39.9 | 45.3 | 46.8 | 38.4 | 36.8 | 46.7 | 44.2 | 42.5 | 31.2 | 54.0 | 42.8 | 42.3 | 48.7 | 46.2 | 39.5 | 35.8 |
| S16 | 32.0 | 32.9 | 32.9 | 31.9 | 27.5 | 34.5 | 35.3 | 33.8 | 23.3 | 40.2 | 32.7 | 32.2 | 34.3 | 34.6 | 34.6 | 26.3 |
| S17 | 29.3 | 30.0 | 31.7 | 27.6 | 22.6 | 33.4 | 32.6 | 30.0 | 21.7 | 36.9 | 29.7 | 29.7 | 33.0 | 31.5 | 29.4 | 24.8 |
| S18 | 24.4 | 26.9 | 25.6 | 25.8 | 22.8 | 27.0 | 27.3 | 28.3 | 17.9 | 30.9 | 25.7 | 25.7 | 27.6 | 26.7 | 27.5 | 21.0 |
| S19 | 27.3 | 28.7 | 31.5 | 24.5 | 25.0 | 29.4 | 29.5 | 26.8 | 22.7 | 34.5 | 27.9 | 28.0 | 31.0 | 30.6 | 25.5 | 24.8 |
| S20 | 29.3 | 34.0 | 32.4 | 31.0 | 26.6 | 33.7 | 34.7 | 31.9 | 25.7 | 37.4 | 31.8 | 31.6 | 34.4 | 36.3 | 30.0 | 26.2 |
| S21 | 27.7 | 29.6 | 30.9 | 26.4 | 25.4 | 30.8 | 29.7 | 28.1 | 22.4 | 35.2 | 28.6 | 28.8 | 31.2 | 33.1 | 26.4 | 24.0 |
| S22 | 32.3 | 33.6 | 36.4 | 29.5 | 28.6 | 34.5 | 35.7 | 34.1 | 24.2 | 40.5 | 33.2 | 32.7 | 33.6 | 35.3 | 35.6 | 27.2 |
| S23 | 31.7 | 35.0 | 34.6 | 32.1 | 28.5 | 35.0 | 36.5 | 33.8 | 25.6 | 40.7 | 33.1 | 33.6 | 34.9 | 39.0 | 32.8 | 26.6 |
| S24 | 35.4 | 43.3 | 40.1 | 38.6 | 33.9 | 41.3 | 42.9 | 39.7 | 32.1 | 46.3 | 39.7 | 39.1 | 42.8 | 43.8 | 39.2 | 31.6 |
| S25 | 34.7 | 41.9 | 39.3 | 37.4 | 34.7 | 41.6 | 38.7 | 37.1 | 33.0 | 44.8 | 38.3 | 38.4 | 40.2 | 43.8 | 36.8 | 32.5 |
| S26 | 47.6 | 49.9 | 53.6 | 43.8 | 42.3 | 51.3 | 52.5 | 52.2 | 41.7 | 52.1 | 48.5 | 48.9 | 52.9 | 56.3 | 47.0 | 38.6 |
| S27 | 31.5 | 31.8 | 30.4 | 32.8 | 26.5 | 34.2 | 34.2 | 33.6 | 24.6 | 36.7 | 31.6 | 31.6 | 32.8 | 35.2 | 31.5 | 27.0 |
| S28 | 35.9 | 45.5 | 41.9 | 39.4 | 33.5 | 45.1 | 43.5 | 43.0 | 29.9 | 49.2 | 40.9 | 40.5 | 43.5 | 43.9 | 42.3 | 33.0 |
| S29 | 32.2 | 36.1 | 33.1 | 35.2 | 30.1 | 36.5 | 35.8 | 36.3 | 22.6 | 43.6 | 34.5 | 33.9 | 36.8 | 35.7 | 36.0 | 28.1 |
| S30 | 31.1 | 33.1 | 35.4 | 28.9 | 28.3 | 32.6 | 35.4 | 35.1 | 21.6 | 39.0 | 31.9 | 32.3 | 33.4 | 37.1 | 32.5 | 25.4 |
| S31 | 51.3 | 53.7 | 54.5 | 50.5 | 44.5 | 56.0 | 57.0 | 58.6 | 36.6 | 62.2 | 52.4 | 52.5 | 56.8 | 58.3 | 53.5 | 41.3 |
| S32 | 43.0 | 45.9 | 47.4 | 41.5 | 38.6 | 46.0 | 48.7 | 49.5 | 31.1 | 52.7 | 44.7 | 44.2 | 47.9 | 50.3 | 44.0 | 35.5 |
| S33 | 35.7 | 41.4 | 39.7 | 37.4 | 32.1 | 40.4 | 43.1 | 41.6 | 23.5 | 50.3 | 38.2 | 38.9 | 42.7 | 41.6 | 39.2 | 30.7 |
| S34 | 49.8 | 51.8 | 53.8 | 47.8 | 44.5 | 53.2 | 54.7 | 52.1 | 39.0 | 61.4 | 50.7 | 50.9 | 54.1 | 57.5 | 51.3 | 40.4 |
| S35 | 38.4 | 45.4 | 45.3 | 38.8 | 35.1 | 45.5 | 45.5 | 45.6 | 25.7 | 53.7 | 42.2 | 41.8 | 45.8 | 47.7 | 42.2 | 32.2 |
| S36 | 36.9 | 39.3 | 41.0 | 35.3 | 32.9 | 40.7 | 40.7 | 39.5 | 29.8 | 45.1 | 37.9 | 38.3 | 41.2 | 43.1 | 36.8 | 31.3 |
| S37 | 30.9 | 33.7 | 35.8 | 28.9 | 27.7 | 33.5 | 35.8 | 33.8 | 20.3 | 42.9 | 32.2 | 32.5 | 35.3 | 33.3 | 34.3 | 26.4 |
| S38 | 35.1 | 38.2 | 39.0 | 34.3 | 30.9 | 38.7 | 40.3 | 37.8 | 26.5 | 45.6 | 36.7 | 36.6 | 39.5 | 40.9 | 37.1 | 29.1 |
| S39 | 41.6 | 43.2 | 46.1 | 38.7 | 39.0 | 43.1 | 45.1 | 47.4 | 28.1 | 51.7 | 42.4 | 42.4 | 44.3 | 48.6 | 42.8 | 33.9 |
| S40 | 41.4 | 48.9 | 48.8 | 41.5 | 37.1 | 47.9 | 50.4 | 48.1 | 30.3 | 56.9 | 45.1 | 45.2 | 48.9 | 50.5 | 46.8 | 34.4 |
| S41 | 38.4 | 43.2 | 43.9 | 37.6 | 34.7 | 42.9 | 44.7 | 44.4 | 28.1 | 49.7 | 40.6 | 41.0 | 42.6 | 48.3 | 40.5 | 31.7 |
| S42 | 27.2 | 29.4 | 31.3 | 25.4 | 25.5 | 28.2 | 31.4 | 29.7 | 21.6 | 33.0 | 28.3 | 28.3 | 29.2 | 31.4 | 28.3 | 24.3 |
The mean precision and mean recall (= accuracy) scores for each individual participant depending on each preprocessing method and machine-learning classifier can be found in .
Each mean value combines all combinations of preprocessing steps where the preprocessing method was part of (n = 42).
Figure 3F1-score of each preprocessing step across all participants. The y-axis shows the mean F1-score achieved. The bar charts show the mean value and the standard deviation depending on the respective preprocessing step. The parentheses show a statistically significant effect. Random Baseline = 16.7%; ***p ≤ 0.001.
Top 30 combinations of preprocessing methods, ranked by the mean F1-score over the 15-fold cross validation (n = 42).
| 1 | No | GRF | 1,001 | PCA | No | SVM | 54.4 | 9.8 |
| 2 | No | GRF | 101 | PCA | Yes | SVM | 54.2 | 10.3 |
| 3 | No | GRF | 1,001 | PCA | Yes | SVM | 54.1 | 11.2 |
| 4 | Yes | GRF | 1,001 | PCA | No | SVM | 54.0 | 10.3 |
| 5 | Yes | GRF | 101 | PCA | No | SVM | 53.9 | 10.3 |
| 6 | No | GRF | 101 | PCA | No | SVM | 53.8 | 9.8 |
| 7 | Yes | GRF | 1,001 | PCA | Yes | SVM | 53.7 | 11.6 |
| 8 | Yes | GRF | 101 | PCA | Yes | SVM | 53.6 | 11.3 |
| 9 | Yes | ΔtGRF | 1,001 | PCA | No | SVM | 53.5 | 10.6 |
| 10 | Yes | ΔtGRF | 101 | PCA | No | SVM | 53.2 | 10.3 |
| 11 | Yes | ΔtGRF | 101 | PCA | Yes | SVM | 53.2 | 10.8 |
| 12 | Yes | ΔtGRF | 1,001 | PCA | Yes | SVM | 53.2 | 10.6 |
| 13 | No | GRF | 1,001 | PCA | No | MLP | 53.0 | 9.7 |
| 14 | Yes | GRF | 101 | PCA | No | MLP | 52.7 | 9.2 |
| 15 | Yes | GRF | 1,001 | PCA | No | MLP | 52.7 | 10.0 |
| 16 | No | GRF | 101 | PCA | Yes | MLP | 52.6 | 10.2 |
| 17 | Yes | GRF | 1,001 | PCA | Yes | MLP | 52.6 | 10.2 |
| 18 | No | GRF | 1,001 | PCA | Yes | MLP | 52.3 | 9.6 |
| 19 | Yes | ΔtGRF | 101 | TC | Yes | RFC | 52.1 | 10.4 |
| 20 | No | GRF | 101 | PCA | No | MLP | 52.1 | 9.3 |
| 21 | Yes | GRF | 101 | PCA | Yes | MLP | 52.1 | 10.5 |
| 22 | Yes | ΔtGRF | 101 | TC | Yes | MLP | 51.6 | 9.6 |
| 23 | Yes | ΔtGRF | 1,001 | PCA | Yes | MLP | 51.6 | 9.4 |
| 24 | Yes | ΔtGRF | 101 | PCA | No | MLP | 51.6 | 10.6 |
| 25 | Yes | ΔtGRF | 101 | TC | No | RFC | 51.5 | 10.8 |
| 26 | Yes | ΔtGRF | 1,001 | PCA | No | MLP | 51.4 | 9.3 |
| 27 | Yes | ΔtGRF | 1,001 | TC | Yes | RFC | 51.4 | 10.7 |
| 28 | Yes | ΔtGRF | 101 | PCA | Yes | MLP | 51.4 | 10.5 |
| 29 | Yes | ΔtGRF | 101 | TC | No | MLP | 51.1 | 9.9 |
| 30 | Yes | ΔtGRF | 1,001 | TC | No | RFC | 51.1 | 10.5 |
(1) The rounded percentage means and standard deviations of the F1-scores are shown; therefore, identical values may occur in the table. However, there are no pairwise identical values, so the ranking is unique. (2) A table including precision and recall (= accuracy) can be found in .
Rank scores of all combinations of preprocessing methods depending on their mean F1-score over the 15-fold cross validation (n = 42).
| Score | 18,564 | 22,764 | 22,953 | 18,375 | 10,392 | 15,512 | 15,424 | 14,337 | 6,870 | 20,121 | 20,635 | 20,693 | 11,926 | 12,170 | 10,373 | 6,859 |
| % | 39.9 | 60.1 | 61.0 | 39.0 | 21.1 | 39.6 | 39.3 | 35.4 | 8.4 | 56.3 | 49.9 | 50.1 | 30.1 | 30.9 | 25.1 | 13.8 |
(1) The total rank score is for each preprocessing step is 41,328. For GRF filtering, time derivative, and weight normalization the minimum rank score is 10,296 (0.0%) and the maximum rank score is 31,032 (100.0%). For time normalization and data reduction the minimum rank score is 4,560 (0.0%) and the maximum is 22,992 (66.7%). For the classifiers the minimum rank score is 2,556 (0%) and the maximum is 18,108 (50.0%). %max: relative rank score of ranks scaled to the interval between the minimum rank score and the maximum total rank score. (2) The rank scores for precision and recall (= accuracy) can be found in .