| Literature DB >> 31877970 |
Krzysztof Kamycki1, Tomasz Kapuscinski1, Mariusz Oszust1.
Abstract
In this paper, a novel data augmentation method for time-series classification is proposed. In the introduced method, a new time-series is obtained in warped space between suboptimally aligned input examples of different lengths. Specifically, the alignment is carried out constraining the warping path and reducing its flexibility. It is shown that the resultant synthetic time-series can form new class boundaries and enrich the training dataset. In this work, the comparative evaluation of the proposed augmentation method against related techniques on representative multivariate time-series datasets is presented. The performance of methods is examined using the nearest neighbor classifier with the dynamic time warping (NN-DTW), LogDet divergence-based metric learning with triplet constraints (LDMLT), and the recently introduced time-series cluster kernel (NN-TCK). The impact of the augmentation on the classification performance is investigated, taking into account entire datasets and cases with a small number of training examples. The extensive evaluation reveals that the introduced method outperforms related augmentation algorithms in terms of the obtained classification accuracy.Entities:
Keywords: data augmentation; machine learning; multivariate time-series; time-series classification
Year: 2019 PMID: 31877970 PMCID: PMC6983028 DOI: 10.3390/s20010098
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Augmented time-series from ECG dataset [25] using the proposed method. The first two rows show input time-series of two classes, the third row presents new examples (a). The plot for the entire ECG dataset highlights the variability of augmented time-series (b). Classes are indicated by colors.
Time series benchmark datasets used in experiments.
| Name | Research | Classes | Attributes | Max | Train-Test | Training | Augmented |
|---|---|---|---|---|---|---|---|
| Area | Length | Split | Samples | Samples | |||
| AUSLAN [ | Sign Language Recognition | 95 | 22 | 96 | 44-56 | 1140 | 6270 |
| Gesture Phase [ | Gesture Recognition | 5 | 18 | 214 | 50-50 | 198 | 4142 |
| EEG [ | EEG Classification | 2 | 13 | 117 | 50-50 | 64 | 996 |
| ECG [ | ECG Classification | 2 | 2 | 147 | 50-50 | 100 | 2706 |
| Kick vs. Punch [ | Action Recognition | 2 | 62 | 761 | 62-38 | 16 | 57 |
| AREM [ | Activity Recognition | 7 | 7 | 480 | 50-50 | 43 | 132 |
| Movement AAL [ | Movement Classification | 2 | 4 | 119 | 50-50 | 157 | 6084 |
| Occupancy [ | Occupancy Classification | 2 | 5 | 3758 | 35-65 | 41 | 400 |
| Ozone [ | Weather Classification | 2 | 72 | 291 | 50-50 | 172 | 7635 |
| LIBRAS [ | Sign Language Recognition | 15 | 2 | 45 | 38-62 | 360 | 4278 |
Experimental comparison of augmentation methods using three time-series classifiers in terms of the classification accuracy. The greatest value used in the rank is written in bold and rounded to two significant figures.
| Dataset/Aug. Method | None | SPAWNER | WW | WS | DBA |
|---|---|---|---|---|---|
| NN-DTW | |||||
| AUSLAN | 0.92 |
| 0.92 | 0.92 | 0.95 |
| Gesture Phase | 0.42 |
| 0.43 | 0.43 | 0.43 |
| EEG |
| 0.57 | 0.67 | 0.68 | 0.55 |
| ECG | 0.82 | 0.82 | 0.81 | 0.80 |
|
| Kick vs. Punch |
|
|
|
|
|
| AREM | 0.72 | 0.72 | 0.72 | 0.72 |
|
| Movement AAL | 0.73 | 0.74 | 0.74 |
| 0.73 |
| Occupancy | 0.64 |
| 0.57 | 0.70 | 0.69 |
| Ozone | 0.79 |
| 0.78 | 0.80 | 0.80 |
| LIBRAS | 0.95 | 0.94 | 0.95 |
| 0.95 |
| NN-TCK | |||||
| AUSLAN |
| 0.57 | 0.09 | 0.31 | 0.01 |
| Gesture Phase | 0.17 |
| 0.17 | 0.17 | 0.17 |
| EEG | 0.48 | 0.50 |
| 0.54 | 0.48 |
| ECG |
| 0.84 | 0.77 | 0.80 | 0.80 |
| Kick vs. Punch |
| 0.40 | 0.40 | 0.40 | 0.40 |
| AREM | 0.72 |
| 0.81 | 0.78 | 0.18 |
| Movement AAL | 0.65 | 0.66 |
| 0.67 | 0.64 |
| Occupancy | 0.68 | 0.66 | 0.67 | 0.64 |
|
| Ozone |
| 0.39 | 0.39 | 0.39 | 0.39 |
| LIBRAS | 0.94 | 0.93 |
| 0.94 | 0.94 |
| LDMLT | |||||
| AUSLAN |
| 0.97 | 0.96 | 0.96 | 0.97 |
| Gesture Phase | 0.31 |
| 0.29 | 0.29 | 0.41 |
| EEG | 0.70 | 0.61 | 0.69 |
| 0.57 |
| ECG | 0.82 | 0.80 |
| 0.81 | 0.82 |
| Kick vs. Punch |
|
|
|
|
|
| AREM | 0.69 | 0.72 | 0.67 | 0.68 |
|
| Movement AAL | 0.66 | 0.69 | 0.66 |
| 0.67 |
| Occupancy | 0.57 |
| 0.70 | 0.65 | 0.65 |
| Ozone | 0.67 | 0.73 | 0.65 | 0.00 |
|
| LIBRAS | 0.94 |
| 0.94 | 0.94 |
|
| Overall results | |||||
| Count best | 8 |
| 6 | 6 | 8 |
| Average rank | 2.98 |
| 3.30 | 3.02 | 3.10 |
| Geometric average rank | 2.64 |
| 2.94 | 2.73 | 2.73 |
Figure 2Scatter plots with two-dimensional MDS embedding of DTW dissimilarities between the first 15 (the first two rows) or 25 (the last two rows) training sequences from AUSLAN dataset, including augmented time-series generated by four methods. In the plots, the classes are denoted by colors, the filled circles denote input time-series, and the circles with white interior indicate synthetic samples.
Figure 3Accuracy of NN-DTW classifier with augmented small number of examples per class.
Experimental comparison of augmentation methods using NN-DTW classifier and few number of examples per class. Average accuracy for a dataset is reported (see Figure 3). The greatest value used in the rank is written in bold and rounded to two significant figures.
| Dataset/Aug. Method | None | SPAWNER | WW | WS | DBA |
|---|---|---|---|---|---|
| AUSLAN | 0.63 |
| 0.64 | 0.65 | 0.67 |
| Gesture Phase | 0.30 |
| 0.30 | 0.31 | 0.31 |
| EEG | 0.52 | 0.55 |
| 0.56 | 0.54 |
| ECG | 0.73 |
| 0.74 | 0.73 | 0.74 |
| KickvsPunch | 0.65 |
| 0.64 | 0.62 | 0.65 |
| AREM | 0.69 | 0.76 | 0.72 | 0.71 |
|
| Movement AAL | 0.58 | 0.57 | 0.58 | 0.58 |
|
| Occupancy | 0.58 | 0.60 | 0.58 |
| 0.58 |
| Ozone | 0.66 | 0.68 | 0.65 | 0.66 |
|
| LIBRAS | 0.77 |
| 0.78 | 0.77 | 0.77 |
| Overall results | |||||
| Count best | 0 |
| 1 | 1 | 3 |
| Average rank | 4.15 |
| 3.3 | 3.2 | 2.45 |
| Geometric average rank | 4.03 |
| 2.99 | 2.96 | 2.09 |
Time- and memory-consumption of compared time-series augmentation methods.
| Method | Computation Time (s) | Storage (MB) |
|---|---|---|
| SPAWNER | 3.23 | 1.23 |
| WW | 0.01 | 0.06 |
| WS | 0.01 | 0.07 |
| DBA | 49.6 | 7.18 |
Influence of the introduced constraint on the average performance of NN-DTW classifier with augmented data generated by SPAWNER. The best result for each benchmark dataset is written in bold and rounded to two significant figures.
| Dataset/Method | Constrained (Suboptimal) | Unconstrained | |
|---|---|---|---|
| Mean | Maximum | (Optimal) | |
| Auslan | 0.96 |
| 0.96 |
| Gesture Phase | 0.45 |
|
|
| EEG | 0.57 |
| 0.56 |
| ECG | 0.82 |
| 0.81 |
| Kick vs. Punch |
|
|
|
| AREM | 0.72 |
|
|
| Movement AAL | 0.74 |
| 0.70 |
| Occupancy | 0.71 |
| 0.71 |
| Ozone | 0.80 |
| 0.72 |
| LIBRAS | 0.94 |
| 0.94 |
Figure 4Accuracy of NN-DTW classifier with augmented time-series generated by SPAWNER considering a different number of neighboring sequences in the selection of input sequence pairs. The number of selected time-series is given in percent.