| Literature DB >> 35890775 |
Dong-Hyuk Yang1, Yong-Shin Kang1.
Abstract
Time-series representation is the most important task in time-series analysis. One of the most widely employed time-series representation method is symbolic aggregate approximation (SAX), which converts the results from piecewise aggregate approximation to a symbol sequence. SAX is a simple and effective method; however, it only focuses on the mean value of each segment in the time-series. Here, we propose a novel time-series representation method-distance- and momentum-based symbolic aggregate approximation (DM-SAX)-that can secure time-series distributions by calculating the perpendicular distance from the time-axis to each data point and consider the time-series trend by adding a momentum factor reflecting the direction of previous data points. Experimental results for 29 highly imbalanced classification problems on the UCR datasets revealed that DM-SAX affords the optimal area under the curve (AUC) among competing time-series representation methods (SAX, extreme-SAX, overlap-SAX, and distance-based SAX). We statistically verified that performance improvements resulted in significant differences in the rankings. In addition, DM-SAX yielded the optimal AUC for real-world wire cutting and crimping process dataset. Meaningful data points such as outliers could be identified in a time-series outlier detection framework via the proposed method.Entities:
Keywords: highly imbalanced classification; momentum; symbolic aggregate approximation; time-series representation
Mesh:
Year: 2022 PMID: 35890775 PMCID: PMC9315809 DOI: 10.3390/s22145095
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Procedure of (a) PAA (t_size = 5) and (b) SAX (n_bins = 7).
Lookup table containing the breakpoints.
|
| 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
|
| −0.43 | −0.67 | −0.84 | −0.97 | −1.07 | −1.15 | −1.22 | −1.28 | |
|
| 0.43 | 0.00 | −0.25 | −0.43 | −0.57 | −0.67 | −0.76 | −0.84 | |
|
| 0.67 | 0.25 | 0.00 | −0.18 | −0.32 | −0.43 | −0.52 | ||
|
| 0.84 | 0.43 | 0.18 | 0.00 | −0.14 | −0.25 | |||
|
| 0.97 | 0.57 | 0.32 | 0.14 | −0.00 | ||||
|
| 1.07 | 0.67 | 0.43 | 0.25 | |||||
|
| 1.15 | 0.76 | 0.52 | ||||||
|
| 1.22 | 0.84 | |||||||
|
| 1.28 | ||||||||
Figure 2Procedure of (a) D-PAA (t_size = 5) and (b) D-SAX (n_bins = 7).
Figure 3Procedure of (a) DM-PAA (t_size = 5) and (b) DM-SAX (n_bins = 7).
Dataset descriptions.
| Dataset | #Training | #Test | #Input | Imbalance |
|---|---|---|---|---|
| Adiac | 390 | 391 | 176 | 38.1 |
| CricketX | 390 | 390 | 300 | 11.0 |
| CricketY | 390 | 390 | 300 | 11.0 |
| CricketZ | 390 | 390 | 300 | 11.0 |
| Crop | 7200 | 16,800 | 46 | 23.0 |
| DistalPhalanxOutlineAgeGroup | 400 | 139 | 80 | 11.0 |
| DistalPhalanxTW | 400 | 139 | 80 | 19.7 |
| ECG5000 | 500 | 5000 | 140 | 207.3 |
| ElectricDevices | 8926 | 7711 | 96 | 12.3 |
| EOGHorizontalSignal | 362 | 362 | 1250 | 11.3 |
| EOGVerticalSignal | 362 | 362 | 1250 | 11.3 |
| FaceAll | 560 | 1690 | 131 | 45.9 |
| FacesUCR | 200 | 2050 | 131 | 45.9 |
| FiftyWords | 450 | 455 | 270 | 149.8 |
| Fungi | 18 | 186 | 201 | 24.5 |
| InsectWingbeatSound | 220 | 1980 | 256 | 10.0 |
| MedicalImages | 381 | 760 | 99 | 48.6 |
| MiddlePhalanxTW | 399 | 154 | 80 | 15.3 |
| NonInvasiveFetalECGThorax1 | 1800 | 1965 | 750 | 49.2 |
| NonInvasiveFetalECGThorax2 | 1800 | 1965 | 750 | 49.2 |
| OSULeaf | 200 | 242 | 427 | 10.6 |
| Phoneme | 214 | 1896 | 1024 | 1054.0 |
| PigAirwayPressure | 104 | 208 | 2000 | 51.0 |
| PigArtPressure | 104 | 208 | 2000 | 51.0 |
| PigCVP | 104 | 208 | 2000 | 51.0 |
| ProximalPhalanxTW | 400 | 205 | 80 | 32.6 |
| ShapesAll | 600 | 600 | 512 | 59.0 |
| SwedishLeaf | 500 | 625 | 128 | 14.0 |
| WordSynonyms | 267 | 638 | 270 | 74.4 |
Performance benchmarks (UCR datasets).
| Dataset | SAX | E-SAX | O-SAX | D-SAX | DM-SAX |
|---|---|---|---|---|---|
| Adiac | 43.10 |
| 50.01 | 48.34 | 48.97 |
| CricketX | 55.78 | 58.68 | 60.93 |
| 60.23 |
| CricketY | 68.84 | 71.62 | 63.95 |
| 72.54 |
| CricketZ | 51.33 |
| 51.57 | 52.42 | 52.86 |
| Crop | 99.55 | 99.42 |
| 99.64 | 99.64 |
| DistalPhalanxOutlineAgeGroup | 89.31 |
| 83.92 | 82.48 | 81.54 |
| DistalPhalanxTW | 50.73 | 54.51 | 57.16 | 56.74 |
|
| ECG5000 | 65.72 | 58.09 | 60.87 | 66.54 |
|
| ElectricDevices | 80.55 | 78.34 | 80.56 |
| 84.02 |
| EOGHorizontalSignal | 72.34 | 76.00 |
| 73.75 | 74.37 |
| EOGVerticalSignal | 69.57 | 70.84 |
| 69.94 | 69.14 |
| FaceAll | 94.29 | 92.89 | 89.28 | 96.09 |
|
| FacesUCR | 61.33 | 55.96 | 59.05 |
| 63.88 |
| FiftyWords | 60.54 |
| 58.70 | 60.86 | 61.54 |
| Fungi |
| 86.14 | 93.89 | 97.77 | 97.89 |
| InsectWingbeatSound | 76.53 | 62.60 | 71.30 | 78.31 |
|
| MedicalImages | 77.95 | 87.85 | 90.21 | 95.24 |
|
| MiddlePhalanxTW | 63.70 | 69.12 | 68.31 | 66.15 |
|
| NonInvasiveFetalECGThorax1 | 85.80 | 87.12 | 67.86 |
| 92.31 |
| NonInvasiveFetalECGThorax2 | 82.04 | 81.11 | 67.12 | 87.23 |
|
| OSULeaf | 57.59 | 47.28 | 56.51 | 57.78 |
|
| Phoneme | 36.33 | 69.76 |
| 53.56 | 53.52 |
| PigAirwayPressure | 59.11 | 81.37 |
| 64.83 | 66.59 |
| PigArtPressure | 60.70 | 51.64 | 39.82 | 76.63 |
|
| PigCVP | 73.18 | 47.11 | 60.54 |
| 84.24 |
| ProximalPhalanxTW | 56.00 | 72.25 | 55.87 | 71.74 |
|
| ShapesAll | 82.38 | 74.49 |
| 85.76 | 84.61 |
| SwedishLeaf | 64.70 |
| 59.80 | 65.72 | 63.80 |
| WordSynonyms | 51.43 |
| 57.09 | 54.18 | 53.32 |
| Mean AUC (%) | 68.57 | 69.94 | 69.06 | 73.26 |
|
| Mean Rank | 3.86 | 3.21 | 3.24 | 2.41 |
|
Post-hoc test (Wilcoxon) results (p-value).
| SAX | E-SAX | O-SAX | D-SAX | DM-SAX | |
|---|---|---|---|---|---|
| SAX | - | 0.9573 | 0.6517 | 0.0135 | 0.0022 |
| E-SAX | - | 0.9222 | 0.0139 | 0.0032 | |
| O-SAX | 0.0251 | 0.0043 | |||
| D-SAX | - | 0.2692 | |||
| DM-SAX | - |
Figure 4Ratio of each algorithm included in the top-n rank on 29 UCR datasets.
Description of features.
| Features | Description |
|---|---|
| B/S | Bad limit overall/Specification delta conductor |
| RCFA | Results measured from crimp force analyzer |
| MPP | Maximum press power |
Descriptive statistics.
| Features | Min | Median | Mean | Max |
|---|---|---|---|---|
| B/S | −2052.0 | 1.0 | −1.1 | 1674.0 |
| RCFA | 1.0 | 14.0 | 17.4 | 2052.0 |
| MPP | 99.0 | 3457.0 | 3774.8 | 8758.0 |
Similarities and differences between experiments of UCR and real-world datasets.
| Elements | UCR | Real-World | |
|---|---|---|---|
| Similarities | Competing methods | SAX, E-SAX, O-SAX, D-SAX, and DM-SAX | |
| Performance measure | AUC | ||
| Base classifier | Random forest (20 iterations) | ||
|
| 4, 6, 8, and 10 | ||
|
| 0.9 | ||
|
| 0.01 | ||
| Differences |
| 3, 5 | 25, 50, 75, 100, and 150 |
| Training/Test set ratio | Originally split | 0.7/0.3 | |
Performance benchmarks (real-world dataset).
|
|
| SAX | E-SAX | O-SAX | D-SAX | DM-SAX |
|---|---|---|---|---|---|---|
| 25 | 4 | 85.16 | 89.45 | 87.74 |
| 99.38 |
| 6 | 85.00 | 94.18 | 93.58 | 99.73 |
| |
| 8 | 80.30 | 93.95 | 93.69 |
|
| |
| 10 | 83.23 | 93.68 | 92.28 |
| 99.33 | |
| 50 | 4 | 85.90 | 81.48 | 86.89 |
|
|
| 6 | 82.32 | 89.86 | 89.51 |
|
| |
| 8 | 84.67 | 93.17 | 87.87 |
| 99.47 | |
| 10 | 84.72 | 93.39 | 90.14 |
|
| |
| 75 | 4 | 79.29 | 77.62 | 85.28 |
|
|
| 6 | 76.83 | 88.39 | 86.97 |
|
| |
| 8 | 78.56 | 91.91 | 86.79 |
|
| |
| 10 | 76.23 | 92.53 | 86.79 | 98.95 |
| |
| 100 | 4 | 84.58 | 76.34 | 85.81 | 98.39 |
|
| 6 | 81.79 | 89.40 | 86.52 | 98.55 |
| |
| 8 | 78.48 | 92.79 | 87.77 | 97.79 |
| |
| 10 | 76.17 | 93.57 | 86.44 | 98.44 |
| |
| 150 | 4 | 77.27 | 73.90 | 82.60 | 97.75 |
|
| 6 | 78.45 | 86.71 | 82.03 | 98.06 |
| |
| 8 | 82.87 | 91.48 | 80.41 | 98.62 |
| |
| 10 | 79.05 | 91.37 | 78.79 | 97.45 |
| |
| Mean AUC (%) | 81.04 | 88.76 | 86.90 | 98.81 |
| |
| Mean Rank | 4.70 | 3.40 | 3.90 | 1.50 |
|