| Literature DB >> 36015930 |
Jiucheng Xu1,2, Qinchen Hou1,2, Kanglin Qu1,2, Yuanhao Sun1,2, Xiangru Meng1,2.
Abstract
The rapid growth of digital information has produced massive amounts of time series data on rich features and most time series data are noisy and contain some outlier samples, which leads to a decline in the clustering effect. To efficiently discover the hidden statistical information about the data, a fast weighted fuzzy C-medoids clustering algorithm based on P-splines (PS-WFCMdd) is proposed for time series datasets in this study. Specifically, the P-spline method is used to fit the functional data related to the original time series data, and the obtained smooth-fitting data is used as the input of the clustering algorithm to enhance the ability to process the data set during the clustering process. Then, we define a new weighted method to further avoid the influence of outlier sample points in the weighted fuzzy C-medoids clustering process, to improve the robustness of our algorithm. We propose using the third version of mueen's algorithm for similarity search (MASS 3) to measure the similarity between time series quickly and accurately, to further improve the clustering efficiency. Our new algorithm is compared with several other time series clustering algorithms, and the performance of the algorithm is evaluated experimentally on different types of time series examples. The experimental results show that our new method can speed up data processing and the comprehensive performance of each clustering evaluation index are relatively good.Entities:
Keywords: P-splines; fuzzy C-medoids; similarity measure; time series; weight fuzzy clustering analysis
Mesh:
Year: 2022 PMID: 36015930 PMCID: PMC9414275 DOI: 10.3390/s22166163
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Comparison of P−spline fitting processing results.
Figure 2Flow chart of PS−WFCMdd.
Descriptions of ten common time series data.
| Dataset | Number of Classes | Time Series Length | Size of Set |
|---|---|---|---|
| Beef | 5 | 470 | 60 |
| BeetleFly | 2 | 512 | 40 |
| Car | 4 | 577 | 120 |
| ECG200 | 2 | 96 | 200 |
| FaceFour | 4 | 350 | 88 |
| Fish | 7 | 463 | 175 |
| GunPoint | 2 | 150 | 200 |
| Herring | 2 | 512 | 128 |
| Lightning2 | 2 | 637 | 121 |
| Meat | 3 | 448 | 120 |
Descriptions of eight large-scale time series data.
| Dataset | Number of Classes | Time Series Length | Size of Set |
|---|---|---|---|
| ChlorineConcentration | 3 | 166 | 3840 |
| ECG5000 | 5 | 140 | 4500 |
| HouseTwenty | 2 | 2000 | 159 |
| Mallat | 8 | 1024 | 2345 |
| StarLightCurves | 3 | 1024 | 8236 |
| Strawberry | 5 | 235 | 983 |
| Rock | 4 | 2844 | 70 |
| Wafer | 2 | 152 | 6174 |
ARI index scores of different clustering algorithms.
| Dataset | FCM | PS-K-Means | K-Shape | K-Means-sD | PS-WFCMdd-sD | K-Medoids-sD | PS-WFCMdd |
|---|---|---|---|---|---|---|---|
| Beef | 0.03229 | 0.06846 | 0.01537 | 0.06846 | 0.05139 | −0.02605 |
|
| BeetleFly | 0.11333 | −0.04442 | 0.11333 | −0.04442 | 0.03537 | 0.16197 |
|
| Car | 0.08753 | 0.14338 | 0.09126 | 0.13345 | 0.00699 | −0.00475 |
|
| ECG200 | 0.00757 | 0.24208 | 0.24919 | 0.28307 | 0.06660 | 0.20403 |
|
| FaceFour | 0.31835 | 0.26386 | 0.26957 | 0.12286 | 0.36096 | 0.37089 |
|
| Fish | 0.21965 | 0.17013 | 0.20009 |
| 0.19032 | 0.17431 | 0.25790 |
| GunPoint | −0.00512 | −0.00512 | −0.00512 | −0.00512 | 0.01594 | 0.00600 |
|
| Herring | −0.01435 | −0.01488 | −0.00852 |
| 0.03936 | 0.00893 | 0.04904 |
| Lightning2 | 0.01194 | 0.03000 | 0.00000 | 0.01194 | 0.03764 | 0.03737 |
|
| Meat | 0.56296 | 0.49599 | 0.56296 | 0.56296 | 0.75734 | 0.58745 |
|
FMI index scores of different clustering algorithms.
| Dataset | FCM | PS-K-Means | K-Shape | K-Means-sD | PS-WFCMdd-sD | K-Medoids-sD | PS-WFCMdd |
|---|---|---|---|---|---|---|---|
| Beef | 0.25318 | 0.26423 | 0.24121 | 0.26423 | 0.40805 | 0.20412 |
|
| BeetleFly | 0.53333 | 0.45305 | 0.53333 | 0.45305 | 0.73536 | 0.71824 |
|
| Car | 0.31788 | 0.37009 | 0.34238 | 0.41842 |
| 0.25424 | 0.43560 |
| ECG200 | 0.54056 | 0.68526 | 0.67556 |
| 0.48946 | 0.62784 | 0.66386 |
| FaceFour | 0.52085 | 0.48106 | 0.47705 | 0.37907 |
| 0.56173 | 0.56499 |
| Fish | 0.36878 | 0.30637 | 0.33106 | 0.41914 | 0.37703 | 0.30304 |
|
| GunPoint | 0.49524 | 0.49524 | 0.49524 | 0.49524 | 0.42098 | 0.49529 |
|
| Herring | 0.49695 | 0.50080 | 0.51599 | 0.58125 |
| 0.51251 | 0.54077 |
| Lightning2 | 0.52741 | 0.60282 |
| 0.52741 | 0.52637 | 0.54414 | 0.56615 |
| Meat | 0.76657 | 0.66358 | 0.76657 | 0.76657 | 0.75638 | 0.72042 |
|
NMI index scores of different clustering algorithms.
| Dataset | FCM | PS-K-Means | K-Shape | K-Means-sD | PS-WFCMdd-sD | K-Medoids-sD | PS-WFCMdd |
|---|---|---|---|---|---|---|---|
| Beef | 0.25365 | 0.31947 | 0.24797 | 0.31947 | 0.25552 | 0.21094 |
|
| BeetleFly | 0.11871 | 0.00733 | 0.11871 | 0.00733 | 0.33175 | 0.35321 |
|
| Car | 0.16604 | 0.26708 | 0.22220 | 0.26604 | 0.11472 | 0.06676 |
|
| ECG200 | 0.00576 | 0.14024 | 0.15085 | 0.17160 | 0.14998 | 0.17360 |
|
| FaceFour | 0.37440 | 0.44330 | 0.34980 | 0.24702 | 0.13612 | 0.50233 |
|
| Fish | 0.38036 | 0.34383 | 0.37170 |
| 0.23739 | 0.27090 | 0.35590 |
| GunPoint | 0.00111 | 0.00111 | 0.00111 | 0.00111 | 0.08702 | 0.04770 |
|
| Herring | 0.00075 | 0.00271 | 0.00128 | 0.18105 | 0.13313 | 0.13073 |
|
| Lightning2 | 0.01523 | 0.01026 | 0.00000 | 0.01523 | 0.08381 | 0.07639 |
|
| Meat | 0.73368 | 0.59092 | 0.73368 | 0.73368 | 0.71512 | 0.64352 |
|
Figure 3ARI of clustering results with different clustering algorithms.
Figure 4FMI of clustering results with different clustering algorithms.
Figure 5NMI of clustering results with different clustering algorithms.
Figure 6Display the trend of the original data and the cluster centers obtained by the PS−WFCMdd.
Figure 7Display the trend of the original data and the cluster centers obtained by the K−means−sD.
Time complexity comparison.
| Algorithm | Time Complexity |
|---|---|
| FCM |
|
| PS-K-means |
|
| K-shape |
|
| K-means-sD |
|
| PS-WFCMdd-sD |
|
| K-Medoids-sD |
|
| PS-WFCMdd |
|