| Literature DB >> 25140332 |
Seyedjamal Zolhavarieh1, Saeed Aghabozorgi1, Ying Wah Teh1.
Abstract
Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.Entities:
Mesh:
Year: 2014 PMID: 25140332 PMCID: PMC4130317 DOI: 10.1155/2014/312521
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1A sample of time series data.
Figure 3Time series clustering taxonomy.
Figure 2A sample of subsequence time series clustering.
Figure 4The general skeleton of subsequence time series clustering.
The overview of preproof period dimensions.
| Article | Problem | Method | Algorithm | Goal | Extent |
|---|---|---|---|---|---|
| [ | Reducing the size of the grammar and producing structure as a by-product/the input is not a continuous stream | Hierarchical clustering | SEQUITUR | Abstracting subsequences | No |
|
| |||||
| [ | Finding rules relating time series patterns | Pattern discovery | Rule finding algorithms, episode rule, simple rule discovery, | Discovery of interesting, interpretable, and useful rules | No |
|
| |||||
| [ | Determining what distinguishes time series in that set from other time series obtained from the same source | Pattern discovery |
| Identifying shared patterns | No |
|
| |||||
| [ | Supervised and unsupervised learning | Pattern discovery | PERUSE | Finding recurring patterns | [ |
|
| |||||
| [ | Determining activation and repression of specific genes | Clustering | Cluster-buster | Finding clusters of prespecified motifs in DNA sequences | No |
Figure 5The chronology of methods in preproof period.
The overview of interproof period dimensions.
| Article | Problem | Method | Algorithm | Goal | Extent |
|---|---|---|---|---|---|
| [ | Meaningless time series clustering | Hierarchical and partitioning clustering |
| Proving the claim of meaningless results | No |
|
| |||||
| [ | Specifying uninteresting sequences and their effects | Density-based clustering | Kernel-density base algorithm | Detecting meaningful pattern | [ |
|
| |||||
| [ | Sequential time series clustering is meaningless | Partitioning clustering |
| Showing sequential time series clustering is not meaningless | [ |
|
| |||||
| [ | Very high noise levels | Density-based clustering | Continuous random-walk noise model | Noise elimination and high quality measure | [ |
|
| |||||
| [ | Certain constraint in datasets and clusters, meaningless result | Hierarchical and partitioning clustering | Any clustering algorithm | Showing clustering of time series subsequences is meaningless | No |
|
| |||||
| [ | Reliable determination of the produced sequences of cluster centroids | Partitioning clustering |
| Results: the claim of the result of | [ |
|
| |||||
| [ | Sinusoidal time series clustering | Partitioning clustering |
| Explaining sine waves results of subsequence time series clustering | [ |
|
| |||||
| [ | Hidden knowledge in time series | Hierarchical clustering, discovery pattern | Adaptive WaveSim transform | Extracting hidden knowledge in time series data | [ |
|
| |||||
| [ | Cluster representatives are smoothed and generally do not look at all like any part of the original time series, meaningless results | Hierarchical and partitioning clustering | (Transcription factors) TF-clustering algorithm, TF-minicluster algorithm | Producing useful time series clustering | [ |
|
| |||||
| [ | Sequential time series clustering is meaningless | Partitioning clustering |
| Showing sequential time series clustering can indeed be meaningful | [ |
|
| |||||
| [ | Unspecific results from dataset, meaningless | Pattern discovery | RD algorithm | Creating cluster exclusively from subsequences | [ |
|
| |||||
| [ | Time consuming to mind the complete set of frequent subsequences for large sequence databases | Pattern discovery | CONTOUR | Efficiently discovering a set of summarization subsequences | No |
|
| |||||
| [ | Categorizing visitors based on their navigation patterns on a website | Pattern discovery | Repetitive closed gapped subsequence | Constructing feature vector of click stream | [ |
|
| |||||
| [ | The detection of repeated subsequences, time series motifs | Pattern discovery | Online motif discovery | Useful extensions of the algorithm to deal with arbitrary data rates and to discover multidimensional motifs. | [ |
|
| |||||
| [ | Identifying frequently accurate patterns or motifs | Pattern discovery | Sequitur | Discovery of approximate, variable-length motifs in streaming data. | No |
Figure 6The chronology of methods in interproof period.
The summary of postproof period dimensions.
| Article | Problem | Method | Algorithm | Goal | Extent |
|---|---|---|---|---|---|
| [ | The problem of time series clustering from a single stream | Motif discovery | MDL-based clustering | Creating meaningful result | No |
|
| |||||
| [ | The problem of time series clustering from a single stream | All methods |
| Producing correct results | [ |
|
| |||||
| [ | Discovery motif with arbitrary length | Pattern discovery |
| Developing the main idea of best motif | [ |
|
| |||||
| [ | Length of motifs in finding time series motifs | Pattern discovery | Grammar induction algorithm | Developing a motif visualization system based on grammar induction | [ |
|
| |||||
| [ | Meaningless outcomes as outputs based on inputs | Pattern discovery | Selective sequence time series (SSTS) | Achieving meaningful results | [ |
|
| |||||
| [ | Predefined constraints values | Pattern discovery | Motif discovery | Eliminate the problem of predefined constraint values such as width of subsequences, by utilizing motif discovery algorithm | [ |
|
| |||||
| [ | Extracting and classifying shapes from very noisy real world time series | Pattern discovery | Motif discovery, noise test | A new method for shape extraction from time series | [ |
|
| |||||
| [ | The difficulty of scaling a search to large datasets | Pattern discovery | God's algorithm (GOAL), embedded-based search method (EBSM) | Search and mine massive time series for the first time | No |
|
| |||||
| [ | Invalid subsequence time series clustering | Partitioning clustering | Phase shift weighted spherical | Clustering unsynchronized time series | [ |
|
| |||||
| [ | Difficulty of scaling search to large datasets | Pattern discovery | God's algorithm (GOAL) | Search and mine truly massive time series for the first time | No |
Figure 7The chronology of methods in postproof period.
Strengths and weaknesses of preproof period researches.
| Article | Strengths | Weaknesses | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Pruning the research space | Using long sequence | Determining corresponding clusters | Recognizing resemblance | Memory usage | Unique grammar | Undetected rules | Limited TS | Lack of predictability | |
| [ |
|
|
| ||||||
|
| |||||||||
| [ |
|
| |||||||
|
| |||||||||
| [ |
|
| |||||||
|
| |||||||||
| [ |
|
| |||||||
|
| |||||||||
| [ |
| ||||||||
Strengths and weaknesses of interproof period researches.
| Article | Strengths | Weaknesses | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Trying to get meaningful results | Successful clustering | Noise elimination | Effective in large window size | Improved BIRCH algorithm | Negative view | Deterministic dynamical system | Large ratio | Unsuccessful clustering | Limited investigation of behavior | |
| [ |
|
| ||||||||
|
| ||||||||||
| [ |
|
|
| |||||||
|
| ||||||||||
| [ |
|
| ||||||||
|
| ||||||||||
| [ |
|
| ||||||||
|
| ||||||||||
| [ |
|
|
|
| ||||||
|
| ||||||||||
| [ |
|
| ||||||||
|
| ||||||||||
| [ |
|
|
| |||||||
|
| ||||||||||
| [ |
|
| ||||||||
|
| ||||||||||
| [ |
|
|
| |||||||
|
| ||||||||||
| [ |
|
| ||||||||
|
| ||||||||||
| [ |
|
|
|
|
| |||||
|
| ||||||||||
| [ |
|
|
| |||||||
|
| ||||||||||
| [ |
|
|
| |||||||
Strengths and weaknesses of postproof period researches.
| Article | Strengths | Weaknesses | |||||
|---|---|---|---|---|---|---|---|
| Efficiency and successfulness in meaningful results | Parameter-lite clustering | Parameter-free clustering | Find best motif | Complexity | Not clear results | Worse result in large dimensions | |
| [ |
|
|
|
|
| ||
|
| |||||||
| [ |
|
| |||||
|
| |||||||
| [ |
|
|
| ||||
|
| |||||||
| [ |
| ||||||
|
| |||||||
| [ |
|
|
| ||||
|
| |||||||
| [ |
|
|
|
| |||
|
| |||||||
| [ |
|
| |||||