| Literature DB >> 36236703 |
Shiting Ding1, Zhiheng Li1, Kai Zhang1,2, Feng Mao1.
Abstract
Sequential pattern mining (SPM) is a major class of data mining topics with a wide range of applications. The continuity and uncertain nature of trajectory data make it distinctively different from typical transactional data, which requires additional data transformation to prepare for SPM. However, little research focuses on comparing the performance of SPM algorithms and their applications in the context of trajectory data. This study selected some representative sequential pattern mining algorithms and evaluated them with various parameters to understand the effect of the involved parameters on their performances. We studied the resultant sequential patterns, runtime, and RAM consumption in the context of the taxi trajectory dataset, the T-drive dataset. It was demonstrated in this work that a method to discretize trajectory data and different SPM algorithms were performed on trajectory databases. The results were visualized on actual Beijing road maps, reflecting traffic congestion conditions. Results demonstrated contiguous constraint-based algorithms could provide a concise representation of output sequences and functions at low min_sup with balanced RAM consumption and execution time. This study can be used as a guide for academics and professionals when determining the most suitable SPM algorithm for applications that involve trajectory data.Entities:
Keywords: data mining; sequential pattern mining; traffic congestion; vehicle trajectory
Mesh:
Year: 2022 PMID: 36236703 PMCID: PMC9571407 DOI: 10.3390/s22197608
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
An example of sequence database.
|
| Sequence |
|---|---|
| 1 | < |
| 2 | < |
| 3 | < |
A snippet of original Microsoft T-drive dataset.
| Taxi Id | Date Time | Longitude | Latitude |
|---|---|---|---|
| 1 | 2008-02-02 15:36:08 | 116.51172 | 39.92123 |
| 1 | 2008-02-02 15:46:08 | 116.51135 | 39.93883 |
| ⋯ | ⋯ | ⋯ | ⋯ |
| 10,357 | 2008-02-08 17:26:51 | 116.72877 | 40.01143 |
Figure 1Procedure of generating discrete trajectory database for sequential pattern mining.
Figure 2Distribution of data after segmentation. (a) Average grids traveled per taxi. (b) Average trips traveled per taxi.
Runtime (s) of Algorithms with different minimum support () categorized by their constraints.
| Constraint | Algorithm |
| ||||||
|---|---|---|---|---|---|---|---|---|
| 0.7 | 0.6 | 0.5 | 0.4 | 0.3 | 0.2 | 0.1 | ||
| No | GSP | 1.66 | 1.62 | 8.76 | 67.73 | 581.82 | 2986.78 | Failed |
| No | PrefixSpan | 1.17 | 1.15 | 1.81 | 3.69 | 19.78 | 272.52 | Failed |
| No | CM-SPADE | 5.39 | 5.04 | 5.72 | 7.58 | 18.74 | 52.93 | 436.35 |
| Closed | CM-ClaSP | 5.81 | 5.47 | 6.34 | 8.50 | 22.03 | 143.45 | Failed |
| Closed | CloFAST | 24.45 | 22.84 | 23.10 | 26.16 | 42.85 | 162.80 | Failed |
| Max | MaxSP | 1.97 | 2.10 | 7.26 | 28.11 | 310.01 | Failed | Failed |
| Max | VMSP | 4.73 | 4.75 | 6.43 | 11.67 | 38.38 | 275.12 | Failed |
| Contiguous | VMSP(no gap) | 3.11 | 3.11 | 4.55 | 8.30 | 27.45 | 71.933 | 203.05 |
| Contiguous | CM-SPAM(no gap) | 3.41 | 3.16 | 4.65 | 8.74 | 28.78 | 75.26 | 210.26 |
RAM consumption(MB) of Algorithms with different minimum support () categorized by their constraints.
| Constraint | Algorithm |
| ||||||
|---|---|---|---|---|---|---|---|---|
| 0.7 | 0.6 | 0.5 | 0.4 | 0.3 | 0.2 | 0.1 | ||
| No | GSP | 242.95 | 238.47 | 378.50 | 395.95 | 582.18 | 1197.63 | Failed |
| No | PrefixSpan | 147.98 | 178.37 | 374.3 | 365.03 | 430.45 | 596.19 | Failed |
| No | CM-SPADE | 2476.88 | 2465.66 | 1885.88 | 1522.30 | 1286.48 | 2217.165 | 1740.41 |
| Closed | CM-ClaSP | 1591.04 | 1611.44 | 1665.99 | 2158.76 | 2499.13 | 2955.19 | Failed |
| Closed | CloFAST | 2553.58 | 2456.77 | 2939.67 | 2326.90 | 2450.64 | 1880.79 | Failed |
| Max | MaxSP | 416.04 | 459.42 | 970.77 | 946.63 | 1075.76 | Failed | Failed |
| Max | VMSP | 652.35 | 788.27 | 736.09 | 362.55 | 645.86 | 637.58 | Failed |
| Contiguous | VMSP (no gap) | 100.95 | 1196.67 | 219.346 | 303.99 | 284.27 | 417.61 | 630.23 |
| Contiguous | CM-SPAM (no gap) | 112.96 | 112.36 | 205.11 | 284.65 | 533.012 | 756 | 1359.29 |
Number of output patterns of Algorithms with different minimum support () categorized by their constraints.
| Constraint | Algorithm |
| ||||||
|---|---|---|---|---|---|---|---|---|
| 0.7 | 0.6 | 0.5 | 0.4 | 0.3 | 0.2 | 0.1 | ||
| No | GSP | 0 | 3 | 46 | 120 | 305 | 621 | Failed |
| No | PrefixSpan | 0 | 3 | 46 | 122 | 606 | 7967 | Failed |
| No | CM-SPADE | 0 | 3 | 46 | 122 | 606 | 7973 | 384,295 |
| Closed | CM-ClaSP | 0 | 3 | 46 | 122 | 606 | 7973 | Failed |
| Closed | CloFAST | 0 | 3 | 46 | 122 | 610 | 8068 | Failed |
| Max | MaxSP | 0 | 3 | 45 | 99 | 380 | Failed | Failed |
| Max | VMSP | 0 | 3 | 46 | 120 | 554 | 7627 | Failed |
| Contiguous | VMSP (no gap) | 0 | 3 | 46 | 120 | 305 | 620 | 1255 |
| Contiguous | CM-SPAM (no gap) | 0 | 3 | 46 | 120 | 305 | 633 | 1461 |
Figure 3Comparison of algorithm efficiency through time(s) over pattern outputs at different .
Figure 4Output of SPM ( = 0.5) against Beijing road map. (a) Without constraint (b) With constraint. Color changes from yellow to red as support increases. Higher support suggests more vehicles trajectories overlapping at the particular road segment.
Figure 5Output of SPM ( = 0.2) against Beijing road map. (a) Without constraint (b) With constraint. Color changes from yellow to red as support increases. Higher support suggests more vehicles trajectories overlapping at the particular road segment.
Figure 6An example of sequential pattern mined in the blue boxed region (Figure 5b) with actual trajectories(colored lines) against the roads (grey lines).