| Literature DB >> 34054146 |
Andreas Anastasiou1, Piotr Fryzlewicz2.
Abstract
We introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID's accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal parameter choice. In the scenarios tested, ID is at least as accurate as the state-of-the-art methods; most of the times it outperforms them. ID is implemented in the R packages IDetect and breakfast, available from CRAN. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1007/s00184-021-00821-6.Entities:
Keywords: SDLL; Schwarz information criterion; Segmentation; Symmetric interval expansion; Threshold criterion
Year: 2021 PMID: 34054146 PMCID: PMC8142888 DOI: 10.1007/s00184-021-00821-6
Source DB: PubMed Journal: Metrika ISSN: 0026-1335 Impact factor: 1.057
Distribution of over 100 simulated data sequences from (S1)
| Signal | Method | MSE | |||||
|---|---|---|---|---|---|---|---|
| [5, 15) | |||||||
| (S1) | 0 | 0 | 0 | 0 | |||
| NOT | 5 | 86 | 9 | 0 | 0 | ||
| MARS | 100 | 0 | 0 | 0 | 0 | ||
| (S2) | 0 | 1 | 2 | 0 | |||
| NOT | 100 | 0 | 0 | 0 | 0 | ||
| PELT | 78 | 22 | 0 | 0 | 0 | ||
| WBS | 27 | 71 | 2 | 0 | 0 | ||
The average MSE is also given
Fig. 1Results (up to ) on estimated signals obtained by different change-point detection methods. Top row: the true signal (S1) and the data sequence, and the estimated signal using ID. Bottom row: The estimated signals from NOT, and MARS
Fig. 2Results (up to ) on estimated signals obtained by different change-point detection methods. Top row: the true signal (S2), the data sequence, and the estimated signal using ID. Bottom row: The estimated signals from WBS, NOT, and PELT
Fig. 3An example with two change-points; and . The dashed line is the interval in which the detection took place in each phase
The competing methods used in the simulation study
| Type of signal | Method notation | Reference | R package |
|---|---|---|---|
| Piecewise-constant | PELT |
Killick et al. ( | changepoint |
| NP.PELT |
Haynes et al. ( | changepoint.np | |
| S3IB |
Rigaill ( | Segmentor3IsBack | |
| CumSeg |
Muggeo and Adelfio ( | cumSeg | |
| CPM |
Ross ( | cpm | |
| WBS |
Fryzlewicz ( | wbs | |
| WBS2 |
Fryzlewicz ( | breakfast | |
| NOT |
Baranowski et al. ( | not | |
| FDR |
Li et al. ( | FDRSeg | |
| TGUH |
Fryzlewicz ( | breakfast | |
| Continuous piecewise-linear | NOT |
Baranowski et al. ( | not |
| TF |
Kim et al. ( | – | |
| CPOP |
Maidstone et al. ( | – | |
| MARS |
Friedman ( | earth | |
| FKS |
Spiriti et al. ( | freeknotsplines |
Fig. 4Example of a signal of length 1000 with change-points at 490 and 510 offsetting each other
Distribution of over 100 simulated data sequences of the piecewise-constant signals (M1)–(M4)
| Method | Model | MSE | Time (ms) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | |||||||||
| PELT | 6 | 32 | 50 | 12 | 0 | 0 | 0 | 3.23 | 0.14 | 3 | |
| NP.PELT | 0 | 2 | 27 | 49 | 15 | 5 | 2 | 2.82 | 0.10 | 211.8 | |
| S3IB | 0 | 7 | 38 | 54 | 1 | 0 | 0 | 2.49 | 0.08 | 343.2 | |
| CumSeg | 39 | 21 | 38 | 2 | 0 | 0 | 0 | 6.37 | 0.20 | 62.3 | |
| CPM. | 0 | 0 | 0 | 3 | 3 | 4 | 90 | 4.45 | 0.44 | 2.3 | |
| CPM. | 0 | 0 | 8 | 41 | 26 | 19 | 6 | 3.03 | 0.19 | 3.3 | |
| WBSC1 | (M1) | 0 | 0 | 11 | 32 | 27 | 19 | 11 | 2.79 | 0.25 | 99.3 |
| WBSIC | 0 | 3 | 37 | 53 | 7 | 0 | 0 | 2.59 | 0.08 | 99.3 | |
| WBS2 | 0 | 3 | 54 | 31 | 8 | 2 | 2 | 2.64 | 0.09 | 623.3 | |
| NOT | 0 | 3 | 51 | 43 | 3 | 0 | 0 | 2.61 | 0.10 | 80.7 | |
| FDR | 0 | 0 | 33 | 54 | 12 | 1 | 0 | 2.51 | 0.09 | – | |
| TGUH | 0 | 5 | 37 | 49 | 7 | 1 | 1 | 3.30 | 0.08 | 127.4 | |
| 0 | 3 | 30 | 5 | 0 | 0 | 2.66 | 0.08 | 23.9 | |||
| ID.SDLL | 1 | 2 | 59 | 28 | 5 | 3 | 2 | 2.80 | 0.10 | 20 | |
| 0 | 9 | 62 | 28 | 1 | 0 | 0 | 2.75 | 0.09 | 22.3 | ||
| PELT | 85 | 6 | 0 | 9 | 0 | 0 | 0 | 181 | 6.62 | 1.1 | |
| NP.PELT | 84 | 12 | 3 | 1 | 0 | 0 | 0 | 165 | 4.26 | 3.1 | |
| S3IB | 41 | 15 | 1 | 43 | 0 | 0 | 0 | 117 | 3.73 | 15.2 | |
| CumSeg | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 251 | – | 3.9 | |
| CPM. | 78 | 4 | 15 | 3 | 0 | 0 | 0 | 145 | 2.96 | 0.4 | |
| (M2) | 1 | 2 | 7 | 12 | 6 | 0 | 53 | 0.33 | 38.2 | ||
| 7 | 8 | 1 | 13 | 3 | 0 | 64 | 1.00 | 38.2 | |||
| 3 | 3 | 4 | 10 | 4 | 5 | 58 | 0.363 | 30.5 | |||
| 9 | 7 | 4 | 6 | 1 | 0 | 65 | 0.97 | 43.4 | |||
| FDR | 14 | 11 | 11 | 55 | 7 | 2 | 0 | 71 | 0.80 | – | |
| 4 | 18 | 3 | 7 | 0 | 0 | 64 | 0.47 | 22.8 | |||
| 7 | 7 | 1 | 11 | 0 | 0 | 60 | 0.87 | 8.8 | |||
| ID.SDLL | 5 | 5 | 6 | 63 | 8 | 4 | 9 | 62 | 0.43 | 3.7 | |
| 28 | 13 | 9 | 47 | 3 | 0 | 0 | 84 | 0.90 | 5.3 | ||
| 0 | 2 | 7 | 1 | 0 | 0 | 23 | 0.15 | 1.1 | |||
| NP.PELT | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 781 | 1.78 | 4.2 | |
| S3IB | 98 | 1 | 1 | 0 | 0 | 0 | 0 | 213 | 0.91 | 20.2 | |
| CumSeg | 0 | 3 | 16 | 72 | 9 | 0 | 0 | 65 | 0.32 | 5.2 | |
| CPM. | 1 | 6 | 87 | 6 | 0 | 0 | 0 | 51 | 0.85 | 0.2 | |
| WBSC1 | (M3) | 0 | 0 | 0 | 66 | 26 | 7 | 1 | 24 | 0.19 | 37.3 |
| WBSIC | 0 | 0 | 0 | 64 | 27 | 9 | 0 | 24 | 0.18 | 37.3 | |
| 0 | 0 | 1 | 8 | 2 | 2 | 25 | 0.17 | 34.7 | |||
| 0 | 0 | 0 | 7 | 0 | 0 | 21 | 0.13 | 118.3 | |||
| FDR | 0 | 0 | 2 | 77 | 15 | 5 | 1 | 23 | 0.17 | – | |
| 0 | 0 | 1 | 6 | 2 | 0 | 25 | 0.15 | 25.2 | |||
| 0 | 0 | 0 | 8 | 1 | 0 | 22 | 0.13 | 9.8 | |||
| 0 | 0 | 1 | 1 | 0 | 1 | 24 | 0.14 | 6.8 | |||
| 0 | 0 | 2 | 3 | 1 | 0 | 23 | 0.15 | 4.4 | |||
| PELT | – | 53 | 0 | 47 | 0 | 0 | 0 | 14 | 0.54 | 6.7 | |
| NP.PELT | – | 0 | 0 | 21 | 3 | 34 | 42 | 14 | 0.47 | 395.2 | |
| – | 12 | 0 | 1 | 0 | 0 | 7 | 0.12 | 292.1 | |||
| CumSeg | – | 100 | 0 | 0 | 0 | 0 | 0 | 23 | – | 84.6 | |
| CPM. | – | 0 | 0 | 0 | 0 | 6 | 94 | 31 | 0.76 | 14 | |
| CPM. | – | 0 | 0 | 35 | 11 | 22 | 32 | 13 | 0.39 | 20.4 | |
| WBSC1 | (M4) | – | 0 | 0 | 23 | 20 | 17 | 40 | 13 | 0.50 | 120.8 |
| – | 4 | 0 | 1 | 0 | 0 | 5 | 0.04 | 119.2 | |||
| WBS2 | – | 0 | 1 | 83 | 10 | 4 | 2 | 5 | 0.09 | 666.4 | |
| – | 8 | 0 | 0 | 0 | 0 | 6 | 0.08 | 61.8 | |||
| FDR | – | 0 | 19 | 70 | 10 | 1 | 0 | 9 | 0.07 | – | |
| TGUH | – | 0 | 51 | 40 | 7 | 2 | 0 | 23 | 0.28 | 169.2 | |
| – | 7 | 0 | 0 | 0 | 0 | 6 | 0.07 | 42.3 | |||
| ID.SDLL | – | 0 | 0 | 81 | 4 | 10 | 5 | 7 | 0.10 | 28.7 | |
| – | 1 | 0 | 1 | 0 | 0 | 5 | 0.05 | 66.4 | |||
The average MSE, and computational time are also given
Fig. 5Examples of data series, used in simulations. The true signal, , is in red
Distribution of over 100 simulated data sequences from the piecewise-constant signal (M5)
| Method | MSE | Time (s) | ||||||
|---|---|---|---|---|---|---|---|---|
| PELT | 100 | 0 | 0 | 0 | 0 | 1.97 | 114.92 | 0.033 |
| NP.PELT | 100 | 0 | 0 | 0 | 0 | 2.25 | 551.89 | 8.976 |
| S3IB | 99 | 1 | 0 | 0 | 0 | 2.23 | 1979.95 | 332.841 |
| CumSeg | 100 | 0 | 0 | 0 | 0 | 2.25 | 1999 | 0.551 |
| CPM. | 0 | 45 | 54 | 1 | 0 | 0.19 | 9.00 | 0.002 |
| CPM. | 100 | 0 | 0 | 0 | 0 | 2.23 | 1999 | 1.245 |
| WBSC1 | 100 | 0 | 0 | 0 | 0 | 1.51 | 35.26 | 12.272 |
| WBSIC | 100 | 0 | 0 | 0 | 0 | 2.25 | 1999 | 12.272 |
| WBS2 | 0 | 0 | 0 | 0 | 0.14 | 0.54 | 5.796 | |
| NOT | 100 | 0 | 0 | 0 | 0 | 2.25 | 1999 | 0.484 |
| FDR | 0 | 0 | 0 | 5 | 95 | 0.14 | 0.51 | – |
| 0 | 0 | 0 | 0 | 0.16 | 0.84 | 0.794 | ||
| 0 | 0 | 0 | 0 | 0.14 | 0.99 | 0.785 | ||
| 0 | 0 | 0 | 0 | 0.14 | 0.71 | 120.601 | ||
| 0 | 82 | 18 | 0 | 0 | 0.22 | 2.48 | 1.363 | |
The average MSE, and computational time are also given
Distribution of over 100 simulated data sequences from (NC)
| Method | MSE | Time (s) | ||||
|---|---|---|---|---|---|---|
| 0 | 1 | 2 | ||||
| 0 | 0 | 0 | 39 | 0.004 | ||
| NP.PELT | 8 | 1 | 23 | 68 | 999 | 1.077 |
| 0 | 0 | 0 | 39 | 0.715 | ||
| 0 | 0 | 0 | 39 | 0.115 | ||
| CPM. | 0 | 0 | 0 | 100 | 2957 | 0.011 |
| CPM. | 28 | 6 | 39 | 27 | 628 | 0.031 |
| WBSC1 | 15 | 18 | 20 | 47 | 653 | 0.149 |
| 1 | 0 | 0 | 44 | 0.149 | ||
| WBS2 | 89 | 5 | 4 | 2 | 82 | 0.958 |
| 1 | 0 | 0 | 44 | 0.089 | ||
| 4 | 0 | 0 | 47 | – | ||
| 0 | 0 | 0 | 39 | 0.217 | ||
| 0 | 0 | 0 | 39 | 0.172 | ||
| 4 | 0 | 6 | 182 | 0.069 | ||
| 0 | 1 | 0 | 41 | 0.259 | ||
Also the average MSE and computational times for each method are given
Distribution of over 100 simulated data sequences from the continuous piecewise-linear signals (W1), (W3), and (W4)
| Method | Model | MSE | Time (s) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | |||||||||
| 0 | 0 | 0 | 1 | 0 | 0 | 0.016 | 0.063 | 0.343 | |||
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0.029 | 0.451 | 1.125 | |
| 0 | 0 | 0 | 1 | 0 | 0 | 0.013 | 0.055 | 23.190 | |||
| MARS | (W1) | 0 | 0 | 2 | 9 | 42 | 39 | 8 | 0.034 | 0.200 | 0.011 |
| FKS | 0 | 0 | 0 | 72 | 22 | 6 | 0 | 0.015 | 0.109 | 270.385 | |
| 0 | 0 | 0 | 9 | 0 | 0 | 0.030 | 0.104 | 0.036 | |||
| 0 | 0 | 0 | 0 | 1 | 1 | 0.033 | 0.098 | 0.030 | |||
| NOT | 0 | 0 | 27 | 0 | 6 | 18 | 49 | 0.035 | 0.571 | 0.163 | |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 606.523 | 0.432 | 0.117 | |
| 0 | 0 | 0 | 6 | 2 | 2 | 0.010 | 0.097 | 0.078 | |||
| MARS | (W3) | 91 | 0 | 7 | 2 | 0 | 0 | 0 | 3.991 | 2.258 | 0.008 |
| 0 | 0 | 0 | 9 | 1 | 0 | 0.010 | 0.097 | 67.582 | |||
| 0 | 0 | 0 | 1 | 0 | 0 | 0.013 | 0.101 | 0.017 | |||
| 0 | 0 | 0 | 4 | 1 | 2 | 0.022 | 0.130 | 0.010 | |||
| NOT | 0 | 1 | 14 | 20 | 16 | 20 | 29 | 0.109 | 0.998 | 0.958 | |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 660.399 | 0.465 | 1.349 | |
| (W4) | 0 | 0 | 0 | 8 | 0 | 0 | 0.015 | 0.084 | 1.627 | ||
| MARS | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 22.058 | 1.609 | 0.019 | |
| 0 | 0 | 0 | 8 | 0 | 0 | 0.038 | 0.123 | 0.045 | |||
| 0 | 0 | 0 | 4 | 1 | 3 | 0.062 | 0.120 | 0.025 | |||
The average MSE, and computational time for each method are also given
Distribution of over 100 simulated data sequences of the continuous piecewise-linear signal (W2)
| Method | MSE | Time (s) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | (1, 60] | ||||||||
| NOT | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 4.731 | 99 | 0.869 |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 212.547 | 0.387 | 0.863 |
| 0 | 0 | 0 | 3 | 0 | 0 | 0.162 | 0.189 | 1.161 | ||
| MARS | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 4.703 | 98.523 | 0.009 |
| 0 | 0 | 0 | 2 | 0 | 0 | 0.201 | 0.242 | 0.589 | ||
| 0 | 0 | 0 | 2 | 0 | 0 | 0.256 | 0.287 | 0.097 | ||
The average MSE, and computational time for each method are also given
ID results for the distribution of for the models (M2)–(M4) and (W1), over 100 simulations where the distribution of the noise is Student-, for
| Model | MSE | Time (ms) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | |||||||||
| 5 | (M2) | 6 | 2 | 2 | 74 | 9 | 5 | 2 | 0.86 | 9.7 | |
| (M3) | 0 | 0 | 0 | 75 | 16 | 5 | 4 | 0.16 | 9.2 | ||
| (W1) | 0 | 0 | 0 | 86 | 12 | 2 | 0 | 0.23 | 32.8 | ||
| 3 | (M2) | 7 | 1 | 2 | 52 | 21 | 8 | 9 | 1.18 | 8.7 | |
| (M3) | 0 | 1 | 0 | 59 | 20 | 13 | 7 | 0.22 | 9.8 | ||
| (W1) | 0 | 0 | 0 | 62 | 28 | 4 | 6 | 0.25 | 22.6 | ||
The average MSE, and computational time are also given
Fig. 6Top row: The time series and the fitted piecewise-constant mean signals obtained by ID and ID.SDLL for both Tower Hamlets and Hackney. Bottom row: NOT (solid) and TGUH (dashed) estimates for Tower Hamlets and Hackney
Fig. 7Top row: The transformed data sequence and the fitted continuous and piecewise-linear mean signals obtained by ID and ID.SDLL for both the daily number of cases and the daily number of deaths. Bottom row: NOT (solid) and CPOP (dashed) estimates for the daily number of cases and the daily number of deaths