| Literature DB >> 35340819 |
Williams Rizzi1,2, Chiara Di Francescomarino1, Chiara Ghidini1, Fabrizio Maria Maggi2.
Abstract
Existing well-investigated Predictive Process Monitoring techniques typically construct a predictive model based on past process executions and then use this model to predict the future of new ongoing cases, without the possibility of updating it with new cases when they complete their execution. This can make Predictive Process Monitoring too rigid to deal with the variability of processes working in real environments that continuously evolve and/or exhibit new variant behaviours over time. As a solution to this problem, we evaluate the use of three different strategies that allow the periodic rediscovery or incremental construction of the predictive model so as to exploit new available data. The evaluation focuses on the performance of the new learned predictive models, in terms of accuracy and time, against the original one, and uses a number of real and synthetic datasets with and without explicit Concept Drift. The results provide an evidence of the potential of incremental learning algorithms for predicting process monitoring in real environments.Entities:
Keywords: Concept Drift; Incremental Learning; Predictive Process Monitoring; Process Mining
Year: 2022 PMID: 35340819 PMCID: PMC8935895 DOI: 10.1007/s10115-022-01666-9
Source DB: PubMed Journal: Knowl Inf Syst ISSN: 0219-3116 Impact factor: 2.531
Fig. 1The general idea
Fig. 2Four strategies to produce
The outcome formulae
| Dataset | Outcome |
|---|---|
| BPIC11 |
|
|
| |
|
| |
| BPIC12 |
|
|
| |
|
| |
| BPIC15 |
|
|
| |
|
| |
| BPIC18 |
|
| DriftRIO1 |
|
| DriftRIO2 |
|
Fig. 3The experimental settings used to build the predictive models
Dataset entropy
| Dataset | Trace Entropy | Global Block Entropy | ||||
|---|---|---|---|---|---|---|
| 0–100% | 0–40%– | 40–80%– | 0–100% | 0–40%– | 40–80%– | |
| 80–100% | 80–100% | 80–100% | 80–100% | |||
| BPIC11 | 9.62 | 9.08 | 8.97 | 24.71 | 23.73 | 24.13 |
| BPIC12 | 11.65 | 11.05 | 11.08 | 16.18 | 15.9 | 16.08 |
| BPIC15 | 10.16 | 9.4 | 9.48 | 18.96 | 18.55 | 18.4 |
| BPIC18 | 12.88 | 12.7 | 11.56 | 20.72 | 20.54 | 19.74 |
| DriftRIO1 | 4.66 | 4.71 | 4.29 | 8.52 | 8.54 | 8.38 |
| DriftRIO2 | 4.27 | 4.29 | 4.1 | 8.41 | 8.41 | 8.37 |
Dataset label distribution
| Dataset | Formula | Total | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0–10% | 0–40% | 0–80% | 80–100% | 0–100% | |||||||
| True | False | True | False | True | False | True | False | True | False | ||
| BPIC11 | 50 | 64 | 156 | 300 | 359 | 663 | 109 | 119 | 458 | 682 | |
| 96 | 18 | 391 | 65 | 742 | 170 | 151 | 77 | 893 | 247 | ||
| 20 | 94 | 79 | 377 | 161 | 639 | 180 | 732 | 259 | 881 | ||
| BPIC12 | 227 | 241 | 904 | 970 | 1761 | 1987 | 482 | 455 | 2243 | 2442 | |
| 163 | 305 | 663 | 1211 | 1366 | 2382 | 274 | 663 | 1640 | 3045 | ||
| 78 | 390 | 307 | 1567 | 621 | 3127 | 181 | 756 | 802 | 3883 | ||
| BPIC15 | 6 | 90 | 7 | 472 | 59 | 900 | 63 | 178 | 122 | 1078 | |
| 33 | 63 | 110 | 369 | 223 | 736 | 110 | 131 | 333 | 867 | ||
| 33 | 63 | 124 | 355 | 215 | 744 | 83 | 158 | 298 | 902 | ||
| BPIC18 | 1994 | 936 | 7787 | 3933 | 16351 | 7090 | 4384 | 1477 | 20735 | 8567 | |
| DriftRIO1 | 25 | 272 | 132 | 1081 | 989 | 1987 | 582 | 436 | 1571 | 2423 | |
| DriftRIO2 | 83 | 117 | 398 | 402 | 834 | 766 | 229 | 171 | 1063 | 937 | |
Model hyperparameters
| Dataset | Formula | 0–10% | 0–40% | 0–80% | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| # est | max_d | max_f | # est | max_d | max_f | # est | max_d | max_f | ||
| BPIC11 | 163 | 4 | log | 537 | 8 | sqrt(n) | 397 | 15 | auto | |
| 273 | 9 | log | 155 | 7 | auto | 991 | 7 | n | ||
| 395 | 28 | n | 865 | 12 | n | 203 | 7 | n | ||
| BPIC12 | 157 | 16 | sqrt(n) | 427 | 14 | sqrt(n) | 206 | 9 | auto | |
| 700 | 6 | auto | 558 | 9 | sqrt(n) | 536 | 6 | auto | ||
| 180 | 7 | log | 163 | 20 | auto | 587 | 27 | log | ||
| BPIC15 | 725 | 15 | log | 377 | 19 | n | 563 | 16 | sqrt(n) | |
| 977 | 13 | auto | 937 | 8 | sqrt(n) | 417 | 14 | sqrt(n) | ||
| 374 | 29 | auto | 374 | 29 | auto | 161 | 13 | sqrt(n) | ||
| BPIC18 | 990 | 8 | n | 650 | 26 | sqrt(n) | 165 | 24 | n | |
| DriftRIO1 | 913 | 4 | log | 971 | 9 | n | 293 | 4 | log | |
| DriftRIO2 | 998 | 23 | n | 151 | 4 | sqrt(n) | 178 | 5 | auto | |
The accuracy results
| Dataset | |||||
|---|---|---|---|---|---|
| (a) Setting 10–70% | |||||
| BPIC11 | 0.673 | 0.885 | 0.833 | ||
| 0.745 | 0.887 | 0.883 | |||
| 0.843 | |||||
| BPIC12 | 0.576 | 0.648 | 0.671 | ||
| 0.672 | 0.676 | ||||
| 0.517 | 0.504 | 0.514 | |||
| BPIC15 | 0.908 | 0.916 | 0.935 | ||
| 0.928 | 0.911 | 0.923 | |||
| 0.961 | 0.976 | ||||
| BPIC18 | 0.532 | ||||
| DriftRIO1 | 0.603 | ||||
| DriftRIO2 | 0.761 | ||||
Accuracy improvement against
| Dataset | ||||
|---|---|---|---|---|
| (a) Setting 10–70% | ||||
| BPIC11 | 0.31 | 0.36 | 0.23 | |
| 0.19 | 0.29 | 0.18 | ||
| 0.08 | 0.09 | 0.09 | ||
| BPIC12 | 0.12 | 0.21 | 0.16 | |
| 0.09 | 0.10 | 0.00 | ||
| −0.02 | 0.00 | 0.08 | ||
| BPIC15 | 0.01 | 0.03 | 0.09 | |
| −0.01 | 0.00 | 0.04 | ||
| 0.01 | 0.02 | 0.03 | ||
| BPIC18 | 0.87 | 0.86 | 0.87 | |
| DriftRIO1 | 0.59 | 0.59 | 0.59 | |
| DriftRIO2 | 0.12 | 0.12 | 0.12 | |
The time results
| Dataset | |||||
|---|---|---|---|---|---|
| (a) Setting 10–70% | |||||
| BPIC11 | 05:37:18 | 10:12:42 | 05:35:53 | ||
| 06:05:01 | 09:23:21 | 06:04:32 | |||
| 30:43:11 | |||||
| BPIC12 | 00:51:00 | 08:02:32 | |||
| 09:03:06 | |||||
| 11:37:39 | |||||
| BPIC15 | 00:27:31 | ||||
| 00:51:01 | |||||
| 00:52:12 | |||||
| BPIC18 | 27:26:00 | 74:44:03 | 26:51:24 | ||
| DriftRIO1 | 00:16:17 | ||||
| DriftRIO2 | 00:14:22 | ||||
| Average Time | 01:44:10 | 12:52:20 | |||
| Std deviation | 02:14:45 | 02:15:53 | 21:06:04 | 02:15:44 | |
Fig. 4Inaccuracy versus time plots (10%)
Fig. 5Inaccuracy versus time plots (40%)
Cost-effectiveness framework instantiation with and experimental setting 40%–40%
| Dataset | |||||
|---|---|---|---|---|---|
| (a) n=1, | |||||
| BPIC11 | 7800.5 | 4601.52 | 4800.5 | ||
| 5001.89 | 3401.89 | 4301.89 | |||
| 4300.52 | 3900.52 | 4100.52 | |||
| BPIC12 | 46600.34 | 36500.35 | 31001.14 | ||
| 14800.37 | 13400.37 | 6601.27 | |||
| 20900.5 | 274000.5 | 20900.5 | |||
| BPIC15 | 5400.05 | 6000.5 | 6200.1 | ||
| 3800.05 | 4100.05 | 3600.14 | |||
| 2900.06 | 3300.06 | 3100.15 | |||
| BPIC18 | 236200.8 | 2600.81 | 20901.08 | ||
| DriftRIO1 | 58200.02 | 11400.05 | |||
| DriftRIO2 | |||||
The accuracy results related to the Perceptron model
| Dataset | |||||
|---|---|---|---|---|---|
| (a) Setting 10–70% | |||||
| BPIC11 | 0.504 | 0.536 | 0.566 | ||
| 0.682 | 0.413 | 0.749 | |||
| 0.676 | 0.863 | ||||
| BPIC12 | 0.633 | 0.655 | 0.68 | ||
| 0.686 | |||||
| 0.483 | 0.5 | ||||
| BPIC15 | 0.675 | 0.725 | 0.732 | ||
| 0.827 | 0.809 | 0.832 | |||
| 0.745 | 0.841 | 0.788 | |||
| BPIC18 | 0.386 | 0.392 | 0.397 | ||
| DriftRIO1 | 0.369 | ||||
| DriftRIO2 | 0.701 | 0.945 | 0.908 | ||