| Literature DB >> 34764613 |
Ozan Ozyegen1, Igor Ilic1, Mucahit Cevik1.
Abstract
Being able to interpret a model's predictions is a crucial task in many machine learning applications. Specifically, local interpretability is important in determining why a model makes particular predictions. Despite the recent focus on interpretable Artificial Intelligence (AI), there have been few studies on local interpretability methods for time series forecasting, while existing approaches mainly focus on time series classification tasks. In this study, we propose two novel evaluation metrics for time series forecasting: Area Over the Perturbation Curve for Regression and Ablation Percentage Threshold. These two metrics can measure the local fidelity of local explanation methods. We extend the theoretical foundation to collect experimental results on four popular datasets. Both metrics enable a comprehensive comparison of numerous local explanation methods, and an intuitive approach to interpret model predictions. Lastly, we provide heuristical reasoning for this analysis through an extensive numerical study.Entities:
Keywords: Interpretable AI; Local explanation; Multivariate; Regression; Time series forecasting
Year: 2021 PMID: 34764613 PMCID: PMC8315500 DOI: 10.1007/s10489-021-02662-2
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.019
Fig. 1Local explanation of a random sample from the Rossmann dataset obtained from feature importances, where the positively and negatively contributing features are highlighted in yellow and blue, respectively. This sample shows that the number of customers is the most important feature for the local predictions
Fig. 2Post-hoc interpretability approach. The explanation method feeds modified input to the trained model, and the model predictions are used along with model internals to reverse engineer the process
Dataset specifications
| Electricity | Rossmann | Walmart | Ohio | |
|---|---|---|---|---|
| # time series | 370 | 1115 | 2660 | 6 |
| Domain |
|
|
|
|
| # features | 6 | 10 | 10 | 4 |
| # time covariates | 4 | 3 | 4 | 0 |
| Input window size | 168 | 30 | 30 | 60 |
| Prediction window size | 12 | 12 | 6 | 6 |
| Time Granularity | Hourly | Daily | Weekly | 5 minutes |
The hyperparameter tuning search space
| Model | Search space |
|---|---|
| LSTM/TDNN | |
| GBR | |
The hyperparameter values (LSTM/TDNN: {# of layers, # of units, dropout rate}, GBR: {# of trees, max depth, min samples split})
| Model | Hyperparameters |
|---|---|
| TDNN | |
| LSTM | |
| GBR | |
Comparison of TDNN, LSTM and GBR models on the Electricity, Rossmann, Walmart and Ohio datasets
| Dataset | Model | NRMSE | ND | ||||
|---|---|---|---|---|---|---|---|
| Mean | std | CI | Mean | std | CI | ||
| Electricity | TDNN | 0.0469 | 0.0161 | [0.0465, 0.0473] | 0.0868 | 0.0436 | [0.0857, 0.0878] |
| LSTM | 0.0501 | 0.0147 | [0.0497, 0.0504] | 0.0936 | 0.0429 | [0.0926, 0.0947] | |
| GBR | 0.0426 | 0.0161 | [0.0422, 0.0430] | 0.0752 | 0.0413 | [0.0743, 0.0762] | |
| Rossmann | TDNN | 0.1545 | 0.0062 | [0.1542, 0.1549] | 0.2972 | 0.0078 | [0.2968, 0.2976] |
| LSTM | 0.1361 | 0.0080 | [0.1357, 0.1365] | 0.2500 | 0.0063 | [0.2496, 0.2503] | |
| GBR | 0.1264 | 0.0077 | [0.1260, 0.1269] | 0.2062 | 0.0066 | [0.2059, 0.2066] | |
| Walmart | TDNN | 0.0225 | 0.0061 | [0.0221, 0.0228] | 0.1422 | 0.0174 | [0.1412, 0.1432] |
| LSTM | 0.0192 | 0.0020 | [0.0191, 0.0193] | 0.1393 | 0.0197 | [0.1382, 0.1405] | |
| GBR | 0.0189 | 0.0033 | [0.0187, 0.0191] | 0.1085 | 0.0099 | [0.108, 0.1091] | |
| Ohio | TDNN | 0.0531 | 0.0169 | [0.0528, 0.0534] | 0.0832 | 0.0303 | [0.0828, 0.0837] |
| LSTM | 0.0377 | 0.0133 | [0.0375, 0.0379] | 0.0547 | 0.0224 | [0.0544, 0.0551] | |
| GBR | 0.0367 | 0.0130 | [0.0365, 0.0369] | 0.0526 | 0.0213 | [0.0522, 0.0529] | |
Fig. 3Visualization of models’ predictions on random time series samples. Each background color corresponds to a separate prediction window. Each model generates predictions that can capture the trends for the provided sample
AOPCR scores for the electricity and Rossmann datasets
| TDNN | LSTM | GBR | ||||
|---|---|---|---|---|---|---|
| Model | Positive | Negative | Positive | Negative | Positive | Negative |
| (a) Electricity dataset | ||||||
| Random | 0.00022 | 0.00002 | 0.00004 | 0.00004 | 0.00017 | 0.00068 |
| Omission (Global) | 0.15613 | 0.14029 | 0.06432 | 0.07810 | 0.05870 | 0.06287 |
| Omission (Local) | 0.13386 | 0.12560 | 0.05460 | 0.06593 | 0.04424 | 0.05008 |
| SHAP | ||||||
| (b) Rossmann dataset | ||||||
| Random | 0.00056 | 0.00018 | 0.00043 | 0.00069 | 0.00081 | 0.00080 |
| Omission (Global) | 0.05246 | 0.04907 | 0.06415 | 0.09736 | ||
| Omission (Local) | 0.03819 | 0.03595 | 0.03240 | 0.06572 | 0.05992 | 0.09013 |
| SHAP | 0.03066 | 0.06672 | ||||
| (c) Walmart dataset | ||||||
| Random | 0.00614 | 0.00625 | 0.00283 | 0.00303 | 0.00138 | 0.00168 |
| Omission (Global) | 0.17907 | 0.23159 | 0.28617 | 0.24000 | 0.25960 | |
| Omission (Local) | 0.08063 | 0.07464 | 0.13248 | 0.11156 | 0.06275 | 0.03772 |
| SHAP | 0.23484 | |||||
| (d) Ohio dataset | ||||||
| Random | 0.00609 | 0.00504 | 0.00009 | 0.00052 | 0.00004 | 0.00056 |
| Omission (Global) | 0.07872 | 0.08554 | 0.01110 | 0.01579 | ||
| Omission (Local) | 0.00619 | 0.00739 | 0.02109 | 0.01952 | 0.00591 | 0.01059 |
| SHAP | 0.12235 | 0.07701 | ||||
Higher values show higher local fidelity. Best method in each column is in bold
APT % scores for the Electricity and Rossmann datasets
| TDNN | LSTM | GBR | ||||
|---|---|---|---|---|---|---|
| Model | Positive | Negative | Positive | Negative | Positive | Negative |
| (a) Electricity dataset | ||||||
| Random | 0.192 | 0.241 | 0.604 | 0.557 | 0.708 | 0.568 |
| Omission (Global) | 0.003 | 0.013 | 0.019 | 0.328 | 0.306 | |
| Omission (Local) | 0.003 | 0.050 | 0.026 | 0.411 | 0.368 | |
| SHAP | ||||||
| (b) Rossmann dataset | ||||||
| Random | 0.316 | 0.416 | 0.547 | 0.545 | 0.597 | 0.580 |
| Omission (Global) | 0.158 | 0.286 | 0.231 | 0.348 | ||
| Omission (Local) | 0.191 | 0.327 | 0.219 | 0.230 | 0.260 | 0.376 |
| SHAP | 0.184 | 0.224 | ||||
| (c) Walmart dataset | ||||||
| Random | 0.022 | 0.124 | 0.025 | 0.128 | 0.055 | 0.123 |
| Omission (Global) | 0.014 | 0.015 | 0.033 | 0.075 | ||
| Omission (Local) | 0.005 | 0.029 | 0.060 | 0.087 | 0.150 | |
| SHAP | 0.010 | 0.031 | ||||
| (d) Ohio dataset | ||||||
| Random | 0.365 | 0.731 | 0.716 | 0.699 | 0.977 | 0.941 |
| Omission (Global) | 0.275 | 0.949 | 0.870 | |||
| Omission (Local) | 0.490 | 0.701 | 0.640 | 0.563 | 0.961 | 0.892 |
| SHAP | 0.060 | 0.127 | 0.321 | |||
Lower percentage shows higher local fidelity. Best method in each column is in bold
Performance of the GBR model trained with all available features and only with the top 10 most significant features determined by different feature importance assessment strategies
| Dataset | Model | NRMSE | ND | Time (sec) | ||
|---|---|---|---|---|---|---|
| Mean | std | Mean | std | |||
| Electricity | Full | 0.0426 | 0.0161 | 0.0752 | 0.0414 | 3750 |
| Mutual Information | 0.0494 | 0.0183 | 0.0852 | 0.0464 | 78 | |
| F Statistic | 0.0498 | 0.0184 | 0.0856 | 0.0465 | 83 | |
| Tree Importance | 0.0455 | 0.0166 | 0.0800 | 0.0428 | 79 | |
| SHAP | 0.0449 | 0.0166 | 0.0788 | 0.0423 | 83 | |
| Rossmann | Full | 0.1264 | 0.0077 | 0.2062 | 0.0066 | 131 |
| Mutual Information | 0.1361 | 0.0070 | 0.2276 | 0.0061 | 10 | |
| F Statistic | 0.1317 | 0.0077 | 0.2151 | 0.0045 | 11 | |
| Tree Importance | 0.1402 | 0.0078 | 0.2275 | 0.0055 | 10 | |
| SHAP | 0.1343 | 0.0077 | 0.2196 | 0.0060 | 11 | |
| Walmart | Full | 0.0189 | 0.0033 | 0.1086 | 0.0099 | 152 |
| Mutual Information | 0.0304 | 0.0217 | 0.1268 | 0.0215 | 17 | |
| F statistic | 0.0368 | 0.0321 | 0.1396 | 0.0371 | 17 | |
| Tree importance | 0.0220 | 0.0069 | 0.1196 | 0.0106 | 16 | |
| SHAP | 0.0265 | 0.0143 | 0.1219 | 0.0135 | 17 | |
| Ohio | Full | 0.0367 | 0.0130 | 0.0526 | 0.0213 | 2455 |
| Mutual Information | 0.0370 | 0.0127 | 0.0527 | 0.0213 | 112 | |
| F statistic | 0.0370 | 0.0126 | 0.0527 | 0.0212 | 109 | |
| Tree importance | 0.0371 | 0.0126 | 0.0527 | 0.0211 | 109 | |
| SHAP | 0.0373 | 0.0129 | 0.0529 | 0.0212 | 112 | |
Fig. 4Visualization of predictions for the GBR model trained on all available features and only with the top 10 most significant features determined by the SHAP method
Fig. 5Normalized global feature importances for the Rossmann dataset obtained by different feature selection strategies and SHAP method with the GBR model
Fig. 6Sensitivity of the explanation methods and evaluation metrics with respect to the forecasting model parameters