| Literature DB >> 36175583 |
Roberto Cilli1, Mario Elia2, Marina D'Este2, Vincenzo Giannico2, Nicola Amoroso3,4, Angela Lombardi1,5, Ester Pantaleo1,5, Alfonso Monaco1,5, Giovanni Sanesi2, Sabina Tangaro5,6, Roberto Bellotti1,5, Raffaele Lafortezza2,7.
Abstract
The impacts and threats posed by wildfires are dramatically increasing due to climate change. In recent years, the wildfire community has attempted to estimate wildfire occurrence with machine learning models. However, to fully exploit the potential of these models, it is of paramount importance to make their predictions interpretable and intelligible. This study is a first attempt to provide an eXplainable artificial intelligence (XAI) framework for estimating wildfire occurrence using a Random Forest model with Shapley values for interpretation. Our findings accurately detected regions with a high presence of wildfires (area under the curve 81.3%) and outlined the drivers empowering occurrence, such as the Fire Weather Index and Normalized Difference Vegetation Index. Furthermore, our analysis suggests the presence of anomalous hotspots. In contexts where human and natural spheres constantly intermingle and interact, the XAI framework, suitably integrated into decision support systems, could support forest managers to prevent and mitigate future wildfire disasters and develop strategies for effective fire management, response, recovery, and resilience.Entities:
Mesh:
Year: 2022 PMID: 36175583 PMCID: PMC9523070 DOI: 10.1038/s41598-022-20347-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Map of the Italian peninsula (yellow) in the Mediterranean basin illustrating the distribution of wildfire occurrences (red) across the period of investigation. The map was produced on QGIS using the "new layout" feature.
Figure 2Methodological overview. Environmental, biophysical, and human-related variables are used to model wildfire occurrence over the Italian peninsula. The map was produced by using the plot.raster function from "raster" R package.
Figure 3The Italian peninsula is covered by large cells (5000 km2 each) to perform a hold-out spatial CV procedure. Training and validation examples were assigned to different cells so that the model would not be trained on adjacent grid points (except for points along the cell borders). Furthermore, training examples were subsampled to ensure a balanced ratio between the two classes. The map was produced by using the plot.raster function from "raster" R package.
The medians and 95% confidence intervals for all adopted metrics are reported and out-of-bag (OOB) performances of models with and without spatial cross-validation (CV) are compared.
| Metric (%) | OOB Performance | Spatial CV |
|---|---|---|
| 72.9 (72.1, 73.3) | 69.7 (63.6, 75.7) | |
| 65.1 (64.3, 65.7) | 62.0 (53.2, 69.0) | |
| 82.5 (81.8, 83.6) | 78.7 (69.9, 87.8) | |
| 53.7 (52.8, 54.5) | 50.9 (43.0, 59.1) | |
| 68.7 (67.6, 69.2) | 69.2 (48.9, 76.0) | |
| 84.1 (83.8, 84.5) | 81.3 (76.0, 84.8) |
We evaluated our machine learning models in terms of accuracy (Acc), F1 score, sensitivity (Sens), precision (Prec), specificity (Spec) and area under the ROC curve (AUC).
Figure 4Feature importance embedded in the Random Forest model in terms of mean decrease in accuracy.
Figure 5(Left panel) Shapley values for wildfire occurrence prediction. On the y-axis, the features are sorted by decreasing importance while the x-axis shows the SHAP value distribution and denotes whether a variable contributes to reduce or to increase the fire probability. Gradient colors indicate the original value for that variable. Each point represents a row from the original dataset. (Right panel) Fire Weather Index (FWI) Shapley values. For sake of readability, SHAP values exceeding the [− 0.2, 0.2] range were assigned the same colors of extreme points of the chosen interval. The map was produced by using the plot.raster function from "raster" R package.
Figure 6Explainability and classification results of 4 notable cases: Po Valley—True Negative (top left panel); Campidano Plain near Sanluri, South Sardinia—False Negative (top right panel); Parco del Matese, province of Benevento—True Positive (bottom left panel); Tuscan-Emilian Apennines—False Positive (bottom right panel). Fire Weather Index (FWI), Normalized Difference Vegetation Index (NDVI), Digital Terrain Model (DTM).
Figure 7Spatial distribution of wildfire hotspots according to Getis-Ord G* statistics at a 95% confidence interval. Confusion matrix between RF predictions and actual binary labels (top right grid). The green scale for diagonal terms and red scale for off-diagonal terms in the confusion matrix emphasize the impact of the matrix terms on the overall ratio. The map was produced on QGIS using the "new layout" feature. The map was produced by using the plot.raster function from "raster" R package.