| Literature DB >> 35250179 |
Selene Cerna1, Christophe Guyeux1, David Laiymani1.
Abstract
In some countries such as France, the number of operations assisted by firefighters has shown an almost linear increase over the years, contrary to their resource capacity. For this reason, predicting the number of interventions has become a necessity. Initially, time series models were developed with several types of qualitative and quantitative features, including the alert level of the bulletins, to predict the operational load. We realized that interventions related to human activities are quite predictable. However, the recognition of interventions due to rare events such as storms or floods needs more than quantitative meteorological data to be identified, since there are almost always zero cases. Thus, this work proposes the application of natural language processing techniques, namely long short-term memory, convolutional neural networks, FlauBERT, and CamemBERT to extract features from the texts of weather bulletins in order to recognize periods with peak interventions, where the intense workload of firefighters is caused by rare events. Four categories identified as Emergency Person Rescue, Total Person Rescue, interventions related to Heating, and Storm/Flood were our targets for the multilabel classification models developed. The results showed a remarkable accuracy of 80%, 86%, 92%, and 86% for Emergency Rescue People, Total Rescue People, Heating, and Storm/Flood, respectively.Entities:
Keywords: BERT; Firemen activity prediction; Natural language processing; Regression
Year: 2022 PMID: 35250179 PMCID: PMC8881897 DOI: 10.1007/s00521-022-06996-x
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.102
Statistical description of the interventions per hour
| Type of intervention | Mean | Std. | Min. | Max |
|---|---|---|---|---|
| Emergency rescue of people | 3.56 | 2.46 | 0 | 20 |
| Total rescue of people | 7.25 | 4.53 | 0 | 30 |
| Heating | 0.32 | 0.62 | 0 | 6 |
| Storm and flood | 0.26 | 1.17 | 0 | 82 |
Fig. 1Interventions for the four types considered, early 2012
Fig. 2Example of a possible consequences section (in French)
Example of a multilabeling of vigilance texts
| Text | Emergency | Total | Heating | Storm/ |
|---|---|---|---|---|
| Les températures sont déjà négatives aujourd’hui | 1 | 0 | 1 | 1 |
| mercredi. A 15h les. | ||||
| une vigilance particulière notamment | ||||
| pour les personnes sensibles ou exposées. | ||||
| Période de grand froid; moins intense qu’en 1985; | 1 | 0 | 1 | 1 |
| mais nécessitant toutefois. | ||||
| températures sous abris observées | ||||
| s’échelonnent entre -1 et -4 degrés. |
Fig. 3Number of samples for each category
Defined architectures for LSTM
| Archi. 1 | Archi. 2 | Archi. 3 |
|---|---|---|
| Input(200) | Input(200) | Input(200) |
| Embedding(100) | Embedding(100) | Embedding(200) |
| LSTM(128) | LSTM(1000) | LSTM(128) |
| Dense(256, ReLU) | Dense(2000, ReLU) | Dense(256, ReLU) |
| Dropout(0.5) | Dropout(0.5) | Dropout(0.2) |
| Dense(4, Sigmoid) | Dense(4, Sigmoid) | LSTM(512) |
| Dense(1024, ReLU) | ||
| Dropout(0.2) | ||
| Dense(4, Sigmoid) |
Defined architectures for CNN
| Archi. 1 | Archi. 2 | Archi. 3 |
|---|---|---|
| Input (200) | Input (200) | Input (200) |
| Embedding (200) | Embedding (200) | Embedding (200) |
| Conv1D (128, 3, ReLU) | Conv1D (16, 3, ReLU) | Conv1D (256, 3, ReLU) |
| MaxPooling1D (2) | Dropout (0.2) | Dropout (0.2) |
| Flatten () | MaxPooling1D (2) | MaxPooling1D (4) |
| Dense (4, Sigmoid) | Conv1D (32, 3, ReLU) | Conv1D (300, 4, ReLU) |
| Dropout (0.5) | Dropout (0.2) | |
| MaxPooling1D (2) | MaxPooling1D (4) | |
| Conv1D (64, 3, ReLU) | Conv1D (360, 4, ReLU) | |
| MaxPooling1D (2) | Dropout (0.5) | |
| Flatten () | MaxPooling1D (4) | |
| Dense (4, Sigmoid) | Flatten | |
| Dense (400, ReLU) | ||
| Dropout (0.2) | ||
| Dense (4, Sigmoid) |
Fig. 4Total person rescue interventions and its mean per hour series, early 2012
Emergency person rescue prediction scores
| Method | MAE | RMSE |
|---|---|---|
| Mean | 1.9856 | 2.4554 |
| Persistence | 1.3504 | 1.8292 |
| Mean/hour | 1.6485 | 2.1160 |
Total person rescue prediction scores
| Method | MAE | RMSE |
|---|---|---|
| Mean | 3.6939 | 4.5267 |
| Persistence | 2.0571 | 2.7298 |
| Mean/hour | 2.5843 | 3.3901 |
Heating prediction scores
| Method | MAE | RMSE |
|---|---|---|
| Mean | 0.4829 | 0.6240 |
| Persistence | 0.1737 | 0.4490 |
| Mean/hour | 0.4641 | 0.6127 |
Storm and flood prediction scores
| Method | MAE | RMSE |
|---|---|---|
| Mean | 0.4340 | 1.1742 |
| Persistence | 0.1749 | 0.5901 |
| Mean/hour | 0.4225 | 1.1699 |
Fig. 5Auto-correlation graph for total person rescue
Prediction scores using XGBoost, emergency person rescue case
| Features | MAE | RMSE |
|---|---|---|
| Calendar | 1.523 | 1.961 |
| Weather | 1.825 | 2.307 |
| Vigilance | 2.021 | 2.493 |
| Weather + calendar | 1.586 | 2.049 |
| Calendar + vigilance | 1.400 | 1.881 |
| Weather + vigilance | 2.186 | 2.809 |
| All | 1.940 | 2.632 |
Prediction scores using XGBoost, total person rescue case
| Features | MAE | RMSE |
|---|---|---|
| Calendar | 2.223 | 2.917 |
| Weather | 3.281 | 4.138 |
| Vigilance | 4.117 | 5.138 |
| Weather + calendar | 2.375 | 3.118 |
| Calendar + vigilance | 2.279 | 2.998 |
| Weather + vigilance | 4.293 | 5.376 |
| All | 3.385 | 4.445 |
Prediction scores using XGBoost, heating case
| Features | MAE | RMSE |
|---|---|---|
| Calendar | 0.413 | 0.558 |
| Weather | 0.502 | 0.663 |
| Vigilance | 0.510 | 0.659 |
| Weather + calendar | 0.475 | 0.636 |
| Calendar + vigilance | 0.319 | 0.495 |
| Weather + vigilance | 0.565 | 0.825 |
| All | 0.533 | 0.804 |
Prediction scores using XGBoost, storm/flood case
| Features | MAE | RMSE |
|---|---|---|
| Calendar | 0.325 | 0.734 |
| Weather | 0.386 | 0.867 |
| Vigilance | 1.729 | 4.959 |
| Weather + calendar | 0.370 | 0.863 |
| Calendar + vigilance | 0.793 | 2.453 |
| Weather + vigilance | 1.212 | 2.565 |
| All | 1.161 | 2.978 |
Fig. 6Illustration example of the tabular data for multilabel classification, considering calendar variables
Prediction results of the multilabel models based on XGBoost and Random Forest techniques
| Technique | Input | F1-score | Accuracy | Balanced accuracy | Precision | Recall |
|---|---|---|---|---|---|---|
| XGBoost | Calendar | 0.80 | 0.48 | 0.82 | 0.80 | 0.80 |
| Weather | 0.63 | 0.21 | 0.64 | 0.65 | 0.60 | |
| Vigilance | 0.62 | 0.23 | 0.62 | 0.61 | 0.62 | |
| Weather + calendar | 0.71 | 0.30 | 0.71 | 0.73 | 0.69 | |
| Calendar + vigilance | 0.81 | 0.48 | 0.82 | 0.81 | 0.80 | |
| Weather + vigilance | 0.64 | 0.24 | 0.65 | 0.65 | 0.64 | |
| All | 0.71 | 0.35 | 0.71 | 0.74 | 0.68 | |
| Random Forest | Calendar | 0.81 | 0.51 | 0.83 | 0.82 | 0.81 |
| Weather | 0.61 | 0.24 | 0.61 | 0.63 | 0.59 | |
| Vigilance | 0.64 | 0.18 | 0.58 | 0.54 | 0.77 | |
| Weather + calendar | 0.73 | 0.35 | 0.72 | 0.74 | 0.71 | |
| Calendar + vigilance | 0.81 | 0.50 | 0.83 | 0.81 | 0.81 | |
| Weather + vigilance | 0.62 | 0.24 | 0.61 | 0.63 | 0.61 | |
| All | 0.72 | 0.33 | 0.72 | 0.74 | 0.71 |
Hyperparameters search space and the best configuration for XGBoost and Random Forest multilabel models
| Method | Search Space | Best configuration | ||||||
|---|---|---|---|---|---|---|---|---|
| Calendar | Weather | Vigilance | Weather | Calendar | Weather | All | ||
| XGBoost | n_estimators: [50–200] | 195 | 159 | 193 | 132 | 132 | 86 | 112 |
| learning_rate: [0.001–0.8] | 0.49 | 0.14 | 0.39 | 0.17 | 0.40 | 0.60 | 0.69 | |
| max_depth: [1–100] | 100 | 6 | 2 | 3 | 20 | 1 | 4 | |
| colsample_bytree: [0.2–1] | 0.99 | 0.5 | 0.56 | 0.42 | 0.87 | 0.22 | 0.46 | |
| objective: multi:softmax | multi:softmax | |||||||
| eval_metric: mlogloss | mlogloss | |||||||
| Random Forest | n_estimators: [50–500] | 52 | 328 | 61 | 152 | 411 | 337 | 87 |
| max_features: [0.2–1] | 0.38 | 0.74 | 0.88 | 0.46 | 0.53 | 0.94 | 0.22 | |
| max_depth: [1–10] | 100 | 1 | 10 | 5 | 20 | 1 | 5 | |
class_weight: [balanced, balanced_subsample] | Balanced_subsample | Balanced | Balanced | Balanced | Balanced | Balanced | Balanced | |
Prediction results of the multilabel models based on NLP techniques
| Technique | Input | F1-score | Accuracy | Balanced accuracy | Precision | Recall |
|---|---|---|---|---|---|---|
| CNN | Bulletin text | 0.84 | 0.56 | 0.78 | 0.83 | 0.85 |
| LSTM | 0.85 | 0.56 | 0.80 | 0.84 | 0.86 | |
| FlauBERT | 0.87 | 0.59 | 0.82 | 0.86 | 0.89 | |
| CamemBERT | 0.89 | 0.65 | 0.84 | 0.87 | 0.90 |
Hyperparameters search space and the best configuration for NLP multilabel models
| Method | Search space | Best configuration |
|---|---|---|
| CNN | Type of architecture: [1,2,3] | 3 |
| Learning rate: [0.00001–0.01] | 0.009 | |
| Batch size: [40–150] | 95 | |
| Epochs: 500 | 105 | |
| Early stopping: 15 | 15 | |
| Restore best weights: True | True | |
| LSTM | Type of architecture: [1,2,3] | 1 |
| Learning rate: [0.00001–0.009] | 0.0006 | |
| Batch size: [40–150] | 59 | |
| Epochs: 200 | 200 | |
| Early stopping: 20 | 20 | |
| Restore best weights: True | True | |
| FlauBERT | Type of architecture: flaubert-base-cased | Flaubert-base-cased |
| Learning rate: [0.0001, 0.00001] | 0.00001 | |
| Batch size: [16–256] | 128 | |
| Epochs: [10–200] | 150 | |
| Early stopping: 15 | 15 | |
| Restore best weights: True | True | |
| CamemBERT | Type of architecture: camembert-base | Camembert-base |
| Learning rate: [0.0001, 0.00001] | 0.00001 | |
| Batch size: [16–256] | 48 | |
| Epochs: [10–200] | 75 | |
| Early stopping: 15 | 15 | |
| Restore best weights: True | True |
Accuracy results for each type of intervention, considering the models generated with the weather bulletins
| Model | Emergency person rescue | Total person rescue | Heating | Storm/flood |
|---|---|---|---|---|
| CNN | 0.76 | 0.82 | 0.84 | 0.82 |
| LSTM | 0.76 | 0.83 | 0.85 | 0.83 |
| FlauBERT | 0.79 | 0.87 | 0.88 | 0.85 |
| CamemBERT | 0.80 | 0.86 | 0.92 | 0.86 |