| Literature DB >> 35897382 |
Lorenzo Gianquintieri1, Maria Antonia Brovelli2,3, Andrea Pagliosa4, Gabriele Dassi4, Piero Maria Brambilla4, Rodolfo Bonora4, Giuseppe Maria Sechi4, Enrico Gianluca Caiani1,5.
Abstract
The pandemic of COVID-19 has posed unprecedented threats to healthcare systems worldwide. Great efforts were spent to fight the emergency, with the widespread use of cutting-edge technologies, especially big data analytics and AI. In this context, the present study proposes a novel combination of geographical filtering and machine learning (ML) for the development and optimization of a COVID-19 early alert system based on Emergency Medical Services (EMS) data, for the anticipated identification of outbreaks with very high granularity, up to single municipalities. The model, implemented for the region of Lombardy, Italy, showed robust performance, with an overall 80% accuracy in identifying the active spread of the disease. The further post-processing of the output was implemented to classify the territory into five risk classes, resulting in effectively anticipating the demand for interventions by EMS. This model shows state-of-art potentiality for future applications in the early detection of the burden of the impact of COVID-19, or other similar epidemics, on the healthcare system.Entities:
Keywords: COVID-19; emergency medical services; geo-AI; geographic information system; health geomatics; machine learning; resources management; spatial filtering
Mesh:
Year: 2022 PMID: 35897382 PMCID: PMC9330211 DOI: 10.3390/ijerph19159012
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Main relevant features of similar studies in the scientific literature.
| Target Variable | Data Source | Max Geographic Granularity | Algorithm Selected | Performance | |
|---|---|---|---|---|---|
| Mollalo et al., 2020 | Cumulative incidence | Socioeconomic, behavioral, environmental, topographic, and demographic factors | County | Multi-Layer Perceptron (MLP) | RMSE = 0.72 |
| Hussein et al., 2022 | Daily infected cases | Official diagnoses | Country | Time-Delay Neural Network (TDNN) | RMSE = 1.15 |
| Alsayed et al., 2020 | Epidemic peak, infected cases | Official diagnoses | Country | Susceptible–Exposed–Infectious–Recovered (SEIR) model, Adaptive Neuro-Fuzzy Inference System (ANFIS) | Normalized RMSE = 0.041 |
| Singh et al., 2020 | Cumulative cases, deaths, recoveries | Official diagnoses | Country | AutoRegressive Integrated Moving Average (ARIMA) | Akaike information criterion value = 20 |
| Hussein et al., 2021 [ | Daily infected cases | Official diagnoses | Country | Linear forecast model + custom mathematical equation | RMSE = 2.15 |
| Lynch et al., 2021 | Cumulative cases | Official diagnoses | County | Moving Average (MA) | MdAE = 0.67 |
| Friedman et al., 2021 | Excess out-of-hospital deaths, respiratory complaints, oxygen saturation level of patients | Emergency Medical Services (EMS) data | City | Comparison against Linear Continuous Fixed Effect | Not applicable |
| COVID-19 APHP-Universities-INRIA-INSERM Group, 2020 | Requirements for ICU beds | EMS data, positivity ratio, emergency department visits, hospital admissions | Region | Correlation curve analysis | R2 = 0.79–0.99 |
| Levy et al., 2021 | Hospitalizations | EMS data | State | AutoRegressive Integrated Moving Average (ARIMA) | AIC |
| Xie et al., 2021 | EMS demand | Hospitalizations | County | Time series regression | R2 = 0.85 |
| Our study | Territorial alert level | EMS data | Municipality | Random Forest (RF) | Accuracy = 80% |
Figure 1Subdivision of the time series, representing the vehicles dispatched by EMS for respiratory and infective issues (normalized by the resident population) in a certain territory, in periods where a label (0 = ‘no diffusion’; 1 = ‘active spreading’) was assigned to each point generated, according to the automated identification of inflection points (change in the shape of the data, see [34] for more details).
Figure 2Post-processing elaboration of a machine learning model probability output, applied to classify the time series of ambulances dispatched and calls received by EMS for respiratory and infective issues (see text for details), to distinguish between a scenario of no epidemic diffusion and a scenario of active spreading by generating five possible classes of alert.
Weights applied for the different metrics (precision, recall, F1 score), computed for the ‘no diffusion’ label, ‘active spreading‘ label, and for the whole dataset, to evaluate the performance of a machine learning algorithm trained to identify these two scenarios, as resulting from a 10-fold cross-validation protocol; the median, first quartile, third quartile, and lower and upper 95% confidence interval values across the distribution among the 10 folds are reported (i.e., the median value of recall for the ‘active spreading’ label was weighted 0.1).
| Weights Assigned to | Precision | Recall | F1 Score | |
|---|---|---|---|---|
| ‘No diffusion’ label | Median | 0.03 | 0.06 | 0.03 |
| 1st quartile | 0.0075 | 0.015 | 0.0075 | |
| 3rd quartile | 0.0075 | 0.015 | 0.0075 | |
| 95% lower C.I. | 0.0075 | 0.015 | 0.0075 | |
| 95% upper C.I. | 0.0075 | 0.015 | 0.0075 | |
| ‘Active spreading’ label | Median | 0.05 | 0.1 | 0.05 |
| 1st quartile | 0.0125 | 0.025 | 0.0125 | |
| 3rd quartile | 0.0125 | 0.025 | 0.0125 | |
| 95% lower C.I. | 0.0125 | 0.025 | 0.0125 | |
| 95% upper C.I. | 0.0125 | 0.025 | 0.0125 | |
| Accuracy | Median | NA | NA | 0.06 |
| 1st quartile | 0.015 | |||
| 3rd quartile | 0.015 | |||
| 95% lower C.I. | 0.015 | |||
| 95% upper C.I. | 0.015 | |||
| Macro Average | Median | 0.015 | 0.03 | 0.015 |
| 1st quartile | 0.0038 | 0.0075 | 0.0038 | |
| 3rd quartile | 0.0038 | 0.0075 | 0.0038 | |
| 95% lower C.I. | 0.0038 | 0.0075 | 0.0038 | |
| 95% upper C.I. | 0.0038 | 0.0075 | 0.0038 | |
| Weighted Average | Median | 0.015 | 0.03 | 0.015 |
| 1st quartile | 0.0038 | 0.0075 | 0.0038 | |
| 3rd quartile | 0.0038 | 0.0075 | 0.0038 | |
| 95% lower C.I. | 0.0038 | 0.0075 | 0.0038 | |
| 95% upper C.I. | 0.0038 | 0.0075 | 0.0038 |
Optimization of a machine learning algorithm for a daily binary classification of territorial districts in the condition of the active spreading of the COVID-19 epidemic (‘1’) or no diffusion of the epidemic (‘0’) on the basis of ambulances dispatched and calls received by the EMS department in Lombardy, Italy, between 1 January 2020 and 13 March 2022. The table reports the final scores of different combinations of machine learning algorithms and attribute combinations (see Appendix A for a detailed list), computed by averaging the different metrics (and their distributions indicators) on the 10-fold cross-validation protocol, according to the defined weights (see Table 2). In bold, the highest results reached for each algorithm are highlighted, while the overall best result is also underlined.
| Machine Learning Algorithm: | Features Numbers (Ref. to | Random Forest | Support Vector Machine | Logistic Regression |
|---|---|---|---|---|
| All | 1–42 | 0.7967 | 0.7809 | 0.7829 |
| Time-Series (TS) | 1–14 | 0.7887 | 0.7805 | 0.7818 |
| All Derived Attributes | 15–42 | 0.799 | 0.7804 | 0.7826 |
| Ambulances Dispatches | 1–7, 15–28 | 0.7965 | 0.7806 | 0.7827 |
| Emergency Calls | 8–14, 29–42 | 0.7939 | 0.7792 | 0.7811 |
| Max-Min + TS | 1–14, 15–19, 29–33 | 0.7934 | 0.7792 | 0.7819 |
| Max-Min | 15–19, 29–33 | 0.7981 | 0.7786 | 0.7798 |
| Statistics + TS | 1–14, 20–22, 34–36 | 0.7975 | 0.7787 | 0.78 |
| Statistics | 20–22, 34–36 | 0.8017 | 0.7791 | 0.7804 |
| Position and Statistics + TS | 1–14, 15–22, 29–36 | 0.8017 | 0.7791 | 0.7805 |
|
| 15–22, 29–36 |
| 0.779 | 0.7806 |
| Lin Regression + TS | 1–14, 23–25, 37–39 | 0.8039 | 0.7789 | 0.7815 |
| Lin Regression | 23–25, 37–39 | 0.8032 | 0.7792 | 0.7815 |
| Exp Regression + TS | 1–14, 26–28, 40–42 | 0.8015 | 0.7597 | 0.7481 |
| Exp Regression | 26–28, 40–42 | 0.7996 | 0.7601 | 0.7482 |
| Lin & Exp Regression | 23–28, 37–42 | 0.7993 | 0.7604 | 0.7483 |
| Position + Lin Reg + TS | 1–19, 23–25, 29–33, 37–39 | 0.7983 | 0.7605 | 0.7484 |
| Position + Lin Reg | 15–19, 23–25, 29–33, 37–39 | 0.7994 | 0.7605 | 0.7484 |
| Position + Exp Reg + TS | 1–19, 26–33, 40–42 | 0.799 | 0.7605 | 0.7484 |
| Position + Exp Reg | 15–19, 26–33, 40–42 | 0.7996 | 0.7605 | 0.7483 |
| Position + Lin & Exp Reg + TS | 1–19, 23–33, 37–42 | 0.799 | 0.7606 | 0.7483 |
| Position + Lin & Exp Reg | 15–19, 23–33, 37–42 | 0.7991 | 0.7608 | 0.7484 |
| Statistics + Lin Reg + TS | 1–14, 20–25, 34–39 | 0.7883 | 0.7799 | 0.7837 |
|
| 20–25, 34–39 | 0.7974 |
|
|
| Statistics + Exp Reg + TS | 1–14, 20–22, 26–28, 34–36, 40–42 | 0.8005 | 0.761 | 0.749 |
| Statistics + Exp Reg | 20–22, 26–28, 34–36, 40–42 | 0.8009 | 0.7611 | 0.7489 |
| Statistics + Lin & Exp Reg + TS | 1–14, 20–28, 34–42 | 0.8002 | 0.7611 | 0.7501 |
| Statistics + Lin & Exp Reg | 20–28, 34–42 | 0.8003 | 0.7611 | 0.7503 |
* LEGEND (see Appendix A for detailed list): TS (time series) = daily calls to the EMS number, daily dispatches of EMS vehicles, relevant to respiratory and infective causes, normalized by the resident population; POSITION = max value, min value, max-min, position of the max and min values in the time window; STATISTICS = mean, median, standard deviation; LIN REG (Linear regression) intercept, slope, and Pearson’s correlation; EXP REG (Exponential regression): base numerical coefficient, exp coefficient, and Pearson’s correlation.
Figure 3ROC curves representing the performance of the selected machine learning algorithm (random forest classifier) for a daily binary classification of territorial districts in the condition of the active spreading of the COVID-19 epidemic (‘1’) or no diffusion of the epidemic (‘0’) on the basis of ambulance dispatches and calls received by EMS department in Lombardy, Italy, between 1 January 2020 and 13 March 2022. The considered threshold is the label probability as provided in the output by the algorithm. The three COVID-19 waves (spring of 2020, autumn of 2020 to spring of 2021, winter of 2021–2022 to spring of 2022) were evaluated separately, training the model with the data from the other two. The area under the curve was computed, along with the sensitivity and specificity of their optimized [51] working points.
Median (25th–75th percentile) of the number of ambulances dispatched for respiratory and infective issues by the EMS department on the territory of Lombardy, Italy, relevant to the 5 classes representing the level of confidence of being in a situation of the active spread of COVID-19, assigned by post-processing from the machine learning algorithm output for each municipality, between 1 January 2020 and 13 March 2022 (see text for details); the last column reports the p-values of pairwise Wilcoxon’s rank-sum tests (after Bonferroni correction), assessing the difference in the distribution across different classes.
| Assigned Class | Ambulances Dispatched/Population in the Following 7 Days: Median [25th–75th Percentile] | |
|---|---|---|
| Class 1 | 1.83 [0.96–2.55] | Class 2: |
| Class 2 | 3.21 [2.57–4.16] | Class 1: |
| Class 3 | 3.73 [2.85–4.69] | Class 1: |
| Class 4 | 3.96 [3.00–5.03] | Class 1: |
| Class 5 | 6.24 [4.63–9.00] | Class 1: |
Figure 4Map representation of the output of a machine learning model representing, for every municipality of Lombardy, Italy, the alert level of confidence, as expressed in five classes, of being in a condition of the active spreading of COVID-19, on the basis of a post-processing of the machine learning output. Four dates are here considered: 15 March 2020 (top left) as the peak of the first wave; 5 November 2020 (top right), as the first peak of the second wave; 15 March 2021 (bottom left), as the second peak of the second wave; and 6 January 2022 (bottom right) as the peak of the third wave.