| Literature DB >> 35390590 |
Pier Francesco Caruso1, Giovanni Angelotti2, Massimiliano Greco3, Giorgio Guzzetta4, Danilo Cereda5, Stefano Merler4, Maurizio Cecconi1.
Abstract
INTRODUCTION: SARS-CoV-2 was declared a pandemic by the WHO on March 11th, 2020. Public protective measures were enforced in every country to limit the diffusion of SARS-CoV-2. Its transmission, mainly by droplets, has been measured by the effective reproduction number (Rt) that counts the number of secondary cases caused in a population by an average infectious individual at time t. Current strategies to calculate Rt reflect the number of secondary cases after several days, due to a delay from symptoms onset to reporting. We propose a complementary Rt estimation using supervised machine learning techniques to predict short term variations with more timely results.Entities:
Keywords: COVID-19; Data science; Environmental data; Epidemiology; Machine learning; Mobility data; Rt prediction
Year: 2022 PMID: 35390590 PMCID: PMC8970608 DOI: 10.1016/j.ijmedinf.2022.104755
Source DB: PubMed Journal: Int J Med Inform ISSN: 1386-5056 Impact factor: 4.730
Classification of models created for our analysis.
| Outcome as Rt raw | Outcome as Rt differential | |
|---|---|---|
| Access to Rt value of the previous day | CRt | CDRt |
| Denied access to Rt value of the previous day | FRt | FDRt |
Descriptive analysis of data from the 12 provinces of Lombardy regions.
| Full dataset (n) | (2916, 50) |
|---|---|
| Unique Lombardy provinces (n) | 12 |
| Unique days (n) - from February 15th, 2020, to October 14th, 2020 | 243 |
| Mobility features: | 9s |
| provided by Google (n) | 6 |
| provided by Apple (n) | 3 |
| Weather features provided by ARPA* (n) | 16 |
| Pollution features provided by ARPA (n) | 8 |
| Rt estimates from official algorithms (median, (InterQuartile Range)) | 0.99 [0.73–1.32] |
| Rt estimates from official algorithms (mean (Standard Deviation)) | 1.12 (0.59) |
*ARPA, Regional Environmental Protection Agency.
Fig. 1Comparison between Rt values and some mobility features. On the x axis the date is reported, on the y axis Rt is reported as a bold and blue line while some macro mobility features are reported as thinner lines in other colors. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Coefficient of determination (R2) performance of the four models after validation. Legend of colors: green: >0.9, red: < 0.5, white in between red and green. Colors shades are darker for the highest (green) and lowest (red) values.
Fig. 2Residual distribution. These graphs show how accurate the models are comparing the Rt ground truth value to the estimated ones. On the x axis the value of the estimated Rt from official algorithms (considered as ground truth) is reported while on the y axis the difference between that value and the estimated value of our models is represented. Differential based models (CDRt and FDRt) consistently handle all the considered values of Rt while the performance of CRt and FRt decreases inversely to the ground truth value of Rt reported.
Fig. 3Feature importance bar. Every graph represents a different model: on the × axis are reported all the features analyzed, while on the y axis is reported the weight given to that specific feature by every model. Rt calculated the day before (Previous Rt) is the most important feature when available, while the models (FDRt and FRT) spread the features importance much more when previous Rt is not available.
Fig. 4Forecasts of some provinces* made by the models with the highest coefficient of determination (CDRt and FDRt) from 15th August to 14th October 2020 (60 days). On the × axis the date is reported, while on the y axis the values of Rt and the abbreviation of the name of the province. The first column visualized the forecasts made by CDRt while the second one is for FDRt. The area in blue indicates the forecast made by the model, while the blue line represents the Rt value computed using traditional methods. *BG: ‘Bergamo’, BS: ‘Brescia’, MI: ‘Milan’, MB: ‘Monza and Brianza’, SO: ‘Sondrio’. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)