| Literature DB >> 34177122 |
Augusto Cerqua1, Roberta Di Stefano2, Marco Letta1, Sara Miccoli3.
Abstract
Estimates of the real death toll of the COVID-19 pandemic have proven to be problematic in many countries, Italy being no exception. Mortality estimates at the local level are even more uncertain as they require stringent conditions, such as granularity and accuracy of the data at hand, which are rarely met. The "official" approach adopted by public institutions to estimate the "excess mortality" during the pandemic draws on a comparison between observed all-cause mortality data for 2020 and averages of mortality figures in the past years for the same period. In this paper, we apply the recently developed machine learning control method to build a more realistic counterfactual scenario of mortality in the absence of COVID-19. We demonstrate that supervised machine learning techniques outperform the official method by substantially improving the prediction accuracy of the local mortality in "ordinary" years, especially in small- and medium-sized municipalities. We then apply the best-performing algorithms to derive estimates of local excess mortality for the period between February and September 2020. Such estimates allow us to provide insights about the demographic evolution of the first wave of the pandemic throughout the country. To help improve diagnostic and monitoring efforts, our dataset is freely available to the research community. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00148-021-00857-y.Entities:
Keywords: COVID-19; Coronavirus; Counterfactual building; Italy; Local mortality; Machine learning
Year: 2021 PMID: 34177122 PMCID: PMC8214048 DOI: 10.1007/s00148-021-00857-y
Source DB: PubMed Journal: J Popul Econ ISSN: 0933-1433
Descriptive statistics
| Year | ||||||
|---|---|---|---|---|---|---|
| Variables | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
| Number of deaths from Jan 1 to Feb 20 (per 10,000 inhabitants) | 22.45 | 19.14 | 23.63 | 21.68 | 21.61 | 19.10 |
| Number of deaths in the previous year (per 10,000 inhabitants) | 75.65 | 82.97 | 79.29 | 83.37 | 81.02 | 81.48 |
| Population | 7,685.12 | 7,668.66 | 7,659.04 | 7,645.69 | 7,629.94 | 7,615.37 |
| Population density (inhabitants per square kilometer) | 305.23 | 304.72 | 304.49 | 304.11 | 303.69 | 303.45 |
| Share of those aged 65+ | 23.77% | 24.10% | 24.42% | 24.70% | 25.04% | 25.44% |
| Share of those aged 80+ | 7.72% | 7.84% | 7.97% | 8.09% | 8.28% | 8.49% |
| Share of men | 49.34% | 49.39% | 49.45% | 49.53% | 49.57% | 49.61% |
| Share of men aged 65+ | 10.47% | 10.68% | 10.87% | 11.05% | 11.26% | 11.49% |
| Share of men aged 80+ | 2.79% | 2.86% | 2.93% | 3.00% | 3.11% | 3.22% |
| Number of employees | 2,045.42 | 2,058.19 | 2,108.21 | 2,155.41 | 2,184.38 | 2,184.38 |
| Share of employment in manufacturing | 24.78% | 24.69% | 24.70% | 24.35% | 24.30% | 24.30% |
| PM-10 (μg/m3) | 28.26 | 28.26 | 25.42 | 26.80 | 24.71 | 24.71 |
| Share of municipalities with a hospital | 7.89% | 7.81% | 7.80% | 7.48% | 7.41% | 7.41% |
| Share of municipalities with a hospital in at least a neighboring municipality | 44.69% | 44.39% | 44.40% | 42.68% | 42.49% | 42.49% |
| Number of deaths due to road accidents (per 10,000 inhabitants) | 0.43 | 0.43 | 0.42 | 0.42 | 0.42 | 0.42 |
| Number of deaths from Feb 21 to Mar 31(per 10,000 inhabitants) | 15.37 | 14.08 | 14.09 | 14.93 | 14.89 | 21.05 |
| Number of deaths from Feb 21 to June 30 (per 10,000 inhabitants) | 45.63 | 43.31 | 44.00 | 44.51 | 45.31 | 55.21 |
| Number of deaths from Feb 21 to Sept 30 (per 10,000 inhabitants) | 75.34 | 71.55 | 73.40 | 73.34 | 73.77 | 84.58 |
In case of missing data for 2020, we use the 2019 value. We also control for the degree of urbanization, which is constant across years (270 municipalities are classified as large urban areas, 2,275 are classified as small urban areas, and 5,353 are classified as rural areas)
A comparison of predictive accuracy across the different methods
| Method | MSE, March 31, 2019 | MAE, March 31, 2019 | MSE, June 30, 2019 | MAE, June 30, 2019 | MSE, September 30, 2019 | MAE, September 30, 2019 |
|---|---|---|---|---|---|---|
| Panel A: performance on all municipalities | ||||||
| Intuitive (historical average) | 216.59 | 8.70 | 665.63 | 15.75 | 1,082.81 | 20.32 |
| Intuitive (past year) | 351.95 | 10.37 | 1,104.24 | 19.46 | 1,679.86 | 24.60 |
| OLS | 179.51 | 8.30 | 558.50 | 14.98 | 932.00 | 19.77 |
| LASSO | 8.09 | 14.56 | 18.86 | |||
| Random forest | 179.97 | 8.16 | 557.23 | 14.63 | 914.23 | 18.94 |
| Boosting | 179.01 | 8.20 | 555.13 | 14.82 | 902.96 | 19.15 |
| Panel B: performance by population size | ||||||
| < 2,000 inhabitants (3,457 municipalities) | ||||||
| Intuitive (historical average) | 450.36 | 1,4.31 | 1,379.64 | 26.06 | 2,240.62 | 33.70 |
| Intuitive (past year) | 733.24 | 16.78 | 2,309.04 | 32.26 | 3,488.76 | 40.64 |
| OLS | 370.56 | 13.48 | 1,140.86 | 24.15 | 1,877.86 | 31.15 |
| LASSO | 13.33 | 23.93 | 1,846.62 | 30.68 | ||
| Random forest | 374.36 | 13.52 | 1,152.28 | 24.20 | 1,882.25 | 31.09 |
| Boosting | 370.09 | 13.34 | 1,138.49 | 24.08 | 30.79 | |
| Between 2,000 and 5,000 inhabitants (2,030 municipalities) | ||||||
| Intuitive (historical average) | 56.91 | 5.89 | 180.05 | 10.41 | 297.08 | 13.35 |
| Intuitive (past year) | 91.13 | 7.38 | 275.80 | 12.92 | 446.85 | 16.43 |
| OLS | 48.42 | 5.57 | 159.15 | 9.90 | 285.05 | 13.41 |
| LASSO | 5.34 | 9.53 | 257.80 | 12.61 | ||
| Random forest | 46.36 | 5.35 | 150.95 | 9.57 | 12.49 | |
| Boosting | 47.27 | 5.42 | 155.45 | 9.75 | 264.38 | 12.80 |
| Between 5,000 and 50,000 inhabitants (2,265 municipalities) | ||||||
| Intuitive (historical average) | 16.69 | 3.12 | 53.39 | 5.66 | 88.81 | 7.27 |
| Intuitive (past year) | 26.16 | 3.84 | 78.15 | 6.85 | 130.74 | 8.78 |
| OLS | 16.66 | 3.25 | 61.89 | 6.24 | 124.30 | 8.98 |
| LASSO | 14.73 | 2.99 | 50.86 | 5.50 | 89.91 | 7.37 |
| Random forest | 2.96 | 5.35 | 7.20 | |||
| Boosting | 16.70 | 3.25 | 57.50 | 5.95 | 101.84 | 7.99 |
| ⩾ 50,000 inhabitants (146 municipalities) | ||||||
| Intuitive (historical average) | 2.56 | 1.30 | 9.09 | 2.44 | 2.93 | |
| Intuitive (past year) | 4.14 | 1.59 | 14.17 | 2.96 | 24.87 | 3.85 |
| OLS | 4.85 | 1.78 | 25.81 | 4.08 | 61.42 | 6.31 |
| LASSO | 2.96 | 1.35 | 14.16 | 2.97 | 26.43 | 4.00 |
| Random forest | 1.20 | 2.31 | 16.33 | 3.12 | ||
| Boosting | 4.58 | 1.73 | 19.48 | 3.50 | 37.20 | 4.79 |
Best-performing method in terms of MSE in bold
Fig. 1Percentage of municipal excess deaths detected from February 21, 2020, to September 30, 2020, with respect to the counterfactual scenario estimated via ML techniques. A From February 21, 2020, to March 31, 2020 (note: excess mortality estimates for the north of Italy, 23,603; official number of COVID-19 deaths, 11,011. Gap between these estimates on March 31, 12,592). B From February 21, 2020, to June 30, 2020 (note: excess mortality estimates for the north of Italy, 40,001; official number of COVID-19 deaths, 29,752. Gap between these estimates on June 30, 10,249). C From February 21, 2020, to September 30, 2020 (note: excess mortality estimates for the north of Italy, 39,362; official number of COVID-19 deaths, 30,580. Gap between these estimates on September 30, 8,782)
Fig. 2Percentage of municipal excess deaths detected from February 21, 2019, to June 30, 2019, with respect to predicted deaths estimated via ML techniques. Note: Excess mortality estimates (measurement error) at the country level: 1,365
Share of excess deaths “observed” in 2019 and in 2020 by population size
| Share of municipalities with excess deaths above 50% | Share of municipalities with excess deaths above 100% | Share of municipalities with excess deaths above 300% | ||||
|---|---|---|---|---|---|---|
| 2019 | 2020 | 2019 | 2020 | 2019 | 2020 | |
| Overall | 8.81% | 21.64% | 2.32% | 9.50% | 0.05% | 0.37% |
| Less than 2,000 inhabitants | 17.01% | 26.57% | 5.12% | 11.86% | 0.12% | 0.66% |
| Inhabitants ⩾ 2,000 and < 5,000 | 4.53% | 20.98% | 0.30% | 8.71% | 0.00% | 0.15% |
| Inhabitants ⩾ 5,000 and < 50,000 | 0.71% | 15.69% | 0.00% | 7.05% | 0.00% | 0.13% |
| More than 50,000 inhabitants | 0.00% | 5.44% | 0.00% | 2.04% | 0.00% | 0.00% |
Share of excess deaths “observed” in 2019 and in 2020 by geographic and population size
| Share of municipalities with excess deaths above 50% | Share of municipalities with excess deaths above 100% | Share of municipalities with excess deaths above 300% | ||||
|---|---|---|---|---|---|---|
| 2019 | 2020 | 2019 | 2020 | 2019 | 2020 | |
| North | ||||||
| Overall | 9.21% | 32.48% | 2.83% | 16.12% | 0.09% | 0.66% |
| Less than 2,000 inhabitants | 17.38% | 34.75% | 5.94% | 18.61% | 0.20% | 1.14% |
| Inhabitants ⩾ 2,000 and < 5,000 | 3.98% | 33.22% | 0.35% | 14.97% | 0.00% | 0.27% |
| Inhabitants ⩾ 5,000 and < 50,000 | 0.59% | 28.64% | 0.00% | 13.39% | 0.00% | 0.25% |
| More than 50,000 inhabitants | 0.00% | 14.29% | 0.00% | 6.12% | 0.00% | 0.00% |
| Center-south | ||||||
| Overall | 8.32% | 8.15% | 1.68% | 1.25% | 0.00% | 0.00% |
| Less than 2,000 inhabitants | 16.48% | 15.20% | 3.96% | 2.48% | 0.00% | 0.00% |
| Inhabitants ⩾ 2,000 and < 5,000 | 5.22% | 5.49% | 0.22% | 0.78% | 0.00% | 0.00% |
| Inhabitants ⩾ 5,000 and < 50,000 | 0.83% | 1.49% | 0.00% | 0.09% | 0.00% | 0.00% |
| More than 50,000 inhabitants | 0.00% | 1.02% | 0.00% | 0.00% | 0.00% | 0.00% |
Variable importance ranking of the random forest algorithm (2015–2018 training sample for the period February 21 to September 30)
| Variable | Increase in node purity |
|---|---|
| Share of those aged 80+ | 5,633,789 |
| Share of those aged 65+ | 4,576,871 |
| Share of men aged 80+ | 3,579,953 |
| Share of men aged 65+ | 3,367,116 |
| Population density (inhabitants per square kilometer) | 2,521,136 |
| Population | 2,508,496 |
| Number of employees | 2,385,304 |
| Number of deaths from Jan 1 to Feb 20 (per 10,000 inhabitants) | 1,887,305 |
| Share of men | 1,853,599 |
| PM-10 (μg/m3) | 1,601,275 |
| Share of employment in manufacturing | 1,463,918 |
| Number of deaths in the previous year (per 10,000 inhabitants) | 1,449,038 |
| Degree of urbanization | 384,249.6 |
| Share of municipalities with a hospital in at least a neighboring municipality | 220,769.3 |
| Number of deaths due to road accidents (per 10,000 inhabitants) | 140,360.1 |
| Share of municipalities with a hospital | 23,915.52 |