| Literature DB >> 35337032 |
Serafeim Moustakidis1, Christos Kokkotis2, Dimitrios Tsaopoulos3, Petros Sfikakis4, Sotirios Tsiodras5, Vana Sypsa6, Theoklis E Zaoutis7,8, Dimitrios Paraskevis6,8.
Abstract
Coronavirus disease 2019 (COVID-19) has resulted in approximately 5 million deaths around the world with unprecedented consequences in people's daily routines and in the global economy. Despite vast increases in time and money spent on COVID-19-related research, there is still limited information about the factors at the country level that affected COVID-19 transmission and fatality in EU. The paper focuses on the identification of these risk factors using a machine learning (ML) predictive pipeline and an associated explainability analysis. To achieve this, a hybrid dataset was created employing publicly available sources comprising heterogeneous parameters from the majority of EU countries, e.g., mobility measures, policy responses, vaccinations, and demographics/generic country-level parameters. Data pre-processing and data exploration techniques were initially applied to normalize the available data and decrease the feature dimensionality of the data problem considered. Then, a linear ε-Support Vector Machine (ε-SVM) model was employed to implement the regression task of predicting the number of deaths for each one of the three first pandemic waves (with mean square error of 0.027 for wave 1 and less than 0.02 for waves 2 and 3). Post hoc explainability analysis was finally applied to uncover the rationale behind the decision-making mechanisms of the ML pipeline and thus enhance our understanding with respect to the contribution of the selected country-level parameters to the prediction of COVID-19 deaths in EU.Entities:
Keywords: COVID-19; data mining; explainability; machine learning
Mesh:
Year: 2022 PMID: 35337032 PMCID: PMC8955542 DOI: 10.3390/v14030625
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Description of the feature categories in the employed dataset.
| Category | Description |
|---|---|
| Confirmed cases | Demonstrates the new or total confirmed cases of SARS-CoV-2 (F33 in |
| Confirmed deaths | Describes the COVID-19-related deaths (F34 and F40) |
| Hospital and intensive care units (ICU) | Describes variables which consists of data about the patients in hospital and the patients in intensive care units (F36 and F37) |
| Policy responses | Government Response Stringency Index, which is composite measure based on 9 response indicators (0 to 100, 100 = strictest response) (F7 and F8) |
| Reproduction number | Real-time estimate of the effective reproduction number (R) of COVID-19 (F35) |
| Tests and positivity | Consists of variables which demonstrate information about the total number of tests per 1000, new tests per 1000, and the tests that are positive given as a rolling 7-day average (F38–39) |
| Vaccinations | Information about the vaccination doses and the booster doses that have been administered (F9–17) |
| Mobility | Includes mobility trends for places such as markets, drug stores, public areas, transport hubs, retail, recreation, places of residence, and workplaces (F1–6) |
| Generic | Includes variables that describe demographic data and data that occur from the quality of life (F18–32) |
Government Response Stringency Index is a composite score, which is based on nine response indicators, including workplace closures, school closures, and travel bans.
Figure 1COVID-19 waves for Belgium.
Description of the features extracted per wave per country.
| # | Category | Description | Current Wave | Previous Wave | |
|---|---|---|---|---|---|
| F1 | Mobility | Grocery and pharmacy percent change from baseline | Mean | ✔ | |
| F2 | Parks percent change from baseline | ✔ | |||
| F3 | Residential percent change from baseline | ✔ | |||
| F4 | Retail and recreation percent change from baseline | ✔ | |||
| F5 | Transit stations percent change from baseline | ✔ | |||
| F6 | Workplaces percent change from baseline | ✔ | |||
| F7 | Policy responses | Stringency index | ✔ | ✔ | |
| F8 | Response time | See (1) | ✔ | ✔ | |
| F9 | Vaccinations | Total vaccinations (cumulative) | Last valid | ✔ | |
| F10 | People vaccinated (cumulative) | ✔ | |||
| F11 | People fully vaccinated (cumulative) | ✔ | |||
| F12 | New vaccinations | Mean | ✔ | ||
| F13 | New vaccinations smoothed | ✔ | |||
| F14 | Total vaccinations per hundred (cumulative) | Last valid | ✔ | ✔ | |
| F15 | People vaccinated per hundred (cumulative) | ✔ | ✔ | ||
| F16 | People fully vaccinated per hundred (cumulative) | ✔ | ✔ | ||
| F17 | New vaccinations (smoothed) per million | Mean | ✔ | ||
| F18 | Demographics | Population | Mean | ✔ | |
| F19 | Population density | ✔ | |||
| F20 | Median age | ✔ | |||
| F21 | Aged 65 older | ✔ | |||
| F22 | Aged 70 older | ✔ | |||
| F23 | GDP per capita | ✔ | |||
| F24 | Extreme poverty | ✔ | |||
| F25 | Cardiovasc death rate | ✔ | |||
| F26 | Diabetes prevalence | ✔ | |||
| F27 | Female smokers | ✔ | |||
| F28 | Male smokers | ✔ | |||
| F29 | Handwashing facilities | ✔ | |||
| F30 | Hospital beds per thousand | ✔ | |||
| F31 | Life expectancy | ✔ | |||
| F32 | Human development index | ✔ | |||
| F33 | Cases, deaths, hospitalizations, and positivity | Total cases per million | Last valid | ✔ | |
| F34 | Total deaths per million | ✔ | |||
| F35 | Reproduction number | Mean | ✔ | ||
| F36 | ICU patients per million (cumulative) | Last valid | ✔ | ||
| F37 | Hospitalized patients per million (cumulative) | ✔ | |||
| F38 | Total tests per thousand (cumulative) | ✔ | |||
| F39 | Positive rate given as a rolling 7-day average | Mean | ✔ | ||
| F40 | Total deaths per million in the wave (cumulative) | Last valid | ✔ |
Figure 2Model graph for support vector regression.
Predictive performance achieved by the proposed ML pipeline for each of the three waves.
| Metric | Wave 1 | Wave 2 | Wave 3 |
|---|---|---|---|
| Mean square error 1 | 0.02707 | 0.01829 | 0.01913 |
1 MSE was calculated on the normalized data to set a fair basis of comparison between the waves.
Figure 3Impact of the selected risk factors for wave 1.
Figure 4Actual versus predicted number of deaths in wave 1.
Figure 5Impact of the selected risk factors for wave 2.
Figure 6Actual versus predicted number of deaths in wave 2.
Figure 7Impact of the selected risk factors for wave 3.
Figure 8Actual versus predicted number of deaths in wave 3.