Literature DB >> 36061268

Understanding the COVID-19 pandemic prevalence in Africa through optimal feature selection and clustering: evidence from a statistical perspective.

Mohamed Lamine Sidibé1, Roland Yonaba1, Fowé Tazen1, Héla Karoui1, Ousmane Koanda1, Babacar Lèye1, Harinaivo Anderson Andrianisa1, Harouna Karambiri1.   

Abstract

The COVID-19 pandemic, which outbroke in Wuhan (China) in December 2019, severely hit almost all sectors of activity in the world as a consequence of the restrictive measures imposed. Two years later, Africa still emerges as the least affected continent by the pandemic. This study analyzed COVID-19 prevalence across African countries through country-level variables prior to clustering. Using Spearman-rank correlation, multicollinearity analysis and univariate filtering, 9 country-level variables were identified from an initial set of 34 variables. These variables relate to socioeconomic status, population structure, healthcare system and environment and the climatic setting. A clustering of the 54 African countries is further carried out through the use of agglomerative hierarchical clustering (AHC) method, which generated 3 distinctive clusters. Cluster 1 (11 countries) is the most affected by COVID-19 (median of 63,508.6 confirmed cases and 946.5 deaths per million) and is composed of countries with the highest socioeconomic status. Cluster 2 (27 countries) is the least affected (median of 4473.7 confirmed cases and 81.2 deaths per million), and mainly features countries with the least socioeconomic features and international exposure. Cluster 3 (16 countries) is intermediate in terms of COVID-19 prevalence (median of 2569.3 confirmed cases and 35.7 deaths per million) and features countries the least urbanized and geographically close to the equator, with intermediate international exposure and socioeconomic features. These findings shed light on the main features of COVID-19 prevalence in Africa and might help refine effectively coping management strategies of the ongoing pandemic. Supplementary Information: The online version contains supplementary material available at 10.1007/s10668-022-02646-3.
© The Author(s), under exclusive licence to Springer Nature B.V. 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities:  

Keywords:  Africa; COVID-19; Cluster analysis; Hierarchical clustering; Pandemic; Transmission factors

Year:  2022        PMID: 36061268      PMCID: PMC9424840          DOI: 10.1007/s10668-022-02646-3

Source DB:  PubMed          Journal:  Environ Dev Sustain        ISSN: 1387-585X            Impact factor:   4.080


Introduction

Disease epidemics and even pandemics are nowadays becoming increasingly common occurrences (Madhav et al., 2017). In December 2019, the 2019-nCOV acute respiratory disease (hereafter named ‘COVID-19’ disease) emerged. This disease was later found to be caused by the Sars-CoV-2 coronavirus, isolated for the first time in the province of Hubei (in China). The World Health Organization (WHO) declared the disease a pandemic two months later (Cucinotta & Vanelli, 2020). As of December 31, 2020, a year later, the global epidemiological situation indicated a cumulative total of 83,559,591 confirmed cases and 1,824,934 deaths, i.e., hence a global case fatality rate of 2.18% and a mortality rate of 234 deaths per million people (Dong et al., 2020). A study published by the London School of Hygiene & Tropical Medicine (LSHTM) concluded that all African countries would have passed the 10,000-case mark for COVID-19 by early June 2020 (Pearson et al., 2020). Following this study, the vast majority of public health experts, including the WHO, had called on Africa to ‘prepare for the worst’ (Nuwagira & Muzoora, 2020). Yet, the global observed case count reported for Africa remained largely 10 times lower than expected, suggesting that the African continent, as a whole, has remained largely unaffected: America, Europe and Asia reported, respectively, 43.4%, 28.5% and 24.8% of the global count of confirmed cases, while Africa only reported 3% (Dong et al., 2020). Another interesting point worth raising is that even at the scale of the African continent, the pandemic appears to be unevenly spread between its countries. South Africa, for example, has reported more than 38% of cases (Dong et al., 2020). Likewise, more than 82% of the confirmed cases come from 9 countries alone (Salyer et al., 2021). The relatively low number of cases and deaths due to COVID-19 is thought to be largely attributed to the fact that forecast regarding the evolution of the pandemic in Africa has been made without regard to some specificities such as socio-demographic aspects (Zongo et al., 2020). African countries seem to be more resilient to COVID-19 because of the swift adoption of mitigation measures, the low rate of urbanization, the limited transport network and the youth of the population: in fact, the median age of the population lies between 31 to 42 years old for Europe, America, Oceania and Asia, as compared to 18 years old for Africa (Adams et al., 2021; Desjardins, 2019; Lulbadda et al., 2021). This might explain the low number of COVID-19 cases and deaths in Africa, since the case fatality rate of non-communicable diseases (such as cancer, cardiovascular accidents and diabetes), already known as comorbidities in the context of this pandemic, is unlikely with younger people (Lawal, 2021; Randazzo et al., 2020). The role of climatic and environmental factors has also been highlighted in COVID-19-related studies. Temperature and humidity are the factors most often associated to COVID-19 (Kerr et al., 2021; Şahin, 2020). Wang et al. (2021), for instance, showed that a 1 °C rise in average temperature can be associated with a 3.1% decrease in the new cases of COVID-19 infections and a 1.2% decrease in related deaths. According to Baker et al. (2020), in the absence of effective control measures, stronger outbreaks are likely in wetter climates. Luo et al. (2020) found significant influence of absolute air temperature over transmission rates of COVID-19 in China. A significant correlation between geographical latitude and COVID-19-related deaths and confirmed cases has been reported in earlier studies (Braiman, 2020; Chen et al., 2021; Heneghan et al., 2020; Whittemore, 2020). Moreover, the concentration of fine particles in the air has been associated with a higher prevalence of COVID-19 (Rizvi et al., 2021; Zhu et al., 2020). Based on the current findings, it appears that the spread of the COVID-19 is affected by various factors at different levels, but also that countries exhibit different vulnerabilities to the pandemic. To assess these various sensitivities, clustering has been carried in earlier studies to identify emerging risk profiles. Gilbert et al. (2020) used bottom-up hierarchical clustering to model transmission between Africa and China and identified three different clusters depending on the severity of the risk of exposure to COVID-19 (high, medium and low). Centroid-based method (K-means) clustering was used by Carrillo-Larco and Castillo-Cara (2020) at the global level using country-level variables, which helped identify 5 to 6 clusters of countries. Imtyaz et al. (2020) assessed the effectiveness of measures taken by countries to limit the spread of COVID-19 based on 5 clusters identified and concluded the positive association between the mortality rate and the proportion of people over 65 years of age. Sadeghi et al. (2021) used hierarchical clustering to rank and score 180 countries according to COVID-19 cases and fatality in 2020 and compare existing pandemic vulnerability prediction models and standard epidemiological scoring techniques. In Africa, the African Center for Strategic Studies (ACSS) identified 3 clusters of African countries by assessing their level of vulnerability through the rating of the 9 following socioeconomic factors: international exposure, healthcare system, urban density, urban population, population age, governmental transparency, press freedom, conflict and displacement (ACSS, 2020a). These clusters later served in establishing different risk profiles of exposure to COVID-19 (ACSS, 2020b). However, the lack of inclusion of climatic factors might constitute serious limitations for these results. Despite the large number of publications related to COVID-19, very few focused on the African continent. This study aims at addressing this critical gap in the body of the available literature, through the evaluation of the relative importance of factors that might be associated with the spread of the COVID-19 pandemic within the African context, using country-level indicators. To the best of our knowledge, this is the first study addressing directly the African continent scale regarding COVID-19, considering confirmed cases and deaths reports. Moreover, it uses data acquired over two years (January 1, 2020 to March31, 2022), which is likely to support effectively in identifying long-term relevant conclusions regarding the spread of COVID-19 in Africa. In addition, this study is motivated by the lack of explicit assessment of the effect of variables related to the physical climate setting (temperature, rainfall, insolation, humidity, wind speed) and environment (air quality, environmental performance index) on COVID-19 in earlier studies, especially in the case of Africa. The objectives of this study are twofold: (i) to assess the potential factors explaining at best the COVID-19 prevalence in African countries (cumulative number of confirmed cases and deaths per million people); (ii) to identify clusters of African countries sharing similar prevalence profiles to the COVID-19 disease.

Material and methods

Figure 1 is the flowchart presenting the main steps of the methodology used in this study, which are: (i) the data preparation phase (including data collection, imputation of missing values and removal of potential redundant variables); (ii) the dataset optimization phase (consisting in filtering optimal variables explaining the maximum variance in the data); (iii) the clustering phase and the comparative analysis of the clusters. These phases are further described in detail in the following sections. The complete list of the 54 African countries considered in this study is presented in the Online Resource 2 (ESM2—Table S1).
Fig. 1

Flowchart of the methodology used in this study

Flowchart of the methodology used in this study

Data preparation

Selected country-level variables description

Based on a literature review, a set of factors previously associated with the spread of the COVID-19 pandemic has been identified and included in this study. These variables, presented in Table 1, are grouped into five categories: (i) international exposure and socioeconomic status; (ii) population structure; (iii) healthcare system and environment; (iv) disease prevalence and risk factors; and (v) climatic setting. The detailed dataset for all these variables is given in the Online Resource 2 (ESM2—Table S2).
Table 1

Country-level variables selected for this study

CategoryVariablesDescriptionSources
COVID-19 prevalenceconf_pmCumulative confirmed cases (as of 08/31/21)Dong et al. (2020)
death_pmCumulative confirmed deaths (as of 08/31/21)Dong et al. (2020)
International exposure and socioeconomic statusarrivInternational tourism, number of arrivals (thousands)WorldBank (2021)
hdiHuman development index (HDI)UNDP (2020)
giniGini index (metric for inequalities)WorldBank (2021)
gdp_capGross domestic product per capita (GDP) ($US)WorldBank (2021)
alphabLiteracy rate (%)WorldBank (2021)
Population structuredens_popPopulation density (people/km2)WorldBank (2021)
urb_popUrban population percentage (%)WorldBank (2021)
median_ageMedian age of the population (years old)WorldBank (2021)
life_expLife expectancy (years old)WorldBank (2021)
p65yrsPercentage of people aged over 65 years (%)WorldBank (2021)
Healthcare system and environmentlack_hygienMortality rate due to lack of hygiene, unsafe water and sanitation (per 100,000 people)WorldBank (2021)
hous_fossfMortality rate due to air pollution from the use of household solid fuels (per 100,000 people)Yale (2020)
med_1000Number of physicians (per 1000 people)WorldBank (2021)
pm25Annual mean concentration of particulate matter of less than 2.5 microns of diameter (PM2.5) [µg/m3] in urban areasWorldBank (2021)
health_expCurrent health expenditure per capita ($US)WorldBank (2021)
epiEnvironmental performance indexYale (2020)
immuniz_dtp1Immunization coverage / DTP1 (%)WHO (2020)
immuniz_bcgImmunization coverage / BCG (%)WHO (2020)
Diseases prevalence and risk factorsprev_diabDiabetes prevalence (number of people)IHME (2020)
prev_cvldsCardiovascular diseases prevalence (number of people concerned)IHME (2020)
prev_ch.respChronic respiratory diseases prevalence (number of people concerned)IHME (2020)
prev_malariaMalaria prevalence (number of people concerned)IHME (2020)
prev_nutdefMalnutrition and nutritional deficiencies prevalence (number of people concerned)IHME (2020)
prev_respdtubRespiratory infections and tuberculosis prevalence (number of people concerned)IHME (2020)
alcohol_consTotal alcohol consumption per capita (liters)WorldBank (2021)
Climatic settinglat_absAbsolute latitude (°)Gelaro et al. (2017)
ws2m_avgAverage daily wind speed (m/s)Gelaro et al. (2017)
rh2m_avgAverage daily relative humidity (%)Gelaro et al. (2017)
tmax_avgAverage daily maximum temperature (°C)Gelaro et al. (2017)
tmin_avgAverage daily minimum temperature (°C)Gelaro et al. (2017)
tmoy_avgAverage daily temperature (°C)Gelaro et al. (2017)
insol_avgAverage daily insolation (MJ/m2/j)Gelaro et al. (2017)
tdew_avgAverage dew point temperature (°C)Gelaro et al. (2017)
ah_avgAbsolute air humidity (%) – calculated (Iribarne & Godson, 1973)Gelaro et al. (2017)
Country-level variables selected for this study The data was collected at the country level for the latest year available (2020 in most of the cases). Climatic setting variables were collected from MERRA-2 global reanalysis, which is a gridded and global model operating at the hourly/daily timestep at a spatial resolution of 0.625° × 0.5° providing data since 1980 (Gelaro et al., 2017). For this study, the climate data was collected using NASA POWER Data Access Viewer (https://power.larc.nasa.gov/data-access-viewer/), using the R package nasapower (Sparks, 2021). The climate data was first collected as daily time series for the period January 2020 to March 2022 and later on averaged over the period for each country. Absolute humidity (AH) was calculated using an approximation of the Clausius–Clapeyron equation, presented in Eq. () (Iribarne & Godson, 1973):where is the absolute humidity, is the average temperature (°C), is the air relative humidity (%), and is the base of the natural logarithm. These variables were selected because they have been previously associated with the COVID-19 pandemic. International exposure reflects the number of people entering a country through airports, which denotes an increased probability of welcoming confirmed cases of COVID-19 in the country (Moosa & Khatatbeh, 2020). Socio-econometric variables describe the level of development of countries and are closely related to healthcare system equipment and the effectiveness of policy management during health crises (Freed et al., 2020). Population structure has been associated with COVID-19 prevalence, especially to deaths (Medford & Trias-Llimós, 2020). Healthcare systems, environment and disease prevalence have also been identified as determinants of COVID-19 prevalence (Aydın et al., 2021; Carrillo-Larco & Castillo-Cara, 2020). Finally, the relationship between climate variables and COVID-19 has been a trending and active topic of research since the outbreak of the pandemic (Chen et al., 2021; Islam et al., 2021; Rahman et al., 2021; Singh et al., 2021; Zaitchik et al., 2020). COVID-19 prevalence data used in this study (cumulative number of confirmed cases and deaths) includes cumulative cases and deaths since the outbreak of the pandemic until March 31, 2022 for all 54 African countries. The data was normalized by current countries population estimates (WorldBank, 2021) to enable the comparison of the pandemic prevalence across countries, as suggested by Goldstein and Lee (2020). The counts were later on translated into confirmed cases per million people (conf_pm) and deaths per million people (death_pm).

Dataset imputation

The data collected for all country-level variables listed in Table initially presented gaps for specific countries such as Eritrea (ERY), Equatorial Guinea (GNQ), Libya (LBY), Somalia (SOM), South Sudan (SSD) and Lesotho (LSO). Missing values were identified especially for international arrivals (arriv: 4 missing values out of 54), Gini index (gini: 2 missing values out of 54), number of physicians (med_1000: 1 missing value out of 54), current health expenditure per capita (health_exp: 1 missing value out of 54), environmental performance index (epi: 3 missing values out of 54), alcohol consumption (alcohol_cons: 1 missing value out of 54), exposure to air pollution from household fossil fuels (hous_fossf: 3 missing values out of 54). To form a fully complete initial dataset, the missing values were imputed using the R package missforest (Stekhoven, 2013), which implements a random iterative and nonparametric gap-filling approach based on random forests (Stekhoven & Buhlmann, 2012). The missForest algorithm has been used in previous COVID-19 research (Gangloff et al., 2021) and has proven to be more effective at gap-filling than other nonparametric approaches (Ramosaj & Pauly, 2019). In this study, the random forest model was trained on the matrix formed by the initially selected variables using 100 trees and 3 iterations, yielding a normalized root mean square error (NRMSE) = 0.1480. The detailed dataset for all variables is given in the Online Resource 2 (ESM2—Table S3).

Identification and removal of redundant factors

The initial dataset contained 34 variables which were assessed for potential redundancy and multicollinearity. In this perspective, the correlation matrix (Spearman’s nonparametric coefficient) was evaluated and a threshold of 0.90 was considered to eliminate country-level variables highly correlated ( > 0.90) with an already existing variable within the dataset, to lessen redundancy. To further avoid potential issues related to multicollinearity and form a dataset of independent variables, the variance inflation factor (VIF) was evaluated for the remaining country-level variables. The VIF is a measure of multicollinearity in a set of multiple regression variables, and is defined as the ratio of the overall model variance to the variance of a model including a single independent variable (Akinwande et al., 2015). The VIF formula is defined as in Eq. ():where is the VIF for the independent variable, is the unadjusted coefficient of determination for regressing the independent variable on the remaining ones. In this study, only variables presenting a VIF value below 10 were retained, this threshold being commonly advised as a cutoff for high multicollinearity (Kutner et al., 2004)..

Optimal feature selection

The optimal feature selection is a dimensionality reduction method which helps in retaining a subset of relevant variables maximizing the variance of the original dataset, while minimizing the loss of information resulting from the removal of some of the original variables (Friedman, 1997). However, the procedure for optimal feature selection is likely to be affected by the presence of atypical observations, i.e., outliers. It is therefore critical to identify and remove these outliers from the dataset before looking for optimal variables (Acuña & Rodríguez, 2005). In this study, outliers were identified using the multivariate Cook's distance statistic (Cook & Weisberg, 1982). Cook’s -statistic is calculated by removing the data observations from the model and recalculating a regression, hence summarizing how much all the values in the regression model change when the observation is removed. The calculation of Cook’s distance is defined by Eq. ():where is the Cook distance for the observation (for ), is the regression model response fitted on all observations, is the regression model response fitted on all but the observation, is the number of coefficients in the regression model, and is the mean square error. The calculation of -statistic was conducted through a linear regression model, using a cutoff of 4 times the standard deviation to flag outlier observations (Cook & Weisberg, 1982). The identified outliers were temporarily set aside in order to avoid bias in the selection conducted later on during feature selection. The optimal feature selection was performed through a univariate filter using the R caret package function sbf for selection by filter (Kuhn, 2021). Seventy-five percent of the data was used for training and 25% of the data was used for validation, using repeated tenfold cross-validation. This procedure for optimal feature selection was conducted separately on confirmed cases (conf_pm) and on deaths (deaths_pm) as response variables. The resulting dataset was finally normalized to bring all the variables to the same scale between 0 and 1, prior to clustering (Visalakshi & Suguna, 2009), and using min–max normalization.

Clustering of countries

Clustering aims at partitioning the whole of African countries into homogeneous groups called clusters. These clusters are obtained by maximizing the inertia between clusters and therefore minimizing the inertia within all clusters to obtain well-differentiated groups of observations. The different clustering methods includes hierarchical clustering, partitioning methods and machine learning-based methods. In this study, the bottom-up or agglomerative hierarchical clustering (AHC) was used as it does not require as an input a number of clusters, unlike partitioning methods such as K-means. Machine learning-based methods are also available and effective; however, they poorly compare to AHC in terms of ease of interpretation of their results. The AHC procedure used in this study provides the analyst a dendrogram, whose goodness can be assessed through the correlation between the cophenetic distances between observations (vertical y-axis on the dendrogram) and the original distances. The closer the value of this correlation coefficient to 1, the more reliable the classification presented through the dendrogram in terms of reflection of the data. Cophenetic distances above 0.5 are deemed to be acceptable (Kassambara, 2017). In this study, the cophenetic correlation coefficient for various clustering schemes produced by combinations of various distance metrics (Manhattan, Canberra, Minkowski and Euclidean) and aggregation methods (Average, Complete, Ward.D and Ward.D2), hence a total of 16 combinations, was examined. These combinations were ranked out by decreasing values of cophenetic correlation coefficients. For each of these combinations, the optimal number of clusters to be produced was evaluated with the R package NbClust (Charrad et al., 2014), which uses an array of indices to select the appropriate number of clusters (Charrad et al., 2014; Milligan & Cooper, 1985). Using this number of clusters, the AHC is applied and the statistical differences in COVD-19 prevalence between clusters are assessed. The final combination of distance metric and aggregation method selected is the one producing significantly different clusters (in terms of COVID-19 prevalence), with the highest cophenetic correlation coefficient. The significance of differences between COVID-19 clusters prevalence (confirmed cases and deaths per million) was further evaluated with the nonparametric Kruskal–Wallis test for multiple groups comparison (at level =5%), associated with the post hoc nonparametric Mann–Whitney U test for pairwise comparison of group medians (at level α = 5%). Also, the significant differences between the distribution of variables used to form clusters were similarly assessed.

Results

COVID-19 situation in Africa

In this section, several aspects of the epidemiological situation of the pandemic are presented: the chronological onset of COVID-19 in Africa, the evolution of the cumulative number of cases and deaths and the spatial and temporal evolution of COVID-19 within the African continent.

Chronological onset of COVID-19

Africa reported its first COVID-19 case in Egypt (EGY), on February 14, 2020. Neighboring countries such as Algeria (DZA), Tunisia (TUN) and Morocco (MAR) reported their first cases a few days later. It appears that the first countries to be affected are countries with higher international exposure (Online Resource 1, ESM1–Fig. S1). These countries are also those farthest from the equator, such as Tunisia (TUN), Egypt (EGY) and South Africa (ZAF). Less than 2 months after the 1st case was reported in Egypt, 52/54 countries (96.3%) had reported at least one confirmed case. For most countries, the date of the first reported death case follows quite closely the date of the first reported confirmed case, by an order of 2 to 125 days. Only Seychelles (SYC) has not reported a single death in 2020 despite a first confirmed case being reported on March 14, 2020.

Cumulative number of cases and deaths

Figure 2 shows the cumulated numbers of cases and deaths, along with the daily estimates in Africa during the early beginning of the pandemic up to March 31, 2022.
Fig. 2

COVID-19 prevalence evolution in Africa. a Cumulative confirmed cases and deaths. b Daily confirmed cases and deaths

COVID-19 prevalence evolution in Africa. a Cumulative confirmed cases and deaths. b Daily confirmed cases and deaths From February 14, 2020 to March 31, 2022, a total of 11,558,931 COVID-19-related confirmed cases and 251,953 deaths is reported in Africa (roughly 2.37% of the total number of worldwide cases and 4.10% of worldwide deaths counts). This yields an average of 8848 confirmed cases per million and 193 deaths per million in Africa for this period, compared to 67,977 cases per million and 809.1 deaths per million globally (Dong et al., 2020). These figures translate to a case fatality rate of 2.17% in Africa, twice higher than the global case fatality rate which is 1.19% (Dong et al., 2020). This shows that even though Africa is much less affected than the rest of the world, the COVID-19 lethality in Africa is more severe (Lawal, 2021). As of March 31, 2022, the 10 COVID-19 hard-hit countries are: Seychelles (SYC: 414,044 cases per million, 1680 deaths per million), Cameroon (CMR: 217,378 cases per million, 3504 deaths per million), Mauritius (MUS: 166,186 cases per million, 765 deaths per million), Botswana (BWA: 132,624 cases per million, 1166 deaths per million), Tunisia (TUN: 88,577 cases per million, 2422 deaths per million), Libya (LBY: 74,026 cases per million, 947 deaths per million), South Africa (ZAF: 63,509 cases per million, 1708 deaths per million), Namibia (NAM: 63,197 cases per million, 1611 deaths per million), Swaziland (SWZ: 60,769 cases per million, 1214 deaths per million) and Morocco (MAR: 31,895 cases per millions, 440 deaths per million). In terms of raw confirmed cases and deaths count, the top 10 countries include South Africa (ZAF), Morocco (MAR), Tunisia (TUN), Egypt (EGY), Libya (LBY), Ethiopia (ETH), Kenya (KEN), Zambia (ZMB), Botswana (BWA), Algeria (DZA) and 74.5% of the confirmed cases counts and 80.1% of deaths come from these topping countries; yet, their cumulative population account for 33.8%of the continent population (Dong et al., 2020). Also, it is worth noting that most of these countries are those mostly located at the northernmost (or southernmost) parts of the continent, and features high standard of living and international exposure. Over the study period, the African continent experienced four waves of increasing magnitude, both for new daily cases and deaths, as shown in daily estimates presented in Fig. 2b. The first wave peaked around July–September 2020, the second in January–February 2021, the third one in early August 2021 and the fourth one (of similar magnitude with the third one) occurred February 2022. Also, a strong periodicity of around 6 months is observed.

Spatial and temporal evolution of COVID-19

The spatial and temporal spread of the COVID-19 in Africa is presented on choropleth maps in Fig. 3 at different dates (September 30, 2020; March 31, 2020; September 30, 2021; March 31, 2022).
Fig. 3

Choropleth map showing the spatial and temporal spread of COVID-19 cumulative cases and deaths in Africa over the period January 2020 to March 2022. a–d Cumulative cases per million people. e–h Cumulative deaths per million people

Choropleth map showing the spatial and temporal spread of COVID-19 cumulative cases and deaths in Africa over the period January 2020 to March 2022. a–d Cumulative cases per million people. e–h Cumulative deaths per million people From the beginning of the pandemic in Africa to March 31, 2022, it appears that the countries located at the extremities of the continent (north and south) are the most affected in terms of cumulative confirmed cases and deaths per million, while countries closer to the equator seem less affected, in terms of magnitude. The Southern Africa region especially is the most affected region on the continent in terms of prevalence. This region, which is home to only 13.52% of the people living in Africa (WorldBank, 2021), accounts for over 47.1% of cumulative confirmed cases and 49.8% of deaths. This is in sharp contrast to the findings of Heneghan et al. (2020) who stated that the pandemic had a higher prevalence in the Northern hemisphere of the continent. In contrast, the West Africa region, which is home to 29.67% of the African population, reported only 7.8% of cumulative confirmed cases and 5.2% of deaths. The North Africa region, with 18.69% of the continent’s population, reported 31.1% of confirmed cases and 34.9% of deaths, while the East Africa region reported 11.0% of confirmed cases and 8.3% of deaths. The Central Africa region is the least affected by the pandemic, with only 3.0% of confirmed cases and 1.7% of deaths (Online Resource 1, ESM1–Fig. S2-S5).

Selection of optimal factors for COVID-19 prevalence analysis

Spearman’s correlation rank analysis

Figure 4 shows Spearman’s correlation matrix for the complete dataset of 34 country-level variables initially selected and their association to the two response variables, which are the cumulative confirmed cases per million (conf_pm) and deaths per million (death_pm). The complete correlation matrix values and associated significance (p values) is given in the Online Resource 2 (ESM2—Tables S4 and S5).
Fig. 4

Spearman’s rho correlation coefficient between country-level variables and their association to COVID-19 prevalence in African countries. Blank values show nonsignificant correlation coefficients (at = 5% level)

Spearman’s rho correlation coefficient between country-level variables and their association to COVID-19 prevalence in African countries. Blank values show nonsignificant correlation coefficients (at = 5% level) The variables highly and positively correlated to COVID-19 prevalence data (respectively, cumulative confirmed cases and deaths) are Human Development Index (hdi: =0.77, 0.73), health expenditure (hdi: =0.76, 0.77), median age (median_age: =0.74, 0.71), literacy rate (alphab: =0.72, 0.64), number of physicians for 1000 people (med_1000:=0.71, 0.70), Gross Development Product per capita (gdp_cap:=00.71, 0.64) and mortality rate due to air pollution from the use of household solid fuels (hous_fossf: =0.70, 0.72). These results are in line with those of Gilbert et al. (2020) and ACSS (2020a) for variables reflecting the standard of living, and of Lulbadda et al. (2021) for the age of the population. Some variables are also found to be highly and negatively correlated to COVID-19 prevalence, including the mortality related to the lack of hygiene (lack_hygien: = − 0.78,  − 0.79), prevalence of malaria (prev_malaria:= − 0.71,  − 0.79), the lack of hygiene related mortality (lack_hygien:= − 0.77,  − 0.70) and prevalence of malnutrition and nutritional deficiencies (prev_nutdef: = − 0.62,  − 0.53). Such findings are in line with the recent work of Weiss et al. (2021). Moderate association is found between COVID-19 prevalence (confirmed cases and deaths per million, respectively) and prevalence of nutritional deficiencies (prev_nutdef:= − 0.62, − 0.53), immunization coverage with BCG (immuniz_bcg: =0.57, 0.50), life expectancy (life_exp: = 0.55, 0.56), percentage of people aged over 65 years (p65yrs: =0.54, 0.58), environmental performance index (epi: =0.54, 0.56), prevalence of respiratory diseases and tuberculosis (prev_respdtub: =-0.51, − 0.45), prevalence of chronic respiratory diseases (prev_ch.resp: = − 0.46, − 0.37), PM2.5 air pollution (pm25: = − 0.45, -0.40), daily annual maximum temperature (tmax_avg: = − 0.44, − 0.36) and urban population (urb_pop: =0.43, 0.42). Variables such as prevalence of nutritional deficiencies (prev_nutdef), prevalence of respiratory infections and tuberculosis (prev_respdtub), prevalence of cardiovascular diseases (prev_cvlds) and prevalence of diabetes (prev_diab) and prevalence of chronic and respiratory diseases (prev_ch.resp) were found to be highly correlated with each other ( > 0.9). As such the first four variables were removed from the dataset, only leaving out the prev_ch.resp variable, found to be lesser correlated with the remaining country-level variables in the entire dataset. To further avoid potential issues of collinearity, redundant variables in the dataset were screened through the calculation of the VIF index, presented in Table 2.
Table 2

VIF values for all variables

Variableconf_pmdeath_pmVariableconf_pmdeath_pm
1alcohol_cons3.113.1116prev_ch.resp9.529.52
2dens_pop3.363.3617insol_avg10.5710.57
3urb_pop3.593.5918life_exp11.4811.48
4pm253.663.6619p65yrs14.6314.63
5gini4.994.9920hous_fossf17.2017.20
6arriv6.106.1021hdi18.7418.74
7med_10007.487.4822gdp_cap32.4832.48
8prev_malaria7.517.5123health_exp32.5432.54
9ws2m_avg7.757.7524median_age35.6135.61
10epi7.837.8325rh2m_avg140.79140.79
11alphab7.967.9626ah_avg180.74180.74
12immuniz_bcg8.468.4627tmax_avg258.91258.91
13lack_hygien8.828.8228tmin_avg575.63575.63
14immuniz_dtp18.928.9229tmoy_avg1349.891349.89
15lat_abs9.089.08
VIF values for all variables A total of 16 variables have VIF values below 10 and were considered significant for further analysis. These variables refer to international exposure (arriv), socioeconomic status (gini, alphab), population structure (dens_pop, urb_pop), healthcare systems and environment (pm25, med_1000, epi, lack_hygien, immuniz_dtp1, immuniz_bcg), disease prevalence and risk factors (alcohol_cons, prev_malaria, prev_ch.resp), climate setting (ws2m_avg, lat_abs). The remaining variables show VIF values over 10, indicating high collinearity. Therefore, these latter variables were excluded.

Optimal factors subset selection

The Cook’s distance -statistic helped in flagging some countries as outliers for cumulative confirmed cases and deaths per million, especially 3 countries: Cabo Verde (CPV), Mauritius (MUS) and Seychelles (SYC). These countries present highest GDP per capita values (gdp_cap: 11,099.2 $US/capita and 17,448.3 $US/capita for MUS and SYC, respectively), highest population densities (urban_pop: MUS: 620.4 inhabitants/km2 for MUS), highest health expenditure (health_exp: 653.3 $US/capita and 833.1 $US/capita for MUS and SYC, respectively). Also, these countries share the highest prevalence estimates (166,185–414,043 confirmed cases per million and 764–3,504 deaths per million). Since such atypical values are likely to affect the feature selection procedure (Online Resource 1, ESM1–Fig. S6), these countries were temporarily removed from the dataset, and later re-included in the set of countries. Table 3 shows the optimal variables selected through the feature selection, ranked by order of decreasing importance, evaluated at different time points during the study period.
Table 3

Optimal features explaining variability in COVID-19 prevalence across African countries

VariableConf_pm (ρ)VariableDeath_pm (ρ)Variableconf_pm (ρ)VariableDeath_pm (ρ)
Period: January 1, 2020 to September 30, 2020Period: March 31, 2021 to September 30, 2021
1lack_hygien − 0.69 ***lack_hygien − 0.62 ***1lack_hygien − 0.74 ***lack_hygien − 0.73 ***
2med_10000.62 ***med_10000.56 ***2alphab0.65 ***med_10000.63 ***
3urb_pop0.58 ***urb_pop0.54 ***3med_10000.64 ***alphab0.55 ***
4alphab0.51 ***alphab0.41 **4epi0.49 ***epi0.49 ***
5epi0.38 **lat_abs0.37 **5pm25 − 0.44 **lat_abs0.49 ***
6lat_abs0.3 *epi0.33 *6urb_pop0.43 **pm25 − 0.41 **
7arriv0.18arriv0.167arriv0.35 *arriv0.40 **
8gini0.15gini0.028lat_abs0.35 *urb_pop0.38 **
9gini0.32 *gini0.19
Period: September 30, 2020 to March 31, 2021Period: September 30, 2021 to March 31, 2022
1lack_hygien − 0.73 ***lack_hygien − 0.68 ***1lack_hygien − 0.74 ***lack_hygien − 0.76 ***
2med_10000.68 ***med_10000.63 ***2alphab0.67 ***med_10000.66 ***
3alphab0.62 ***lat_abs0.53 ***3med_10000.66 ***alphab0.58 ***
4epi0.51 ***alphab0.49 ***4epi0.49 ***epi0.53 ***
5urb_pop0.46 ***epi0.47 ***5pm25 − 0.45 **lat_abs0.51 ***
6lat_abs0.4 **urb_pop0.42 **6urb_pop0.42 **pm25 − 0.41 **
7pm25 − 0.39 **pm25 − 0.35 *7arriv0.37 **urb_pop0.39 **
8arriv0.33 *arriv0.34 *8lat_abs0.37 **arriv0.39 **
9gini0.25gini0.149gini0.31 *gini0.16

indicates Spearman’s rank correlation coefficient. ‘***’ indicates significance at the 0.001 level. ‘**’ indicates significance at the 0.01 level. ‘*’ indicates significance at the 0.05 level. Variables are ranked out by order of decreasing importance

Optimal features explaining variability in COVID-19 prevalence across African countries indicates Spearman’s rank correlation coefficient. ‘***’ indicates significance at the 0.001 level. ‘**’ indicates significance at the 0.01 level. ‘*’ indicates significance at the 0.05 level. Variables are ranked out by order of decreasing importance It appears that 8 to 9 optimal features stand out as the most important ones, both for confirmed cases and deaths. These optimal variables include mortality attributed to the lack of hygiene (lack_hygien), literacy rate (alphab), number of physicians per 1000 inhabitants (med_1000), coming as the most important ones. These variables are further followed by EPI (epi), air pollution with PM2.5 (pm25) and urban population (urb_pop). The lesser important one, with varying ranks of importance depending on the analysis period, are latitude (lat_abs), international tourism (arriv) and Gini index (gini). From the above results, it appears that variables relating to the healthcare system and environment-related variables (lack_hygien, med_1000, epi, pm25), international exposure (arriv) and socioeconomic status (alphab, gini) are closely related to COVID-19 prevalence. The latter are followed by variables related to population structure (urb_pop) and to a lesser extent, climatic setting (lat_abs). These 9 variables were finally used for the clustering of countries.

Clustering of African countries

Creation of clusters

The examination of various clustering schemes (as presented in Sect. 0) resulted in an optimal set of 3 clusters, produced through the combination of Canberra distance and Ward.D2 aggregation method. The cophenetic correlation associated is 0.585, therefore considered acceptable (Kassambara, 2017). Figure 5 shows the resulting dendrogram from the AHC clustering.
Fig. 5

Dendrogram of observations based on AHC using the optimal subset of 9 variables

Dendrogram of observations based on AHC using the optimal subset of 9 variables The associated map in Fig. 6 shows the spatial configuration of the clusters obtained. The detailed dataset presenting the clusters and their associated features and prevalence is presented in the Online Resource 2 (ESM2—Table S6).
Fig. 6

Map of the 3 clusters identified in this study

Map of the 3 clusters identified in this study Cluster 1 is composed of 11 countries, mostly located in the Northern Southern regions of Africa: Algeria (DZA), Botswana (BWA), Egypt (EGY), Libya (LBY), Mauritius (MUS), Morocco (MAR), Namibia (NAM), Seychelles (SYC), South Africa (ZAF), Tunisia (TUN) and Zambia (ZMB). Most of these countries feature a high socioeconomic status and large international exposure. Cluster 2 is the largest and is composed of 27 countries: Angola (AGO), Burundi (BDI), Cabo Verde (CPV), Central African Republic (CAF), Comoros (COM), Congo Brazzaville (COG), Cote d'Ivoire (CIV), Djibouti (DJI), Equatorial Guinea (GNQ), Eritrea (ERI), Eswatini (SWZ), Ethiopia (ETH), Gabon (GAB), Ghana (GHA), Kenya (KEN), Lesotho (LSO), Madagascar (MDG), Malawi (MWO), Mozambique (MOZ), Rwanda (RWA), Sao Tome and Principe (STP), Somalia (SOM), South Sudan (SSD), Sudan (SDN), Tanzania (TZA), Uganda (UGA) and Zimbabwe (ZWE). Most of these countries feature a middle to low socioeconomic status. Cluster 3 is composed of the 16 remaining countries: Benin (BEN), Burkina Faso (BFA), Cameroon (CMR), Chad (TCD), Congo Kinshasa (COD), Gambia (GMB), Guinea (GIN), Guinea-Bissau (GNB), Liberia (LBR), Mali (MLI), Mauritania (MRT), Niger (NER), Nigeria (NGA), Senegal (SEN), Sierra Leone (SLE) and Togo (TOG). These countries feature an intermediate socioeconomic status, between countries of Cluster 1 and Cluster 2. The map of clusters shows a substantial spatial differentiation. With a few exceptions, the countries in Cluster 1 (11 countries) are located at the Northern and Southern poles of the continent. The majority of countries in Cluster 2 (27 countries) are located in Central and Eastern regions of the continent. Finally, Cluster 3 (16 countries) is essentially composed of countries located in the western part of the continent.

Statistical analysis of clusters

The Cluster 1 is by far the largest affected cluster (median of 63,508.6 confirmed cases per million and 946.5 deaths per million), followed by Cluster 2 (median of 4473.7 confirmed cases per million and 81.2 deaths per million) and Cluster 3 (median of 2569.3 confirmed cases per million and 35.7 deaths per million). Clusters 2 and 3 share similar orders of magnitude in terms of COVID-19 prevalence. Table 4 presents the statistical description of the 3 clusters. Cluster 1 is by far the largest affected cluster (median of 63,508.6 confirmed cases per million and 946.5 deaths per million), followed by Cluster 2 (median of 4473.7 confirmed cases per million and 81.2 deaths per million) and Cluster 3 (median of 2569.3 confirmed cases per million and 35.7 deaths per million). Clusters 2 and 3 share similar orders of magnitude in terms of COVID-19 prevalence.
Table 4

Statistical description of the 3 clusters of countries

Clusters123123123123
lack_hygienalphabmed_1000epi
Min0.211.44.171.234.522.30.20.00.034.726.522.6
Q10.826.137.880.261.439.80.50.10.141.430.226.6
Q21.938.445.986.776.546.41.20.10.143.333.829.3
Avg7.939.850.584.670.647.61.20.20.243.933.229.0
Q312.849.869.189.279.752.31.90.20.145.036.030.7
Max34.986.6101.095.994.486.82.50.70.858.245.838.3

‘Min’ is the minimum value, ‘Q1’ the first quartile, ‘Q2’ the second quartile, i.e., the median, ‘Avg’ is the average, ‘Q3’ is the third quartile, ‘Max’ is the maximum value

Statistical description of the 3 clusters of countries ‘Min’ is the minimum value, ‘Q1’ the first quartile, ‘Q2’ the second quartile, i.e., the median, ‘Avg’ is the average, ‘Q3’ is the third quartile, ‘Max’ is the maximum value Figure 7 compares the COVID-19 prevalence between the 3 clusters. The cumulative number of confirmed cases per million (conf_pm) and deaths per million (death_pm), respectively, in Fig. 7a and Fig. 7b shows significant differences for groups between Clusters 1–2 and Clusters 1–3 (p values < 0.01). Likewise, the mortality rate (shown in Fig. 7d) is found to be significantly different in Clusters 1–2 and 1–3 pairs (p values < 0.05). The case fatality rate, however, shown in Fig. 7c, remains similar across the 3 clusters (p values: 0.648–1.000).
Fig. 7

Box-plot comparison of COVID-19 prevalence across clusters. a Cumulative cases per million (conf_pm). b Cumulative deaths per million (death_pm). c Case fatality rate (%), calculated as the cumulative number of deaths out of the cumulative number of confirmed cases. d Mortality (per million people), calculated as the cumulative number of deaths out of population estimates (WorldBank, 2021). The vertical axis was transformed to log10 scale for easier visual cross-comparison of clusters

Box-plot comparison of COVID-19 prevalence across clusters. a Cumulative cases per million (conf_pm). b Cumulative deaths per million (death_pm). c Case fatality rate (%), calculated as the cumulative number of deaths out of the cumulative number of confirmed cases. d Mortality (per million people), calculated as the cumulative number of deaths out of population estimates (WorldBank, 2021). The vertical axis was transformed to log10 scale for easier visual cross-comparison of clusters Figure 8 compares the distribution of the 10 features used to form the 3 clusters. Significant differences (at = 5% level) are observed between the 3 pairs of clusters for the literacy rate (alphab) and environmental performance index (epi). Variables such as international arrivals (arriv), urban population (urb_pop), number of physicians for 1000 inhabitants (med_1000), mortality attributed to the lack of hygiene (lack_hygien) and absolute latitude (lat_abs) present significant differences only for Clusters 1–2 and Clusters 1–3 pairs. Differences in Gini index distribution (gini) are found to be significant only between Clusters 2 and 3 (p value = 0.011), while air pollution due to PM2.5 particles (pm2.5) shows significant differences between Clusters 1–3 and Clusters 2–3 pairs.
Fig. 8

Cluster comparison by variables. The vertical axis was transformed to log10 scale to enable visual cross-comparison across clusters

Cluster comparison by variables. The vertical axis was transformed to log10 scale to enable visual cross-comparison across clusters Cluster 1 is by far the hard-hit cluster by COVID-19 with a median of 63,508.6 confirmed cases per million and 946.5 deaths per million. The countries in this cluster have the lowest mortality related to the lack of hygiene (median of 1.9%) and air pollution due to PM2.5 (median of 28.4), the highest literacy rate (median of 86.7%) and EPI score (median of 43.3). These countries are the most urbanized (median of 63.0%) and are located the farthest from the equator (median absolute latitude of 26.3°). International exposure, with the annual number of tourist arrivals is the highest for this cluster (median of 1,830,000 tourists). Cluster 3 is the least affected by COVID-19 with a median of 2569.3 confirmed cases per million and 35.7 deaths per million. Interestingly, the countries in this cluster have the highest mortality related to the lack of hygiene (median of 45.9%), the lowest literacy rate (median of 46.4%), EPI score (median of 29.3) and international exposure (median of 277,000 arrivals). Also, air pollution due to PM2.5 particles in countries of Cluster 3 is the highest (median of 57.7). Cluster 2 is intermediate between Cluster 1 and 3 in terms of COVID-19 prevalence, as shown by the median values for confirmed cases (4473.7 cases per million) and deaths (81.2 deaths per million). Urban population densities are the lowest in this cluster (median of 36.5%). Also, these countries are geographically close to the equator (median absolute latitude of 6.9°). International exposure is intermediate for this cluster (median of 812,000 tourists). Figure 9 shows the linear association between the natural logarithm of cumulative confirmed cases and deaths per million opposed to the absolute latitude.
Fig. 9

Scatterplots of natural logarithm (log) of COVID-19 cases and deaths per million people opposed to absolute latitude (in degrees) for African countries. a COVID-19 cumulated confirmed cases (R2 = 0.063, p value = 0.063 > 0.05). b COVID-19 cumulated deaths (R2 = 0.198, p value = 0.005 < 0.05)

Scatterplots of natural logarithm (log) of COVID-19 cases and deaths per million people opposed to absolute latitude (in degrees) for African countries. a COVID-19 cumulated confirmed cases (R2 = 0.063, p value = 0.063 > 0.05). b COVID-19 cumulated deaths (R2 = 0.198, p value = 0.005 < 0.05) The coefficients of determination (R2) for this linear association are, respectively, of 0.098 (p value = 0.063, not significant) and 0.198 (p value = 0.005, significant) for cumulative confirmed cases and deaths per million. It shows that to some extent, the farther from the equator a country is located, the more deaths are to be expected to COVID-19, with a semi-elasticity of around 7.9% increase in deaths cases per million by one degree of absolute latitude increase. Similar observation have been reported in previous studies, which suggested that the higher sunlight and heat (near equator) is likely to hinder the spread of the COVID-19 (Braiman, 2020; Chen et al., 2021; Whittemore, 2020).

Discussion

On transmission factors and COVID-19 clustering

At a global level, the spread of the pandemic indicates that developed countries (such as USA, Italy, England, France, China and Russia) are also the most affected by the pandemic. A positive correlation between the high socioeconomic status, standard of living and COVID-19 prevalence has been reported in earlier studies (Dong et al., 2020), which is similar, to some extent, to the findings in this study: Cluster 1 in this study, for example, is the most affected by the pandemic and is also the one concentrating the leading countries in Africa, in terms of socioeconomic features (Cash & Patel, 2020). On the other hand, it appears that the prevalence of chronic respiratory infections and diseases is not relevant to COVID-19 prevalence in the context of Africa, which calls into question some of the previous studies (Carrillo-Larco & Castillo-Cara, 2020; Renzaho, 2020). According to Bigna and Noubiap (2019), there is a rising concern regarding the recent increase of non-communicable diseases (cardiovascular and respiratory diseases, cancers, diabetes) in Sub-Saharan Africa, mostly attributed to rapid urbanization and increased risk factors such as unhealthy diets, reduced physical activity, hypertension, obesity and air pollution (Kraef et al., 2020). The countries which reported the first COVID-19 cases and deaths in this study are those found to have a higher international exposure mostly through tourism. This is in line with a recent large-scale genomic analysis, which specifically revealed that COVID-19 in most African countries was triggered by importations, predominantly from Europe. Yet, this spread slowed down following the early introduction of international travel restrictions. Furthermore, ongoing transmission and increasing mobility led to the emergence and spread of many variants within the continent (Wilkinson et al., 2021; Zongo et al., 2020). Some variables related to the structure of the population (life expectancy, urban population) in this study also best explain the spread of the pandemic in Africa. The highest values of life expectancy obtained for Cluster 1 appear mostly as a typical trait of topping countries, for which it is expected to be high because of the better standard of life, life amenities, healthcare facilities and management systems found in such countries. Similarly, for such countries, the rate of urbanization is expected to be high, which increases the risk of COVID-19 transmission (Carrillo-Larco & Castillo-Cara, 2020; Rizvi et al., 2021). Zhu et al. (2020) and Rizvi et al. (2021) established that air quality is a positive predictor of the COVID-19 confirmed cases. This is supported by the findings in this study, as shown by the prominent variables highlighted in this study such as mortality related to air pollution due to PM2.5 and also the EPI score. COVID-19 and air pollution is already known to be a hazardous association. Recently emerging evidence suggests that exposure to air pollution worsens the severity of COVID-19 on human health (Bourdrel et al., 2021). Regarding the climatic setting, only latitude was found to be effective at explaining COVID-19 prevalence, with a higher and significant association to COVID-19 deaths. Meo et al. (2020) showed that an increase in relative humidity and temperature is associated with a decrease in the number of daily cases and deaths due to COVID-19 in Africa. Other studies highlighted association between COVID-19 and various climate parameters, such as rainfall, wind speed and surface pressure (Bashir et al., 2020; Bilal et al., 2021; Hossain et al., 2021; Raza et al., 2021; Rendana, 2020; Ward et al., 2020). Interestingly, insolation has been reported as a negative predictor of COVID-19 prevalence, which in turn might explain why countries located farther from the equator tend to report more confirmed cases and especially deaths (Braiman, 2020; Chen et al., 2021; Whittemore, 2020). This latter finding is in line with our results. However, an in-depth assessment of the clear connection between climate and the current pandemic is yet to be carried. (Wang and Crameri 2014; Lone & Ahmad, 2020). On the overall, little previous work has examined factors associated with the COVID-19 pandemic within the context of Africa. In this research, 3 clusters of countries are identified. In comparison, ACSS (2020b) conducted a clustering in Africa and identified 7 country profiles. Yet, our approach presents a significant difference as it tries to relate the variability in COVID-19 prevalence (cases and deaths) across countries through country-level variables, later used to form clusters. Moreover, environment-related variables are considered here, unlike ACSS (2020b). Other clustering-related research work conducted outside of Africa or at the global level concluded that countries with similar socioeconomic profiles fall within the same cluster (Carrillo-Larco & Castillo-Cara, 2020; Freed et al., 2020; Zarikas et al., 2020). It is, at a first glance, surprising to note that countries considered to be ‘developed’ from the viewpoint of socioeconomic status or standard of living are the most severely affected by the COVID-19 pandemic. However, Freed et al. (2020) discusses the clear distinction to be made between the level of socioeconomic achievement for a country on the one side, and on the other side, the preparedness of healthcare systems as well as the willingness of populations to cope with restrictive measures promoted by authorities. These features are decisive to achieve a swift and effective response to the ongoing health crisis (Sadeghi et al., 2021; Zhang et al., 2020).

On the lack of hygiene and COVID-19 transmission

Handwashing is considered to be one of the most effective ways to prevent the transmission of diseases, including COVID-19. In this study, the mortality attributed to the lack of hygiene (lack_hygien) is found to be significant for both confirmed cases and deaths. Similarly, it was found to be determinant at separating optimally the 3 clusters found. The lack_hygien variable is a negative predictor of the pandemic (conf_pm: = -0.74; death_pm: = -0.76). The lack of sanitation associated with poor hygiene practices is already deemed to be responsible for the higher communicable disease burdens, especially for developing countries (James et al., 2018). It is, therefore, reasonable to expect that better hygiene standards, safe sanitation and safe drinking water are likely to be negatively correlated with COVID-19 cases and deaths. Interestingly, the findings in our study suggest quite the opposite. In our understanding, this should not be perceived as a causation, but rather as a typical trait of the clusters formed instead. An explanatory hypothesis can be found in the possibility of ‘immune training,’ as suggested by Chatterjee et al. (2020). In fact, the African context is a unique case where previous infectious diseases such as HIV, tuberculosis and malaria as well as infections are highly prevalent and are known to influence immune function, which might also, in turn, affect the immune response to COVID-19 (Adams et al., 2021; Tessema & Nkengasong, 2021). Also, along these lines, a lower prevalence of COVID-19 in malaria-endemic areas has been reported, although the reasons are yet to be further investigated (Anjorin et al., 2021; Iesa et al., 2020).

Implications for policies and decision making

Since the onset of the pandemic, Africa has experienced four waves in daily new cases, which seems to display a strong periodicity of approximately 6 months each. Such finding has direct implications for management policies, as it suggests that barrier measures, social distancing and eventually specific measures should be undertaken prior to the occurrence of such predictable peak periods (especially in the months of June-July and December-January). Besides this, coupling some strategies and preventive measures might help in a strong mitigation of the spread of the pandemic on the African continent. For example, authorities should focus on good governance regarding health directives and open communication, to foster willingness of population to adopt mitigation measures, and also encourage them to get vaccinated. Providing financial support to vulnerable sectors of activities and populations might help in this regard, considering the limited resources of many African countries (James et al., 2018; Sadeghi et al., 2021). Regarding the governance of the health sector, lack of knowledge is still hindering our understanding of the pandemic. Also, the emergence of new variants more virulent in younger populations is likely, which might lead in turn to reconsideration of Africa’s susceptibility to the COVID-19 pandemic. As such, studies to assess risk factors including detailed cohort studies with appropriate controls are needed (Adams et al., 2021). Regarding environment and sustainability aspects, the current figures of the COVID-19 pandemic can be perceived late lesson from an early warning. Human-induced environmental degradation increases the risk of pandemics through the complex interplay between ecosystem disturbance, urbanization, international travel and climate change. Therefore, a transition to a sustainable society and economy appears necessary to protect human health. As such, decision makers (at the institutional level) and societies (at the individual and community level) should start thinking about what to differently to move forward more sustainable practices (EEA, 2020; Harremoës et al., 2001).

Limitations of this study

It should be fully acknowledged that our study is fraught with a few limitations: first, since it is mostly based on statistical analysis, it might help highlighting evidence on a macroscopic scale. However, our framework could be less performant at explaining individual and specific variations among observations, which are typically masked. Moreover, since any model is as good as the data used, the limitations related to the supporting data used in this study should be considered. The reporting of COVID-19 data is subjected to different strategies depending on countries and might not be entirely accurate, nor up to date, depending on either technical limitations or communication strategies. Similarly, it is well known that only positive tests results are considered as confirmed cases: therefore, the less testing is done, the less confirmed cases are detected, which might not reflect accurately the actual state of the pandemic. On several occasions, the reliability of the tests has been questioned (Danilova, 2020). These issues and the subsequent uncertainty around the COVID-19 prevalence estimates should be considered as they might distort our understanding of the spread of the pandemic across different countries. Finally, this study also focused on the African continent to explain the variability in COVID-19 prevalence across countries through country-level related variables. Such focus might bring a loss of generalization of our findings to other contexts outside Africa, or to urban areas. However, the framework of methodology could still be applied in such cases, with consideration of new potential variables more related to such contexts. Also, the clusters identified might help in COVID-19 modeling studies, since better performance might be achieved through the tuning models according to each cluster. The findings in this study open new avenues for research regarding COVID-19 prevalence in Africa. Future studies should consider forecasting COVID-19 confirmed cases through time series modeling (Takele, 2020). Such modeling efforts might critically help in handling effectively the pandemic, but could also be useful in understanding spatial patterns of evolution of the pandemic and in assessing the effectiveness of mitigation of restrictive measures (Likassa et al., 2021). Also, future studies should consider assessing the potential effects of the pandemic on critical sectors to which most African countries are dependent, such as agricultural trade (Dugué et al., 2021; Lèye et al., 2021).

Conclusion

The current COVID-19 pandemic took the world by surprise early in 2020. The African continent, which turned out to be the least affected, has outwitted even the most sophisticated prognosticators. In this study, a set of 9 country-level descriptors have been identified as the optimal ones at explaining the variability of cumulative confirmed cases (per million) and cumulative deaths (per million) figures across the 54 African countries. The variables relating to the healthcare system and environment, international exposure and socioeconomic status are found to be closely related to COVID-19 prevalence, followed by variables relating to population structure. To a lesser extent, climate (through the geographical distance to the equator) might explain the current pandemic figures, more specifically in terms of cumulative deaths per million. A negative predictor, which is also the most significant variable, is the mortality related to lack of water, hygiene and sanitation. Based on these optimal features, the African continent is partitioned into 3 epidemiological clusters using the AHC method. Cluster 1 is composed of 11 countries mainly located mainly at the northernmost and southernmost parts of the continent, and characterized by the highest median values for confirmed cases, deaths of COVID-19. It also has the highest socioeconomic and standard of living features. Conversely, the median value for mortality related to lack of water, hygiene and sanitation is the lowest for this cluster. Cluster 2 (27 countries) is the one the most spared by the current pandemic. It also has the lowest standard of living and is the part of the continent where mortality due to lack of water, hygiene and sanitation seems to be the highest. Cluster 3 (1 countries) is intermediate between Cluster 1 and 2 in terms of COVID-19 prevalence and mostly features countries with likewise similar socioeconomic features. Overall, in Africa, as in the rest of the world, richer or topping countries seem the most affected by this pandemic, as regard to reported statistics. Some limitations of this study include the reliability of cases and deaths data reports, which might not be accurately reported on time. Also, it is of utmost importance to keep in mind that these reports are also limited by the testing policies applied by the different countries: the less testing is carried, the less active cases or deaths are reported. However, despite these limitations, the clustering produced in this study shed light on the nature and the exposure level to COVID-19 in African countries and might help fostering informed and strategic interventions by public authorities and decision makers for the current COVID-19 crisis and further epidemics or pandemics. Below is the link to the electronic supplementary material. Supplementary file1 (DOCX 912 kb) Supplementary file2 (XLSX 101 kb)
  57 in total

1.  The rising burden of non-communicable diseases in sub-Saharan Africa.

Authors:  Jean Joel Bigna; Jean Jacques Noubiap
Journal:  Lancet Glob Health       Date:  2019-10       Impact factor: 26.763

Review 2.  The Need for the Right Socio-Economic and Cultural Fit in the COVID-19 Response in Sub-Saharan Africa: Examining Demographic, Economic Political, Health, and Socio-Cultural Differentials in COVID-19 Morbidity and Mortality.

Authors:  Andre M N Renzaho
Journal:  Int J Environ Res Public Health       Date:  2020-05-15       Impact factor: 3.390

3.  A year of genomic surveillance reveals how the SARS-CoV-2 pandemic unfolded in Africa.

Authors:  Eduan Wilkinson; Marta Giovanetti; Houriiyah Tegally; James E San; Richard Lessells; Diego Cuadros; Darren P Martin; David A Rasmussen; Abdel-Rahman N Zekri; Abdoul K Sangare; Abdoul-Salam Ouedraogo; Abdul K Sesay; Abechi Priscilla; Adedotun-Sulaiman Kemi; Adewunmi M Olubusuyi; Adeyemi O O Oluwapelumi; Adnène Hammami; Adrienne A Amuri; Ahmad Sayed; Ahmed E O Ouma; Aida Elargoubi; Nnennaya A Ajayi; Ajogbasile F Victoria; Akano Kazeem; Akpede George; Alexander J Trotter; Ali A Yahaya; Alpha K Keita; Amadou Diallo; Amadou Kone; Amal Souissi; Amel Chtourou; Ana V Gutierrez; Andrew J Page; Anika Vinze; Arash Iranzadeh; Arnold Lambisia; Arshad Ismail; Audu Rosemary; Augustina Sylverken; Ayoade Femi; Azeddine Ibrahimi; Baba Marycelin; Bamidele S Oderinde; Bankole Bolajoko; Beatrice Dhaala; Belinda L Herring; Berthe-Marie Njanpop-Lafourcade; Bronwyn Kleinhans; Bronwyn McInnis; Bryan Tegomoh; Cara Brook; Catherine B Pratt; Cathrine Scheepers; Chantal G Akoua-Koffi; Charles N Agoti; Christophe Peyrefitte; Claudia Daubenberger; Collins M Morang'a; D James Nokes; Daniel G Amoako; Daniel L Bugembe; Danny Park; David Baker; Deelan Doolabh; Deogratius Ssemwanga; Derek Tshiabuila; Diarra Bassirou; Dominic S Y Amuzu; Dominique Goedhals; Donwilliams O Omuoyo; Dorcas Maruapula; Ebenezer Foster-Nyarko; Eddy K Lusamaki; Edgar Simulundu; Edidah M Ong'era; Edith N Ngabana; Edwin Shumba; Elmostafa El Fahime; Emmanuel Lokilo; Enatha Mukantwari; Eromon Philomena; Essia Belarbi; Etienne Simon-Loriere; Etilé A Anoh; Fabian Leendertz; Faida Ajili; Fakayode O Enoch; Fares Wasfi; Fatma Abdelmoula; Fausta S Mosha; Faustinos T Takawira; Fawzi Derrar; Feriel Bouzid; Folarin Onikepe; Fowotade Adeola; Francisca M Muyembe; Frank Tanser; Fred A Dratibi; Gabriel K Mbunsu; Gaetan Thilliez; Gemma L Kay; George Githinji; Gert van Zyl; Gordon A Awandare; Grit Schubert; Gugu P Maphalala; Hafaliana C Ranaivoson; Hajar Lemriss; Happi Anise; Haruka Abe; Hela H Karray; Hellen Nansumba; Hesham A Elgahzaly; Hlanai Gumbo; Ibtihel Smeti; Ikhlas B Ayed; Ikponmwosa Odia; Ilhem Boutiba Ben Boubaker; Imed Gaaloul; Inbal Gazy; Innocent Mudau; Isaac Ssewanyana; Iyaloo Konstantinus; Jean B Lekana-Douk; Jean-Claude C Makangara; Jean-Jacques M Tamfum; Jean-Michel Heraud; Jeffrey G Shaffer; Jennifer Giandhari; Jingjing Li; Jiro Yasuda; Joana Q Mends; Jocelyn Kiconco; John M Morobe; John O Gyapong; Johnson C Okolie; John T Kayiwa; Johnathan A Edwards; Jones Gyamfi; Jouali Farah; Joweria Nakaseegu; Joyce M Ngoi; Joyce Namulondo; Julia C Andeko; Julius J Lutwama; Justin O'Grady; Katherine Siddle; Kayode T Adeyemi; Kefentse A Tumedi; Khadija M Said; Kim Hae-Young; Kwabena O Duedu; Lahcen Belyamani; Lamia Fki-Berrajah; Lavanya Singh; Leonardo de O Martins; Lynn Tyers; Magalutcheemee Ramuth; Maha Mastouri; Mahjoub Aouni; Mahmoud El Hefnawi; Maitshwarelo I Matsheka; Malebogo Kebabonye; Mamadou Diop; Manel Turki; Marietou Paye; Martin M Nyaga; Mathabo Mareka; Matoke-Muhia Damaris; Maureen W Mburu; Maximillian Mpina; Mba Nwando; Michael Owusu; Michael R Wiley; Mirabeau T Youtchou; Mitoha O Ayekaba; Mohamed Abouelhoda; Mohamed G Seadawy; Mohamed K Khalifa; Mooko Sekhele; Mouna Ouadghiri; Moussa M Diagne; Mulenga Mwenda; Mushal Allam; My V T Phan; Nabil Abid; Nadia Touil; Nadine Rujeni; Najla Kharrat; Nalia Ismael; Ndongo Dia; Nedio Mabunda; Nei-Yuan Hsiao; Nelson B Silochi; Ngoy Nsenga; Nicksy Gumede; Nicola Mulder; Nnaemeka Ndodo; Norosoa H Razanajatovo; Nosamiefan Iguosadolo; Oguzie Judith; Ojide C Kingsley; Okogbenin Sylvanus; Okokhere Peter; Oladiji Femi; Olawoye Idowu; Olumade Testimony; Omoruyi E Chukwuma; Onwe E Ogah; Chika K Onwuamah; Oshomah Cyril; Ousmane Faye; Oyewale Tomori; Pascale Ondoa; Patrice Combe; Patrick Semanda; Paul E Oluniyi; Paulo Arnaldo; Peter K Quashie; Philippe Dussart; Phillip A Bester; Placide K Mbala; Reuben Ayivor-Djanie; Richard Njouom; Richard O Phillips; Richmond Gorman; Robert A Kingsley; Rosina A A Carr; Saâd El Kabbaj; Saba Gargouri; Saber Masmoudi; Safietou Sankhe; Salako B Lawal; Samar Kassim; Sameh Trabelsi; Samar Metha; Sami Kammoun; Sanaâ Lemriss; Sara H A Agwa; Sébastien Calvignac-Spencer; Stephen F Schaffner; Seydou Doumbia; Sheila M Mandanda; Sherihane Aryeetey; Shymaa S Ahmed; Siham Elhamoumi; Soafy Andriamandimby; Sobajo Tope; Sonia Lekana-Douki; Sophie Prosolek; Soumeya Ouangraoua; Steve A Mundeke; Steven Rudder; Sumir Panji; Sureshnee Pillay; Susan Engelbrecht; Susan Nabadda; Sylvie Behillil; Sylvie L Budiaki; Sylvie van der Werf; Tapfumanei Mashe; Tarik Aanniz; Thabo Mohale; Thanh Le-Viet; Tobias Schindler; Ugochukwu J Anyaneji; Ugwu Chinedu; Upasana Ramphal; Uwanibe Jessica; Uwem George; Vagner Fonseca; Vincent Enouf; Vivianne Gorova; Wael H Roshdy; William K Ampofo; Wolfgang Preiser; Wonderful T Choga; Yaw Bediako; Yeshnee Naidoo; Yvan Butera; Zaydah R de Laurent; Amadou A Sall; Ahmed Rebai; Anne von Gottberg; Bourema Kouriba; Carolyn Williamson; Daniel J Bridges; Ihekweazu Chikwe; Jinal N Bhiman; Madisa Mine; Matthew Cotten; Sikhulile Moyo; Simani Gaseitsiwe; Ngonda Saasa; Pardis C Sabeti; Pontiano Kaleebu; Yenew K Tebeje; Sofonias K Tessema; Christian Happi; John Nkengasong; Tulio de Oliveira
Journal:  Science       Date:  2021-09-09       Impact factor: 63.714

4.  SARS-CoV-2 and Plasmodium falciparum common immunodominant regions may explain low COVID-19 incidence in the malaria-endemic belt.

Authors:  M A M Iesa; M E M Osman; M A Hassan; A I A Dirar; N Abuzeid; J J Mancuso; R Pandey; A A Mohammed; M J Borad; H M Babiker; E H E Konozy
Journal:  New Microbes New Infect       Date:  2020-11-19

5.  Impact of weather on COVID-19 transmission in south Asian countries: An application of the ARIMAX model.

Authors:  Md Sabbir Hossain; Sulaiman Ahmed; Md Jamal Uddin
Journal:  Sci Total Environ       Date:  2020-11-02       Impact factor: 7.963

6.  Climate and the spread of COVID-19.

Authors:  Simiao Chen; Klaus Prettner; Michael Kuhn; Pascal Geldsetzer; Chen Wang; Till Bärnighausen; David E Bloom
Journal:  Sci Rep       Date:  2021-04-27       Impact factor: 4.379

Review 7.  The impact of outdoor air pollution on COVID-19: a review of evidence from in vitro, animal, and human studies.

Authors:  Thomas Bourdrel; Isabella Annesi-Maesano; Barrak Alahmad; Cara N Maesano; Marie-Abèle Bind
Journal:  Eur Respir Rev       Date:  2021-02-09

8.  Has COVID-19 subverted global health?

Authors:  Richard Cash; Vikram Patel
Journal:  Lancet       Date:  2020-05-05       Impact factor: 79.321

9.  The COVID-19 Pandemic and Non-communicable Diseases-A Wake-up Call for Primary Health Care System Strengthening in Sub-Saharan Africa.

Authors:  Christian Kraef; Pamela Juma; Per Kallestrup; Joseph Mucumbitsi; Kaushik Ramaiya; Gerald Yonga
Journal:  J Prim Care Community Health       Date:  2020 Jan-Dec

10.  Clustering of Countries for COVID-19 Cases based on Disease Prevalence, Health Systems and Environmental Indicators.

Authors:  Syeda Amna Rizvi; Muhammad Umair; Muhammad Aamir Cheema
Journal:  Chaos Solitons Fractals       Date:  2021-07-08       Impact factor: 5.944

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.