Literature DB >> 34094532

Understanding the cycles of COVID-19 incidence: Principal Component Analysis and interaction of biological and socio-economic factors.

Pablo Duarte1, Efrain Riveros-Perez2,3.   

Abstract

The incidence curve of coronavirus disease 19 (COVID-19) shows cyclical patterns over time. We examine the cyclical properties of the incidence curves in various countries and use principal components analysis to shed light on the underlying dynamics that are common to all countries. We find that the cyclical series of 37 countries can be summarized in four principal components which explain over 90% of the variation. We also discuss the influence of complex interactions between biological viral natural history and socio-political reactions and measures adopted by different countries on the cyclical patterns exhibited by COVID-19 around the globe.
© 2021 The Authors.

Entities:  

Keywords:  COVID-19; Epidemiological data; Predictive model; Principal component analysis; Viral spread

Year:  2021        PMID: 34094532      PMCID: PMC8168336          DOI: 10.1016/j.amsu.2021.102437

Source DB:  PubMed          Journal:  Ann Med Surg (Lond)        ISSN: 2049-0801


Introduction

Many infections undergo cycles and present waves of variable duration ranging from one to four years [1]. The two major contributors to the cyclic nature of respiratory viral infections are the changes in environmental parameters and human behavior [2]. The magnitude and the severe impact of COVID-19 on individual mortality, social interactions, strain on healthcare systems, and political and economic variables worldwide has led researchers to try to understand the complex interactions between the SARS CoV-2 (Severe Acute Coronavirus 2) virus, the individual host and the exposed population, and environmental factors. It is recognized that SARS CoV-2 as a coronavirus, has transmission epidemiology similar to influenza [3]. Influenza, as well as COVID-19, has followed wave patterns with a peak usually followed by a second wave a few months later [4]. Studying this cyclical pattern provides us with important insights about the nature of the cycle involving virus, host, community, and environment. Our study uses Principal Component Analysis (PCA) to model incidence patterns in different countries. Fig. 1 shows the cyclical component of incidence series for Germany, Israel and the United States normalized such that the first observation corresponds to the peak of the first infection wave. The patterns are similar but not equal. While the cycles seem to move in a congruent way, the amplitude and length tend to vary. In this analysis, we apply frequency domain time series techniques to examine the extent to which common patterns in the cyclical components can be extracted allowing us to approximate the further movement of the series. We found that the variation of 37 incidence curves corresponding to the same number of countries can be reduced to four principal components that explain over 90% of the variation in the sample. We also show cycle predictions for countries with shorter incidence series since the peak of the first wave and a one-step ahead as a well as an out-of-sample estimation for Germany and the United States as a representation of countries with high overall incidence in two different continents. Both countries are currently in the upswing of a cycle whose turnaround point does not seem to be within the next couple of weeks.
Fig. 1

New cases per 1 Million (Cyclical Component) in United States, Germany, and Israel.

New cases per 1 Million (Cyclical Component) in United States, Germany, and Israel.

Cycle extraction and commonalities

To examine the common cyclical properties of the series, we followed two general steps. First, we filtered away the trends and the high-frequency cycles (e.g. weekly fluctuations in the numbers) to solely focus on the cycles that repeat every couple of weeks and months. Second, we extracted the elements that are common to all the filtered series of a sub-sample of 37 countries using principal components analysis.

Time series filtering

Time series in general can be broken down into different constituent factors: trend, cyclical components (weekly, monthly, and other periodicities), and an irregular component [5]. By using time series filters it is possible to separate components from each other and focus on the periodicities of interest. To filter the range of frequencies for the specific case of COVID-19 incidence series across countries, we used the popular Hodrick-Prescot filter in its double application [6]. We illustrate the procedure using the German series as an example (Fig. 2). The first application of the HP-filter takes away the cycles that repeat very often (high frequency) and leaves away everything else (curve HP-1 in Fig. 2A). The second application leaves only the very low frequencies (trend, curve HP-2 in Fig. 2A). By subtracting HP-2 from HP-1 we get the frequencies that are just between the very frequent periodicities and the trend (black dashed line in Fig. 2B).
Fig. 2

Germany: Filtering the Incidence Curve. (A) First and second application of HP filter. (B) Original and resulting (filtered) incidence series. HP, Hodrick-Prescott.

Germany: Filtering the Incidence Curve. (A) First and second application of HP filter. (B) Original and resulting (filtered) incidence series. HP, Hodrick-Prescott. The underlying series for the calculation of the cyclical components are the standardized logarithms of the weekly new cases per 1 Million inhabitants. The series start at a high level because we used data starting at the peak of the first wave. The reason for omitting the initial observations was that in most countries the test capacity increased at the beginning of the spread of the virus, leading to overestimation of incidence.

Common periodicities

We use the resulting filtered curves in the second step to examine common regularities at different periodicities. A straightforward way of examining commonalities consists of using a principal components analysis (PCA). The idea of PCA in a nutshell is to reduce the dimensionality of a dataset with multiple variables by identifying a smaller number of independent variables which capture the information of the dataset by summarizing the common patterns and therefore the variation of the whole dataset. In our specific case, the different variables are the cyclical components of the incidences of each country. We implemented a PCA over a sub-sample of 37 countries for which we have data corresponding to at least 358 days since the peak of the first wave. Data are more or less reliable and the testing capacity is accurate (according to the World Health Organization (WHO) criteria of 10–30 tests per confirmed case) [7]. These criteria leave countries like Cuba, Venezuela and Iran out of the sample. The countries included were Austria, Belgium, Bosnia and Herzegovina, China, Costa Rica, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Israel, Italy, Japan, Kosovo, Latvia, Macedonia, Malaysia, Netherlands, Norway, Portugal, Romania, Singapore, Slovenia, South Korea, Spain, Switzerland, Thailand, Turkey, United States, Uruguay, and Uzbekistan. The time span goes from the peak of the first wave in each country until April 6th, 2021. The main result from the principal components analysis is that 96.7% of the cyclical variation of 37 incidence series can be summarized in six variables or principal components. Four principal components contain over 90% of the total variation. This means that the infection dynamics as well as the social and political reactions over countries have important commonalities, such that they can be reduced to a handful of variables. Fig. 3 shows the first 4 principal components calculated out of the 37 filtered series. Each component exhibits a different cycle length and trajectory. The first component (black line in Fig. 3), which explains 50% of the variation, shows a declining pattern until period 80 and afterwards a steady increase at a slower rate compared to the decrease until period 220. Afterwards, a new declining phase starts until it reaches a new through roughly 100 periods later. Since we do not have longer underlying time series, we cannot know with certainty for how long this cycle will expand or when a new turning point could be achieved. The second component (red line in Fig. 3) explains an additional 15% of the variation. This component suggests a lag length of approximately 60 days. The cycle length of the third component is unclear as a turnaround after period 300 is not foreseeable yet. The fourth principal component seems to have a somewhat shorter length as the second and is also entering an increasing phase, as well as the first and second principal components are.
Fig. 3

Principal Components explaining 93% of variability. Each component exhibits a different cycle length and trajectory.

Principal Components explaining 93% of variability. Each component exhibits a different cycle length and trajectory.

Estimating cyclical series

We can use the extracted principal components, which are 358 days long, to estimate the hypothetical path of the cycle curves of countries with shorter incidence series than the ones used to calculate the principal components. Predictions were made using a simple linear regression of the cycle series on the six principal components. Fig. 4 shows the predicted cycles using the 6 principal components which explain 97% of the variation of the 37-country sample. The predicted cyclical series (red dashed line in Fig. 4) fits the official data closer in some countries than in others. The shorter the prediction horizon, such as in the UK or in Bulgaria, the more accurate the prediction was. The discrepancies are more evident when the available data end in or close to a turning point such as in Chile, India, Poland and Russia. Discrepancies can also reflect differences in the data quality as most of the countries with shorter cycles have less testing capacity (e.g. Colombia, Chile) or concerns about the transparency of their data reporting (e.g. Russia).
Fig. 4

Predicted Cycles for countries not used to perform the Principal Component Analysis (PCA) in relation to official data.

Predicted Cycles for countries not used to perform the Principal Component Analysis (PCA) in relation to official data. Fig. 5 shows one-step-ahead predictions of the cycle series for Germany and the United States starting at day 150. In other words, we wanted to answer the question: Were Germany and the US now at day 150 of the pandemic, how well would the principal components (trained model including neither of those two countries) predict the later trajectory of the cycle for both countries? The upward turning point after day 150 was well anticipated by the principal components. The trajectory towards the end of the series is overestimated for Germany and underestimated for the US and the latest turnaround towards a new increasing phase was captured rather accurately.
Fig. 5

One-step-ahead cycles Germany (A) and the United States (USA) (B).

One-step-ahead cycles Germany (A) and the United States (USA) (B).

Forecasts of the cycles

Using the same approach to estimate the trajectory of the cycles for Germany and the United States for the upcoming weeks, we analyzed countries that have been at least one week ahead in the number of days since the initial peak. For Germany, Australia, Austria, China, Costa Rica, Italy, Latvia, Norway, South Korea, Thailand, and Uruguay are at least one week ahead. For the United States, the available countries are Australia, Austria, China, Costa Rica, Croatia, Czech Republic, Estonia, France, Germany, Greece, Italy, Latvia, Norway, Portugal, Slovenia, South Korea, Spain, Switzerland, Thailand, and Uruguay. Fig. 6 shows the estimation of the trajectory one week from day 358 on. For Germany, the model projects a further upward but less steep trajectory. For the United States, even though the model is less accurate, it predicts a steepening of the cycle series which the official data is not showing at the time of the analysis.
Fig. 6

Trajectory estimation Germany (A) and United States (USA) (B).

Trajectory estimation Germany (A) and United States (USA) (B).

Discussion

Modeling of biological phenomena is limited by the presence of randomness and noise. This randomness is the result of incomplete or insufficient knowledge of the nuances of biological variables at a smaller scale [8]. In our case, the effects of individual interactions with SARS CoV-2 are difficult to incorporate into a model based on large populations of different countries. Understanding both static complexities that do not change over time and extrinsic variations imposed over time by changes in biological aspects of the virus (e.g., new variants), the host (e.g., acquired immunity), and the community (e.g., behavior changes), is critical to comprehend patterns of viral spread. For instance, genetic differences leading to heterogeneous susceptibility to the virus, variation in viral replication from host to host, and behavioral and contact differences between individuals have been identified as important factors determining viral transmission within groups of people [9,10]. A significant body of evidence shows that possible seasonal determinants typical of respiratory viruses such as temperature, sunlight, and humidity, as well as host factors (e.g., vitamin status and behavior) contribute to the cyclical pattern of these infections [2,[11], [12], [13], [14], [15]]. Environmental conditions such as dry and unventilated air facilitates transmission of respiratory virus particles [16]. Cyclic tightening and loosening of lockdown mandates or compliance to the rule in different countries may be associated with intermittent exposure to indoor conditions that enhance transmission in patterns compatible to those displayed by the measured and predicted waves presented by our study [17]. On the other hand, despite the generalized agreement on the fact that dry environments occurring during winter season stimulate respiratory viral replication and transmission, Luo et al. challenged this notion by examining province-level variability of the basic reproductive numbers of COVID-19 in China, determining that summer conditions would not protect against viral spread [18]. Wang et al. assessed the impact of humidity and temperature on the transmission of COVID-19 taking into account socioeconomic status, mobility status, and demographics [19]. The authors conclude that changes in humidity and temperature are insufficient to reduce the reproductive viral number. Taken together, these studies underscore the complex contribution of environmental and non-environmental factors to viral spread. Our study shows that the waves are not completely seasonal, and that social and policy factors are playing a significant role in the pattern of infection across communities. Mathematical models have been used to predict the effect of measures such as social distancing and lockdowns on COVID-19 propagation patterns [20]. However, discrepancy between predictions and actual incidence and patterns of presentation have been consistently identified [21]. Our study tries to add value to the contribution of statistical learning methods to the exploration of possibilities rather than making robust prediction about contagion dynamics in the future. We present actual, fitted, and predictive incidence data. The information contained in our numbers reflects positive testing and not mortality. In contrast with studies using compartments for exposure, we cannot discriminate cases according to severity. We consider that simplifying information by just displaying incidence trends provides better interpretability to help decision makers incorporate data into their learning processes before coming up with policies [22]. Indeed, most governments closely follow the incidence and the reproduction number R, which is a function of the incidence, for health and economic policy making. The fact that our study includes wave patterns from countries with diverse policies facilitates this reflection process based on feedback provided by studies like ours. The introduction of vaccines is expected to change the progression and epidemiological profile of COVID-19. De Leon et al. presented a model showing the effect of the vaccination program in Israel that covered 80% of the population at the time of the study. The authors report that the shape of the outbreak as measured as new moderate and severe cases has changed, bringing the decline earlier than expected by their prediction model [23]. Fig. 1 also shows the steep decline in the cyclical component for Israel starting around day 300 after the peak of the first wave. Our study has limitations. We did not use compartments to discriminate between asymptomatic, mild, and severe cases, and we did not analyze mortality rates. Although making predictions based on severity may show a better picture about the virulence of the virus and may help planning for increasing hospital capacity, we believe that our study provides policy makers and the general public with information that describes the overall infection spread and its relationship with control measures in different countries. As most policy makers in general also do not discriminate between cases when deciding on imposing lockdowns restrictions on say schools vs retiring homes, our study is indeed useful to describe the cyclical movements that come from social and political decision making. We argue that the comparison between countries might prove useful to isolate effective policies versus ineffective and even damaging ones. Although some control measures have been universal, regional differences are probably playing a role in differential trend patterns [24]. We also acknowledge the fact that our analysis includes a time frame that may not be representative of the total duration of the pandemic. In this regard, with the emergence of new variants, the predictions of our model may become obsolete [25]. We propose to continue training the model with new observations and reevaluate the prediction accuracy as the pandemic and the effect of new variants evolve. Future research is necessary to continually evaluate the prediction accuracy of this and other models based on data analysis as the COVID-19 progression is fluid and rapidly changing. Experimental designs evaluating specific control measures in population groups may help elucidate the role of such measures as part of public health policy. Finally, population studies targeting vulnerable populations to characterize their unique epidemiological profile in relation to COVID-19 are warranted. Statistical learning is a powerful tool in assisting analysis of growing data in those population subgroups.

Conclusion

The incidence curves of the COVID-19 measured as the number of confirmed new cases per 1 million inhabitants show strong commonalities among countries. After filtering away the high-periodic elements as well as the trends from the incidence curves of 37 countries, 90% of the information in the resulting dataset can be summarized in four variables (principal components). The commonalities are not only related to the periodic nature of viral infections but also that the fact that citizens and governments have reacted to the spread of the virus in a similar fashion. The combination of viral natural history and governmental and individual behavior seem to have so much in common, that the incidence cycles of 37 countries can be reduced to a few principal components. One-step ahead forecasts for Germany and the United States show that the principal components can track the incidence cycles. How well the principal components can predict the trajectories out-of-sample will be evident in the coming weeks and months.

Ethical approval

Mathematical analysis. No IRB approval necessary.

Sources of funding

None

Author contribution

Pablo Duarte: Data collection, analysis and construction of manuscript. Efrain Riveros-Perez: Data collection and analysis. Manuscript construction and final review.

Conflicts of interest

None

Research registration unique identifying number (UIN)

N/A.

Trial registry number – ISRCTN

N/A

Guarantor

Efrain Riveros-Perez
  17 in total

Review 1.  Cyclical patterns and predictability in infection.

Authors:  N D Noah
Journal:  Epidemiol Infect       Date:  1989-04       Impact factor: 2.451

Review 2.  Seasonality of viral infections: mechanisms and unknowns.

Authors:  D Fisman
Journal:  Clin Microbiol Infect       Date:  2012-07-20       Impact factor: 8.067

Review 3.  Impact of pollution, climate, and sociodemographic factors on spatiotemporal dynamics of seasonal respiratory viruses.

Authors:  Chantel Sloan; Martin L Moore; Tina Hartert
Journal:  Clin Transl Sci       Date:  2011-02       Impact factor: 4.689

4.  Wrong but Useful - What Covid-19 Epidemiologic Models Can and Cannot Tell Us.

Authors:  Inga Holmdahl; Caroline Buckee
Journal:  N Engl J Med       Date:  2020-05-15       Impact factor: 91.245

5.  Seasonal variation in host susceptibility and cycles of certain infectious diseases.

Authors:  S F Dowell
Journal:  Emerg Infect Dis       Date:  2001 May-Jun       Impact factor: 6.883

6.  Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England.

Authors:  Sam Abbott; Rosanna C Barnard; Christopher I Jarvis; Adam J Kucharski; James D Munday; Carl A B Pearson; Timothy W Russell; Damien C Tully; Alex D Washburne; Tom Wenseleers; Nicholas G Davies; Amy Gimma; William Waites; Kerry L M Wong; Kevin van Zandvoort; Justin D Silverman; Karla Diaz-Ordaz; Ruth Keogh; Rosalind M Eggo; Sebastian Funk; Mark Jit; Katherine E Atkins; W John Edmunds
Journal:  Science       Date:  2021-03-03       Impact factor: 63.714

7.  Modeling the transmission dynamics of COVID-19 epidemic: a systematic review.

Authors:  Jinxing Guan; Yongyue Wei; Yang Zhao; Feng Chen
Journal:  J Biomed Res       Date:  2020-10-30

8.  Single-cell analysis and stochastic modelling unveil large cell-to-cell variability in influenza A virus infection.

Authors:  Frank S Heldt; Sascha Y Kupke; Sebastian Dorl; Udo Reichl; Timo Frensing
Journal:  Nat Commun       Date:  2015-11-20       Impact factor: 14.919

Review 9.  Death from 1918 pandemic influenza during the First World War: a perspective from personal and anecdotal evidence.

Authors:  Peter C Wever; Leo van Bergen
Journal:  Influenza Other Respir Viruses       Date:  2014-06-27       Impact factor: 4.380

View more
  3 in total

1.  COVID-19 Vulnerability Mapping of Asian Countries.

Authors:  Showmitra Kumar Sarkar; Md Manjur Morshed; Tanmoy Chakraborty
Journal:  Disaster Med Public Health Prep       Date:  2022-06-08       Impact factor: 5.556

2.  Application of principal component analysis on temporal evolution of COVID-19.

Authors:  Ashadun Nobi; Kamrul Hasan Tuhin; Jae Woo Lee
Journal:  PLoS One       Date:  2021-12-02       Impact factor: 3.240

3.  Anxiety in anesthesia providers during coronavirus disease 19 pandemic: Insights into perception of harm a cross-sectional study.

Authors:  Efrain Riveros-Perez; Javier Polania; Maria Gabriela Sanchez; Bibiana Avella-Molano; Alexander Rocuts
Journal:  Ann Med Surg (Lond)       Date:  2022-04-05
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.