Literature DB >> 36193192

A Bioinformatics Tool for Predicting Future COVID-19 Waves Based on a Retrospective Analysis of the Second Wave in India: Model Development Study.

Ashutosh Kumar1, Adil Asghar1, Prakhar Dwivedi1, Gopichand Kumar1, Ravi K Narayan2, Rakesh K Jha1, Rakesh Parashar3, Chetan Sahni4, Sada N Pandey5.   

Abstract

Background: Since the start of the COVID-19 pandemic, health policymakers globally have been attempting to predict an impending wave of COVID-19. India experienced a devastating second wave of COVID-19 in the late first week of May 2021. We retrospectively analyzed the viral genomic sequences and epidemiological data reflecting the emergence and spread of the second wave of COVID-19 in India to construct a prediction model. Objective: We aimed to develop a bioinformatics tool that can predict an impending COVID-19 wave.
Methods: We analyzed the time series distribution of genomic sequence data for SARS-CoV-2 and correlated it with epidemiological data for new cases and deaths for the corresponding period of the second wave. In addition, we analyzed the phylodynamics of circulating SARS-CoV-2 variants in the Indian population during the study period.
Results: Our prediction analysis showed that the first signs of the arrival of the second wave could be seen by the end of January 2021, about 2 months before its peak in May 2021. By the end of March 2021, it was distinct. B.1.617 lineage variants powered the wave, most notably B.1.617.2 (Delta variant). Conclusions: Based on the observations of this study, we propose that genomic surveillance of SARS-CoV-2 variants, complemented with epidemiological data, can be a promising tool to predict impending COVID-19 waves. ©Ashutosh Kumar, Adil Asghar, Prakhar Dwivedi, Gopichand Kumar, Ravi K Narayan, Rakesh K Jha, Rakesh Parashar, Chetan Sahni, Sada N Pandey. Originally published in JMIR Bioinformatics and Biotechnology (https://bioinform.jmir.org), 22.09.2022.

Entities:  

Keywords:  COVID-19; SARS-CoV-2; epidemiology; genomic surveillance; second wave

Year:  2022        PMID: 36193192      PMCID: PMC9516867          DOI: 10.2196/36860

Source DB:  PubMed          Journal:  JMIR Bioinform Biotech        ISSN: 2563-3570


Introduction

The year 2019 had a SARS-CoV-2–driven wave of COVID worldwide that soon turned into a pandemic, and to date, this disease has killed about 65 million people [1]. Since the pandemic’s start, much policy talk has been about whether an impending COVID wave can be predicted [2]. Unfortunately, successful prediction of COVID waves has not yet been achieved. A prediction tool that can inform about an upcoming COVID wave well before time and reasonably accurately could minimize the enormous loss of life and other collateral damages. Multiple waves at a global scale driven by SARS-CoV-2 variants, primarily Alpha, Delta [3], and, most recently, Omicron [4], have followed since the first wave. The successive SARS-CoV-2 variants showed increased transmissibility and virulence compared with the wild-type strain [3]; however, the latest Omicron variant has shown higher transmissibility and immune escape but lesser lethality compared with the Delta variant [4]. The Delta variant–driven wave was characterized by high speed of rising cases, increased oxygen demand, vaccine breakthrough [5], a highly increased proportion of severe cases, and high mortality [6]. More comprehensive coverage of COVID vaccines in the global population is helping to create an immunity barrier against the rise of a new wave. However, an increase in the immune escape potential of emerging variants causes a grave concern for vaccine breakthroughs and reinfections [3,4,7]. With the waning of immunity derived from vaccines and previous infections [8], the risk of the emergence of a more lethal variant capable of creating a global wave remains high and therefore demands continued surveillance [9]. The Delta variant–driven wave showed a rapid peak and fall to the baseline, making it ideal for prediction studies. The Delta strain was first reported from India [10]. Of note, India witnessed a devastating second COVID wave that began toward the end of February 2021 [11]. The unexpected arrival of the second COVID wave, accompanied by an exponential increase in infections, brought the country’s epidemic response system and health infrastructure to a standstill [11], and resulted in massive suffering and loss of life [12]. The Delta variant belongs to the SARS-CoV-2 lineage B.1.617, which appeared as a precursor. The first case of the B.1.617 variant was also reported from India as early as October 2020 [13]. The World Health Organization (WHO) recognized the B.1.617 lineage as a global variant of concern (VOC). The strain evolved into 3 more sublines, namely, B.1.617.1-3, of which B.1.617.1 (the Kappa variant) was declared a variant of interest (VOI) and B.1.617.2 was later declared a VOC by the WHO [14]. B.1.617 contained mutations in key spike protein regions involved in host interactions and the induction of neutralizing antibodies (S: L452R, E484Q, D614G, del681, and del1072) [15]. The sublineages contained lineage-defining spike mutations (L452R and D614G) as well as newly developed mutations as follows: B.1.617.1 (S: T95I, G142D, E154K, L452R, E484Q, D614G, P681R, and Q1071H); B.1.617.2 (S: T19R, G142D, 156del, 157del, R158G, L452R, T478K, D614G, P681R, D614G, P681R, and D950N); and B.1.617.3 (S: T19R, L452R, E484Q, D614G, and P681R) [16]. Contemporary studies suggested that B.1.617 lineage variants were more easily transmissible [13,17-21] and deadlier [18] than the B.1.1.7 lineage (Alpha variant), a globally dominant strain before the second wave [10]. Studies also showed a significant reduction in the neutralization of variants of the B.1.617 lineage by antibodies derived from natural infections and many currently used COVID-19 vaccines, and multiple monoclonal antibodies [18-21]. Notably, B.1.617.2 showed very high transmissibility and immunological escape [10,13,17,22]. Several studies worldwide have shown that predicting an impending COVID-19 wave is possible [23-28]. These studies used mathematical modeling of epidemiological data. Unfortunately, none of them could accurately anticipate a COVID-19 wave. The ability to predict an established wave from epidemiological data alone seems severely limited [12,29]. The analysis of SARS-CoV-2 genomic sequences has emerged as an efficient surveillance tool for understanding the emergence of new variants and their spread. Fortunately, millions of SARS-CoV-2 genomic sequences from regions worldwide are being made publicly available as a collaborative effort to contain the pandemic [30]. The easy availability of high-quality viral sequences with patient metadata has opened a new avenue for potential predictions of the COVID-19 pandemic [31]. However, viral genomic sequences alone may not be sufficient for efficient predictions, and their current uses for this purpose are constrained. In this study, we propose an integrated approach using viral genome surveillance and epidemiological data for the prediction of an impending COVID-19 wave. We retrospectively analyzed viral genomic sequences and epidemiological data reflecting the emergence and spread of the second wave of COVID-19 in India to construct such a model.

Methods

Study Design, Participants, and Data Sources

We analyzed the time series (weekly and monthly) distributions of SARS-CoV-2 variants coupled with epidemiological data from December 1, 2020, to July 26, 2021 (34 weeks) for new cases and deaths from COVID-19 in India. Further, a phylodynamic analysis for individual variants was performed. We downloaded SARS-CoV-2 genomic sequence data and epidemiological data from the EpiCoV database of the Global Initiative on Sharing All Influenza Data (GISAID) [32] and the Worldometer database [33], respectively. A total of 40,359 genomic sequences of SARS-CoV-2 were analyzed. The sequence for each SARS-CoV-2 variant was retrieved using an automated search function that entered lineage and sublineage information into the EpiCoV database. The total numbers of sequences per week and month for the variants and their relative proportions were calculated (in percentage). The data were tabulated, and each variant’s weekly and monthly distributions were compared to COVID-19 epidemiological data (new cases and deaths) and statistically analyzed. The genomic sequences of SARS-CoV-2 variants in each state and union territory were also examined to check deviations from overall patterns in data.

Phylodynamics of SARS-CoV-2 Variants

A phylodynamic analysis of the variants circulating in the Indian population during the study period was performed on GISAID sequences using the bioinformatics tool available at EpiCoV.

Statistical Analysis

XLSTAT (Addinsoft) was used to perform all statistical analyses. Descriptive statistics were calculated for each variable. Levene and Anderson tests were used to determine the homogeneity or normality of the data. In addition, a correlation matrix was constructed, and a linear regression analysis was performed between contrasting variables (R values = −1 to +1). Finally, the statistical significance level for each comparison was set at P<.05.

Ethical Considerations

Approval from the institutional ethics committee was not required as the data used in this study were retrieved from publicly available databases.

Results

Our retrospective analysis of the epidemiological data reflected that the second COVID-19 wave started rising by the end of February 2021 and peaked by the end of the first week of May 2021. Based on the distinct epidemiological trends observed (Multimedia Appendix 1), we divided the study period (December 1, 2020, to July 26, 2021; 34 weeks) into prepeak (weeks 1-23) and postpeak (weeks 24-34) periods. The weekly average of new cases and deaths showed a strong correlation in the study period (R=0.98, P<.001), signifying the high statistical validity of the data for further comparisons. Further, we analyzed the distribution of SARS-CoV-2 variants circulating in the Indian population in correlation with new cases and deaths before and after the peak. For description, based on epidemiological trends, the prepeak period was further divided into the following 3 time series intervals: “very early” (weeks 1-8), “early” (weeks 9-16), and “near peak” (weeks 17-23). New cases and deaths showed a downward trend in the “very early” period and maintained a plateau in the “early” period (except toward the end when cases and deaths started increasing, indicating the start of the second wave). In the “near peak” period, a steep rise in new cases and deaths was observed (Figure 1).
Figure 1

Weekly distribution of SARS-CoV-2 variants in genomic sequence data from India and the correlation with daily new COVID-19 cases and deaths from December 1, 2020, to July 26, 2021. The data were analyzed for the period before the peak of the second wave (23rd week) and after that. SARS-CoV-2 genomic sequence data were obtained from the EpiCoV database of the Global Initiative on Sharing All Influenza Data, and epidemiological data were obtained from the Worldometer database.

The rise and fall of circulating SARS-CoV-2 variants were studied against the observed epidemiological data trends in the respective time series intervals. Observing the composite data trends of epidemiological and SARS-CoV-2 genomic data provides a glimpse of the formation of the second COVID-19 wave, with clear indications of which SARS-CoV-2 strains may have driven it (Figures 1 and 2). By December 2020, 8 SARS-CoV-2 Pango lineages and their multiple sublineages were circulating in the Indian population, including 4 VOCs (B.1.1.7, B.1.351, P1, and B.1.617.2) and 3 VOIs (B.1.617.1, B.1.127/B.1.429, and B.1.525). However, B.1.1.7 was the most dominant variant in that period. B.1.617 lineage variants collectively (B.1.617+) showed an upward trend since their emergence, and surpassed other VOCs, including B.1.1.7, by the end of January 2021 (weeks 8-9) and subsequently kept rising. In contrast, B.1.1.7 showed a downward trend by the end of March 2021 (weeks 17-18), with B.1.617 lineage variants becoming the dominant variants. By the end of April 2021, B.1.617 lineage variants were detected in 78.5% of SARS-CoV-2 sequences uploaded on the GISAID database, reaching about 83% in the week of the peak.
Figure 2

Origin and spread of B.1.617 lineage SARS-CoV-2 variants in the Indian population. Data were analyzed from December 1, 2020, to July 26, 2021. SARS-CoV-2 genomic sequence data were obtained from the EpiCoV database of the Global Initiative on Sharing All Influenza Data, and epidemiological data were obtained from the Worldometer database.

The phylodynamic analysis of the circulating variants in the study period strongly corroborated the trends present in the graph data, showing an exclusive increase in the cluster density of B.1.617.2 compared with other variants in the “near peak” period (Figure 3).
Figure 3

Phylodynamics of SARS-CoV-2 variants in the Indian population from December 1, 2020, to July 26, 2021. SARS-CoV-2 genomic sequence data were obtained from the EpiCoV database of the Global Initiative on Sharing All Influenza Data, and epidemiological data were obtained from the Worldometer database. VOC: variant of concern.

To know whether the rise in the B.1.617.2 variant was localized to specific geographical regions, which may have influenced the collective data trends, we compared the monthly distribution of genomic sequences of SARS-CoV-2 variants for the states and union territories of India individually. A similar increase in the detection of the B.1.617.2 variant was observed in most states and union territories (Multimedia Appendix 2), except Kerala, where different patterns were visible (Figure S15 in Multimedia Appendix 2). In Kerala, the rise of the B.1.617.2 variant was slower in comparison with the rest of the country (55.5% vs 72% of total cases by the end of April 2021), which was further confirmed in the state-wise serosurvey data from the period of the second wave (44.4% vs 67.7% of the national average) [34]. Notably, a sharp rise in B.1.617.2 cases was observed in Kerala in a later period. Weekly distribution of SARS-CoV-2 variants in genomic sequence data from India and the correlation with daily new COVID-19 cases and deaths from December 1, 2020, to July 26, 2021. The data were analyzed for the period before the peak of the second wave (23rd week) and after that. SARS-CoV-2 genomic sequence data were obtained from the EpiCoV database of the Global Initiative on Sharing All Influenza Data, and epidemiological data were obtained from the Worldometer database. Origin and spread of B.1.617 lineage SARS-CoV-2 variants in the Indian population. Data were analyzed from December 1, 2020, to July 26, 2021. SARS-CoV-2 genomic sequence data were obtained from the EpiCoV database of the Global Initiative on Sharing All Influenza Data, and epidemiological data were obtained from the Worldometer database. Phylodynamics of SARS-CoV-2 variants in the Indian population from December 1, 2020, to July 26, 2021. SARS-CoV-2 genomic sequence data were obtained from the EpiCoV database of the Global Initiative on Sharing All Influenza Data, and epidemiological data were obtained from the Worldometer database. VOC: variant of concern.

Discussion

Principal Findings

The retrospective examination of linked viral genomic sequences and epidemiological data in this study clearly showed that the occurrence of B.1.617 lineage variants, particularly the B.1.617.2 sublineage, was strongly related to the second wave of COVID-19 in India. In late January 2021, when instances of B.1.617.2 surpassed those of all other variants, the first signs of an imminent wave of COVID-19 began to appear. The rise of the wave could be observed closely until the end of March 2021, when instances of B.1.617.2 showed a sharp increase in line with the total number of new cases.

Comparison With Prior Work

Current prediction models in the COVID-19 pandemic are dominated by purely epidemiological analyses, from which hardly anyone could accurately predict an impending COVID-19 wave [23-27]. The importance of studying viral genomic sequences for the epidemiological surveillance of new SARS-CoV-2 variants is well recognized [31,35-40]. However, its application in developing a predictive model to forecast upcoming virus waves has received little appreciation in the existing literature [41]. Interestingly, strong conceptual validation for the applicability of an integrated approach to predict an impending COVID-19 wave using viral genomic surveillance and epidemiological data came from a recent study by de Hoffer et al [42]. These authors studied the temporal dynamics of emerging SARS-CoV-2 variants using a machine learning algorithm–based analysis of the spike protein sequences of viral samples from England, Scotland, and Wales reported in the GISAID database. Further, they correlated the relative percentage of each variant with the weekly and monthly epidemiological data of active cases from the studied geographical regions. They showed a strong relationship between the genesis of a new emerging variant and the onset of a new wave, with an exponential increase in the number of infections [42]. Moreover, our findings regarding the second wave of COVID-19 in India are corroborated by a previous study by Dhar et al [10]. The authors analyzed viral genomic sequences retrospectively and observed a similar pattern in the rise of the B.1.617 lineage, mainly the B.1.617.2 variant, in Delhi before the second wave [10]. A B.1.617.2-driven second wave was also reflected in the analysis of viral genomic sequences performed by Adiga and Nayak in 2021 [43]. We recently used our prediction model prospectively during the initial rise of cases caused by the Omicron strain in South Africa, which indicated an upcoming wave with very high transmissibility but limited lethality [4]. These predictions were later accurately reflected in the studies reporting the Omicron-mediated fourth wave of COVID-19 in South Africa [44,45]. The potential predictability of the second wave of COVID-19 in India in the retrospective data analysis suggests that genomic surveillance of SARS-CoV-2 variants, enriched with epidemiological data, could be a potential tool to predict upcoming COVID-19 waves. Still, the prediction accuracy is largely dependent on population-based viral genomic sequencing and consistency in data upload from all geographic regions, as well as accurate reporting of epidemiological data. The sole increase in the proportion of an emerging SARS-CoV-2 variant, coupled with an associated rise in new cases, might inform the arrival of a new wave of COVID-19. However, consideration of other epidemiological factors, such as previous exposure to related virus strains and the immunization status of the population, will be necessary to determine the magnitude of an impending wave [46]. Notably, the first wave of COVID-19 in India was limited in scope, as evidenced by the serosurvey data [47,48], and only a small part of the population was vaccinated as of early 2021 [49]. With the emergence of a new variant, both these factors may have created an ideal environment for a massive second wave to emerge. In addition, preventive measures, such as blocking or limiting gatherings and using face masks, can also influence the prospects and magnitude of a new wave [29].

Limitations

There were some limitations in our study that may have influenced the interpretation of the results. First, the samples used in our analyses might not be representative of the population. In many geographical regions, the sample size was grossly disproportionate. Therefore, the genomic sequence data presented in this study might not reflect the exact epidemiological extent of the distribution of the variants in the reported geographical regions but only show their relative proportions in the samples for which genomic sequences were uploaded to the GISAID database. We have assumed that similar proportions exist between variants in the actual population. Second, inconsistent reports and uploads of genomic sequences made it challenging to study a daily trend in the spread of variants. Finally, the scarcity of genomic sequences and inconsistency in uploading to the databases used for some states/union territories made determining variant dominance difficult.

Conclusions

Based on the observations of this study, we propose that genomic surveillance of SARS-CoV-2 variants, complemented with epidemiological data, can be a promising tool to predict upcoming COVID-19 waves.
  36 in total

1.  Emergence of Novel SARS-CoV-2 variants in India: second wave.

Authors:  Rama Adiga; Varun Nayak
Journal:  J Infect Dev Ctries       Date:  2021-11-30       Impact factor: 0.968

Review 2.  SARS-CoV-2 Omicron Variant: Epidemiological Features, Biological Characteristics, and Clinical Significance.

Authors:  Yifei Guo; Jiajia Han; Yao Zhang; Jingjing He; Weien Yu; Xueyun Zhang; Jingwen Wu; Shenyan Zhang; Yide Kong; Yue Guo; Yanxue Lin; Jiming Zhang
Journal:  Front Immunol       Date:  2022-04-29       Impact factor: 8.786

3.  Clinical severity of COVID-19 in patients admitted to hospital during the omicron wave in South Africa: a retrospective observational study.

Authors:  Waasila Jassat; Salim S Abdool Karim; Caroline Mudara; Richard Welch; Lovelyn Ozougwu; Michelle J Groome; Nevashan Govender; Anne von Gottberg; Nicole Wolter; Milani Wolmarans; Petro Rousseau; Lucille Blumberg; Cheryl Cohen
Journal:  Lancet Glob Health       Date:  2022-05-18       Impact factor: 38.927

Review 4.  The current second wave and COVID-19 vaccination status in India.

Authors:  Chiranjib Chakraborty; Ashish Ranjan; Manojit Bhattacharya; Govindasamy Agoramoorthy; Sang-Soo Lee
Journal:  Brain Behav Immun       Date:  2021-05-19       Impact factor: 7.217

5.  India grapples with second wave of COVID-19.

Authors:  Udani Samarasekera
Journal:  Lancet Microbe       Date:  2021-06-02

6.  Genomic characterization and epidemiology of an emerging SARS-CoV-2 variant in Delhi, India.

Authors:  Mahesh S Dhar; Robin Marwal; Radhakrishnan Vs; Kalaiarasan Ponnusamy; Bani Jolly; Rahul C Bhoyar; Viren Sardana; Salwa Naushin; Mercy Rophina; Thomas A Mellan; Swapnil Mishra; Charles Whittaker; Saman Fatihi; Meena Datta; Priyanka Singh; Uma Sharma; Rajat Ujjainiya; Nitin Bhatheja; Mohit Kumar Divakar; Manoj K Singh; Mohamed Imran; Vigneshwar Senthivel; Ranjeet Maurya; Neha Jha; Priyanka Mehta; Vivekanand A; Pooja Sharma; Arvinden Vr; Urmila Chaudhary; Namita Soni; Lipi Thukral; Seth Flaxman; Samir Bhatt; Rajesh Pandey; Debasis Dash; Mohammed Faruq; Hemlata Lall; Hema Gogia; Preeti Madan; Sanket Kulkarni; Himanshu Chauhan; Shantanu Sengupta; Sandhya Kabra; Ravindra K Gupta; Sujeet K Singh; Anurag Agrawal; Partha Rakshit; Vinay Nandicoori; Karthik Bharadwaj Tallapaka; Divya Tej Sowpati; K Thangaraj; Murali Dharan Bashyam; Ashwin Dalal; Sridhar Sivasubbu; Vinod Scaria; Ajay Parida; Sunil K Raghav; Punit Prasad; Apurva Sarin; Satyajit Mayor; Uma Ramakrishnan; Dasaradhi Palakodeti; Aswin Sai Narain Seshasayee; Manoj Bhat; Yogesh Shouche; Ajay Pillai; Tanzin Dikid; Saumitra Das; Arindam Maitra; Sreedhar Chinnaswamy; Nidhan Kumar Biswas; Anita Sudhir Desai; Chitra Pattabiraman; M V Manjunatha; Reeta S Mani; Gautam Arunachal Udupi; Priya Abraham; Potdar Varsha Atul; Sarah S Cherian
Journal:  Science       Date:  2021-10-14       Impact factor: 47.728

7.  Possible future waves of SARS-CoV-2 infection generated by variants of concern with a range of characteristics.

Authors:  Louise Dyson; Edward M Hill; Sam Moore; Jacob Curran-Sebastian; Michael J Tildesley; Katrina A Lythgoe; Thomas House; Lorenzo Pellis; Matt J Keeling
Journal:  Nat Commun       Date:  2021-09-30       Impact factor: 14.919

8.  Waning 2-Dose and 3-Dose Effectiveness of mRNA Vaccines Against COVID-19-Associated Emergency Department and Urgent Care Encounters and Hospitalizations Among Adults During Periods of Delta and Omicron Variant Predominance - VISION Network, 10 States, August 2021-January 2022.

Authors:  Jill M Ferdinands; Suchitra Rao; Brian E Dixon; Patrick K Mitchell; Malini B DeSilva; Stephanie A Irving; Ned Lewis; Karthik Natarajan; Edward Stenehjem; Shaun J Grannis; Jungmi Han; Charlene McEvoy; Toan C Ong; Allison L Naleway; Sarah E Reese; Peter J Embi; Kristin Dascomb; Nicola P Klein; Eric P Griggs; Deepika Konatham; Anupam B Kharbanda; Duck-Hye Yang; William F Fadel; Nancy Grisel; Kristin Goddard; Palak Patel; I-Chia Liao; Rebecca Birch; Nimish R Valvi; Sue Reynolds; Julie Arndorfer; Ousseny Zerbo; Monica Dickerson; Kempapura Murthy; Jeremiah Williams; Catherine H Bozio; Lenee Blanton; Jennifer R Verani; Stephanie J Schrag; Alexandra F Dalton; Mehiret H Wondimu; Ruth Link-Gelles; Eduardo Azziz-Baumgartner; Michelle A Barron; Manjusha Gaglani; Mark G Thompson; Bruce Fireman
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2022-02-18       Impact factor: 17.586

9.  Predicting the mutational drivers of future SARS-CoV-2 variants of concern.

Authors:  M Cyrus Maher; Istvan Bartha; Steven Weaver; Julia di Iulio; Elena Ferri; Leah Soriaga; Florian A Lempp; Brian L Hie; Bryan Bryson; Bonnie Berger; David L Robertson; Gyorgy Snell; Davide Corti; Herbert W Virgin; Sergei L Kosakovsky Pond; Amalio Telenti
Journal:  Sci Transl Med       Date:  2022-02-23       Impact factor: 17.956

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.