Literature DB >> 33200080

A Distinct Phylogenetic Cluster of Indian Severe Acute Respiratory Syndrome Coronavirus 2 Isolates.

Sofia Banu¹, Bani Jolly^2,3, Payel Mukherjee¹, Priya Singh¹, Shagufta Khan¹, Lamuk Zaveri¹, Sakshi Shambhavi^1,3, Namami Gaur¹, Shashikala Reddy⁴, K Kaveri⁵, Sivasubramanian Srinivasan⁵, Dhinakar Raj Gopal⁶, Archana Bharadwaj Siva¹, Kumarasamy Thangaraj¹, Karthik Bharadwaj Tallapaka¹, Rakesh K Mishra¹, Vinod Scaria², Divya Tej Sowpati¹.

Abstract

BACKGROUND: From an isolated epidemic, coronavirus disease 2019 has now emerged as a global pandemic. The availability of genomes in the public domain after the epidemic provides a unique opportunity to understand the evolution and spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus across the globe.
METHODS: We performed whole-genome sequencing of 303 Indian isolates, and we analyzed them in the context of publicly available data from India.
RESULTS: We describe a distinct phylogenetic cluster (Clade I/A3i) of SARS-CoV-2 genomes from India, which encompasses 22% of all genomes deposited in the public domain from India. Globally, approximately 2% of genomes, which to date could not be mapped to any distinct known cluster, fall within this clade.
CONCLUSIONS: The cluster is characterized by a core set of 4 genetic variants and has a nucleotide substitution rate of 1.1 × 10-3 variants per site per year, which is lower than the prevalent A2a cluster. Epidemiological assessments suggest that the common ancestor emerged at the end of January 2020 and possibly resulted in an outbreak followed by countrywide spread. To the best of our knowledge, this is the first comprehensive study characterizing this cluster of SARS-CoV-2 in India.

Entities: Chemical

Keywords: COVID-19; Clade I/A3i; India; genetic epidemiology; phylogenomics

Year: 2020 PMID： 33200080 PMCID： PMC7543508 DOI： 10.1093/ofid/ofaa434

Source DB: PubMed Journal: Open Forum Infect Dis ISSN： 2328-8957 Impact factor: 3.835

Since the emergence of the outbreak in the Chinese city of Wuhan in late 2019, the novel coronavirus disease has spread widely to become a global pandemic, with approximately 25 million individuals infected worldwide and resulting in the death of >800 000 individuals [1]. The causative virus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a member of the genus Betacoronavirus. During its transmission, the virus has differentiated into at least 10 clades globally and is continuously evolving [2]. This has implications in genetic epidemiology, surveillance, contact tracing, and the development of long-term strategies for mitigation of this disease [3]. The recent availability of whole-genome sequences of the SARS-CoV-2 from across the world deposited in public databases provides an unprecedented opportunity to understand the dynamics and evolution of the pathogen. The availability of genomic data in a public repository such as GISAID [4] also provides wider access to the resources and enables researchers across the globe to address pertinent hypotheses. Likewise, this gave us a unique scope to understand the introduction, evolution, and spread of the virus in India and understand it in the context of global clades circulating across the world. In this manuscript, we report the sequences of SARS-CoV-2 isolates predominantly sampled from the states of Telangana and Tamil Nadu. Furthermore, we systematically analyzed the phylogenetic clusters of genomes from India and characterized a unique cluster of sequences (Clade I/A3i), which could not be classified into any of the previously annotated global clades. Isolates forming this cluster were predominant in several states and characterized by a shared set of 4 genetic variants. The cluster potentially arose from a single outbreak followed by a rapid spread across the country. To the best of our knowledge, this is the first comprehensive report of the novel and predominant cluster of sequences from India and suggests its distribution beyond India in many countries in South Asia, Oceania, and America.

MATERIALS AND METHODS

Patient Consent Statement

A written consent from the patients was obtained wherever applicable. The design and implementation of this work has been approved by a local ethical committee.

Sample Processing and Sequencing

Samples were collected and processed as per the guidelines of the Institutional Ethics Committee. Ribonucleic acid (RNA) was isolated from nasopharyngeal or oropharyngeal swabs collected in viral transport media as explained in Supplementary Methods. Purified RNA was sequenced using either a shotgun approach or the ARTIC v3 protocol, as detailed in Supplementary Methods [5].

Assembly of Sequencing Data

Quality control of the FASTQ files was performed using FastQC v0.11.7, and adaptors/poor quality bases were trimmed using Trimmomatic [6, 7]. Reads were aligned to the reference genome MN908947.3 using hisat2 [8]. Consensus sequence from the bam file was derived using seqtk and bcftools [9]. Samtools depth command was used to calculate the coverage across the genome [10]. The sequences were deposited in GISAID with accessions detailed in Supplementary Data 1.

Data Availability

All sequences generated in this study have been submitted to GISAID. The accession names of the samples and associated metadata are outlined in Supplementary Data 1.

Genomic Data Collection and Analysis

The datasets of Indian SARS-CoV-2 genomes deposited in GISAID (until August 7, 2020) were used for the analysis. Furthermore, 10 high-quality genomes from each of the 10 clades, respectively, as annotated by Nextstrain were retrieved from GISAID and used in the analysis. The datasets and acknowledgments are listed in Supplementary Data 2. We considered only high-quality genomes for evaluation of the nucleotide substitution rates, molecular clock, and phylogenetic clustering, because these would be sensitive to the quality of genomes. The criteria used for filtering low-quality genomes are outlined in Supplementary Methods.

Phylogenetic Analysis and Divergence Estimation

Phylogenetic analysis of the samples was performed as detailed previously following the standard protocol for analysis of SARS-CoV-2 genomes provided by Nextstrain [11, 12]. BEAST v1.10.4 was used for the analysis of nucleotide substitution rates and the estimation of times to the most recent common ancestor. The detailed methodology for phylogenetic tree construction and dating analysis is provided in Supplementary Methods. The resulting tree was used to infer mutations and identify clades. The values used for each parameter in the protocol are given in Supplementary Data 4.

Functional Evaluation of Variants

Wuhan-Hu-1 genome (NC_045512) was used as reference wherever applicable. The variants were also evaluated for the functional consequences using SIFT [13]. A SIFT score of 0.0 to 0.05 was interpreted to have a deleterious effect. The functional effects of protein variants identified in the clades were assessed using the PROVEAN web server, using a default threshold value of −2.5 [14]. In addition, PhyloP conservation scores and base-wise GERP rejected substitutions scores for the variants were computed [15, 16]. Sites having positive PhyloP scores were predicted to be conserved, whereas positive GERP scores were considered indicative of a site under evolutionary constraint. The variants were also checked for overlaps with immune epitope predictions as given on UCSC Genome Browser for SARS-CoV-2.

RESULTS

Demographics and Quality of Viral Genomes

The samples sequenced encompass 303 genomes in total, majorly collected from the states of Telangana and Tamil Nadu. The age of the patients ranged from 1.5 to 80 years, with >80% (275 of 303) within the age bracket of 20–60 years (Supplementary Figure S1A). A total of 294 samples were sequenced using an amplicon-based approach with a target of ~2 million paired-end reads per sample. We could achieve an average coverage of >1000× in all cases, with a uniform representation from all amplicons (Supplementary Figure S1B, top, and S1C). Three samples were sequenced using a shotgun sequencing approach and had an average coverage of approximately 100×. The coverage across the genome was uniform (Supplementary Figure S1B, bottom). The samples and metadata for the isolates sequenced and deposited in the public domain are summarized in Supplementary Data 1.

Phylogenetic Clusters

A total of 2212 genome sequences of the SARS-CoV-2 were available for analysis as of August 7, 2020 from India including the genomes sequenced by our group. After removal of low-quality sequences, the dataset resulted in a total of 1377 genomes submitted from 16 institutions (including 275 of our 303 genomes) (Supplementary Data 2). The genomes isolated from India were found to be classified under 7 clusters (Figure 1). Six of these clusters are known clades identified by Nextstrain: A1a, A2a, A3, B, B1, and B4 [17]. The first and the major cluster encompassed 1143 (83%) of genomes, which fell into the A2a clade. The clade was represented by samples derived from multiple states across the country including Gujarat, Maharashtra, Telangana, West Bengal, Odisha, Karnataka, Uttarakhand, Tamil Nadu, and Haryana.

Figure 1.

Phylogenetic clusters and clades as generated by Nextstrain for the dataset of 1377 high-quality Indian severe acute respiratory syndrome coronavirus 2 genomes. Indian genomes were found to fall under 7 clusters with the majority of the genomes falling under clade A2a. The second largest cluster in India (purple) has been designated as clade I/A3i. The second largest cluster consisted of 160 genomes (11.6%). This cluster of sequences could not be classified into any of the 10 clade sequences defined by Nextstrain, and it did not share the nucleotide compositions that define any of the 10 clades [17]. This cluster was found to have diverged from the A1a and A3 clades, and most, but not all of the sequences, shared a variant (L3606F in ORF1a) with members of the A3 and A1a clades. We call this the A3i clade in cognizance of this fact. Therefore, to avoid potential conflict with the nomenclature followed by Nextstrain, we define this cluster of sequences as Clade I/A3i, for the unique occurrence as a dominant cluster among SARS-CoV-2 genome sequences from India, and also because this clade is largely formed by sequences from India (Supplementary Figures S2 and S3). The other clusters encompassed the B4, A3, A1a, B, and B1 clades with 52 and 17 genomes falling into the clusters A3 and B4, respectively, and clades A1a, B, and B1 having 1 genome each.

Molecular Definition of the Cluster

A discriminant analysis was performed for all variants in any genome defined by the cluster of sequences. Systematic analysis of members of the cluster revealed that a set of 4 variants (C6312A, C13730T, C23929T, and C28311T) was shared by a majority of members of the cluster (Figure 2). A total of 149 genomes of the 160 genomes (93%) in the cluster shared the combination of variants. This unique combination of variants was shared by none of the other genomes that were assigned to any other clade.

Figure 2.

Shared variants among the variants that define the clusters of the Indian severe acute respiratory syndrome coronavirus 2 genomes. The size of the circle represents the allele frequencies of the respective variants, and clade-defining variants are marked with a red circle. We further analyzed the global datasets for identifying the genomes that displayed matches for all 4 variants that defined the Clade I/A3i. Our prospective search retrieved a total of 362 high-quality genomes (Supplementary Data 5). Of the retrieved genomes, the largest number of genomes originated from Singapore, which had 219 genomes and constituted 53% of the high-quality genomes from Singapore. The other genomes originated from several countries including Malaysia, Australia, United States, Canada, Taiwan, Japan, Thailand, Philippines, Oman, Guam, and Saudi Arabia. However, the members in the clade contributed to a much smaller proportion of the clades/clusters identified in the respective countries. Of these, 23 were sampled from a date earlier than the earliest sample of this cluster from India and were from the United States, Canada, Australia, Thailand, Saudi Arabia, Taiwan, Singapore, Malaysia, Japan, and Brazil (Figure 3B).

Figure 3.

(A) Proportion of the I/A3i clade (purple) and A2a (teal) in the genomes sequenced from different states of India. The proportion of the A2a clade (teal) is also shown for comparison, whereas all other clades are shaded gray. (B) The short tree of the I/A3i clade diverging from a central point suggests a single point of introduction and spread across the different states. The 23 global genomes that were sampled before the first Indian genome from this cluster are highlighted in gray.

Nucleotide Substitution Rates

Mutation rates were calculated for the Indian sequences using BEAST, with the WH1 genome as the root. Our analysis suggests that the substitution rate is 1.76 × 10–3 (95% highest posterior density [HPD] 1.57 × 10–3–1.99 × 10–3) per site per year for the entire Indian SARS-CoV-2 genomes put together. This also confirms the estimates previously made [18]. The substitution rate was also computed for the individual clades. The gene-wise substitution rates were also similarly calculated for the major clusters. The analysis suggests that the I/A3i clade has a nucleotide substitution rate of 1.1 × 10–3 variants per site per year compared with the estimate of 1.73 × 10–3 variants per site per year for the prevalent A2a clade and 1.76 × 10–3 variants per site per year for all the high-quality genomes from India analyzed. The nucleotide substitution rate suggests that the evolution of the I/A3i clade is largely determined by changes in the structural proteins—Nucleocapsid (N) and Membrane (M) genes, compared with the A2a, the globally predominant clade, which is determined by changes in the Spike (S) genes (Table 1).

Table 1.

Nucleotide Substitution Rates of the Different Structural Protein Genes and Genome-Wide Across the Different Clusters and Clades in Indiaa

Clade/ Cluster	S Gene	E Gene	M Gene	N Gene	Genome
All (N = 1376)	3.55 × 10^–3	4.57 × 10^–3	4.69 × 10^–3	6.94 × 10^–3	1.76 × 10^–3
A2a (N = 1143)	3.49 × 10^–3	5.42 × 10^–3	3.67 × 10^–3	5.85 × 10^–3	1.73 × 10^–3
I/A3i (N = 149)	0.94 × 10^–3	3.5 × 10^–3	2.26 × 10^–3	1.54 × 10^–3	1.1 × 10^–3
B4 (N = 52)	1.18 × 10^–3	3.9 × 10^–3	1.15 × 10^–3	3.33 × 10^–3	1.12 × 10^–3
A3 (N = 17)	1.36 × 10^–3	1.89 × 10^–3	1.28 × 10^–3	7.23 × 10^–3	1.85 × 10^–3

aThe estimates for A1a, B, and B1 were not computed because the clades encompassed very few genomes from India.

Nucleotide Substitution Rates of the Different Structural Protein Genes and Genome-Wide Across the Different Clusters and Clades in Indiaa aThe estimates for A1a, B, and B1 were not computed because the clades encompassed very few genomes from India.

Estimating Time to Most Recent Common Ancestor and Age of the Cluster

The date of the most recent ancestor for the dataset of all Indian SARS-CoV-2 genomes, with WH1 genome sequence included, was computed using BEAST. The median time to most recent common ancestor (tMRCA) was December 10, 2019 (95% HPD November 24 to December 24), confirming the previous estimates of the origin of the epidemic in Wuhan city of China [19]. The tMRCA for the I/A3i clade, as well as the A2a clade, which constituted the majority of samples, was also computed. Clade A2a, which is the predominant clade in India, had a tMRCA of January 15, 2020 (95% HPD interval December 25, 2019–February 2, 2020), whereas clade I/A3i had a tMRCA of January 26, 2020 (95% HPD interval January 1, 2020–February 15, 2020).

Functional Consequences of the Variants

The majority of the variants that defined other clades were predicted to be neutral by PROVEAN, with the exception of G251V, which defines the A1a clade. Three variants that define Clade I/A3i (C6312A, C13730T, and C28311T) resulted in amino acid changes with potentially deleterious functional consequences, as predicted by SIFT, and mapped to conserved genomic loci in the SARS-CoV-2 genome (Table 2). One of these variants, A97V in the RDRP protein (corresponds to A88V in ORF1b), is located in its NiRAN domain, which is suggested to be important in RNA binding and nucleotidylation activity [20]. Both SIFT and PROVEAN analyses suggest that the effect of this mutation is deleterious in nature; however, because both alanine and valine are hydrophobic amino acids, the exact effect of the mutation needs to be experimentally validated. Of notable significance is the P13L variant (C28311T) in the Nucleocapsid protein, which is required for the viral entry into the cells. The variant maps to the intrinsically disordered region (IDR) domain of the N protein and SIFT predicts the variant to be deleterious, although the PROVEAN analysis categorized it as a neutral mutation.

Table 2.

Functional Characteristics of the Four Variants That Define Clade I/A3i and Other Clades Across the Worlda

Clade	Gene	Site	Mutation	PROVEAN Score/Prediction	SIFT Score/Prediction	Conservation Scores
A1a	ORF3a	G26144T	G251V	−8.581	0	PhyloP: 4.256
				Deleterious	Deleterious	GERP: 1.65
A1a, A3, I/A3i	ORF1a	G11083T	L3606F	−1.4	0.01	PhyloP: −1.32286
				Neutral	Deleterious	GERP: −3.3
A2	S	A23403G	D614G	0.598	0.3	PhyloP: 2.25839
				Neutral	Tolerated	GERP: 1.65
A2a	ORF1b	C14408T	P314L	−0.914	0.31	PhyloP: 3.30748
				Neutral	Tolerated	GERP: 1.65
A3	ORF1a	G1397A	V378I	−0.199	0.62	PhyloP: 0.227575
				Neutral	Tolerated	GERP: −1.81
A7	ORF1a	C9924T	A3220V	−2.049	0.04	PhyloP: 3.30935
				Neutral	Deleterious	GERP: 1.65
B, B1, B2, B4	ORF8	T28144C	L84S	2.333	0.37	PhyloP: −1.52089
				Neutral	Tolerated	GERP: −0.206
B4	N	G28878A	S202N	−0.404	0	PhyloP: 4.256
				Neutral	Deleterious	GERP: 1.65
I/A3i	N	C28311T	P13L	−1.23	0	PhyloP: 3.27687
				Neutral	Deleterious	GERP: 1.65
I/A3i	RDRP/ORF1b	C13730T	A97V (A88V in ORF1b)	−3.611	0	PhyloP: 3.31844
				Deleterious	Deleterious	GERP: 1.65
I/A3i	ORF1a	C6312A	T2016K	−0.352	0.03	PhyloP: 3.29661
				Neutral	Deleterious	GERP: 1.61

aPROVEAN scores of less than −2.5 are considered deleterious in nature. Similarly, SIFT scores of 0 to 0.05 are considered deleterious.

Functional Characteristics of the Four Variants That Define Clade I/A3i and Other Clades Across the Worlda aPROVEAN scores of less than −2.5 are considered deleterious in nature. Similarly, SIFT scores of 0 to 0.05 are considered deleterious. Two of the variants, C6312A in ORF1a and C13730T in ORF1b, also mapped to immune epitope predictions (HLA-A0201 binding peptides) from NetMHC 4.0, available on UCSC Genome Browser and as listed on UCSC Genome Browser for SARS-CoV-2 [21]. The potential consequences of the variants in the immune response could not be ascertained.

Defining the Origin and Spread From the Cluster

The presence of a short tree of Clade I/A3i with divergence from a single point suggests a single point of introduction [22]. The single point of divergence also suggests that the origin and spread of the cluster were possibly from a single outbreak (Figure 3). The clustering of samples around January 2020 suggests a rapid spread spanning multiple regions across the country. The first sequence from the cluster in India was GMC-KN443/2020 (Accession ID EPI_ISL_431103, deposited by Department of Microbiology, Gandhi Medical College and Hospital, Hyderabad, India) sampled on March 16, 2020 from an Indonesian traveler from the state of Telangana. Of the 14 states from which the data for high-quality genomes were made available, the I/A3i clade was represented in 11 of the states. Considering all the genomic data available from India, the I/A3i clade is represented in 446 genomes (22%) and represented from 17 of the 20 states from which the genomes originated. The geographical distribution and the proportion of the Clade I/A3i isolates are depicted in Figure 3. The states of Delhi, Telangana, Maharashtra, Karnataka, and Tamil Nadu have the highest proportions of this clade, followed by Haryana, Madhya Pradesh, West Bengal, Odisha, Uttar Pradesh, and Bihar (Supplementary Data 6).

Temporal Shifts in the Prevalent Clades

After the initial outbreak of coronavirus disease 2019 (COVID-19) in India, most of the samples collected in the months of March and April belonged to the I/A3i clade (Figure 4A). In fact, it was the predominant clade in almost all of the states where data were collected in March and April, with the exception of West Bengal and Gujarat. However, by late April and early May, a shift in the prevalent clade was observed. All states, except Delhi, showed an increased representation of the A2a clade (Figure 4B). It is interesting to note that in Gujarat, the most prominent clade remained A2a throughout the period of April to July, with meager representation of I/A3i and B4 clades. Odisha was a mixed bag during the month of May, with almost equal representation of I/A3i, A2a, and B4 clades. However, recent samples collected from Odisha in the month of June all belonged to A2a clade. Further sample collection and sequencing is needed to assess this shift in the predominance of clades reliably.

Figure 4.

(A) A week-wise stacked bar displaying the proportion of clades across India, starting from the week of March 1, 2020. (B) A bubble plot depicting the change in the predominant clades with time in various states. X-axis indicates the date on which the sample was collected, and color indicates the clade. Only those states with collection data across at least 2 months are plotted.

Demographics of the Patients

Of the members in Clade I/A3i, 112 were male (70%) whereas 44 were female (27.5%). The mean age was 35.7 years (confidence interval [CI], 33.2–38.2 years). For A2a cluster, 761 were male (66.6%) and 353 were female (30.9%), whereas the mean age was 40.8 years (CI, 39.8–41.8 years). Although age and clinical outcomes were found to be significantly different between Clade I/A3i and other clades (P = .00042 and P = .000075, respectively), sex was not found to be significantly different between Clade I/A3i and the other clades (χ 2 = 0.68, P > .05). Patient details for the Indian samples as provided by GISAID are available in Supplementary Data 7.

DISCUSSION

Genomic evolution coupled with the appropriate tools such as genome sequencing provides a unique opportunity to understand the spread and evolution of pathogens [23, 24]. The emergence of COVID-19 as a global pandemic and the availability of the Open Data for SARS-CoV-2 genomes from across the globe facilitated by genomic databases such as GenBank and GISAID has truly opened up new opportunities to understand the pathogen and its spread and evolution at an unprecedented rate [4, 25]. Whole-genome sequencing of SARS-CoV-2 has also been extensively used in understanding epidemics at a macro- as well as microlevels, at hospitals [26]. In this report, we describe a distinct cluster of sequences from genomes of SARS-CoV-2 sequenced and deposited from multiple laboratories across India, which we classify as the I/A3i clade. This distinct cluster could not be classified into any of the 10 clade annotations as described by Nextstrain, and it was characterized by a unique combination of 4 variants that was shared by more than 95% of the isolates falling in the cluster. The cluster was predominantly found in genomes from India; although additional members could also be found from genomes deposited in other countries, they form a minor proportion of the genomes from the respective countries. As per Nextstrain, the Indian genomes constituted more than 30% of the global genomes for this cluster. In-depth analysis of the genome cluster suggests a comparable rate of nucleotide substitutions with other predominant clades, although a gene-wise estimate of substitution suggests a distinct mode of evolution, driven by the Nucleocapsid (N) and Membrane (M) genes, and sparing of the Spike (S) gene in contrast to predominant diversity in the Spike (S) gene in A2a clade, the globally predominant clade [27]. However, it has not escaped our attention that host genetic factors could modulate the evolution of the virus genome, and without large-scale host genomic studies, the causal relationships cannot be conclusively established. The cluster suggests a potential single introduction around February, followed by a countrywide spread, mostly affecting the South Indian states as evidenced by the tMRCA as well as the short cluster. Our analysis suggests that the Clade I/A3i was represented in almost all states from which genomes are available. Members of the Clade I/A3i formed the predominant class of isolates from the states of Delhi, Telangana, Maharashtra, Karnataka, and Tamil Nadu and the second largest in membership in Haryana, Madhya Pradesh, West Bengal, Odisha, Uttar Pradesh, and Bihar.

CONCLUSIONS

Put together, the cluster of genomes (Clade I/A3i) forms a distinct cluster, predominantly found amongst Indian SARS-CoV-2 genomes, with limited representation outside the region. To the best of our knowledge, this is the first comprehensive study characterizing the distinct and predominant cluster of SARS-CoV-2 in India. This report also exemplifies the fact that timely and open access to genomic data can provide unique insights into the genetic epidemiology of pathogens.

Supplementary Data

Supplementary materials are available at Open Forum Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author. Supplementary Data 1. Metadata for the genome sequences submitted by our group. Supplementary Data 2. List of Indian and global high-quality GISAID submissions used in this study, and acknowledgments of the contributing authors. Supplementary Data 3. List of sites masked before variant analysis. Supplementary Data 4. Parameter values used for phylogenetic tree construction and the prior values used in the analysis of nucleotide substitution rates using BEAST. Supplementary Data 5. List of 362 high-quality global GISAID submissions falling under Clade I/A3i. Supplementary Data 6. Statewise proportions of genomes belonging to the I/A3i clade. Supplementary Data 7. Metadata of clinically relevant information associated with the genomes deposited in GISAID from India. Supplementary Data 8. Online resource. An online and updated resource for SARS-CoV-2 genomes from India, their clade assignments and distribution across the country is available at http://clingen.igib.res.in/genepi/phylovis/. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.

20 in total

1. SIFT missense predictions for genomes.

Authors: Robert Vaser; Swarnaseetha Adusumalli; Sim Ngak Leng; Mile Sikic; Pauline C Ng
Journal: Nat Protoc Date: 2015-12-03 Impact factor: 13.491

2. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors: Heng Li
Journal: Bioinformatics Date: 2011-09-08 Impact factor: 6.937

3. NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data.

Authors: Vanessa Jurtz; Sinu Paul; Massimo Andreatta; Paolo Marcatili; Bjoern Peters; Morten Nielsen
Journal: J Immunol Date: 2017-10-04 Impact factor: 5.422

4. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.

Authors: Daehwan Kim; Joseph M Paggi; Chanhee Park; Christopher Bennett; Steven L Salzberg
Journal: Nat Biotechnol Date: 2019-08-02 Impact factor: 54.908

5. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

Review 6. Tracking virus outbreaks in the twenty-first century.

Authors: Nathan D Grubaugh; Jason T Ladner; Philippe Lemey; Oliver G Pybus; Andrew Rambaut; Edward C Holmes; Kristian G Andersen
Journal: Nat Microbiol Date: 2018-12-13 Impact factor: 17.745

7. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Authors: Chaolin Huang; Yeming Wang; Xingwang Li; Lili Ren; Jianping Zhao; Yi Hu; Li Zhang; Guohui Fan; Jiuyang Xu; Xiaoying Gu; Zhenshun Cheng; Ting Yu; Jiaan Xia; Yuan Wei; Wenjuan Wu; Xuelei Xie; Wen Yin; Hui Li; Min Liu; Yan Xiao; Hong Gao; Li Guo; Jungang Xie; Guangfa Wang; Rongmeng Jiang; Zhancheng Gao; Qi Jin; Jianwei Wang; Bin Cao
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

8. Emergence of Drift Variants That May Affect COVID-19 Vaccine Development and Antibody Treatment.

Authors: Takahiko Koyama; Dilhan Weeraratne; Jane L Snowdon; Laxmi Parida
Journal: Pathogens Date: 2020-04-26

Review 9. COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives.

Authors: Jiumeng Sun; Wan-Ting He; Lifang Wang; Alexander Lai; Xiang Ji; Xiaofeng Zhai; Gairu Li; Marc A Suchard; Jin Tian; Jiyong Zhou; Michael Veit; Shuo Su
Journal: Trends Mol Med Date: 2020-03-21 Impact factor: 11.951

10. The establishment of reference sequence for SARS-CoV-2 and variation analysis.

Authors: Changtai Wang; Zhongping Liu; Zixiang Chen; Xin Huang; Mengyuan Xu; Tengfei He; Zhenhua Zhang
Journal: J Med Virol Date: 2020-03-20 Impact factor: 20.693

25 in total

1. Genomic evolution of severe acute respiratory syndrome Coronavirus 2 in India and vaccine impact.

Authors: Jobin John Jacob; Karthick Vasudevan; Balaji Veeraraghavan; Ramya Iyadurai; Karthik Gunasekaran
Journal: Indian J Med Microbiol Date: 2020 Apr-Jun Impact factor: 0.985

2. SARS-CoV-2 Delta spike protein enhances the viral fusogenicity and inflammatory cytokine production.

Authors: Zhujun Ao; Maggie Jing Ouyang; Titus Abiola Olukitibi; Xiaojian Yao
Journal: iScience Date: 2022-07-14

3. Cov2clusters: genomic clustering of SARS-CoV-2 sequences.

Authors: Benjamin Sobkowiak; Kimia Kamelian; James E A Zlosnik; John Tyson; Anders Gonçalves da Silva; Linda M N Hoang; Natalie Prystajecky; Caroline Colijn
Journal: BMC Genomics Date: 2022-10-19 Impact factor: 4.547

4. Analysis of Indian SARS-CoV-2 Genomes Reveals Prevalence of D614G Mutation in Spike Protein Predicting an Increase in Interaction With TMPRSS2 and Virus Infectivity.

Authors: Sunil Raghav; Arup Ghosh; Jyotirmayee Turuk; Sugandh Kumar; Atimukta Jha; Swati Madhulika; Manasi Priyadarshini; Viplov K Biswas; P Sushree Shyamli; Bharati Singh; Neha Singh; Deepika Singh; Ankita Datey; Kiran Avula; Shuchi Smita; Jyotsnamayee Sabat; Debdutta Bhattacharya; Jaya Singh Kshatri; Dileep Vasudevan; Amol Suryawanshi; Rupesh Dash; Shantibhushan Senapati; Tushar K Beuria; Rajeeb Swain; Soma Chattopadhyay; Gulam Hussain Syed; Anshuman Dixit; Punit Prasad; Sanghamitra Pati; Ajay Parida
Journal: Front Microbiol Date: 2020-11-23 Impact factor: 5.640

5. Pervasive transmission of E484K and emergence of VUI-NP13L with evidence of SARS-CoV-2 co-infection events by two different lineages in Rio Grande do Sul, Brazil.

Authors: Ronaldo da Silva Francisco; L Felipe Benites; Alessandra P Lamarca; Luiz G P de Almeida; Alana Witt Hansen; Juliana Schons Gularte; Meriane Demoliner; Alexandra L Gerber; Ana Paula de C Guimarães; Ana Karolina Eisen Antunes; Fagner Henrique Heldt; Larissa Mallmann; Bruna Hermann; Ana Luiza Ziulkoski; Vyctoria Goes; Karoline Schallenberger; Micheli Fillipi; Francini Pereira; Matheus Nunes Weber; Paula Rodrigues de Almeida; Juliane Deise Fleck; Ana Tereza R Vasconcelos; Fernando Rosado Spilki
Journal: Virus Res Date: 2021-02-22 Impact factor: 3.303

6. SARS-CoV-2 Genomic Variation in Space and Time in Hospitalized Patients in Philadelphia.

Authors: John Everett; Pascha Hokama; Aoife M Roche; Shantan Reddy; Young Hwang; Lyanna Kessler; Abigail Glascock; Yize Li; Jillian N Whelan; Susan R Weiss; Scott Sherrill-Mix; Kevin McCormick; Samantha A Whiteside; Jevon Graham-Wooten; Layla A Khatib; Ayannah S Fitzgerald; Ronald G Collman; Frederic Bushman
Journal: mBio Date: 2021-01-19 Impact factor: 7.867

7. Genomic epidemiology reveals multiple introductions and spread of SARS-CoV-2 in the Indian state of Karnataka.

Authors: Chitra Pattabiraman; Farhat Habib; Harsha P K; Risha Rasheed; Pramada Prasad; Vijayalakshmi Reddy; Prameela Dinesh; Tina Damodar; Kiran Hosallimath; Anson K George; Nakka Vijay Kiran Reddy; Banerjee John; Amrita Pattanaik; Narendra Kumar; Reeta S Mani; Manjunatha M Venkataswamy; Shafeeq K Shahul Hameed; Prakash Kumar B G; Anita Desai; Ravi Vasanthapuram
Journal: PLoS One Date: 2020-12-17 Impact factor: 3.240

8. An Epidemiological Analysis of SARS-CoV-2 Genomic Sequences from Different Regions of India.

Authors: Pragya D Yadav; Dimpal A Nyayanit; Triparna Majumdar; Savita Patil; Harmanmeet Kaur; Nivedita Gupta; Anita M Shete; Priyanka Pandit; Abhinendra Kumar; Neeraj Aggarwal; Jitendra Narayan; Neetu Vijay; Usha Kalawat; Attayur P Sugunan; Ashok Munivenkatappa; Tara Sharma; Sulochna Devi; Tapan Majumdar; Subhash Jaryal; Rupinder Bakshi; Yash Joshi; Rima Sahay; Jayanti Shastri; Mini Singh; Manoj Kumar; Vinita Rawat; Shanta Dutta; Sarita Yadav; Kaveri Krishnasamy; Sharmila Raut; Debasis Biswas; Biswajyoti Borkakoty; Santwana Verma; Sudha Rani; Hirawati Deval; Disha Patel; Jyotirmayee Turuk; Bharti Malhotra; Bashir Fomda; Vijaylakshmi Nag; Amita Jain; Anudita Bhargava; Varsha Potdar; Sarah Cherian; Priya Abraham; Anjani Gopal; Samiran Panda; Balram Bhargava
Journal: Viruses Date: 2021-05-17 Impact factor: 5.048

9. Molecular Analysis of SARS-CoV-2 Circulating in Bangladesh during 2020 Revealed Lineage Diversity and Potential Mutations.

Authors: Rokshana Parvin; Sultana Zahura Afrin; Jahan Ara Begum; Salma Ahmed; Mohammed Nooruzzaman; Emdadul Haque Chowdhury; Anne Pohlmann; Shyamal Kumar Paul
Journal: Microorganisms Date: 2021-05-12

10. Immunodominant regions prediction of nucleocapsid protein for SARS-CoV-2 early diagnosis: a bioinformatics and immunoinformatics study.

Authors: Yufeng Dai; Hongzhi Chen; Siqi Zhuang; Xiaojing Feng; Yiyuan Fang; Haoneng Tang; Ruchun Dai; Lingli Tang; Jun Liu; Tianmin Ma; Guangming Zhong
Journal: Pathog Glob Health Date: 2020-11-16 Impact factor: 2.894