| Literature DB >> 33464421 |
Rakesh Sarkar1, Suvrotoa Mitra1, Pritam Chandra1, Priyanka Saha1, Anindita Banerjee1, Shanta Dutta1, Mamta Chawla-Sarkar2.
Abstract
Accumulation of mutations within the genome is the primary driving force in viral evolution within an endemic setting. This inherent feature often leads to altered virulence, infectivity and transmissibility, and antigenic shifts to escape host immunity, which might compromise the efficacy of vaccines and antiviral drugs. Therefore, we carried out a genome-wide analysis of circulating SARS-CoV-2 strains to detect the emergence of novel co-existing mutations and trace their geographical distribution within India. Comprehensive analysis of whole genome sequences of 837 Indian SARS-CoV-2 strains revealed the occurrence of 33 different mutations, 18 of which were unique to India. Novel mutations were observed in the S glycoprotein (6/33), NSP3 (5/33), RdRp/NSP12 (4/33), NSP2 (2/33), and N (1/33). Non-synonymous mutations were found to be 3.07 times more prevalent than synonymous mutations. We classified the Indian isolates into 22 groups based on their co-existing mutations. Phylogenetic analysis revealed that the representative strains of each group were divided into various sub-clades within their respective clades, based on the presence of unique co-existing mutations. The A2a clade was found to be dominant in India (71.34%), followed by A3 (23.29%) and B (5.36%), but a heterogeneous distribution was observed among various geographical regions. The A2a clade was highly predominant in East India, Western India, and Central India, whereas the A2a and A3 clades were nearly equal in prevalence in South and North India. This study highlights the divergent evolution of SARS-CoV-2 strains and co-circulation of multiple clades in India. Monitoring of the emerging mutations will pave the way for vaccine formulation and the design of antiviral drugs.Entities:
Mesh:
Year: 2021 PMID: 33464421 PMCID: PMC7814186 DOI: 10.1007/s00705-020-04911-0
Source DB: PubMed Journal: Arch Virol ISSN: 0304-8608 Impact factor: 2.574
Fig. 1(A-B): Identification of various mutations present in the genome of SARS-CoV-2 circulating in India. (A) Pictorial representation of 33 different mutations (at both the nucleotide and amino acid levels) found in different regions (coding and non-coding regions) of the SARS-CoV-2 genome. (B) Relative frequencies of 33 different mutations in India. (C-G) Identification of various mutations present in the genome of SARS-CoV-2 circulating in different geographic regions in India. Relative frequencies of various mutations in (C) East India, (D) Western India, (E) South India, (F) Central India and (G) North India
Fig. 2Analysis of synonymous and non-synonymous mutations regarding nucleotide substitutions at different positions in codons. (A) Frequency distribution of SARS-CoV-2 isolates harbouring varying numbers of co-existing mutations. (B) Prevalence of synonymous and non-synonymous mutations in SARS-CoV-2 genomes across India. (C) Frequency distribution of various transitional (C>T, A>G, G>A and T>C) and transversional (G>T, C>A, G>C and A>T) substitution events. (D) Frequency distribution of various types of substitutional events occurring at the first, second, and third nucleotide positions of the codon.
Fig. 3Grouping of SARS-CoV-2 strains on the basis of co-existing mutations and analysis of their prevalence. (A) Analysis of mutations revealed the presence of the clades (A2a, A3 and B) of SARS-CoV-2 strains in India. The accumulation of novel mutations in addition to clade-specific variations allowed us to classify A2a clade strains into 12 groups, A3 clade strains into eight groups, and B clade strains into two groups. We also show the number of strains belonging to each group. (B) Prevalence of three clade-specific mutations in India. The A2a clade (71.34%) was found to be the most prevalent in India, followed by A3 (23.29%) and B (5.36%).
Accession numbers of representative strains of 22 groups of SARS-CoV-2
| Group | 22 groups of SARS-CoV-2, classified on the basis of co-existing mutations | Sequence accession number | Clade | Sub-cluster/sub-clade |
|---|---|---|---|---|
| 1 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP | EPI_ISL_436455 | A2a | Prototype |
| 2 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G25563T (Q57H)/ORF3a | EPI_ISL_455783 | Sub-cluster a | |
| 3 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G25563T (Q57H)/ORF3a, C28854T (S194L)/N, C22444T (D294D)/S | EPI_ISL_435069 | ||
| 4 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G25563T (Q57H)/ORF3a, C28854T (S194L)/N, C22444T (D294D)/S, G16078A (V880I)/RDRP | EPI_ISL_447050 | ||
| 5 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G25563T (Q57H)/ORF3a, C28854T (S194L)/N, C22444T (D294D)/S, G16078A (V880I)/RDRP, G23311T (E583D)/S | EPI_ISL_447044 | ||
| 6 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G25563T (Q57H)/ORF3a, C28854T (S194L)/N, C22444T (D294D)/S, G21724T (L54F)/S | EPI_ISL_447033 | ||
| 7 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G25563T (Q57H)/ORF3a, G21795T (R78M)/S | EPI_ISL_447543 | ||
| 8 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, GGG28881AAC (RG203KR)/N | EPI_ISL_447587 | Sub-cluster e | |
| 9 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, GGG28881AAC (RG203KR)/N, C5700A (A994D)/NSP3 | EPI_ISL_452198 | ||
| 10 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G15451A (G671S)/RDRP | EPI_ISL_455670 | Sub-cluster d | |
| 11 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, C13730T (A97V)/RDRP | EPI_ISL_455676 | Sub-cluster b | |
| 12 | A23403G (D614G)/S, C3037T (F106F)/NSP3, C241T/5’-UTR, C14408T (P323L)/RDRP, G4866T (S716I)/NSP3, C14425A (L329I)/RDRP | EPI_ISL_450788 | Sub-cluster c | |
| 13 | G11083T (L37F)/NSP6 | EPI_ISL_454549 | A3 | Proto type |
| 14 | G11083T (L37F)/NSP6, G1397A (V198I)/NSP2, G8653T (M33I)/NSP4, C884T (R27C)/NSP2 | EPI_ISL_435105 | Sub-cluster a | |
| 15 | G11083T (L37F)/NSP6, C28311T (P13L)/N, C23929T (Y789Y)/S, C13730T (A97V)/RDRP, C6312A (T1198K)/NSP3 | EPI_ISL_447586 | Sub-cluster b | |
| 16 | G11083T (L37F)/NSP6, C28311T (P13L)/N, C23929T (Y789Y)/S, C13730T (A97V)/RDRP, C6312A (T1198K)/NSP3, C6310A (S1197R)/NSP3 | EPI_ISL_447569 | ||
| 17 | G11083T (L37F)/NSP6, C28311T (P13L)/N, C23929T (Y789Y)/S, C13730T (A97V)/RDRP, C6312A (T1198K)/NSP3, C6310A (S1197R)/NSP3, C1707T (S301F)/NSP2 | EPI_ISL_447855 | ||
| 18 | G11083T (L37F)/NSP6, C28311T (P13L)/N, C23929T (Y789Y)/S, C13730T (A97V)/RDRP, C6312A (T1198K)/NSP3, C6310A (S1197R)/NSP3, G1820A (G339S)/NSP2 | EPI_ISL_447862 | ||
| 19 | G11083T (L37F)/NSP6, C28311T (P13L)/N, C23929T (Y789Y)/S, C13730T (A97V)/RDRP, C6312A (T1198K)/NSP3, A6081G (D1121G)/NSP3 | EPI_ISL_447847 | ||
| 20 | G11083T (L37F)/NSP6, C28311T (P13L)/N, C23929T (Y789Y)/S, C13730T (A97V)/RDRP, C6312A (T1198K)/NSP3, A21792T (K77M)/S | EPI_ISL_447571 | ||
| 21 | T28144C (L84S)/ORF8, C8782T (S76S)/NSP4 | EPI_ISL_455763 | B | Prototype |
| 22 | T28144C (L84S)/ORF8, C8782T (S76S)/NSP4, C4965T (T749I)/NSP3 | EPI_ISL_455764 | Sub-cluster |
Fig. 4Prevalence of three different clades (A2a, A3 and B) and their subgroups in different geographic regions in India. (A-C) Frequency distribution of strains belonging to each group of three different clades in (A) East India, (B) Western India, and (C) South India. (D-F): Frequency distribution of strains belonging to each group of three different clades in (D) Central India and (E) North India. (F) Prevalence of three different clades in different geographic regions of India.
Fig. 5Molecular phylogenetic analysis by the maximum-likelihood method. The phylogenetic dendrogram is based on whole genome sequences of 22 representative strains from 22 different groups together with representatives of nine clades specific known strains and the prototype O clade strain (MN908947.3). Twenty-two representative strains are indicated by an asterisk (*). The scale bar represents 0.00005 nucleotide substitution per site. Bootstrap values less than 70% are not shown. The best-fit model used for constructing the phylogenetic dendrogram was the general time-reversible model (GTR).