Literature DB >> 33818474

Phylogenetic classification of the whole-genome sequences of SARS-CoV-2 from India & evolutionary trends.

Varsha Potdar1, Veena Vipat1, Ashwini Ramdasi2, Santosh Jadhav3, Jayashri Pawar-Patil2, Atul Walimbe3, Sucheta S Patil3, Manohar L Choudhury1, Jayanthi Shastri4, Sachee Agrawal4, Shailesh Pawar5, Kavita Lole6, Priya Abraham2, Sarah Cherian3.   

Abstract

BACKGROUND &
OBJECTIVES: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, inconsistency in the nomenclature limits uniformity in its epidemiological understanding. This study provides an integration of existing classifications and describes evolutionary trends of the SARS-CoV-2 strains circulating in India.
METHODS: The whole genomes of 330 SARS-CoV-2 samples were sequenced using next-generation sequencing (NGS). Phylogenetic and sequence analysis of a total of 3014 Indian SARS-CoV-2 sequences from 20 different States/Union Territories (January to September 2020) from the Global Initiative on Sharing All Influenza Data (GISAID) database was performed to observe the clustering of Nextstrain and Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin) lineages with the GISAID clades. The identification of mutational sites under selection pressure was performed using Mixed Effects Model of Evolution and Single-Likelihood Ancestor Counting methods available in the Datamonkey server.
RESULTS: Temporal data of the Indian SARS-CoV-2 genomes revealed that except for Uttarakhand, West Bengal and Haryana that showed the circulation of GISAID clade O even after July 2020, the rest of the States showed a complete switch to GR/GH clades. Pangolin lineages B.1.1.8 and B.1.113 identified within GR and GH clades, respectively, were noted to be indigenous evolutions. Sites identified to be under positive selection pressure within these clades were found to occur majorly in the non-structural proteins coded by ORF1a and ORF1b. INTERPRETATION &
CONCLUSIONS: This study interpreted the geographical and temporal dominance of SARS-CoV-2 strains in India over a period of nine months based on the GISAID classification. An integration of the GISAID, Nextstrain and Pangolin classifications is also provided. The emergence of new lineages B.1.1.8 and B.1.113 was indicative of host-specific evolution of the SARS-CoV-2 strains in India. The hotspot mutations such as those driven by positive selection need to be further characterized.

Entities:  

Keywords:  COVID-19- nucleotide substitution; Clades; India; SARS-CoV-2; selection pressure; whole genomes

Mesh:

Year:  2021        PMID: 33818474      PMCID: PMC8184080          DOI: 10.4103/ijmr.IJMR_3418_20

Source DB:  PubMed          Journal:  Indian J Med Res        ISSN: 0971-5916            Impact factor:   2.375


Genome sequence analyses of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains aid in understanding of patterns and determinants of the global spread of the pandemic strain causing coronavirus disease 2019 (COVID-19)1. The phylogenetic analysis of the genome sequences showed that within a short span from the emergence of the SARS-CoV-2 virus, the genetic diversity expanded23. This resulted in the delineation of the viral strains into clades, lineages and sub-lineages. The Global Initiative on Sharing All Influenza Data (GISAID)4 database () in its earliest classification divided SARS-CoV-2 into two major lineages/clades 'L' and 'S' based on a mutation L84S in the ORF8 protein. Further, for the purpose of consistent reporting based on marker mutations, it identified three major clades denoted as G, V and O or an unclassified group. These clades evolved from 'L'. Further, the clade G was split into sub-clades GH and GR5. The GISAID clades are presently augmented with more detailed lineages assigned by the Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin tool ()6. On the other hand, Nextstrain7 classified the SARS-CoV-2 initially into about nine clades referred to as A1a, A2, A2a, A3, A6, B, B1, B2 and B4. These are indicated in the form of ancestral nodes as 19A, 19B, 20A, 20B and 20C. Thus, it can be noted that several phylogenetic classification systems based on different approaches have been devised to trace the viral lineages of the SARS-CoV-2 across the globe. Inconsistency in the nomenclature systems limits the uniformity in its epidemiological understanding. In this study, we describe the genetic lineages of the strains circulating in India as retrieved from GISAID and provide integration for the SARS-CoV-2 classification systems developed by GISAID, Nextstrain and Pangolin. This study also adds to the whole-genome sequences of SARS-CoV-2, majorly referred samples from different districts of Maharashtra during the period from March 9 to October 14, 2020. To further understand if adaptive evolution of the clades is being observed in the Indian context, selection pressure studies were undertaken.

Material & Methods

This study was conducted at the National Influenza Centre, ICMR-National Institute of Virology (NIV), Pune, India. The genomic analysis was based on samples from different States that were referred to the NIV and hence the approval for the study was obtained from the Institutional Ethics Committee. RNA isolation, RT-PCR of clinical samples and next-generation sequencing (NGS): Throat and nasal swab samples of suspected cases fulfilling the case definition for SARS-CoV-2 were referred by the hospital authorities and COVID collection centers of State Health Services, Maharashtra, India, to ICMR-NIV, Pune, for diagnosis of SARS-CoV-2 during the period from March 9 to September 28, 2020. The detection of the SARS-CoV-2 was done by using the NIV reverse transcription-polymerase chain reaction kit as per the protocols described earlier8. Positive clinical samples were selected for whole-genome sequencing representing the geographical districts and disease severity. In brief, 280 μl of each sample in duplicate was used for RNA extraction by Qiagen viral RNA extraction protocol. The extracted RNA was quantified using Qubit® Fluorometer (Invitrogen; Thermo Fisher Scientific, Inc., Waltham, MA, USA). A concentration of 10 ng of RNA was used for cDNA synthesis using the SuperScript™ VILO™ cDNA Synthesis Kit (Invitrogen, Carlsbad, CA, USA). Further, two-pool RNA panel libraries were prepared manually using the Ion AmpliSeq™ Library Kit Plus as per the manufacturer's instructions (Invitrogen, Carlsbad, CA, USA). The amplified amplicons were partially digested with FuPa reagent and were ligated with adaptors with Switch Solution and DNA Ligase. Purified libraries were quantified using the Qubit™ fluorometer or the Agilent™ 2100 Bioanalyzer™ instrument and diluted to 100 pM. The Ion Chef System was used for template preparation. Purified template beads were submitted to meta-transcriptome next-generation sequencing (NGS) in the Ion S5 platform (Thermo Fisher Scientific) using an Ion 540™ chip and the Ion Total RNA-Seq kit v2.0, as per the manufacturer's protocol (Thermo Fisher Scientific). The Ion AmpliSeq SARS-CoV-2 Research Panel containing target region information was downloaded from Ion AmpliSeq designer () and utilized for analysis. Sequence data were processed using the Torrent Suite Software (TSS) v5.10.1 (Thermo Fisher Scientific, USA). Coverage analysis plugins were utilized to generate coverage analysis report for each of the samples. Reference-based reads gathering and assembly were performed for all the samples using Iterative Refinement Meta-Assembler (IRMA)9 developed by the Centers for Disease Control, USA incorporated within the TSS. Phylogenetic analysis and classification: The whole-genome sequences from India available in GISAID as of October 14, 2020 with information of the sampling location (State information) (n=3014) were used as a starting dataset for this study. The selected sequences were aligned using MAFFT v.7.45010, and phylogenetic analysis was undertaken using MEGA v.611 based on the neighbour-joining approach with the composite likelihood as the substitution model. Further, the classification of the Indian sequences into the Nextstrain assigned new clades and the Pangolin nomenclature for clades/sub-clades was done using the respective tools directly. However, the GISAID nomenclature was assigned by the phylogeny and mutations noted. Identification of synonymous/non-synonymous substitutions in dominant Pangolin lineages in India: The nucleotide substitutions were identified by comparing the alignment of all the Indian SARS-CoV-2 genomes against the reference human SARS-CoV-2 genome from Wuhan (NC_045512.2) using NUCmer version 3.112. The resulting list of nucleotide variations was translated into synonymous and non-synonymous amino acid changes using a previously developed R script13 and the updated list of gene features from NCBI RefSeq SARS-CoV-2 genome annotation (). The substitutions which were present in more than 75 per cent of the sequences of only one lineage with a minimum of 10 representing genomes were considered as the substitutions characterizing the specific lineage. Selection pressure analysis: Selection pressure analysis was performed using the Datamonkey adaptive evolution server14. The sequences were separated into different datasets based on the GISAID clades. For each clade, if the number of sequences was >500, then redundant (100% identical) sequences were removed. Further, if still the number of sequences exceeded 500, then random selection of 500 sequences was done. Stop codons were replaced by gaps. The individual codon sites under diversifying selection pressure were identified by employing two methods: Mixed Effects Model of Evolution15 method which detects episodic diversification by employing a mixed-effects maximum likelihood approach and Single-Likelihood Ancestor Counting16 that uses a combination of maximum likelihood and counting approaches to infer the non-synonymous and synonymous rates of substitution for each site. The overall pipeline of work undertaken in this study is depicted in Figure 1.
Fig. 1

Workflow for SARS-CoV-2 data analysis.

Workflow for SARS-CoV-2 data analysis.

Results

The whole-genome sequencing for 330 strains from Maharashtra (n=328) and Karnataka (n=2) was undertaken as a part of this study. The details of the study samples and the sequences obtained including the per cent of reads mapped, total reads and the per cent of genome coverage recovered are provided in (Supplementary Table I (available from ). Phylogenetic analysis (Supplementary Fig. 1 (available from )) revealed that the genomes from different parts of India (n=3014) could be classified under seven clades, viz. S, V, G, GR, GH, L and O, identified by the GISAID on the basis of the marker mutations as shown in Table I. The genetic make-up of the Indian sequences revealed that overall, the proportion of strains in clade G (including GH and GR) were found to be highest (74.98%) followed by strains in O (unclassified category) (21.53%) (Fig. 2A and Supplementary Table II (available from )). Within the G clade, the highest proportion was noted in the GR clade. Fig. 2B represents the equivalence between the GISAID nomenclature and the Pangolin lineages for the Indian SARS-CoV-2 sequences. As per the Pangolin nomenclature, majority of the Indian sequences belonged to sub-lineages B.1.1.32, B.6, B.1, B.1.1, B.1.113 and B.1.1.8 (Fig. 2, Supplementary Fig. 1 and Supplementary Table II).
Table I

Establishing an equivalence between the Global Initiative on Sharing All Influenza Data (GISAID), Nextstrain and Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin) nomenclature systems with respect to the genome sequence data from India (n=3014)

GISAID cladesNextstrainDominant Pangolin lineagesMajor marker mutations
G20AB.1, B.1.80S: D614G
GR20BB.1.1.32, B.1.1, B.1.1.8S: D614G + N: G204R
GH20CB.1.113, B.1.36S: D614G + nsp3:Q57H
V19AB.2.1nsp6:L37F + nsp3:G251V
L (Ref. seq. WIV04)19AB-
S19BAORF8:L84S
O19AB.6, B.4ORF1a: L3606F
Fig. 2

Sunburst diagrams coloured according to Global Initiative on Sharing All Influenza Data (GISAID) clades showing relationship between GISAID and Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin) annotations on the inner and outer circles, respectively for the Indian SARS-CoV-2 genomes (n=3014). (A) The proportionate chart showing dominant Pangolin corresponding to each of the GISAID clades (The count for individual clades/lineages is shown in Supplementary Table II). (B) The schematic representation of association between the GISAID clades and the Pangolin lineages.

Supplementary Table II

Distribution of the Indian sequences (n=3014) in the Global Initiative on Sharing All Influenza Data, Nextstrain and Phylogenetic Assignment of Named Global Outbreak LINeages clades

GISAID cladeCount of sequencesNextstrain new clades#Pangolin
O64919A=514B.6=501
19B=6B.1=52
20A=53B.1.1=24
20B=11 (65)B.4=22
B=18
B.1.113=14
A.7=6
B.1.1.32=4
A=3
B.1.36=2
B.1.80=2
A.3=1
L2219A=19 (3)B=10
B.4=3
B.6=9
V319A=3B.2.1=3
S8019B=71A=51
19A=1A.7=17
20A=1(7)A.9=6
A.1=2
B.1=2
A.2=1
B.2=1
G61220A=540B.1=404
19A=11B.1.80=130
20B=11B.1.113=21
(50)B.1.1=15
B.1.95=11
B.1.5=8
B.1.87=6
B=5
B.1.36=5
B.1.11=2
B.1.102=1
B.1.107=1
B.1.79=1
B.1.86=1
B.6=1
GH53320A=460B.1.113=337
20C=18B.1.36=155
19A=6 (49)B.1=32
B.1.26=6
B.1.80=2
B.1.1.8=1
GR111520B=877B.1.1.32=535
20A=4B.1.1=382
19A=2 (232)B.1.1.8=192
B.1.1.31=3
B.1.1.28=1
B.1=1
B.1.1.10=1
Total301430143014

Bold fonts indicate the major distribution. #Clade information not available as on October 14, 2020 for the number of strains depicted in bracket. GISAID, Global Initiative on Sharing All Influenza Data; PangoLIN, Phylogenetic Assignment of Named Global Outbreak LINeages

Establishing an equivalence between the Global Initiative on Sharing All Influenza Data (GISAID), Nextstrain and Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin) nomenclature systems with respect to the genome sequence data from India (n=3014) Distribution of the Indian sequences (n=3014) in the Global Initiative on Sharing All Influenza Data, Nextstrain and Phylogenetic Assignment of Named Global Outbreak LINeages clades Bold fonts indicate the major distribution. #Clade information not available as on October 14, 2020 for the number of strains depicted in bracket. GISAID, Global Initiative on Sharing All Influenza Data; PangoLIN, Phylogenetic Assignment of Named Global Outbreak LINeages Sunburst diagrams coloured according to Global Initiative on Sharing All Influenza Data (GISAID) clades showing relationship between GISAID and Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin) annotations on the inner and outer circles, respectively for the Indian SARS-CoV-2 genomes (n=3014). (A) The proportionate chart showing dominant Pangolin corresponding to each of the GISAID clades (The count for individual clades/lineages is shown in Supplementary Table II). (B) The schematic representation of association between the GISAID clades and the Pangolin lineages. Other than the major globally circulating clades that possessed the marker mutations as shown in Table I, mutations specific to the dominant Indian Pangolin lineages were identified (Table II). As per the Pangolin lineage summaries (), some of the lineages most likely to have evolved in India are B.1.113 (n=372), B.1.1.8 (n=193), A.7 (n=23) and A.9 (n=6). Among these, the major lineage B.1.1.8 was found to possess unique mutations nsp3:S1285F and ORF3a:L46F, while B.1.113 possessed S194L in the N protein (Table II).
Table II

Synonymous and non-synonymous substitutions characterizing the dominant Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin) in India

Pangolin lineageSynonymous substitutionNon-synonymous substitutionsUntranslated nucleotide change


Nucleotide variationGene (amino-acid change)Nucleotide variationGene (amino-acid change)
B.1.113C22444TS (D294D)C28854TN (S194L)-
B.1.1.32C313TNSP1 (L16L)C5700Ansp3 (A994D)/ORF1ab (A1812D)-
B.1.1.8G4354ANSP3 (E545E)/ORF1ab (E1363E)C6573Tnsp3 (S1285F)/ORF1ab (S2103F)-
C25528TORF3a (L46F)
B.1.80C3634TNSP3 (N305N)/ORF1ab (N1123N)-
C15324TNSP12b (N619N)
B.4T28688CN (A138A)C884Tnsp2 (R27C)/ORF1ab (R207C)3’UTR (G29742T)
G1397Ansp2 (V198I)/ORF1ab (V378I)
G8653Tnsp4 (M33I)/ORF1ab (M2796I)
G11083Tnsp6 (L37F)/ORF1ab (L3606F)
B.6C13730Tnsp12b (A88V)-
C28311TN (P13L)
C6312Ansp3 (T1198K)/ORF1ab (T2016K)
G11083Tnsp6 (L37F)/ORF1ab (L3606F)
Synonymous and non-synonymous substitutions characterizing the dominant Phylogenetic Assignment of Named Global Outbreak LINeages (Pangolin) in India On the basis of the new nomenclature by Nextstrain as per the ancestral nodes, majority of the sequences fell into the cluster having ancestral nodes as 20A and 20B and others fell into clusters with nodes as 19A, 19B and 20C (Supplementary Table II). The Nextstrain clade assignment was retrieved as on 14 October 2020. Extrapolating to the Nextstrain old nomenclature for classification, it could be seen that the Indian strains could be classified into clades A2a, A1a, A3, B, B4 and O (Supplementary Table III (available from )).
Supplementary Table III

Mutations representing the old Nextstrain clade and corresponding major new Nextstrain clade nomenclatures

Nextstrain (old clades)Mutation(s) defined for the cladeMajor Nextstrain new clades (defined mutation)
A1aORF3a: G251V and ORF1a: L3606F19A (-)
A3ORF1a: L3606F and V378I
A6nt: T514C
A7ORF1a: A3220V
BORF8: L84S19B (nt. C8782T)
B1ORF8: L84S, nt- C18060T
B2ORF8: L84S, nt- C29095T
B4ORF8: L84S, N: S202N
A2S: D614G20A (ORF1b/nsp12b: P314L)
A2aS: D614G, ORF1b: P314L
A2aS: D614G, ORF1b: P314L20B (N: R203K, G204R & ORF14: G50N)
--20C (ORF1a: T265I)
Mutations representing the old Nextstrain clade and corresponding major new Nextstrain clade nomenclatures The State-wise distribution of the SARS-CoV-2 genomes classified as per the different GISAID clades is shown in Fig. 3. A comparison of these genetic variants in the Indian States wherein sufficient sequence data were available (Supplementary Table IV (available from )) was done. For States where a single clade was predominant, it was noted that clade O predominated in Delhi and Tamil Nadu while G predominated in West Bengal and Madhya Pradesh. Both clades GH and G were predominant in Gujarat. Clades GR and O predominated in Telangana; in Karnataka and Uttarakhand, GR and GH predominated; while in Haryana, O and GH were predominant. Clade S majorly circulated in Odisha along with GR, G, O and GH, and Maharashtra was also noted to have several clades in circulation including GR, G, O and S.
Fig. 3

State-wise distribution of total number of SARS-CoV-2 sequences deposited from India to Global Initiative on Sharing All Influenza Data (GISAID) from January to September 2020. The colours on the graph denote the GISAID clades.

Supplementary Table IV

State-wise list with clade information

CladeNextstrainLineageAndhra PradeshAssamBiharDelhiGujaratHaryanaJammu and KashmirKarnatakaKeralaLadakh
G19AB12
B.123
B.1.1
B.1.1135
B.1.361
B.1.80
B.6
20AB
B.12134213
B.1.121
B.1.1021
B.1.107
B.1.11
B.1.113671
B.1.363
B.1.5
B.1.79
B.1.803421
B.1.86
B.1.872
B.1.95
20BB.1
B.1.11
B.1.1132
BlankB.13
B.1.12
B.1.51
B.1.806
GH20AB.12121
B.1.113112192616
B.1.361854
B.1.802
20CB.1221
B.1.1.8
B.1.26
BlankB.14
B.1.1132
B.1.363
GR20BB.1
B.1.174474
B.1.1.28
B.1.1.311
B.1.1.32713
B.1.1.81
BlankB.1.117
B.1.1.101
B.1.1.323
L19AB1
B.43
B.6112
O19AA.31
B11
B.1916
B.1.122
B.1.1135
B.4196
B.62261951111148
19BA
A.7
B.1
B.1.1
20AB1
B.1131
B.1.113232
B.1.3611
20BB.1
B.1.121
B.1.1.32
BlankB.11
B.1.12
B.1.802
B.611
S19BA1731
A.11
A.21
A.7
A.9
B.111
B.2
V19AB.2.12
Total32626356055225526

CladeMadhya PradeshMaharashtraOdishaPunjabRajasthanTamil NaduTelanganaUttar PradeshUttarakhandWest BengalTotal

G14
38
11
5
1
22
11
11
2224331119106124391
14
1
11
112
14
14
3137
11
23496196122
11
46
1111
22
1158
2
3
2
1
6
GH132223
1271113182335
1141412714152
2
5
11
426
4
2
3
GR11
63402151249203365
11
23
3211616897532
191192
17
1
3
L222310
3
1139
O1
1111111817
2161531
4513
5
5122
8423164208019211499
33
66
11
11
1
25618
119
2
11
1318
44
1
2
2
2
S31411251
12
1
61117
336
2
11
V13
Total455532271063663958702163014
State-wise distribution of total number of SARS-CoV-2 sequences deposited from India to Global Initiative on Sharing All Influenza Data (GISAID) from January to September 2020. The colours on the graph denote the GISAID clades. State-wise list with clade information State-wise temporal data (March to August 2020) are shown in Fig. 4 and Supplementary Table V (available from ). Several clades were noted to be circulating in many of the States between March and May. Beyond this, a switch to majorly GR/GH was observed. The temporal distribution in Maharashtra was analyzed based on the sequences generated as a part of this study. The clades during March were majorly O, S and G. The proportion of strains of clade O was noted to decrease gradually, and a replacement to GR strains was noted consistently during May to September.
Fig. 4

Temporal distribution of SARS-CoV-2 sequences from different States of India. The number of SARS-CoV-2 sequences belonging to distinct GISAID clades is represented as a percentage plot of the clades for each month.

Supplementary Table V

State-wise clade information with temporal distribution

StateCladeJanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberNATotal
Andhra PradeshGR11
O112
AssamO22
BiharO336
DelhiG1214118
GH76316
GR3137
L11
O2251855217
S22
V22
GujaratG27451125189
GH337418429320
GR24511
L134
O48618
S81018
HaryanaG134
GH217827
GR1315
O1101719
Jammu and KashmirO22
KarnatakaG141613447
GH1782330
GR313721199
L22
O28358273
S44
KeralaL11
S11
LadakhO66
Madhya PradeshG121628
GH11
L213
O6713
MaharashtraG7123163867
GH3125727
GR15979128232375388
O123294259
S1111
V11
OdishaG1381352
GH138425
GR1262956
L33
O1356363
S2828
PunjabG11
GR22
O527
RajasthanG11
O145
Tamil NaduGH11
GR9615
O154120
TelanganaG1158125
GH12961744
GR1112315947152483
L2215
O1361882
Uttar PradeshG211114
GH29415
GR3519
O312520
UttarakhandG10515
GH513321
GR181129
O314
S11
West BengalG834931213151
GH156
GR3710
L33
O73311731
S10515
Total21304531086868121221107263014

NA, not available

Temporal distribution of SARS-CoV-2 sequences from different States of India. The number of SARS-CoV-2 sequences belonging to distinct GISAID clades is represented as a percentage plot of the clades for each month. State-wise clade information with temporal distribution NA, not available In addition, as the information of the outcome of the infection in terms of fatality was available for Maharashtra (n=41 of 328 sequences, Supplementary Table I), the proportion of fatal cases were estimated in the clade G (including GR as none of the sequences belonged to GH clade). It was observed that 14.38 per cent (41 of 285) of cases which possessed the D614G mutation resulted in fatal outcomes, while the rest of the cases that possessed the mutation were mild. Nextstrain inference (Supplementary Fig. 2 (available from )) of the most likely transmission events () revealed that the dominant clade B.6 (GISAID O) that emerged from 19A was introduced into India from China, Europe, South-East Asia and Middle-East while B.1 (G) and B.1.36 (GH) that emerged from 20A had their origins from Europe, Middle-East and Africa. The B.1.1 (GR) clade that emerged from 20B was introduced from the Europe, Middle-East and Far-East. Selection pressure analysis revealed that site nsp3:994A/D was identified to be under positive selection pressure in both clades G and GR, nsp6:37 L/F and nsp12:323 L/P in both G and GH and nsp16:298N/L/I in GR and GH (Table III).
Table III

Selection pressure analysis based on the whole-genome sequences using the methods Mixed Effects Model of Evolution and Single-Likelihood Ancestor Counting, available in the Datamonkey server

CladeGeneSiteVariable amino acid residuesP MEMEP SLAC
Gnsp3994A/D0.010.042
nsp637L/F0.010.036
nsp12323L/P0.010.008
GRnsp3994D/A0.010.05
nsp31103P/L/S0.020.085
nsp4380A/V0.050.088
nsp754S/L/P0.070.066
nsp16298N/L/I00.037
ORF3a46L/F0.010.06
GHnsp637L/F0.020.04
nsp12323L/P0.030.059
nsp14372T/I0.030.062
nsp16298N/L/H/I00.062
S-----
Onsp31197S/R/K00.021
nsp31198K/T0.070.022
nsp31768V/G00.004
nsp637F/L0.020.009

Sites were identified as showing evidence of positive selection as per the statistical significance level (P<0.1) by both the methods. MEME, Mixed Effects Model of Evolution; SLAC, Single-Likelihood Ancestor Counting

Selection pressure analysis based on the whole-genome sequences using the methods Mixed Effects Model of Evolution and Single-Likelihood Ancestor Counting, available in the Datamonkey server Sites were identified as showing evidence of positive selection as per the statistical significance level (P<0.1) by both the methods. MEME, Mixed Effects Model of Evolution; SLAC, Single-Likelihood Ancestor Counting

Discussion

A dynamic nomenclature for SARS-CoV-2 proposed by Rambaut et al5 initially identified two lineages (A and B) at the root of the phylogeny based on the sharing of two nucleotides at positions 8782 in ORF1ab and 28144 in ORF81718. Subsequently, descendent lineages were assigned a numerical value provided; these satisfied certain criteria of nucleotide substitutions within and between lineages. Several lineages and sub-lineages were thus identified. On the other hand, Nextstrain is based on a maximum likelihood approach as implemented in TreeTime19. Considering temporal dating of ancestral nodes and discrete trait geographic reconstruction based on the SARS-CoV-2 sequences, Nextstrain identified five nodes that were labelled as 19A, 19B, 20A, 20B and 20C. Based on the equivalence between the GISAID clade nomenclature, the new Nextstrain clades and the Pangolin sub-lineages, initially, Nextstrain clade names were ad hoc letter number combinations that were never intended to be a permanent naming system. At least ten clades (B, B1, B2, B4, A3, A6, A7, A1a, A2 and A2a) based on specific marker mutations were identified. The marker mutations specific to these clades are shown in Supplementary Table III. The clades A1a, A3, A6 and A7 emerged from the node labelled 19A, while clades B, B1, B2 and B4 emerged from the node 19B. The strains belonging to clade A2 correlated to strains having ancestral nodes 20A, while the A2a strains could be traced back to nodes 20A, 20B and 20C. Thus, the old Nextstrain clade nomenclature was found to be undefined and did not reflect on the time scale of evolution. We further analyzed the predominance of the strains in different Indian States based on the Pangolin and GISAID clade nomenclatures (Supplementary Table IV) in association with their emergence times as per the Nextstrain new clades classification nomenclature. The earliest Indian cases23 of SARS-CoV-2 were based on laboratory confirmation of suspected cases of persons with international travel history20. Since March 2020, the reported cases saw an increase in different States of the country. Genome sequencing efforts in India resulted in generation of whole-genome sequence data representing 20 different States/Union Territories (UTs). Good representation was noted from the States of Telangana, Gujarat, Maharashtra, Delhi, Karnataka, Odisha, West Bengal, Uttarakhand, Uttar Pradesh and Haryana (Supplementary Table IV and Fig. 2). In the other 16 States/UTs, though cases of SARS-CoV-2 were reported, no genome data were deposited. The genetic make-up of the Indian sequences revealed that the predominant clades (Pangolin/GISAID) circulating in India are the B.1.1.32/GR, B.6/O, B.1/G, B.1.1/GR, B.1.113/GH and B.1.1.8/GR. Thus, as also observed in other studies2122, the G clade (including GR and GH) is seen to have established itself in India as well as the world over1323 (Supplementary Fig. 3 (available from )). Temporal data of the Indian SARS-CoV-2 genomes revealed that except for Uttarakhand, West Bengal and Haryana that showed the circulation of O clade even after July, other States showed a complete switch to GR/GH. The dominant clades were noted to have emerged from nodes 19A, 20A, 20B and 20C. The same Nextstrain clades/Pangolin lineages were found to occur in multiple GISAID clades. Hence, the GISAID nomenclature system that is specifically based on amino acid substitutions can be considered more robust than the other two nomenclatures. The State-wise distribution of the prevalence of the different clades was observed. Within clade GR, a sub-group (Pangolin B.1.1.32 lineage) showed the combinations of strains from Maharashtra interspersed with strains from Telangana. Another sub-group (B.1.1) showed strains mainly from Telangana along with strains from Karnataka, Odisha and Tamil Nadu. The lineage B.1.1.8 which was identified as an indigenous lineage of India could most likely be attributed to evolution within Telangana. On the other hand, within the clade G, groups with mixing of strains from Gujarat, Madhya Pradesh, West Bengal, Odisha, Karnataka or Gujarat, Karnataka and Maharashtra were evident. These may be associated with the inter-State movements of migrant workers, tourists, students and professionals before or following the lock down imposed in the country. Another indigenous lineage (B.1.113), a major component of the clade GH, was noted to have emerged in Gujarat. Within clade O, two prominent sub-groups were noted. In one of these sub-groups (Pangolin B.6 lineage), Delhi strains were noted to be interspersed with strains from several States all over the country including Odisha, Maharashtra, Karnataka, Telangana, Madhya Pradesh, Andhra Pradesh, Haryana, Uttar Pradesh, Bihar, Tamil Nadu, West Bengal, Telangana and Rajasthan. The other sub-group (B.4) involved mainly Karnataka, Maharashtra, Ladakh, Telangana and Gujarat. The O clade was prevalent across several States in the country in March and April, suggesting their expansion due to introductions before the lockdown on March 19, 2020 (Fig. 4). It was noted that the sites putatively identified to be under positive selection pressure within the GISAID clades were found to occur majorly in the non-structural proteins coded by ORF1a and ORF1b. A few of the sites were found to be common to clades G/GH/GR. This was a reflection of the evolution within the dominant clade G. It remains to be observed whether these and the other sites would be future hotspots of evolution. Such sites need to be further characterized to understand if the virus is adapting further towards enhanced human transmissions24252627. The clade G/GH/GR strains possess the mutation D614G in the spike protein S. It has been demonstrated that this mutation increases infectivity, resulting in potentially more transmissible SARS-CoV-2282930. Insight into the associated mechanism was obtained from cryo-EM studies of the SARS-CoV-2 S protein trimer which revealed that D614G shifted the S conformation toward an ACE2 binding-competent state28. Further, considering that a lower proportion of the clade G cases resulted in fatality, if could be difficult to attribute the outcome of infection solely to the D614G marker mutation. It is necessary to focus on the viral genomic variations arising from rapid local expansions of the GISAID or Pangolin lineages. In summary, this study revealed the genetic variants circulating in India during the period from March to September 2020. The increased prevalence of the GH and GR clades from May 2020 onwards was noted to parallel the global trend. The observation of emergence of new lineages B.1.1.8 and B.1.113 was indicative of host-specific evolution of the SARS-CoV-2 strains within GR and GH clades, respectively, in India. To conclude on the robustness of the existing classification nomenclatures, there would be need to continue observing the global evolutionary trends and delineation of the strains. The study had limitations due to the non-availability or less sequence data at uniform time intervals from many parts of the country and also the lack of clinical information. This would benefit in exploring the establishment of the clades, molecular clock, transmissions within the country and further evidence of indigenous evolution. It may also help infer the potential association of SARS-CoV-2 lineages and mortality, as well as identify possible ethnic and genetic correlations. Strains of this study including details of their sequenced genome Neighbour-joining tree of the Indian genome sequences, with the taxa colour coded state-wise. Global transmissions as captured from Nextstrain analyses. Graphical representation of the temporal distribution of severe acute respiratory syndrome coronavirus 2 sequences from different continents of the world (n=140,560).
  25 in total

1.  Not so different after all: a comparison of methods for detecting amino acid sites under selection.

Authors:  Sergei L Kosakovsky Pond; Simon D W Frost
Journal:  Mol Biol Evol       Date:  2005-02-09       Impact factor: 16.240

2.  The coronavirus is mutating - does it matter?

Authors:  Ewen Callaway
Journal:  Nature       Date:  2020-09       Impact factor: 49.962

3.  Nextstrain: real-time tracking of pathogen evolution.

Authors:  James Hadfield; Colin Megill; Sidney M Bell; John Huddleston; Barney Potter; Charlton Callender; Pavel Sagulenko; Trevor Bedford; Richard A Neher
Journal:  Bioinformatics       Date:  2018-12-01       Impact factor: 6.931

4.  Data, disease and diplomacy: GISAID's innovative contribution to global health.

Authors:  Stefan Elbe; Gemma Buckland-Merrett
Journal:  Glob Chall       Date:  2017-01-10

5.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.

Authors:  Andrew Rambaut; Edward C Holmes; Áine O'Toole; Verity Hill; John T McCrone; Christopher Ruis; Louis du Plessis; Oliver G Pybus
Journal:  Nat Microbiol       Date:  2020-07-15       Impact factor: 17.745

6.  Full-genome sequences of the first two SARS-CoV-2 viruses from India.

Authors:  Pragya D Yadav; Varsha A Potdar; Manohar Lal Choudhary; Dimpal A Nyayanit; Megha Agrawal; Santosh M Jadhav; Triparna D Majumdar; Anita Shete-Aich; Atanu Basu; Priya Abraham; Sarah S Cherian
Journal:  Indian J Med Res       Date:  2020 Feb & Mar       Impact factor: 2.375

7.  Development of in vitro transcribed RNA as positive control for laboratory diagnosis of SARS-CoV-2 in India.

Authors:  Manohar Lal Choudhary; Veena Vipat; Sheetal Jadhav; Atanu Basu; Sarah Cherian; Priya Abraham; Varsha A Potdar
Journal:  Indian J Med Res       Date:  2020 Feb & Mar       Impact factor: 2.375

8.  Selective pressure on SARS-CoV-2 protein coding genes and glycosylation site prediction.

Authors:  Alessandra Lo Presti; Giovanni Rezza; Paola Stefanelli
Journal:  Heliyon       Date:  2020-09-21

9.  Genomic characterization of a novel SARS-CoV-2.

Authors:  Rozhgar A Khailany; Muhamad Safdar; Mehmet Ozaslan
Journal:  Gene Rep       Date:  2020-04-16
View more
  5 in total

1.  Clinical Profile and Outcome of Hospitalized Confirmed Cases of Omicron Variant of SARS-CoV-2 Among Children in Pune, India.

Authors:  Aarti A Kinikar; Sagar Vartak; Rahul Dawre; Chhaya Valvi; Pragathi Kamath; Naresh Sonkawade; Sameer Pawar; Vaishnavi Bhagat; Kiruthiga A; Komal Nawale; Isha Deshmukh; Rashmita Das; Rajesh K Kulkarni; Varsha Potdar; Rajesh Karyakarte
Journal:  Cureus       Date:  2022-04-30

2.  SARS-CoV-2 Whole-Genome Sequencing by Ion S5 Technology-Challenges, Protocol Optimization and Success Rates for Different Strains.

Authors:  Maria Szargut; Sandra Cytacka; Karol Serwin; Anna Urbańska; Romain Gastineau; Miłosz Parczewski; Andrzej Ossowski
Journal:  Viruses       Date:  2022-06-06       Impact factor: 5.818

Review 3.  Evolution of the SARS-CoV-2 pandemic in India.

Authors:  Varsha A Potdar; Sarah S Cherian
Journal:  Med J Armed Forces India       Date:  2022-05-16

4.  Erratum: Phylogenetic classification of the whole-genome sequences of SARS-CoV-2 from India & evolutionary trends.

Authors: 
Journal:  Indian J Med Res       Date:  2021-04       Impact factor: 2.375

5.  Circulation and Evolution of SARS-CoV-2 in India: Let the Data Speak.

Authors:  Sanket Limaye; Sunitha M Kasibhatla; Mukund Ramtirthkar; Meenal Kinikar; Mohan M Kale; Urmila Kulkarni-Kale
Journal:  Viruses       Date:  2021-11-08       Impact factor: 5.048

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.