Literature DB >> 35747327

Mapping of SARS-CoV-2 spike protein evolution during first and second waves of COVID-19 infections in India.

Vijay Rani Rajpal1, Shashi Sharma2, Avinash Kumar3, Samantha Vaishnavi4, Apekshita Singh5, Deepmala Sehgal6, Mughdha Tiwari7, Shailendra Goel8, Soom Nath Raina5.   

Abstract

Aim: The aim of this study was to investigate the SARS-CoV-2 spike protein evolution during the first and second wave of COVID-19 infections in India. Materials &
Methods: Detailed mutation analysis was done in 763 samples taken from GISAID for the ten most affected Indian states between March 2020 to August 2021.
Results: The study revealed 242 mutations corresponding to 207 sites. Fifty one novel mutations emerged during the assessment period, including many with higher transmissibility and immune evasion functions. Highest number of mutations per spike protein also rose from 5 (first wave) to 13 (second wave).
Conclusion: The study identified mutation-rich and no mutation regions in the spike protein. The conserved spike regions can be useful for designing future diagnostics, vaccines and therapeutics.
© 2022 Future Medicine Ltd.

Entities:  

Keywords:  SARS-CoV-2; first and second COVID-19 wave of infections; immune evasion; mutation analysis; novel spike mutations; reduced neutralization

Year:  2022        PMID: 35747327      PMCID: PMC9203035          DOI: 10.2217/fvl-2021-0267

Source DB:  PubMed          Journal:  Future Virol        ISSN: 1746-0794            Impact factor:   3.015


Introduction

After the first reports of three cases of COVID-19 from Kerala, India on 30 January 2020, a significant number of positive COVID-19 cases from March 2020 onwards began registering. The first wave of COVID-19 infections peaked in September 2020 in India, with about 100,000 cases reported per day. From February 2021 onwards, India battled a brutal second wave of COVID-19 infections with severe consequences. The daily number of cases rose sharply and crossed the 400,000 mark and 4000 deaths every day in early May 2021 [1]. As compared to the first wave (highest 100,000 cases in September 2020), a sharp increase in positivity rate with a massive surge from 1.62% on 1 March 2021 to approximately 20% on 13 May 2021 was observed [1]. It created an emergency situation in the country with reduced supplies and increased deaths, especially in the young population [2,3]. Multiple factors might have potentially contributed to this sudden spike in cases, such as introduction of highly infectious SARS-CoV-2 lineages, complex interplay between mutant strains and violation of COVID-19 appropriate behavior. The roll out of vaccination programs from 16 January 2021 and the resultant natural selection pressure on the viral genomes might have induced novel mutations and contributed to the further evolution of circulating lineages. The emergence of novel SARS-CoV-2 lineage B.1.617.2 (Delta) in December 2020 also fueled the surge in daily infections and drove the second COVID-19 wave in India [4]. Since the emergence of COVID-19 disease in China in 2019, several variant lineages of SARS-CoV-2 including the current B.1.1.529 Omicron variant have emerged across the world. The variants have been classified as variants of concern (VOCs), variants of interest (VOIs), variants being monitored (VBMs) and variants of high concern (VOHCs) based on their differences in transmissibility, virulence or efficacy of diagnostics, vaccines and therapeutics [5]. Currently, the two listed variants of concern (VOCs) include Delta (B.1.617.2, AY lineages) and Omicron (B.1.1.529, BA lineages). Many other variant lineages including Alpha (B.1.1.7 and Q lineages), Beta (B.1.351 and descendent lineages), Gamma (P.1 and descendent lineages), Lambda (C.37 and descendent lineages), Mu (B.1.621 and B.1.621.1), Epsilon (B.1.427 and B.1.429), Eta (B.1.525). Iota (B.1.526) and Zeta (P.2) are being monitored (VBMs). There are no VOIs and VOHCs listed currently [5]. These novel SARS-CoV-2 lineages are better sustained than the Wuhan strain (clade O) due to fitness provided by the accumulated novel mutations that confer immune surveillance escape, high virulence, pathogenicity and better transmissibility resulting in deadlier resurgent outbreaks of infection globally. Many of these incurred mutations span the spike glycoprotein of SARS-CoV-2, wherein more than 5,000 mutations have already been reported [6,7]. The surface spike (S) protein present as small spikes on the SARS-CoV-2 surface is responsible for host cell receptor recognition and viral entry. It is encoded by the S gene (3821nt, position 21563–25384) and consists of 1273 amino acids [8]. At the N-terminus, S protein has a signal peptide (amino acids 1–13), which is followed by S1 and S2 subunits (Figure 1). The spike protein remains inactive in its native state. It is activated with help of proteases of the host cell membrane, specifically, a trans-membrane protease serine 2 (TMPRSS2) [10], that upon receptor recognition cleaves it into two subunits S1 and S2 at a furin-cleavage site [9,10]. S1 has the receptor binding domain (RBD) which is responsible for host cell angiotensin-converting enzyme 2 (ACE2) receptor recognition, interaction and binding. S2 domain brings about viral and host cell membrane fusion and enables viral entry into the host. S1 and S2 subunits are further divided into various domains and sub-domains (Figure 1).
Figure 1.

Detailed topology of SARS-CoV-2 with respect to nucleotide sequences and ORFs; domains and subdomains of spike protein.

In a rapidly evolving genome like SARS-CoV-2, the characterization of incurred mutations provides significant information for assessing the mechanisms linked to pathogenesis, immune evasion and viral drug resistance and also unravels useful insights into evolutionary patterns and spread of the virus. Furthermore, since the S protein has been an important target for designing anti-covid drugs, diagnostics and vaccines [11], the information about the conserved spike domains and sub-domains, if any, can be especially useful for designing the diagnostics, therapeutics and/or vaccines. In the present study we have undertaken a detailed analysis of Indian SARS-CoV-2 spike protein mutations in 763 randomly selected samples from ten Indian states deposited at Global Initiative on Sharing All Influenza Data (GISAID) [12] data base between March 2020 to August 2021, and compared it to prototype Wuhan sample (GenBank no. MN908947). The assessment period of about one and a half years includes both first and second waves of COVID-19 infections in India. The inferences of this study will be helpful in understanding the evolutionary trajectory of the virus and designing and supporting spike protein-based public health response.

Materials & Methods

The nucleotide sequences of representative SARS-CoV-2 genomes were retrieved from GISAID database [12]. The 763 samples taken from March 2020 to August 2021 analyzed for mutation analysis in the present study included northern, southern, central and western Indian regions including Gujarat (G; n = 314), Maharashtra (M; n = 152), Karnataka (K; n = 67), Telangana (T; n = 43), Chhattisgarh (CH; n = 42), Uttar Pradesh (UP; n = 30), Madhya Pradesh (MP; n = 28), Haryana (H; n = 18), Rajasthan (R; n = 15) and the union territory of Delhi (D; n = 60). These ten states of India showed maximum positivity rate during the second wave of COVID-19 infections in India. The mutations were analyzed on the basis of their presence or absence in different regions of spike protein. The mutation density was calculated as the number of mutations observed in a particular spike protein region divided by its size in base pairs, while frequency of a particular mutation was calculated as the number of samples showing a specific mutation divided by the total number of samples analyzed [13]. Three-dimensional crystal structure of spike protein available in the protein data bank (PDB; code: 6LZG) [14] was used to map the mutations located in the RBD.

Results

Analysis of 763 spike protein mutations from March 2020 to August 2021 in ten Indian states revealed that as compared with lineage B.1 found between March to September 2020, B.1.1.7 lineage was detected from October 2020 to January 2021. During second wave of COVID-19 infections in February 2021 onwards, the number of circulating lineages rose to eight including the SARS-CoV-2 lineages B.1.1.7, B.1.351, B.1, B.1.525, B.1.36.17, B.1.617.1, B.1.617.2 and B.1.617.3. The massive spurt in cases and devastation observed during the second wave could be attributed to the presence of these multiple lineages especially VOC Delta and its sub-lineages. The states of Maharashtra, Kerala, Gujarat, Delhi, Karnataka, Chhattisgarh, Telangana among others included in this analysis reported maximum number of COVID-19 cases in India during the second wave of infections.

Mutation analysis of the prevalent SARS-CoV-2 lineages in ten Indian states

Distribution of mutations in the spike protein

A total of 242 mutations corresponding to 207 sites in the spike protein were recorded during the assessment period (Table 1 & Figure 2). D614G was the only mutation present in all analyzed spike proteins between March 2020 to August 2021 from all the ten states. While, other prevalent mutations like N501Y and T716I (present in seven states), H69del, V70del, Y144del, N440K, P681H (in six states), D215G, K417N, E484K, Q671H and A701V (in five states), L5F, L18F, D80A, Q613H, Q675H (in four states) and Q52R, T95I, W152L, E154K, V382L, L452R, E484Q, A520S, F888L and D1135Y (in three states) showed differential distribution (Table 2 & Figure 2). There were 36 mutations that were present in 2 states each and the remaining mutations were represented individually in various states (Table 2 & Figure 2).
Table 1.

Frequency of observed mutations in various domains of SARS-CoV-2 spike protein.

S no.MutationFrequencyS no.MutationFrequencyS no.MutationFrequency
S1A domain (1-302) 45N87K190L242V1
1V3G146D88H191L242P1
2L5F747S94F292L242del1
3V6A148T95I1693A243S1
4L7S349K97N194A243del1
5S12F650S98F395L244del1
6Q14H251L117I196H245Y1
7L18F1952E132D197D253N1
8T19R153E132Q198W258L1
9T20I154D138Y299G261D1
10R21K155D138H1100G261S1
11R21T256L141F1101A262S1
12Q23R557G142D6102A263V1
13P26L258Y144del27103A263E1
14A27V259Y144H1104P272S1
15A27S360Y145H1105T286I1
16Y28H161W152L5 S1A-S1B linker (303-332)
17G35V162M153I3106F306L1
18S46L163M153K1107T307I1
19V47I164E154K27108P330S1
20H49Y165S155I1 S1B domain (333-527)
21Q52R566E156G1109A344T1
22Q52H167E156Q1110R346T2
23L54F4168F157del1111R365I1
24F59S169R158del1112V367F1
25W64C170N164S1113S373L1
26A67V471Y170L1114D377Y1
27I68del172V171N1115V382L4
28H69del2373S172A1116P384L1
29H69N174S172F1117S399stop1
30V70del2375Q173H1118K417N14
31V70L176G181A2119T430I2
32S71T177G181R1120N439K1
33G75V178F186S1121N440K15
34G75R479F186V1122L452R38
35T76I180S194L1123F456I1
36K77M481P209S1124P463H1
37K77N182V213G1125F464I1
38R78M783R214L3126E465K1
39D80A1484D215G14127R466K1
40D80del385Q218H2128S469T1
41V83F186A222V1129A475G1
42L84I187G232C1130S477I2
43P85T188I233V1131S477N6
44F86I189T240I2132T478K3
133E484Q31 S1-S2 subunit linker (693-710) β-sheet domain (1069-1133)
134E484K24171A694V1205Q1071H8
135E484D1172F698L1206K1073N1
136Q493stop2173A701V16207K1073T1
137S494P4174A701T16208T1077I1
138N501Y75175S704L3209A1078T1
139Y508H2176N709K1210H1083Q17
140L517H1 Central-β strand (711-737) 211D1084Y4
141A520S4177T716I57212A1087S1
142A522S1178M731I2213R1091L1
143A522V1 Downward helix (738-782) 214G1099V1
S1B-S1C linker (528-533) 179A771S2215H1101D9
S1C domain (534-589) S2’ cleavage site (783-815) 216T1116A1
144T547I1180I788V1217D1118H59
145A557S1181Y789F1218V1122L1
146K558N4182K795Q1219G1124V1
147A570D60183P809S1 Heptad repeat region 2 (1134-1213)
148A570S1184P812L33220D1146E1
149T572I5 Fusion peptide (816-828) 221E1150Q1
150D574Y2 Connecting region (829-911) 222D1153Y5
151A575S1185A845S1223P1162L1
152E583D14186R847K1224D1163Y5
153E583Q1187T859I1225I1183V1
S1C-S1D linker (590-593) 188A879S3226N1187K1
S1D domain (594-674) 189S884F1227N1187H2
154Q613H6190F885V1228K1191N5
155D614G697191F888L6229Q1201K3
156D627P1 Heptad repeat region 1 (912-983) Transmembrane region (1214-1236)
157A642I1192V915I1230G1219C1
158A653V1193D936N1231I1225F1
159H655Y3194D936H1232V1228L1
160I670L1 or all195S943P2233M1229I1
Protease cleavage site (675-692) 196S943T3 Cytoplasmic region (1237-1273)
161Q675H11197D950N2234M1237I2
162Q677H41198D950H1235C1248F1
163Q677R2199S982A58236G1251V1
164T678I2 Central helix (984-1034) 237D1259Y2
165P681R33200T1009I1238D1260Y1
166P681H70201A1020S1239P1263L2
167A688V1 β hairpin (1035-1068) 240V1264L1
168A688S3202M1050I2241K1266R1
169S68911203V1060L1242G1267R2
170Q690H5204L1063F3   

Frequency of observed mutations in various domains shown in bold along with their amino acid positions

Figure 2.

Distribution of mutations in human SARS-CoV-2 spike protein regions.

Table 2.

Distribution of mutation types and sites in SARS-CoV-2 spike proteins in Indian states.

S no.StateS proteins (n)Mutations (n)Mutation types and sites
1Gujarat31496L5F (3), L7S (3), L18F (4), Q23R (4), A27V (1), G35V (1), Q52R (1), L54F (41), F59S (1), A67V (1), I68del (1), H69del (14), H69N (1), V70del (13), V70L (1), S71T (1), G75V (1), T76I (1), R78M (6), D80A (4), S94F (2), E132D (1), E132Q (1), D138H (1), Y144DEL (15), W152L (3), S155I (1), S172F (1), Q173H (1), G181A (1), F186V (1), S194L (2), D215G (4), G232C (1), I233V (1), T240I (2), W258L (1), G261S (1), P272S (1), A344T (1), R346T (2), R365I (1), K417N (4), N439K (1), N440K (4), S477N (6), E484K (5), Q493stop (2), S494P (1), N501Y (20), Y508H (2), D574Y (1), A557S (1), K558N (3), A570D (15), A570S (1), T572I (4), E583D (14), Q613H (2), D614G (302), H655Y (2), Q675H (2), Q677H (21), P681H (22), A688S (3), A688V (1), Q690H (6), A701T (16), A701V (5), S704L (3), T716I (15), M731I (1), K795Q (1), P809S (1), P812L (31), A879S (2), F885V (1), F888L (1), D936H (1), D936N (1), S982A (15), M1050I (2), L1063F (2), T1077I (2), H1083Q (17), A1087S (1), R1091L (1), T1116A (1), D1153Y (1), D1163Y (2), I1183V (1), N1187K (1), K1191N (4), Q1201K (1), D1259Y (1), M153I (3)
2Maharashtra152106V3G (1), L5F (3), S12F (5), Q14H (1), L18F (12), T20I (1), R21T (2), P26L (1), A27V (1), A27S (2), S46L (1) V47I (1), H49Y (1), Q52H (1), L54F (2), W64C (1), H69del (2), V70del (2), G75R (1), R78M (1), D80A (4), V83F (1), L84I (1), P85T (1), F86I (1), T95I (12), K97N (1), S98F (1), L141F (1), G142D (1), Y144H (1), Y144del (5), Y145H (1), W152L (1), E154K (20), Y170L (1), V171N (1), S172A (1), F186S (1), P209S (1), R214L (3), D215G (4), Q218H (2), A222V (1), L242del (1), A243del (1), L244del (1), D253N (1), A263V (1), A263E (1), T286I (1), T307I (1) S373L (1), V382L(1), K417N (5), T430I (1), N440K (1), L452R (2), S477I (1), E484K(7), E484Q (21), N501Y (14), L517H (1), T547I (1), A570D (9), E583Q (1), Q613H (2), D614G (148), Q675H (5), Q677H (12), P681H (10), P681R (17), A694V (1), S698L (1), A 701V (4), T716I (9), I788V (1), Y789F (1), A845S (1), T859I (1), A879S (1), V915I (1), D950H (1), S982A (9), T1009I (1), V1060L (1), L1063F (1), G1099V (1), H1101D (8), D1118H (9), V1122L (1), G1124V (1), E1150Q (1), D1153Y (1), P1162L (1), N1187H (2), K1191N (1), I1225F (1), V1228L (1), M1237I (2), C1248F (1), P1263L (1), V1264L (1), K1266R (1), G1267R(1)
3Karnataka6744S12F (1), Q52R (3), H69del (5), V70del (5), G75R (1), K77M (4), D80del (1), D88H (1), T95I (1), D138Y (1), Y144del (5), E154K (4), E156Q (1), D215G (3), V382L (2), K417N (3), N440K (5), L452R (5), E484K (6), E484Q (5), N501Y (12), A570D (7), D574Y (1), D614G (52), D627P (1) A642I (1), A653V (1), H655Y (1), Q677H (3), P681H (11), A701V (3), T716I (9), F888L (3), S982A (7), Q1071H (5), K1073N (1), A1078T (1), D1084Y (4), D1118H (7), D1153Y (2), G1219C (1), D1259Y (1), P1263L (1), G1267R (1)
4Delhi6035R21K (1), P26L (1), A27S (1), Q52R (1), A67V(3), K77N (1), D80A (1), G181R (1), D215G (1), L242P (1), V367F (2), D377Y (1), K417N (1), T430I (1), S477I (1), T478K (1), E484K (4), N501Y (11), A520S (1), A522V (1), K558N (1), A570D (12), T572I (1), Q613H (1), D614G (51), I670L (1), Q677H (1), P681H (8), A701V (1), T716I (7), R847K (1), F888L(2), S943P(1), S982A (10), D1118H (10)
5Chhattisgarh4338T19R (1), H69del (1), V70del (1), N87K (1), T95I (3), G142D (5), Y144del (1), W152L (1), E154K (3), E156G (1), F157del (1), R158del (1), N164S (1), L242V (1), A243S (1), H245Y (1), G261D (1), A262S (1), F306L (1), V382L (1), N440K (2), L452R (7), T478K(2), E484Q (5), N501Y (14), A570D (14), A575S (1), Q613H (1), D614G (41), T678I (2), P681H (15), P681R (7), T716I (14), P812L(2), D950N (2), S982A (15), Q1071H (3), D1118H (14)
6Telangana4233L5F (1), L18F (1), Y28H (1), H69del (2), V70del (2), D80A (2), Y144del (2), D138Y (1), V213G (1), D215G (2), P384L (1), K417N (1), N440K (2), E484D (1), E484K (2), N501Y (5), A520S (3), A570D (3), E583D (1), D614G (35), Q675H (1), P681H (4), A701V (2), N709K (1), T716I (3), M731I (1), A771S (2), S982A (3), S943T (1), K1073T (1), D1118H (3), M1229I (1), G1251V (1)
7Madhya Pradesh3121V6A (1), H69del (1), V70del (1), Y144del (1), S399stop (1), F456I (1), P463H (1), F464I (1), E465K (1), R466K (2), S469T (1), A475G (1), N501Y (1), A570D (1), E583D (1), D614G (26), T716I (1), S884F (1), S943P (1), S982A (1), D1118H (1)
8Uttar Pradesh207L5F (1), L117I (1), A522S (1), D614G (15), Q677H (1), A1020S (1), D1260Y (1)
9Haryana183D614G (15), P330S (1), M1237I (1)
10Rajasthan1612L18F (2), Q23R (1), S98F (2), M153K (1), N440K (1), D614G (12), Q677H (2), Q677R (3), S689I (1), D1146E (1), D1163Y (3), Q1201K (2)

Occurrences are indicated in parenthesis; 5 to 9 occurrences represented in bold; 10 and above are represented as bold and underlined.

Frequency of observed mutations in various domains shown in bold along with their amino acid positions Occurrences are indicated in parenthesis; 5 to 9 occurrences represented in bold; 10 and above are represented as bold and underlined.

Distribution of mutations among domains & subdomains of spike protein

The observed 242 mutations were found scattered through all the domains and subdomains of the spike protein except the S1B–S1C and S1C–S1D linkers, and the fusion peptide domains wherein no mutations were found (Table 1 & Figure 2). Mutations that were observed in just a single sample were termed as unique. The highest number of mutations (105) with 43.4% occurred in S1A domain with 81 mutations listed as unique. It was followed by 35 mutations found in S1B domain (14.5%) with 21 unique mutations and 15 mutations in β-sheet domain (6.9%) with 13 unique mutations (Table 1 & Figure 2).

Frequent spike protein mutations

As many as 170 out of 242 observed mutations were found to be unique. 28 mutations that appeared in more than 10 samples each were listed as frequent mutations. These frequent mutations with number of their occurrences and corresponding mutation frequency given in a decreasing order in parenthesis included: D614G (697; 0.92), N501Y (75; 0.099), P681H (70; 0.092), A570D (60; 0.079), D1118H (59; 0.078 ), S982A (58; 0.077), T716I (57; 0.075), Q677H (41, 0.054), L54F (41, 0.054), L452R (38; 0.05), P812L (33; 0.043), P681R (33; 0.043), E484Q (31; 0.041), E154K (27; 0.036), Y144del (27; 0.036), E484K (24; 0.031), H69del (23; 0.03), H70del (23; 0.03), L18F (19; 0.025), H1083Q (17; 0.022), A701V (16; 0.021), A701T (16; 0.021), S94F (16; 0.021), N440K (15; 0.02), D80A (14; 0.018), D215G (14; 0.018), K417N (14; 0.018) and Q675H (11; 0.014; Table 1). The remaining mutations showed anywhere between two to ten occurrences (Tables 1 & 2). The mutations showing five to nine occurrences are represented as bold, while 10 and above occurrences as bold and underlined in Table 2.

Mutation density in different regions of spike protein

The region-wise mutation density in the human SARS-CoV-2 spike protein is depicted in Figure 3. The highest mutation density (0.55) was observed in protease cleavage site, followed by 0.35 in S1–S2 subunit linker region, 0.34 in S1A domain, 0.24 each in cytoplasmic region and β-sheet domain. The S1B domain that forms the receptor binding domain and the transmembrane region showed a moderate mutation density of 0.17 each in the analyzed samples.
Figure 3.

Mutation density in human SARS-CoV-2 spike regions in the analyzed samples.

Spike protein sites with multiple mutation types

We observed 34 sites in the analyzed spike proteins that harbored two or three different mutations at the same position in the spike protein sequence. Two sites 242 and 484 displayed three mutations each. At the former site, the amino acid leucine (L) was either deleted or changed to amino acid valine (V) or proline (P), while in the latter, glutamic acid (E) changed to either glutamine (Q), lysine (K) or aspartic acid (D). At the remaining 32 sites, two changes each were observed (Figure 4).
Figure 4.

Human SARS-CoV-2 spike protein sites with multiple amino acid substitutions in ten Indian states.

Spike proteins with maximum mutations

Seven hundred thirteen Indian human SARS-CoV-2 spike proteins carried from one to thirteen mutations. The highest number of mutations (13) were observed in spike protein from state of Maharashtra (B.1.351; EPI_ISL_1704062) followed by 11 each in Gujarat (B.1.1.7; EPI_ISL_1544115) and Chhattisgarh (B.1.617.2; EPI_ISL_1731755), 10 each in Telangana (B.1.1.7; EPI_ISL_1672391) and Karnataka (B.1.1.7; EPI_ISL_747244), 9 each in Delhi (B.1.351; EPI_ISL_1716786) and Madhya Pradesh (B.1; EPI_ISL_1251265), respectively. The parenthesis indicate the lineage description and representative GISAID accession number of the analyzed sample.

Mapping of observed mutations in RBD of SARS-CoV-2 spike protein

The RBD of spike protein interacts with host ACE2 receptor to mediate viral entry. The RBD region of spike proteins from ten different states revealed 35 distinct mutation types located at 31 distinct RBD sites as A344T, R346T, R365I, V367F, S373L, D377Y, V382L, P384L, S399stop, K417N, T430I, N439K, N440K, L452R, F456I, P463H, F464I, E465K, R466K, S469T, A475G, S477I, S477N, T478K, E484D, E484K, E484Q, Q493stop, S494P, N501Y, Y508H, L517H, A520S, A522S, and A522V (Table 2 & Figure 5). Mapping of these 31 mutations was done with a three dimensional crystal structure of human SARS-CoV-2 RBD complexed with ACE2 receptor (PDB code : 6LZG) as shown in Figure 6. Six of ten analyzed states except Karnataka, Chhattisgarh, Haryana and Rajasthan showed unique mutations that showed up as A344T, R346T, R365I, N439K, Q493stop, S494P and Y508H in Gujarat; S373L and L517H in Maharashtra; V367F, D377Y and A522V in Delhi; P384L and E484D in Telangana; S399stop, F456I, P463H, F464I, E465K, R466K, S469T, and A475G in Madhya Pradesh and A522S in Uttar Pradesh, respectively (Table 2).
Figure 5.

Mutations at 31 sites in the receptor binding domain of SARS-CoV-2 spike protein.

Figure 6.

Mapping of 35 mutations on the crystal structure of spike protein receptor binding domain complexed with ACE2 receptor.

PDB code: 6LZG; shown in yellow balls.

Mapping of 35 mutations on the crystal structure of spike protein receptor binding domain complexed with ACE2 receptor.

PDB code: 6LZG; shown in yellow balls. The state-wise mutation types and sites of human SARS-CoV-2 spike protein in the RBD according to geographical locations is included in Table 2. The five most frequent mutations sites with occurrences in various states included 501 (G, M, K, D, CH, T and MP), 484 (G, M, K, D, CH, and T), 440 (G, M, K, D, CH and T), 452 (M, K, and CH) and 417 (G, M, K, D and T), respectively. A maximum number of mutants were observed at position 501 in the RBD of spike protein. This site harbored N501Y mutation that was present in 77 spike proteins from seven states (G, M, K, D, CH, T and MP). Site 484 with 56 mutations included mutants wherein amino acid E changed to K in 24 samples, to Q in the rest (Figure 5). We observed 50 spike proteins representing clade O (Wuhan isolate) with no mutations observed in them relative to the human SARS-CoV-2 spike protein reference sequence from Wuhan, China. These samples were present in all the analyzed states except Chhattisgarh. The maximum nine samples each were present in Gujarat and Delhi, while Haryana had the minimum number as three. Interestingly, amidst the presence of so many mutant lineages, clade O was visible in India throughout the assessment period from March 2020 until August 2021.

Samples without D614G

Although mutation D614G was ubiquitously registered in samples from all the ten states, there were 16 samples from five states viz. Gujarat (3; EPI_ISL_483869, EPI_ISL_483870 and EPI_ISL_586542), Karnataka (8; EPI_ISL_486398, EPI_ISL_486409, EPI_ISL_486408, EPI_ISL_486389, EPI_ISL_515961, EPI_ISL_515967, EPI_ISL_1595890 and EPI_ISL_1595875), Chhattisgarh (2; EPI_ISL_1173164 and 1 EPI_ISL_173162), Telangana (2; EPI_ISL_431101 and EPI_ISL_ 850514) and Madhya Pradesh (1; EPI_ISL_476840) which did not possess the most widely occurring mutation D614G. These samples were visible between March 2020 to July 2020 (Table 1). All the samples in remaining five states, and samples from aforementioned states after July 2020 contained D614G mutation either singly or in combination with other mutations.

Preferential site specific positive selection pattern for spike amino acid substitutions

Interestingly, we observed a preferential site specific positive selection pattern for particular amino acid substitutions in the spike protein. For instance, there were 11 sites (95, 240, 286, 307, 430, 547, 572, 678, 716, 859 and 1009), where amino acid threonine (T) always preferentially changed to isoleucine (I; 23 occurrences) across all the analyzed samples. T95I and T716I of these were the most frequent mutation types and sites. Only at three sites viz. 19 (Chhattisgarh), 478 (Chhattisgarh) and 1116 (Gujarat), T was observed to change to arginine (R), K and alanine (A), respectively (Table 2). Likewise, amino acid Q changed to histidine (H) at seven sites (14, 52, 218, 613, 675, 677 and 1071) in 13 occurrences. Q677H was the most frequent mutation type and site followed by Q613H. Except for one instance of Q677R mutation (karnataka), Q always changed to H at the 677 position across the states. Another preferential substitution was amino acid methionine (M) to I, observed at four sites (731, 1050, 1229 and 1237) with M731I being the most frequent mutation type and site in three occurrences. Phenyl alanine (F) to I (456 and 464) and Q to R (52 and 677) substitutions were observed at the mentioned two sites each (Table 2).

State-wise mutation analysis

State-wise compilation of observed spike protein mutations types and sites for 10 Indian states is presented in Table 2. The characteristic signature mutations of these lineages are listed in Table 3 [15]. We observed a significant spike protein evolution and all the signature mutations of a particular lineage were not observed in all its spike proteins and rather 51 novel mutations V3G, R21T, L18F, Q52R, H69N, L84I, P85T, F86I, S94F, T95I, L141F, G142D, Y145H, E154K, E156G, T164S, 168del, Y170L, V171N, S172A, G181A, Q218H, L242del, L244del, H245Y, A263E, F306L, T307I, R365I, V382L, N440K, F456I, P463H, F464I, E465K, R466K, A475G, T572I, E583Q, A653V, H655Y, S698L, I712V, F855V, S943P, H1101D, D1153Y, N1181K, N1187H, P1263L and V1264L were recorded in the analyzed samples in various states. Figures 7 & 8 illustrate the documentation of novel mutation sites and types showing lineage-wise and state-wise evolution of human SARS-CoV-2 spike protein in India.
Table 3.

Newly evolved mutations in the analyzed SARS-CoV-2 lineages circulating in India.

Sl no.PANGO LineageSignature mutationsNewly evolved mutations state wise
1B.1.1.7del69/70, del144/145, N501Y, A 570D, D614G, P681H, T716I, S982A, D1118HM: L84I, Y145HG: H69N, S94F, 168delD: T572IK: A653VCh: H245Y, I712V
2B.1.351D80A, D215G, del241/243, K417N, E484K, N501Y, D614G, A701VM: L18F, P85T, F86I, Y170L, V171N, S172A, L242del, L244delG: L18F, F855VD: G181A, S943PK: L18F, H655Y
3B.1D614GMP: F456I, P463H, F464I, E465K, R466K, A475G
4B.1.36.17D614GM: N440K, P1263L
5B.1.525Q52V, A67V, H69/70del, del144/145, E484K, D614G, Q677H, F888LG: Q52R, R365ID: Q52R
6B.1.617.1L452R, E484Q, D614G, P681R, Q1071HK: T95I, E154K, V382L, D1153YCh: T95I, G142D, E154K, T164S, H1101D
7B.1.617.2T19R, del157/158, L452R, T478K, D614G, P681R, D950NCh: G142D, E156G, F306L
8B.1.617.3T19R, L452R, E484Q, D614G, P681RM: V3G, R21T, T95I, L141F, G142D, E154K, Q218H, A263E, T307I, V382L, E583Q, S698L, H1101D, D1153Y, N1181K, N1187H, V1264L

Ch: Chhattisgarh; D: Delhi; G: Gujarat; K: Karnataka; MP: Madhya Pradesh; M: Maharashtra.

Figure 7.

Newly evolved mutations in human SARS-CoV-2 at various spike protein sites in Indian states.

Figure 8.

Newly evolved mutations in human SARS-CoV-2 at various spike protein sites in Indian states.

Ch: Chhattisgarh; D: Delhi; G: Gujarat; K: Karnataka; MP: Madhya Pradesh; M: Maharashtra.

Gujarat (G)

A total of 96 mutations were observed in 314 samples analyzed from Gujarat state (Table 2). Forty eight mutations were unique, present just in one spike protein. D614G was the most predominant mutation found in 302 samples. Other frequent spike mutations with their occurrences in parenthesis included L54F (41), H69del (14), V70del (13), R78M (6), Y144del (15), S477N (6), N501Y (20), A570D (15), E583D (14) D614G (302), Q677H (21), P681H (22), A701T (16), T716I (15), P812L (31), S982A (15), H1083Q (17). Nine samples represented Clade O (Wuhan genotype). The preferential amino acid substitutions at specific positions for Gujarat state included M to I and T to I substitutions at three and five spike proteins sequence positions, respectively. There was a lone case where T was substituted to A as T1116A. Among the 12 mutations in the RBD region of the spike protein in the state, the most frequent mutations with their occurrences in the parenthesis included K417N (4), N440K (4), S477N (6), E484K (5), and N501Y (20; Table 2). After December 2020, in the second wave of infections, three lineages B.1.1.7, B.1.351 and B.1525 were reported in the analyzed samples from the state. The signature mutations for these lineages are given in Table 3. Ten spike proteins of B.1.1.7 lineage showed 10 signature mutations only, while the remaining five showed convergent evolution. In GISAID accessions EPI_ISL_825054 (eight mutations) and EPI_ISL_825059 (10 mutations), H69del and V70del went missing in the former, while, H69del was replaced by H69N and V70del was replaced by 168del in the latter. In accessions EPI_ISL_1544115, EPI_ISL_1544116, and EPI_ISL_1544117, highest 11 mutations each were recorded. While S94F was added as 11th mutation to the set of 10 signature mutations in the first two, T1116A was added to the third, respectively. H69N, S94F and 168del were the three newly evolved mutations in this lineage in the state, which were absent in the signature mutations that defined the lineage. Lineage B.1.351 was characterized by eight signature mutations (Table 3). In all the analyzed samples in the Gujarat, seven out of eight mutations excluding del241/243 were observed. It was replaced by L18F in GISAID accession EPI_ISL_1704309 and EPI_ISL_1703836. In the latter spike protein, an additional F855V mutation was also noted taking the number of maximum mutations recorded to nine for this lineage. L18F and F855V were the novel mutations that arose in the state in this lineage. In the lineage B.1.525 in Gujarat, the characteristic signature mutation del144/145 was not observed in the analyzed samples, rather a new mutation arose as R365I. Q52V of the signature mutation changed to Q52R (Table 3).

Maharashtra (M)

A higher number of 106 mutations were reported in 156 analyzed spike proteins from the state of Maharashtra that abated the most severe second wave of infections in the country (Table 2). There were 70 unique mutations that were present just in only one spike protein each, while 15 frequently occurring mutations according to their total number of occurrences included S12F (5), L18F (14), T95I (12), E154K (20), E484Q (21), N501Y (14), A570D (9), D614G (152), Q677H (14), P681H (10), P681R (17), T716I (9), S982A (9), H1101D (8) and D1118H (9). Mutation D614G was conspicuously present in 152 samples, while four samples represented clade O (Wuhan clade) and did not show any mutation. Among the 12 mutations in the RBD region of the spike protein in the state (Table 2), the most frequent mutations with their occurrences in the parenthesis included K417N (5), E484K (7), E484Q (21) and N501Y (14). After December 2020, in the second wave of infections, more transmissible and virulent lineages B.1.617, B.1.1.7, B.1.351 and B.1.36.17 became visible in the state. In 19 samples of lineage B.1.617, one of the signature mutations D1071H (Table 3), was not been observed in the state. Additionally, many new substitutions were observed at various sites. E154K and T95I were the most frequent combination visible in 15 of the 19 samples until February 2021. After which, yet another new mutation H1101D started becoming visible. Interestingly, T95I was not seen after February and never in combination with H1101D in our samples. The total number of mutations ranged from six to eight in this lineage. Four of the signature mutations L452R, E484Q, D614G and P681R were observed in all the samples. Additional mutations included V3G, R21T, L141F, G142D, E154K, Q218H, T307I, E583Q, S698L, V1060L, D1153Y, N1187H and V1264L (Table 3, Figures 7 & 8). In the lineage B.1.1.7, nine samples were analyzed. Except del 69/70 that was observed in two analyzed samples remaining seven signature mutations were present in all the samples. The total number of mutations ranged from seven to ten additional mutations included L84I, Y144del and Y145H (Table 3). Lineage B.1.351 is characterized by eight signature mutations. The number of observed mutations in the analyzed samples varied from five to thirteen. Sample EPI_ISL_1704062, showed the highest mutations (13) among all the analyzed samples, across 10 states. Sample 1703656 showed minimum (5) signature mutations, i.e; K417N, E484K, N501Y, D614G and A701Y, which were present in all the samples. Additional substitutions included L18F, P85T, F86I, Y170L, V171N, S172A, L242del and L244del (Table 3). The lineage B.1.36.17, contained two new mutations N440K and P1263L along with the signature mutation D614G (Table 3).

Karnataka (K)

A total of 44 mutations with 18 unique mutations were observed in 61 analyzed spike proteins (Table 2). Clade O with no mutations with respect to the Wuhan reference sample was visible in seven samples. While the frequently occurring mutations included N440K (5), L452R (5), E484K (6), E484Q (5), D614G (52), N501Y (12), A570D (7), P681H (11), T716I (9), S982A (7), Q1071H (5) and D1118H (7), the most frequently occurring mutation D614G was absent in eight analyzed spike proteins. Among the 7 mutations in the RBD region of the spike protein in the state (Table 2), the most frequent mutations with their occurrences in the parenthesis included N440K (5), L452R (5), E484K (6), E484Q (5) and N501Y (12). Four lineages B.1.1.7, B.1.525, B.1.351 and B.1.617 surfaced in the state in December 2020. Sub-lineage B.1.617.1 only of Indian variant was visible in our analyzed samples from the state. The highest total number of observed mutations in Karnataka in the first wave of infections and after introduction of above lineages in the second wave was three and ten per spike protein, respectively. In five out of nine spike proteins belonging to lineage B.1.1.7, all ten signature mutations of the lineage were observed. While in the remaining two spike proteins (accession ID EPI_ISL_159589 and EPI_ISL_1595874) three each (N501Y, P681H and T716I), seven (EPI_ISL_1818595, H69del, V70del and Y144del absent) and eight (EPI_ISL_1818647, H69del, V70del, Y144del absent and A653V additional present) mutations were documented (Figure 7). Only sub-lineage B.1.617.1 of the Indian variant B.1.617 was visible in our analyzed samples in the state of Karnataka. The total number of mutations ranged from five to seven. Five signature mutations L452R, E484Q, D614G, P681R and Q1071H were observed in all the samples. Additional amino acid substitutions included T95I, E154K, V382L, and D1153Y (Table 3 & Figure 8). Lineage B.1.525 uniformly showed five spike protein amino acid substitutions namely Q52R, E484K, D614G, Q677H, and F888L, and no additional substitutions were observed in the analyzed samples. Lineage B.1.351 spike proteins showed either seven or eight amino acid substitutions each. Seven signature mutations namely D80A, D215G, K417N, E484K, N501Y, D614G and A701V were present in all the samples while the additional new substitutions included L18F and H655Y (Table 3 & Figure 7).

Delhi (D)

A total of 35 mutations were observed in 60 analyzed spike proteins in Delhi (Table 2). There were 35 unique mutations and clade O was observed in 9 samples (Table 1). D614G was present in all the remaining 51 samples. The other frequent mutations included A570D (12), N501Y (11), P681H (8), S982A (10) D1118H (10). Among the 10 mutations in the RBD region of the spike protein in the state (Table 2), the most frequent mutations with their occurrences in the parenthesis included V367F (2), E484K (4) and N501Y (11). Three PANGO lineages B.1.525, B.1.1.7 and B.1.351 have been observed in the analyzed Delhi samples. Thirteen samples of lineage B.1.1.7 revealed two (EPI_ISL_1716802) to eight mutations (EPI_ISL_1716813). Seven signature mutations except del69/70 and del144/145 were visible in the state, though not in every sample. Sample EPI_ISL_1716813 contained seven signature mutations and an additional new T572I mutation. In the remaining samples, for instance, four, three and two samples contained seven, six and two of the signature mutations, respectively, while three, four, and five mutations were represented in one sample each only (Tables 2 & 3). In lineage B.1.525 samples, number of mutations recorded ranged from four to five. Five of the signature mutations analyzed were A67V, D614G, E484K, Q677H and F888L. del69/70 and del144/145 were not observed (Table 3). At 52 spike protein sequence site, V (Valine) of the signature mutation was substituted by R (Arginine). Q52R was hence a new substitution (Table 3 & Figure 7). Total number of reported mutations was nine for the South African lineage B.1.351. Seven signature mutations except del241/243 were visible in the analyzed samples. Additionally new substitutions G181A and S943P were also observed (Table 3 & Figure 7).

Chhattisgarh (CH)

Forty-three samples analyzed from the state showed 38 mutations. Most frequent mutations included L452R (7), N501Y (14), A570D (14), D614G (41), P681H (15), T716I (14), S982A (15), and D1118H (14) (Table 2). There were 18 unique mutations and D614G was found to be absent in samples EPI_ISL_1173164, and EPI_ISL_1173162. clade O was not observed in the analyzed samples in Chhattisgarh. Preferential amino acid substitutions from T to I was observed in T95I (3), T678I (2), T716I (14), while T (Threonine) also changed to K (Lysine) and R (Arginine) as T478K (2) and T19R (1), respectively. Likewise, Q (Glutamine) to H (Histidine) substitutions were observed at spike sequence positions Q613H (1), Q1071H (3). Among the six mutations in the RBD region of the spike protein in the state, the most frequent mutations with their occurrences in the parenthesis included L452R (7), E484Q (5), and N501Y (14) (Table 2). During the second wave of infections, two lineages B.1.617 and B.1.1.7 were detected in the analyzed samples. The Indian variant was represented as two sub-lineages B.1.617.1 and B.1.617.2. A total of five to eight mutations with five signature mutations L452R, E484Q, D614G, P681R, and Q1071H were present in all samples of B.1.617.1 sub-lineage. Additionally new amino acid substitutions T95I, G142D, E154K, T164S and H1101D were observed (Table 3). Sub-lineage B.1.617.2 showed six to ten (EPI_ISL_ 1731755) mutations. All the seven signature mutations were observed in the state with additional new amino acid substitutions listed as G142D, E156G and F306L (Table 3 & Figure 8). Fourteen samples of lineage B.1.1.1.7 showed seven to eleven mutations. Eleven samples showed seven of the signature mutations of the lineage, while with an additional new substitution I712V, this number rose to eight in two samples. H69del, H70del, Y144del and a new H245Y were the additional mutations recorded in the sample with highest 11 mutations (EPI_ISL_1731751) in the state. H245Y, I712V were the new amino acid substitutions recorded in the state in this lineage (Table 3 & Figure 7).

Telangana (T)

Forty-two samples revealed 33 amino acid substitutions in the state of Telangana. The total mutations varied from one to ten. Clade O was present in five samples. The most frequent mutation D614G, present in 35 samples was not detected in samples EPI_ISL_850514 and EPI_ISL_495164. Preferential amino acid substitutions were visible as T to I (T716I) and M to I substitutions (M731I, M1229I). The seven mutations in the RBD region of the spike protein in the state with their occurrences in the parenthesis included P384L (1), K417N (1), N440K (2), E484D (1), E484K (2), N501Y (5) and A520S (3) (Table 2).

Madhya Pradesh (MP)

Thirty-one samples showed 21 amino acid substitutions in the spike protein. Nineteen mutations were unique and four samples depicted clade O. D614G was the most frequent mutation observed in 26 samples. Preferential amino acid substitution in terms of change of F to I was observed at two spike sequences F456I and F464I. Lineage B.1 seen in the state encompassed mutations as F456I, P463H, F464I, E465K, R466K, A475G, D614G showing substitution of four successive AAs (463 to 466) in a single sample (EPI_ISL_1251265).The nine mutations in the RBD region of the spike protein in the state with their occurrences in the parenthesis included S399stop (1), F456I (1), P463H (1), F464I (1), E465K (1), R466K(2), S469T (1), A475G (1), N501Y (1) (Table 2).

Rajasthan (R)

Twelve mutations were detected in 16 samples and five mutations were unique. Clade O was found in four samples. D614G was the most common mutation found in 12 samples. N440K was the only mutation located in RBD of the spike protein (Table 2).

Haryana (H)

Three mutations observed in 18 samples with clade O represented in three samples, D614G was the dominant mutation detected in 15 samples. P330S was the only mutation located in RBD of the spike protein (Table 2).

Uttar Pradesh (UP)

The analyzed 20 samples from the state revealed seven mutations. There were six unique mutations and clade O was detected in five samples and D614G was the most frequent mutation that was present in all the remaining samples. A522S was the only mutation located in RBD of the spike protein (Table 2).

Discussion

As the SARS-CoV-2-induced pandemic expanded, research efforts around the world started focusing on whole-genome sequencing, understanding epidemiology, documenting genetic diversity, analyzing the evolving mutations and their effects, monitoring the genomic evolution of the virus and developing diagnostics, therapeutics and vaccines. Detailed mutation analysis of human SARS-CoV-2 spike proteins sampled between March 2020 to August 2021, involving both the first and second waves of COVID-19 infections from ten different Indian states in the present analysis revealed both a rise in number as well as the induction of many novel functionally significant mutations in the SARS-CoV-2 spike protein in India. Introduction of global lineages, the emergence of local variant strains and induction of new mutations in the existing strains with the passage of time and laxity of COIVD-19 appropriate behavior contributed to this upsurge. The number of spike mutations in various states varied from one to five during the first wave of COVID-19 infections, while a sudden spurt in the frequency of spike protein mutation took the maximum number to as high as thirteen per spike protein during the second wave of infections. Since its emergence in 2019, the SARS-CoV-2 virus has been continuously evolving and has resulted in many novel lineages that emerged in different parts of the world. During the first wave of infections in India until September 2020, lineage B.1 with D614G mutation was visible in the analyzed samples. Later, between October and December 2020, lineage B.1.1.7 also became visible with signature mutations del69/70, del144/145, N501Y, A 570D, D614G, P681H, T716I, S982A and D1118H. While, from January 2021 onwards, eight SARS-CoV-2 lineages B.1.1.7, B.1, B.1.351, B.1.525, B.1.36.17, B.1.617.1, B.1.617.2 and B.1.617.3 could be observed in our representative sample. Lineage B.1.617 popularly known as the ‘Indian variant’ with its three main sub-lineages B.1.617.1 (Kappa), B.1.617.2 (Delta) and B.1.617.3 first emerged in India in December 2020. B.1.617.2 sub-lineage with the basic reproductive number (Ro) of 5.08 versus 2.79 for Wuhan-Hu-1 strain outcompeted pre-existing lineages B.1, B.1.1.7 (Alpha) and B.1.617.1 (Kappa). With predominant mutations T19R, del157-158, K417N, L452R, T478K, E484K/Q, P681R and D950N [15], the Delta variant showed higher transmissibility and risk of hospitalization [16-18] and fuelled the massive devastation associated with the second wave of COVID-19 infections in India. The further evolution in the prevalent global and local lineages in the analyzed samples was seen to be propelled by the induction of additional novel mutations which were visible both in terms of increased number and their diverse types. The maximum number of 13 mutations per spike protein was documented in the state of Maharashtra which battled a brutal and long phase of COVID-19 infections. From January to August 2021 (second wave) a total of 51 new mutations besides the listed signature mutations evolved in eight prevalent lineages in 763 samples from ten Indian states included in the present study. The emergence of many new mutations in addition to the signature mutations of specific lineages indicates a convergent evolution in the virus that might be responsible for an increase in transmission and pathogenesis. The selection pressure resulting from the rollout of the vaccination program in January 2021 along with other factors might have triggered the emergence of novel spike mutations in India. Several studies have documented the role of naturally induced mutations in RBD and other regions of spike protein in viral entry, transmission, infectivity, pathogenesis and immune escape [7,19,20,21]. VOCs Alpha and Delta showed higher transmission rate and spread globally and VOCs Alpha and Beta were discovered to be resistant to neutralizing antibodies, thereby, affecting the effectiveness of vaccines [22]. In the presently analyzed dataset, maximum number of mutations (43%) were recorded in SIA domain followed by SIB (14.5%) and β-sheet (6.9%). The SIA spike domain includes the N-terminal domain (NTD), RBD and the receptor binding motif (RBM). These three important regions of spike protein have been the primary targets of vaccines and antibody-based therapeutics including monoclonal antibodies (mAbs), polyclonal antibodies and convalescent plasma. The spike RBD has undergone extensive evolution with the passage of time and as many as 824 RBD mutations have already been documented [23]. In this study, the RBD region including the RMF revealed 35 distinct mutation types located at 31 distinct sites as A344T, R346T, R365I, V367F, S373L, D377Y, V382L, P384L, S399stop, K417N, T430I, N439K, N440K, L452R, F456I, P463H, F464I, E465K, R466K, S469T, A475G, S477I, S477N, T478K, E484D, E484K, E484Q, Q493stop, S494P, N501Y, Y508H, L517H, A520S, A522S and A522V. Many of these mutations have known functions to contribute to increased pathogenesis, transmission and immune evasion. In the context of spike RBD mutations, Delta sub-lineages characteristically contain two mutations each in the spike RBD and are commonly known as double mutants. B.1.617.1 (Kappa) and B.1.617.3 contain L452R and E484Q mutations each, while B.1.617.2 (Delta) has L452R and T478K as the RBD mutations [24]. The L452R mutation is associated with increased infectivity by enhanced interaction between RBD and hACE2 receptor [25]. Co-operatively, T478K and T452R mutations stabilize the RBD–ACE2 complex to increase the rate of virus infectivity and affect the immune response [20]. Another important spike mutation P681R leads to increased furin cleavage leading to greater infectivity and higher viral loads. Together, this cocktail of B.1.617.2 mutations imparts higher transmissibility, infectivity and immune evasion potential [20,25-27]. In the present samples, D614G has been the most widely occurring mutation, present in 697 out of the 763 analyzed viruses. It has been shown to result in significantly higher transmission and host infectivity [28]. The other frequent mutations observed were N501Y, P681H, A570D and D1118H observed in 77, 70, 60 and 59 of the analyzed 763 viruses, respectively. With respect to the state wise distribution of mutations, five most frequent spike mutations sites with occurrences in various states included 501 (G, M, K, D, CH, T and MP), 484 (G, M, K, D, CH and T), 440 (G, M, K, D, CH and T), 452 (M, K, and CH) and 417 (G, M, K, D and T), respectively. Interestingly, we observed state-specific unique set of mutations in six out of ten analyzed states that showed up as A344T, R346T, R365I, N439K, Q493stop, S494P and Y508H in Gujarat; S373L and L517H in Maharashtra; V367F, D377Y and A522V in Delhi; P384L and E484D in Telangana; S399stop, F456I, P463H, F464I, E465K, R466K, S469T and A475G in Madhya Pradesh and A522S in Uttar Pradesh. The differential level of infections, pathogenicity and transmission in different states could possibly be due to these state-specific mutations. Functional characterization of many of the presently observed mutations has been illustrated in many studies [20,21,27-32]. Most of the RBD mutations strengthen the RBD–ACE2 binding supporting the evolution of the virus to more infectious variants [21,27,29]. Mutations N501Y, D614G and others have been found to be associated with reinfection, partial resistance to vaccines and increased transmissibility. Mutation V367F, N354D and T478K in SARS-CoV-2 in RBD has been associated with enhanced hACE2 binding affinity and increased viral infectivity [21,27,29]. Mutations L452R and E484Q found in Indian variants have been shown to disrupt the binding between RBD and many known antibodies leading to vaccine escape [21]. Cherian et al. 2021 [20] attributed the higher pathogenicity, transmission and acute infections of SARS-CoV-2 in the state of Maharashtra to the presence of spike mutations L452R, T478K, E484Q and P681R. Further, Wang et al. [21] identified R403K, K417N/T, L452R, A475S, E484K, F486L, F490S/L, Q493L and S494P as most likely vaccine escape mutations. The S1/S2 cleavage site mutations H655Y, N679K and P681H result in increased S1/S2 furin cleavage and facilitate efficient entry into the host [28,30-32]. Many spike mutations have been shown to affect the neutralization ability of monoclonal and polyclonal therapeutic and convalescent antibodies. While mutation N439K resulting in increased binding affinity with ACE2 receptor was observed to neutralize the monoclonal and polyclonal antibodies in people who recovered from infection [31,33], deletion mutation Y144del was observed to modulate the effects of neutralizing antibodies [30,34]. Likewise, mutations K417N/T, N439K, L452R, Y453F and N501Y have been listed to be the most significant immune escape spike RBD mutations [35]. Further, multiple spike substitutions at 477 (S477G, S477N and S477R) and 484 positions (E484A, E484D and E484K) have characteristically shown resistance toward convalescent sera [28,31,35]. Interestingly, amidst the presence of so many mutant and more transmissible and virulent lineages, clade O (Wuhan isolate) was visible in India throughout the assessment period from March 2020 even until 2021. As many as 50 out of 763 analyzed spike proteins were observed representing clade O with no mutations observed in them relative to the human SARS-CoV-2 spike protein reference sequence from Wuhan, China. These samples were present in all the analyzed states except Chhattisgarh. A maximum of nine samples each were present in Gujarat and Delhi, while Haryana had the minimum number of three. Also, the initial lineage D614G co-existed in the population and remained visible throughout. One of the most significant revelations of the present analysis has been the identification of spike protein regions S1B–S1C and S1C–S1D linkers and fusion peptide domains with no mutations. This information is important amidst reports of a significant reduction in vaccine efficacy and antibody neutralization against the spike protein-based vaccines or monoclonal therapeutic antibodies [30,31,36]. In this context, the above-mentioned conserved spike regions can be important potential targets for developing future diagnostics, therapeutics or vaccines.

Conclusion

The analysis provides useful insights into the distribution of accrued mutations across the spike protein, their density in different regions of spike protein, the frequency and state-wise occurrence(s) of particular mutations, identification of mutation-rich, mutation-poor and mutation-nil regions in the spike protein and most importantly, comparison of spike protein mutations during first and second waves of infections in India to understand the evolutionary trajectory of SARS-CoV-2 in India. The spike protein now has an expanded mutational landscape and its evolution seemingly has facilitated the transmission of the virus by modulating ACE2 receptor binding affinity or immune evasion leading to reduced efficacy of vaccines and therapeutic antibodies. The continuous monitoring of emerging variants through mutation analysis and functional characterization of the induced mutations is important to track the further evolutionary course of the virus and to understand the effects of mutations on viral epidemiology. This information will eventually help in devising intervention strategies. In the context of the characterized spike RBD mutations favoring increased transmission, pathogenesis, immune evasion and reduced neutralization, it is probably time to possibly look at the more conserved parts of the spike protein or other parts of the virus for designing the next-generation therapeutics and vaccines. The present study tracks the evolutionary trajectory of SARS-CoV-2 spike protein during the first and second waves of COVID-19 in India. A detailed mutation analysis of spike protein of 763 virus samples taken from Global Initiative on Sharing All Influenza Data belonging to 10 Indian states between March 2020 to August 2021 revealed the presence of 242 mutations corresponding to 207 spike sites. The mutations were observed to be differentially distributed across states and domains of spike protein. The highest number of mutations occurred in S1A domain. Interestingly, no mutations were detected in S1B–S1C and S1C–S1D linkers and fusion peptide domains. The number of detected mutations rose from five per spike protein during the first COVID-19 wave to thirteen during the devastating second wave of COVID-19 infections in India. The five most frequent mutations included D614G, N501Y, P681H, A570D and D1118H. 51 novel mutations emerged in the circulating lineages during the assessment period in 10 Indian states. Many of the incurred mutations have previously been functionally characterized to show immune evasion and resistance to antibody neutralization. The evolution of spike protein in analyzed Indian states seems to be fuelled by introduction of global lineages, emergence of local lineages (Delta and its sub-lineages) and induction of novel co-occurring mutations. The S1A domain that includes the N-terminal domain, receptor binding domain and receptor binding motif is heavily mutated. This is a matter of concern as these three important regions of spike protein have been the primary targets of vaccines and antibody-based therapeutics including monoclonal antibodies (mAbs), polyclonal antibodies and convalescent plasma. Many of the incurred mutations have been previously characterized to show immune escape and resistance to neutralization by antibodies. In this context, importantly, the study has identified mutation rich and no mutation regions in spike protein. These conserved spike regions can be useful for designing future diagnostics, vaccines and therapeutics.
  31 in total

Review 1.  Roles of the polybasic furin cleavage site of spike protein in SARS-CoV-2 replication, pathogenesis, and host immune responses and vaccination.

Authors:  Md Golzar Hossain; Yan-Dong Tang; Sharmin Akter; Chunfu Zheng
Journal:  J Med Virol       Date:  2021-12-31       Impact factor: 2.327

2.  Clinical and Virological Features of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Variants of Concern: A Retrospective Cohort Study Comparing B.1.1.7 (Alpha), B.1.351 (Beta), and B.1.617.2 (Delta).

Authors:  Sean Wei Xiang Ong; Calvin J Chiew; Li Wei Ang; Tze Minn Mak; Lin Cui; Matthias Paul H S Toh; Yi Ding Lim; Pei Hua Lee; Tau Hong Lee; Po Ying Chia; Sebastian Maurer-Stroh; Raymond T P Lin; Yee Sin Leo; Vernon J Lee; David Chien Lye; Barnaby Edward Young
Journal:  Clin Infect Dis       Date:  2022-08-24       Impact factor: 20.999

Review 3.  Neutralising antibody escape of SARS-CoV-2 spike protein: Risk assessment for antibody-based Covid-19 therapeutics and vaccines.

Authors:  Daniele Focosi; Fabrizio Maggi
Journal:  Rev Med Virol       Date:  2021-03-16       Impact factor: 11.043

4.  The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade.

Authors:  B Coutard; C Valle; X de Lamballerie; B Canard; N G Seidah; E Decroly
Journal:  Antiviral Res       Date:  2020-02-10       Impact factor: 5.970

Review 5.  Viral targets for vaccines against COVID-19.

Authors:  Lianpan Dai; George F Gao
Journal:  Nat Rev Immunol       Date:  2020-12-18       Impact factor: 53.106

6.  Identification of SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.

Authors:  Zhuoming Liu; Laura A VanBlargan; Louis-Marie Bloyet; Paul W Rothlauf; Rita E Chen; Spencer Stumpf; Haiyan Zhao; John M Errico; Elitza S Theel; Mariel J Liebeskind; Brynn Alford; William J Buchser; Ali H Ellebedy; Daved H Fremont; Michael S Diamond; Sean P J Whelan
Journal:  Cell Host Microbe       Date:  2021-01-27       Impact factor: 31.316

7.  Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies.

Authors:  Jiahui Chen; Kaifu Gao; Rui Wang; Guo-Wei Wei
Journal:  Chem Sci       Date:  2021-04-13       Impact factor: 9.825

8.  Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV.

Authors:  Xiuyuan Ou; Yan Liu; Xiaobo Lei; Pei Li; Dan Mi; Lili Ren; Li Guo; Ruixuan Guo; Ting Chen; Jiaxin Hu; Zichun Xiang; Zhixia Mu; Xing Chen; Jieyong Chen; Keping Hu; Qi Jin; Jianwei Wang; Zhaohui Qian
Journal:  Nat Commun       Date:  2020-03-27       Impact factor: 14.919

9.  Effectiveness of Covid-19 Vaccines against the B.1.617.2 (Delta) Variant.

Authors:  Jamie Lopez Bernal; Nick Andrews; Charlotte Gower; Eileen Gallagher; Ruth Simmons; Simon Thelwall; Julia Stowe; Elise Tessier; Natalie Groves; Gavin Dabrera; Richard Myers; Colin N J Campbell; Gayatri Amirthalingam; Matt Edmunds; Maria Zambon; Kevin E Brown; Susan Hopkins; Meera Chand; Mary Ramsay
Journal:  N Engl J Med       Date:  2021-07-21       Impact factor: 91.245

10.  The reproductive number of the Delta variant of SARS-CoV-2 is far higher compared to the ancestral SARS-CoV-2 virus.

Authors:  Ying Liu; Joacim Rocklöv
Journal:  J Travel Med       Date:  2021-10-11       Impact factor: 8.490

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.