Literature DB >> 35344550

Phylogenetic analysis of 17271 Indian SARS-CoV-2 genomes to identify temporal and spatial hotspot mutations.

Nimisha Ghosh1,2, Suman Nandi3, Indrajit Saha3.   

Abstract

The second wave of SARS-CoV-2 has hit India hard and though the vaccination drive has started, moderate number of COVID affected patients is still present in the country, thereby leading to the analysis of the evolving virus strains. In this regard, multiple sequence alignment of 17271 Indian SARS-CoV-2 sequences is performed using MAFFT followed by their phylogenetic analysis using Nextstrain. Subsequently, mutation points as SNPs are identified by Nextstrain. Thereafter, from the aligned sequences temporal and spatial analysis are carried out to identify top 10 hotspot mutations in the coding regions based on entropy. Finally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. For both temporal and spatial analysis, there are 21 non-synonymous hotspot mutations which are unstable and damaging.

Entities:  

Mesh:

Year:  2022        PMID: 35344550      PMCID: PMC8959188          DOI: 10.1371/journal.pone.0265579

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

It is now close to two years since the emergence of SARS-CoV-2, the virus behind the deadly COVID-19 disease and the scientific community is still struggling to put an end to this pandemic. Though India was able to contain the spread in the first wave, the second wave put the entire system in turmoil. In September 2021, around 30,000 https://www.covid19india.org/ cases were being registered on a daily basis while in the month of May, this figure surpassed 300,000. Scientists and researchers had attributed this surge due to the evolution of this contagious virus which has resulted in Delta (B.1.617.2) variant. Though the vaccination drive in India is in full swing, doubts regarding the efficacy of the vaccine against such mutations cannot be undermined. Apart from Delta, other variants of concern as declared by W.H.O making their rounds are Alpha (B.1.1.7) [1], Beta (B.1.351) [2] and Gamma (P.1) [3] variants. All these variants, especially Delta resulted in new spurts of lockdown in the country. Thus, to understand its frequent mutations, a study pertaining to the evolution of SARS-CoV-2 virus is inevitable [4, 5]. To understand these evolutionary mutations, 103 SARS-CoV-2 sequences have been analysed by Tang et al. [6] which revealed two major lineages, L and S. These lineages are defined by two tightly linked SNPs at positions at 28144 (ORF8: C251T, S84L) and 8782 (orf1ab:T8517C, synonymous) and might influence virus pathogenesis. Raghav et al. [7] have used RTIC primers–based amplicon sequencing to profile 225 Indian SARS-CoV-2 sequences. Their analysis showed that apart from local transmission, Europe and Southeast Asia are the two major routes for introduction of the disease in India. Their study also revealed that D614G in the Spike protein as a very common mutation that increases virus shedding and infectivity. In [8], Wang et al. have proposed a h-index mutation ratio criteria to evaluate the non-conserved and conserved proteins with the help of over 15K sequences. As a result, Nucleocapsid, Spike and Papain-like protease are found to be highly non-conserved while Envelope, main protease, and Endoribonuclease protein are considered to be conservative. They have further identified mutations on 40% of nucleotides in Nucleocapsid gene, thereby reducing the efforts on the ongoing development of various COVID-19 diagnosis and cure which targets Nucleocapsid gene. Similar analysis conducted by Yuan et al. [9] with 11183 sequences revealed 119 high frequency substitutions as SNPs around the globe. Among the nucleotide changes in SNPs, C to T is the major one which indicates adaptation and evolution of the virus in the human host which can pose new challenges. Also, they have found Nucleocapsid to have the highest mutational changes in frequency. Thus both the works by Wang et al. [8] and Yuan et al. [9] refute the claim by Ascoli [10] that Nucleocapsid can be a possible diagnostic target. Thus, it is important to understand the evolution of SARS-CoV-2 over time. Cheng et al. [11] have identified five major mutation points such as C28144T, C14408T, A23403G, T8782C and C3037T in almost all strains for the month of April 2020. Their functional analysis show that these mutations lead to a decrease in protein stability and eventually a reduction in the virulence of SARS-CoV-2 while A23403G mutation increases the Spike-ACE2 interaction leading to an increase in its infectivity. Phylogenetic analysis done by Maitra et al. [12] shows that mutations such as C14408T in RdRp and A23403G in Spike majorly encompass A2a clade in 9 Indian sequences. Moreover, a triplet based mutation such as 2881–3 GGG/AAC in Nucleocapsid gene which might be responsible for affecting miRNAs bindings to original sequences has also been reported in their work. Guruprasad et al. [13] has analysed 10333 spike protein sequences out of which 8155 proteins comprised of one or more mutations, leading to a total of 9654 mutations that correspond to 400 distinct mutation sites. According to this analysis the top 10 mutations according to the total number of occurrences are D614 (7859), L5 (109), L54 (105), P1263 (61), P681 (51), S477 (47), T859 (30), S221 (28), V483 (28) and A845 (24). Other important works like [14-17] have also revealed different mutations after analysis of several SARS-CoV-2 sequences. Looking at these varied mutations as reported by all the aforementioned works, it can be easily concluded that the evolutionary study of SARS-CoV-2 genomes is very relevant in the current pandemic scenario of the ongoing waves in India. Motivated by the aforementioned studies, in this work we have performed multiple sequence alignment (MSA) of 17271 Indian SARS-CoV-2 genomes using multiple alignment using fast fourier transform (MAFFT) [18] followed by their phylogenetic analysis using Nextstrain [19] to eventually identify hotspot mutations both month-wise (temporal) and state-wise (spatial). Thereafter, from the aligned sequences, temporal and spatial analysis are carried out to identify top 10 hotspot mutations in the coding regions based on entropy, thereby resulting in 130 and 250 hotspot mutations respectively. Finally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. The hotspot mutations which are unstable and damaging and common in both the categories are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. Furthermore, as recognised by virologists, E484K in Spike which is identified in temporal analysis is yet another major mutation which is responsible for improving the ability of the virus to escape the host’s immune system [20].

Material and methods

In this section, the dataset collection for the 17271 Indian SARS-CoV-2 genomes are discussed along with the proposed pipeline.

Data acquisition

To perform the multiple sequence alignment and phylogenetic analysis, 17271 Indian SARS-CoV-2 genomes are collected from Global Initiative on Sharing All Influenza Data (GISAID) https://www.gisaid.org/ and the Reference Genome (NC 045512.2) https://www.ncbi.nlm.nih.gov/nuccore/1798174254 is collected from National Center for Biotechnology Information (NCBI). The SARS-CoV-2 sequences are mostly distributed from January 2020 to September 2021 across the states of India. Moreover, for mapping the protein sequences and the subsequent changes in the amino acid, protein PDB are collected from Zhang Lab https://zhanglab.ccmb.med.umich.edu/COVID-19/. These PDBs are then used to model and identify the structural changes in the protein. All these analyses are performed on High Performance Computing facility of NITTTR, Kolkata while MATLAB R2019b is used for checking the amino acid changes.

Pipeline of the work

The pipeline of the work is provided in Fig 1. Initially, multiple sequence alignment (MSA) of 17271 Indian SARS-CoV-2 genomes is performed using MAFFT which is followed by their phylogenetic analysis using Nextstrain, thereby leading to the identification of mutation points as SNPs. In this work, MAFFT is used as the MSA tool. As MAFFT uses fast fourier transform thus, it scores over other alignment techniques. So, MAFFT is used in this work for MSA. On the other hand, by taking the advantage of Nextstrain, in this work the evolution and geographic distribution of SARS-CoV-2 genomes are visualised by creating the metadata in our High Performance Computing environment.
Fig 1

Pipeline of the work.

Once the alignment and the phylogenetic analyses are completed and the mutation points as SNPs are identified, temporal (month-wise) and spatial (state-wise) analysis are performed for the aligned sequences to identify top 10 hotspot mutations both month-wise and state-wise. Furthermore, amino acid changes in the SARS-CoV-2 proteins are also identified considering the codon table. The top 10 hotspot mutations are identified for each month and each state based on their entropy values for the coding regions and are computed as follows: where represents the frequency of each residue α occurring at position β and 5 represents the four possible residues as nucleotides plus gap. Subsequently, the amino acid changes for the temporal and spatial non-synonymous hotspot mutations are visualised graphically. Finally, the amino acid changes of the non-synonymous hotspot mutations are considered to evaluate their functional characteristics and they are visualised in the respective protein structure as well.

Results

All the experiments in this work are carried out according to Fig 1. In this regard, MSA of 17271 Indian SARS-CoV-2 genomes is initially carried out using MAFFT. Thereafter, their phylogenetic analysis using Nextstrain reveals 5 virus clades viz. 19A, 19B, 20A, 20B and 20C and also the corresponding mutation points as SNPs. Subsequently, temporal (month-wise) and spatial (state-wise) analysis are performed for the aligned sequences to identify the top 10 hotspot mutations in each category, resulting in 190 and 250 mutation points respectively. The phylogenetic trees in radial and rectangular views considering temporal analysis are shown in Fig 2(a) and 2(b) while Fig 2(c) and 2(d) show the views considering spatial analysis. The normal and zoomed views of the geographical distribution of the sequences clade-wise are shown in Fig 2(e) and 2(f) respectively. In unsupervised learning feature selection is a non-trivial task; entropy of the aligned sequences is considered to be the selected feature in this work. For example, temporal analysis of January-March-2020 with 191 sequences shows that G11083T in NSP6 has the highest entropy value of 0.82391 while for spatial analysis of Maharastra with 3674 sequences, the highest entropy value of 1.02173 is borne by G28881A and G28881T in Nucleocapsid. Such results are reported in Tables 1 and 2 for the top 10 hotspot mutations for temporal and spatial analysis along with the associated details while S1 and S2 Tables in S1 File report the list of all temporal and spatial hotspot mutations. Table 2 reports the spatial analysis for the states of India. The entropy values corresponding to the nucleotide changes are shown in Fig 2(g) while the temporal and spatial changes in entropy are reported in S3 and S4 Tables in S1 File respectively. The evolution of the virus genome in terms of entropy for both temporal and spatial analysis is another crucial result reported in this work. For example, from a temporal perspective E484Q/K which is a much circulating variant in India has evolved over time but is on the wane now while for spatial analysis it can be seen that E484Q is one of the most prevalent variant in West Bengal. These evolution are visualised in Figs 3 and 4 respectively. It is to be noted that due to the lack of appropriate number of sequences, temporal data of January to March 2020 have been merged for the analysis. Please also note that non-coding regions of SARS-CoV-2 do not produce any protein to bind with human proteins. Thus, they are not considered for hotpot mutations. Moreover, since entropy calculation is performed on aligned sequences, only coding regions are considered for identification of hotspot mutations as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations.
Fig 2

Phylogenetic analysis of 17271 Indian SARS-CoV-2 Genomes where (a) and (b) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for temporal analysis, (c) and (d) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for spatial analysis, (e) and (f) are the geographical distribution in normal and zoomed views and (g) shows the value of entropy for the change in nucleotide.

Table 1

List of top 10 hotspot mutations based on temporal analysis.

MonthNumber of SequencesGenomic CoordinateEntropyNucleotide ChangeAmino Acid ChangeProtein CoordinateCoding Region
January-March-2020191110830.82391G>TL>F37NSP6
283110.64212C>TP>L13Nucleocapsid
30370.63531C>TF>F106NSP3
144080.63531C>TP>L323RdRp
234030.63531A>GD>G614Spike
239290.63088C>TY>Y789Spike
63120.59276C>AT>K1198NSP3
137300.58269C>TA>V97RdRp
286880.57987T>CL>L139Nucleocapsid
13970.53043G>AV>I198NSP2
April-2020441110830.79874G>TL>F37NSP6
283110.71328C>TP>L13Nucleocapsid
30370.70595C>TF>F106NSP3
234030.69774A>GD>G614Spike
144080.6971C>TP>L323RdRp
63120.6678C>AT>K1198NSP3
137300.66587C>TA>V97RdRp
239290.65279C>TY>Y789Spike
288810.53127G>AR>K203Nucleocapsid
288820.53127G>AR>R203Nucleocapsid
May-2020977288810.66198G>AR>K203Nucleocapsid
288820.66198G>AR>R203Nucleocapsid
288830.66198G>CG>R204Nucleocapsid
255630.64183G>TQ>H57ORF3a
267350.56685C>TY>Y71Membrane
188770.5533C>TL>L280Exon
3130.54277C>TL>L16NSP1
144080.54115C>TP>L323RdRp
57000.50567C>AA>D994NSP3
137300.48254C>TA>V97RdRp
June-20201062288810.72623G>AR>K203Nucleocapsid
288830.71049G>CG>R204Nucleocapsid
288820.69816G>AR>R203Nucleocapsid
224440.67332C>TD>D294Spike
255630.67187G>TQ>H57ORF3a
188770.66299C>TL>L280Exon
267350.6606C>TY>Y71Membrane
288540.6393C>TS>L194Nucleocapsid
3130.54631C>TL>L16NSP1
57000.53036C>AA>D994NSP3
July-2020683288810.86601G>AR>K203Nucleocapsid
288820.85618G>AR>R203Nucleocapsid
288830.85615G>CG>R204Nucleocapsid
255630.69252G>TQ>H57ORF3a
3130.66456C>TL>L16NSP1
188770.66359C>TL>L280Exon
57000.65981C>AA>D994NSP3
267350.65467C>TY>Y71Membrane
288540.61568C>TS>L194Nucleocapsid
224440.60236C>TD>D294Spike
August-2020632288810.79095G>AR>K203Nucleocapsid
288830.78919G>CG>R204Nucleocapsid
288820.78061G>AR>R203Nucleocapsid
224440.62652C>TD>D294Spike
255630.62045G>TQ>H57ORF3a
288540.61586C>TS>L194Nucleocapsid
267350.61193C>TY>Y71Membrane
3130.6079C>TL>L16NSP1
188770.6079C>TL>L280Exon
57000.60235C>AA>D994NSP3
September-2020629288810.7396G>AR>K203Nucleocapsid
288820.68911G>AR>R203Nucleocapsid
288830.67924G>CG>R204Nucleocapsid
255630.60326G>TQ>H57ORF3a
3130.59785C>TL>L16NSP1
57000.59193C>AA>D994NSP3
224440.57955C>TD>D294Spike
288540.56792C>TS>L194Nucleocapsid
188770.56622C>TL>L280Exon
267350.56103C>TY>Y71Membrane
October-2020380288810.78752G>AR>K203Nucleocapsid
288820.70769G>AR>R203Nucleocapsid
288830.70769G>CG>R204Nucleocapsid
224440.64744C>TD>D294Spike
188770.6463C>TL>L280Exon
267350.6463C>TY>Y71Membrane
255630.64465G>TQ>H57ORF3a
288540.64124C>TS>L194Nucleocapsid
89170.57761C>TF>F121NSP4
93890.55503G>AD>N279NSP4
November-2020452224440.75515C>TD>D294Spike
288810.74527G>AR>K203Nucleocapsid
288540.69762C>TS>L194Nucleocapsid
188770.68886C>TL>L280Exon
267350.68657C>TY>Y71Membrane
255630.68439G>TQ>H57ORF3a
19470.66982T>CV>A381NSP2
288820.66551G>AR>R203Nucleocapsid
288830.66551G>CG>R204Nucleocapsid
32670.48539C>TT>I183NSP3
December-2020983288810.71656G>AR>K203Nucleocapsid
224440.71598C>TD>D294Spike
19470.71371T>CV>A381NSP2
255630.68512G>TQ>H57ORF3a
188770.67905C>TL>L280Exon
267350.67871C>TY>Y71Membrane
288540.67728C>TS>L194Nucleocapsid
288830.67009G>CG>R204Nucleocapsid
288820.65134G>AR>R203Nucleocapsid
260600.56206C>TT>I223ORF3a
January-2021500288810.82738G>AR>K203Nucleocapsid
288820.71685G>AR>R203Nucleocapsid
188770.70613C>TL>L280Exon
255630.70613G>TQ>H57ORF3a
288830.70225G>CG>R204Nucleocapsid
224440.69315C>TD>D294Spike
267350.69315C>TY>Y71Membrane
288540.69286C>TS>L194Nucleocapsid
32670.63605C>TT>I183NSP3
210340.61845C>TL>L126NSP16
February-2021980288811.13342G>A, G>TR>K, R>M203Nucleocapsid
236041.02071C>A, C>GP>H, P>R681Spike
230120.82687G>C, G>AE>Q, E>K484Spike
247750.69608A>T, A>-Q>H, Q>-1071Spike
288820.68897G>AR>R203Nucleocapsid
288830.67724G>CG>R204Nucleocapsid
282800.66855G>T, G>CD>Y, D>H3Nucleocapsid
254690.65125C>TS>L26ORF3a
224440.6458C>TD>D294Spike
294020.64017G>TD>Y377Nucleocapsid
March-20211907288811.03262G>A, G>TR>K, R>M203Nucleocapsid
236041.01066C>A, C>GP>H, P>R681Spike
282800.91893G>T, G>CD>Y, D>H3Nucleocapsid
230120.88114G>C, G>AE>Q, E>K484Spike
267670.84724T>C, T>GI>T, I>S82Membrane
112960.82674T>G, T>-F>L, F>-108NSP6
219870.80846G>A, G>-G>D, G>-142Spike
247750.80534A>T, A>-Q>H, Q>-1071Spike
254690.77293C>TS>L26ORF3a
220220.76572G>AE>K154Spike
April-20213054282531.13895C>A, C>T, C>-F>L, F>F, F>-120ORF8
220340.89681A>G, A>-R>G, R>-158Spike
267670.89284T>C, T>GI>T, I>S82Membrane
219870.87431G>A, G>-G>D, G>-142Spike
282490.84388A>T, A>-D>V, D>-119ORF8
244100.8167G>AD>N950Spike
220330.76607C>-F>-157Spike
220320.756T>-F>-157Spike
282480.71357G>-D>-119ORF8
114180.70573T>CV>A149NSP6
May-20212408282531.08851C>A, C>T, C>-F>L, F>F, F>-120ORF8
220340.81429A>G, A>-R>G, R>-158Spike
282490.81342A>T, A>-D>V, D>-119ORF8
219870.76579G>AG>D142Spike
114180.70413T>CV>A149NSP6
98910.69625C>TA>V446NSP4
220300.68573G>-E>-156Spike
282510.6755T>-F>-120ORF8
51840.66981C>TP>L822NSP3
112010.66818A>GT>A77NSP6
June-20211293219871.0067G>A, G>-G>D, G>-142Spike
282530.98317C>A, C>-F>L, F>-120ORF8
282490.81706A>T, A>-D>V, D>-119ORF8
220340.81496A>G, A>-R>G, R>-158Spike
114180.70538T>CV>A149NSP6
278740.70016C>TT>I40ORF7b
98910.69617C>TA>V446NSP4
289160.69472G>TG>C215Nucleocapsid
112010.69311A>GT>A77NSP6
90530.69268G>TV>L167NSP4
July-2021632219870.93091G>A, G>-G>D, G>-142Spike
282530.87833C>A, C>-F>L, F>-120ORF8
282490.7564A>T, A>-D>V, D>-119ORF8
282510.71349T>-F>-120ORF8
282500.711T>-D>-119ORF8
282520.70261T>-F>-120ORF8
41810.68595G>TA>S488NSP3
51840.68595C>TP>L822NSP3
64020.68595C>TP>L1228NSP3
71240.68595C>TP>S1469NSP3
August-202115282530.70869C>AF>L120ORF8
41810.69142G>TA>S488NSP3
64020.69142C>TP>L1228NSP3
71240.69142C>TP>S1469NSP3
89860.69142C>TD>D144NSP4
90530.69142G>TV>L167NSP4
100290.69142C>TT>I492NSP4
112010.69142A>GT>A77NSP6
113320.69142A>GV>V120NSP6
192200.69142C>TA>V394Exon
September-202152218460.69315C>TT>I95Spike
244100.68696G>AD>N950Spike
51840.60769C>TP>L822NSP3
278740.59084C>TT>I40ORF7b
41810.57228G>TA>S488NSP3
64020.57228C>TP>L1228NSP3
71240.57228C>TP>S1469NSP3
89860.57228C>TD>D144NSP4
90530.57228G>TV>L167NSP4
100290.57228C>TT>I492NSP4
Table 2

List of top 10 hotspot mutations based on spatial analysis.

StateNumber of SequencesGenomic CoordinateEntropyNucleotide ChangeAmino Acid ChangeProtein CoordinateCoding Region
Maharashtra3674288811.02173G>A, G>TR>K, R>M203Nucleocapsid
267670.92484T>C, T>GI>T, I>S82Membrane
236040.81242C>GP>R681Spike
282530.806C>-F>-120ORF8
219870.79485G>A, G>-G>D, G>-142Spike
254690.7663C>TS>L26ORF3a
276380.70457T>CV>A82ORF7a
294020.70178G>TD>Y377Nucleocapsid
229170.69779T>GL>R452Spike
230120.67477G>CE>Q484Spike
Telangana2506282531.0594C>T, C>-F>F, F>-120ORF8
288811.05196G>A, G>TR>K, R>M203Nucleocapsid
220340.92872A>G, A>-R>G, R>-158Spike
236040.83122C>GP>R681Spike
267670.74928T>CI>T82Membrane
244100.72581G>AD>N950Spike
294020.71226G>TD>Y377Nucleocapsid
220330.70621C>-F>-157Spike
276380.70183T>CV>A82ORF7a
229170.70114T>GL>R452Spike
Gujarat2333288810.98391G>A, G>TR>K, R>M203Nucleocapsid
282530.98023C>A, C>-F>L, F>-120ORF8
236040.89132C>A, C>GP>H, P>R681Spike
267670.79834T>CI>T82Membrane
282490.78731A>-D>-119ORF8
220340.76092A>-R>-158Spike
220330.74274C>-F>-157Spike
220320.74262T>-F>-157Spike
254690.71957C>TS>L26ORF3a
220290.71048A>-E>-156Spike
West Bengal1637288811.03445G>A, G>TR>K, R>M203Nucleocapsid
267670.99595T>G, T>CI>T, I>S82Membrane
236040.9359C>A, C>GP>H, P>R681Spike
282530.88971C>A, C>-F>L, F>-120ORF8
219870.81006G>A, G>-G>D, G>-142Spike
220340.80702A>G, A>-R>G, R>-158Spike
282490.77084A>-D>-119ORF8
229170.70438T>GL>R452Spike
294020.7006G>TD>Y377Nucleocapsid
276380.69709T>CV>A82ORF7a
Delhi1240288811.08218G>A, G>TR>K, R>M203Nucleocapsid
236040.94518C>A, C>GP>H, P>R681Spike
224440.76965C>TD>D294Spike
255630.76199G>TQ>H57ORF3a
267350.72004C>TY>Y71Membrane
188770.71311C>TL>L280Exon
288540.70723C>TS>L194Nucleocapsid
19470.68719T>CV>A381NSP2
267670.65229T>CI>T82Membrane
288830.63286G>CG>R204Nucleocapsid
Andhra Pradesh1077282531.21902C>A, C>T, C>-F>L, F>F, F>-120ORF8
220341.04209A>G, A>-R>G, R>-158Spike
288810.85363G>A, G>TR>K, R>M203Nucleocapsid
220330.78715C>-F>-157Spike
267670.73239T>CI>T82Membrane
236040.73117C>GP>R681Spike
282490.71674A>-D>-119ORF8
220300.70822G>-E>-156Spike
220290.70261A>-E>-156Spike
220310.69313T>-F>-157Spike
Karnataka520288811.23964G>A, G>TR>K, R>M203Nucleocapsid
282530.98145C>AF>L120ORF8
236040.8514C>GP>R681Spike
288820.81953G>AR>R203Nucleocapsid
288830.80388G>CG>R204Nucleocapsid
267670.70691T>CI>T82Membrane
282490.67368A>T, A>-D>V, D>-119ORF8
294020.6736G>TD>Y377Nucleocapsid
229170.64897T>GL>R452Spike
254690.64897C>TS>L26ORF3a
Rajasthan434288810.99106G>A, G>TR>K, R>M203Nucleocapsid
288820.69671G>AR>R203Nucleocapsid
288830.68481G>CG>R204Nucleocapsid
224440.6518C>TD>D294Spike
255630.63888G>TQ>H57ORF3a
288540.61881C>TS>L194Nucleocapsid
267350.61318C>TY>Y71Membrane
188770.61125C>TL>L280Exon
19470.59878T>C, T>-V>A, V>-381NSP2
236040.53191C>GP>R681Spike
TamilNadu423282531.16453C>A, C>TF>L, F>F120ORF8
288811.09273G>A, G>TR>K, R>M203Nucleocapsid
236040.88416C>A, C>GP>H, P>R681Spike
284610.875A>GD>G63Nucleocapsid
244100.85053G>AD>N950Spike
267670.75549T>CI>T82Membrane
216180.69881C>GT>R19Spike
154510.68935G>AG>S671RdRp
164660.68935C>TP>L77Helicase
294020.67288G>TD>Y377Nucleocapsid
Punjab418112961.06149T>G, T>-F>L, F>-108NSP6
280950.89567A>T, A>-K>*, K>-68ORF8
288810.77179G>A, G>TR>K, R>M203Nucleocapsid
282800.76015G>CD>H3Nucleocapsid
236040.75325C>A, C>GP>H, P>R681Spike
282810.74341A>TD>V3Nucleocapsid
112910.69623G>-G>-107NSP6
112950.69059T>-F>-108NSP6
217650.68075T>-I>-68Spike
112920.66789G>-G>-107NSP6
Chhattisgarh364288811.07226G>A, G>TR>K, R>M203Nucleocapsid
236040.94912C>A, C>GP>H, P>R681Spike
267670.91621T>C, T>GI>T, I>S82Membrane
244100.90113G>A, G>-D>N, D>-950Spike
284610.71677A>GD>G63Nucleocapsid
282530.70958C>-F>-120ORF8
154510.706G>AG>S671RdRp
276380.70498T>CV>A82ORF7a
216180.70489C>GT>R19Spike
294020.70441G>TD>Y377Nucleocapsid
Manipur270282531.02447C>A, C>-F>L, F>-120ORF8
219870.87608G>AG>D142Spike
218460.71297C>TT>I95Spike
289160.70747G>TG>C215Nucleocapsid
112010.69044A>GT>A77NSP6
282500.69044T>-D>-119ORF8
282510.69044T>-F>-120ORF8
282520.69044T>-F>-120ORF8
51840.68705C>TP>L822NSP3
64020.68705C>TP>L1228NSP3
Odisha238288811.15561G>A, G>TR>K, R>M203Nucleocapsid
288820.78669G>AR>R203Nucleocapsid
288830.78669G>CG>R204Nucleocapsid
236040.73028C>GP>R681Spike
294020.58678G>TD>Y377Nucleocapsid
89170.57992C>TF>F121NSP4
267670.56936T>CI>T82Membrane
229170.56881T>GL>R452Spike
244100.56082G>AD>N950Spike
93890.55771G>AD>N279NSP4
Uttar Pradesh229267671.15838T>C, T>-I>T, I>-82Membrane
216181.07939C>G, C>-T>R, T>-19Spike
277520.98545C>T, C>-T>I, T>-120ORF7a
276380.95253T>C, T>-V>A, V>-82ORF7a
219870.87393G>AG>D142Spike
218720.7677T>-W>-104Spike
278740.76694C>TT>I40ORF7b
114180.75432T>CV>A149NSP6
90530.74627G>TV>L167NSP4
289160.74627G>TG>C215Nucleocapsid
Haryana193288810.99908G>A, G>TR>K, R>M203Nucleocapsid
236040.82165C>A, C>GP>H, P>R681Spike
255630.71135G>TQ>H57ORF3a
224440.70452C>TD>D294Spike
188770.67876C>TL>L280Exon
267350.67876C>TY>Y71Membrane
288540.67695C>TS>L194Nucleocapsid
19470.63651T>CV>A381NSP2
288820.62134G>AR>R203Nucleocapsid
288830.62134G>CG>R204Nucleocapsid
Himachal Pradesh18419471.00628T>C, T>-V>A, V>-381NSP2
288810.8515G>AR>K203Nucleocapsid
224440.74302C>TD>D294Spike
288540.7196C>TS>L194Nucleocapsid
288820.69576G>AR>R203Nucleocapsid
288830.69576G>CG>R204Nucleocapsid
188770.68944C>TL>L280Exon
255630.68944G>TQ>H57ORF3a
267350.68735C>TY>Y71Membrane
260600.62056C>TT>I223ORF3a
Sikkim165282531.05282C>A, C>-F>L, F>-120ORF8
282490.85603A>-D>-119ORF8
288810.82105G>TR>M203Nucleocapsid
219870.79807G>AG>D142Spike
236040.7316C>GP>R681Spike
282510.72301T>-F>-120ORF8
282520.72301T>-F>-120ORF8
267670.70343T>CI>T82Membrane
220340.69379A>-R>-158Spike
98910.6927C>TA>V446NSP4
Jammu and Kashmir164288811.05025G>A, G>TR>K, R>M203Nucleocapsid
236041.02063C>A, C>GP>H, P>R681Spike
224440.81197C>TD>D294Spike
282800.79577G>CD>H3Nucleocapsid
112960.76392T>-F>-108NSP6
217650.67275T>-I>-68Spike
188770.66944C>TL>L280Exon
255630.66944G>TQ>H57ORF3a
267350.66383C>TY>Y71Membrane
288540.66079C>TS>L194Nucleocapsid
Puducherry138282530.97927C>A, C>TF>L, F>F120ORF8
236040.76675C>GP>R681Spike
288810.74111G>A, G>TR>K, R>M203Nucleocapsid
219870.69501G>AG>D142Spike
154510.6866G>AG>S671RdRp
164660.6866C>TP>L77Helicase
51840.68486C>TP>L822NSP3
282490.68291A>TD>-119ORF8
267670.6806T>CI>T82Membrane
11910.62794C>TP>L129NSP2
Meghalaya135282530.99245C>A, C>-F>L, F>-120ORF8
219870.8842G>AG>D142Spike
282490.84253A>T, A>-D>V, D>-119ORF8
220340.78249A>-R>-158Spike
98910.68543C>TA>V446NSP4
114180.68543T>CV>A149NSP6
51840.6736C>TP>L822NSP3
267670.66499T>CI>T82Membrane
282500.66015T>-D>-119ORF8
282510.66015T>-F>-120ORF8
Uttarakhand126288811.03137G>A, G>TR>K, R>M203Nucleocapsid
19470.77067T>CV>A381NSP2
236040.76724C>GP>R681Spike
224440.73219C>TD>D294Spike
255630.66976G>TQ>H57ORF3a
188770.62109C>TL>L280Exon
267350.62109C>TY>Y71Membrane
288820.62109G>AR>R203Nucleocapsid
288830.62109G>CG>R204Nucleocapsid
288540.61478C>TS>L194Nucleocapsid
Kerala106288810.80484G>AR>K203Nucleocapsid
30370.69298C>TF>F106NSP3
144080.69298C>TP>L323RdRp
234030.69298A>GD>G614Spike
110830.6759G>TL>F37NSP6
13970.6299G>AV>I198NSP2
86530.6299G>TM>I33NSP4
286880.6229T>CL>L139Nucleocapsid
8840.6155C>TR>C27NSP2
288830.59118G>CG>R204Nucleocapsid
Madya Pradesh109288810.98373G>A, G>TR>K, R>M203Nucleocapsid
236040.8576C>A, C>GP>H, P>R681Spike
282800.54646G>CD>H3Nucleocapsid
288820.52208G>AR>R203Nucleocapsid
288830.52208G>CG>R204Nucleocapsid
218950.51534T>CD>D111Spike
229170.51023T>GL>R452Spike
254690.51023C>TS>L26ORF3a
276380.51023T>CV>A82ORF7a
294020.51023G>TD>Y377Nucleocapsid
Chandigarh102224440.76942C>TD>D294Spike
112960.75797T>GF>L108NSP6
288810.7328G>AR>K203Nucleocapsid
288820.68648G>AR>R203Nucleocapsid
288830.68648G>CG>R204Nucleocapsid
112910.68145G>-G>-107NSP6
267350.65645C>TY>Y71Membrane
188770.65095C>TL>L280Exon
255630.65095G>TQ>H57ORF3a
288540.63871C>TS>L194Nucleocapsid
Assam101282531.0999C>A, C>-F>L, F>-120ORF8
282491.05588A>T, A>-D>V, D>-119ORF8
288810.96252G>A, G>TR>K, R>M203Nucleocapsid
219870.9478G>AG>D142Spike
267670.91189T>CI>T82Membrane
220340.78077A>-R>-158Spike
244100.77309G>AD>N950Spike
236040.76149C>GP>R681Spike
154510.73936G>AG>S671RdRp
216180.73936C>GT>R19Spike
Fig 3

Month wise (temporal) entropy of Indian SARS-CoV-2 genomes to show the changes in non-synonymous hotspot mutations.

Fig 4

State wise (spatial) entropy of Indian SARS-CoV-2 genomes to show the changes in non-synonymous hotspot mutations.

Phylogenetic analysis of 17271 Indian SARS-CoV-2 Genomes where (a) and (b) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for temporal analysis, (c) and (d) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for spatial analysis, (e) and (f) are the geographical distribution in normal and zoomed views and (g) shows the value of entropy for the change in nucleotide. Once the top 10 temporal and spatial hotspot mutations are identified, thereafter, 62 and 65 unique hotspot mutations are identified respectively for each category from 190 and 250 mutation points. For temporal analysis, 62 unique mutations result in 50 non-synonymous deletions and substitutions with corresponding 8 and 48 amino acid changes while for spatial analysis 57 non-synonymous deletions and substitutions are identified from 65 unique mutations with corresponding 16 and 47 amino acid changes. These non-synonymous mutations along with their amino acid changes in protein are visualised in Fig 5. Fig 6(a) depicts the common and unique nucleotide changes for all hotspot mutations for temporal and spatial analysis in the form of Venn diagram while Fig 6(b) shows the common and unique nucleotide changes for non-synonymous hotspot mutations and the common and unique amino acid changes in protein for such analysis are visualised in Fig 6(c). Fig 6(a) shows that there are 18 and 21 unique hotspot mutations considering temporal and spatial analysis while the number of such common mutations are 44. Fig 6(b) depicts 12 and 19 unique non-synonymous hotspot mutations while 38 changes are common in both. Finally, Fig 6(c) shows that there are unique 14 and 21 amino acid changes for temporal and spatial analysis with 42 changes common in both. All the amino acid changes in the protein for the non-synonymous hotspot mutations for temporal analysis are highlighted in Fig 7 while such mutations for the spatial analysis are shown in Fig 8. Please note that though 48 and 47 substitutions corresponding to temporal and spatial analysis are reported in Figs 5 and 6, only 47 and 46 such changes are highlighted in Figs 7 and 8 respectively. This is because the structure for ORF7b is not found in the literature and thus the corresponding hotspot mutation in the structure of ORF7b cannot be highlighted in either of the cases.
Fig 5

Illustration of amino acid changes in SARS-CoV-2 proteins for the temporal and spatial non-synonymous hotspot mutations.

Fig 6

Venn diagrams of Indian SARS-CoV-2 Genomes to represent common (a) Nucleotide (b) Non-synonymous mutations and (c) Amino acid changes for the hotspot mutations.

Fig 7

Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on temporal analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Exon (g) Spike (h) ORF3a (i) Membrane (j) ORF8 (k) Nucleocapsid.

Fig 8

Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on spatial analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Helicase (g) Spike (h) ORF3a (i) Membrane (j) ORF7a (k) ORF8 (l) Nucleocapsid.

Venn diagrams of Indian SARS-CoV-2 Genomes to represent common (a) Nucleotide (b) Non-synonymous mutations and (c) Amino acid changes for the hotspot mutations. Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on temporal analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Exon (g) Spike (h) ORF3a (i) Membrane (j) ORF8 (k) Nucleocapsid. Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on spatial analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Helicase (g) Spike (h) ORF3a (i) Membrane (j) ORF7a (k) ORF8 (l) Nucleocapsid.

Discussion

India has gone through the second wave of the SARS-CoV-2 pandemic and according to experts a third wave is inevitable as the virus is evolving and new strains are being identified. Thus, the study of the evolving virus strains is very crucial in the current pandemic scenario, In this regard, we have performed temporal and spatial analysis of 17271 SARS-CoV-2 sequences which has resulted in the identification of hotspot mutation points as SNPs in each category. Changes in protein translations which can lead to functional instability in proteins are often attributed to structural alterations in amino acid residues. In this regard, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 (Polymorphism Phenotyping) [21] while I-Mutant 2.0 [22] evaluates their structural stability. Such results for temporal and spatial analysis are reported in Tables 3 and 4 respectively. The tools used for such prediction are PolyPhen-2 and I-Mutant 2.0. The prediction of Polyphen-2 http://genetics.bwh.harvard.edu/pph2/ works with sequence, structural and phylogenetic information of a SNP while I-Mutant 2.0 https://folding.biofold.org/i-mutant/i-mutant2.0.html uses support vector machine (SVM) for the automatic prediction of protein stability changes upon single point mutations. PolyPhen-2 is used to find the damaging non-synonymous hotspot mutations while protein stabilities are determined by I-Mutant 2.0. The score generated by Polyphen-2 lies between the range of 0 to 1. A score close to 1 denotes that the mutations can be more confidently considered to be damaging. Considering the prediction of Polyphen-2, it can be seen from Table 3 that out of the 56 unique amino acid changes, 27 changes are damaging for temporal analysis while for spatial analysis as can be seen from Table 4, out of 63 unique amino acid changes, 24 changes are damaging. It is important to note that in case of protein, damaging mostly defines instability. Generally, this is used for human proteins. As a consequence, if the human protein is damaging in nature because of mutations, then the human protein-protein interactions may occur with high or low binding affinity. Now in case of virus, similar consequences may happen which means if the virus protein is damaged because of mutations, it may interact with human proteins with similar binding affinity. As a result, the virus may acquire characteristics like transmissibility, escaping antibodies [23, 24] etc.
Table 3

Characteristics of non-synonymous hotspot mutations for temporal analysis.

Change in NucleotideChange in Amino AcidMapped with Coding RegionsPolyPhen-2I-Mutant 2.0
PredictionScoreStabilityDDG
G1397AV198INSP2Benign0.006Increase0.18
T1947CV381ANSP2Benign0.009Decrease-1.64
C3267TT183INSP3NGNGDecrease-0.1
G4181TA488SNSP3Benign0.017Decrease-0.89
C5184TP822LNSP3Benign0.011Decrease-0.54
C5700AA994DNSP3Possibly Damaging0.935Decrease-0.78
C6312AT1198KNSP3Probably Damaging0.998Decrease-1.37
C6402TP1228LNSP3Benign0.001Decrease-0.46
C7124TP1469SNSP3Probably Damaging0.967Decrease-2.17
G9053TV167LNSP4Benign0.406Decrease-2.14
G9389AD279NNSP4Probably Damaging0.999Decrease-1.26
C9891TA446VNSP4Probably Damaging0.999Increase0.64
C10029TT492INSP4Probably Damaging0.973Decrease-0.08
G11083TL37FNSP6Benign0.027Decrease-0.05
A11201GT77ANSP6Possibly Damaging0.577Decrease-0.7
T11296GF108LNSP6Benign0.001Decrease-3.31
T11418CV149ANSP6Possibly Damaging0.865Decrease-3.43
C13730TA97VRdRpProbably Damaging0.99Decrease-0.53
C14408TP323LRdRpBenign0.018Decrease-0.8
C19220TA394VExonBenign0.005Decrease-0.17
C21846TT95ISpikeProbably Damaging0.999Decrease-1.8
G21987AG142DSpikeBenign0.061Decrease-1.17
G22022AE154KSpikeNGNGDecrease-1.4
A22034GR158GSpikeNGNGDecrease-2.63
G23012CE484QSpikePossibly Damaging0.881Decrease-0.48
G23012AE484KSpikePossibly Damaging0.601Decrease-0.85
A23403GD614GSpikeBenign0.004Decrease-1.94
C23604AP681HSpikeNGNGDecrease-0.92
C23604GP681RSpikeNGNGDecrease-0.79
G24410AD950NSpikeBenign0.34Increase0.15
A24775TQ1071HSpikeProbably Damaging0.997Decrease-1.19
C25469TS26LORF3aBenign0.017Increase0.92
G25563TQ57HORF3aProbably Damaging0.983Decrease-1.12
C26060TT223IORF3aProbably Damaging0.998Decrease-0.07
T26767GI82SMembranePossibly Damaging0.951Decrease-2
T26767CI82TMembranePossibly Damaging0.889Decrease-2.41
C27874TT40IORF7bNGNGDecrease-0.22
A28249TD119VORF8Possibly Damaging0.541Decrease-0.63
C28253AF120LORF8Probably Damaging0.988Decrease-2.95
G28280TD3YNucleocapsidProbably Damaging1Increase0.22
G28280CD3HNucleocapsidProbably Damaging1Increase0.34
C28311TP13LNucleocapsidProbably Damaging1Increase0.11
C28854TS194LNucleocapsidProbably Damaging0.994Increase0.45
G28881AR203KNucleocapsidProbably Damaging0.969Decrease-2.26
G28881TR203MNucleocapsidProbably Damaging0.998Decrease-1.52
G28883CG204RNucleocapsidProbably Damaging1No Change0
G28916TG215CNucleocapsidProbably Damaging1Decrease-0.49
G29402TD377YNucleocapsidProbably Damaging1Increase0.51
Table 4

Characteristics of non-synonymous hotspot mutations for spatial analysis.

Change in NucleotideChange in Amino AcidMapped with Coding RegionsPolyPhen-2I-Mutant 2.0
PredictionScoreStabilityDDG
C884TR27CNSP2Probably Damaging1Decrease-0.35
C1191TP129LNSP2Possibly Damaging0.924Decrease-0.53
G1397AV198INSP2Benign0.006Increase0.18
T1947CV381ANSP2Benign0.009Decrease-1.64
C5184TP822LNSP3Benign0.011Decrease-0.54
C6402TP1228LNSP3Benign0.001Decrease-0.46
G8653TM33INSP4Benign0.002Decrease-0.73
G9053TV167LNSP4Benign0.406Decrease-2.14
G9389AD279NNSP4Probably Damaging0.999Decrease-1.26
C9891TA446VNSP4Probably Damaging0.999Increase0.64
G11083TL37FNSP6Benign0.027Decrease-0.05
A11201GT77ANSP6Possibly Damaging0.577Decrease-0.7
T11296GF108LNSP6Benign0.001Decrease-3.31
T11418CV149ANSP6Possibly Damaging0.865Decrease-3.43
C14408TP323LRdRpBenign0.018Decrease-0.8
G15451AG671SRdRpProbably Damaging1Decrease-0.29
C16466TP77LHelicaseProbably Damaging1Decrease-1.03
C21618GT19RSpikeBenign0.007Decrease-0.12
C21846TT95ISpikeProbably Damaging0.999Decrease-1.8
G21987AG142DSpikeBenign0.061Decrease-1.17
A22034GR158GSpikeNGNGDecrease-2.63
T22917GL452RSpikeBenign0.017Decrease-1.4
G23012CE484QSpikePossibly Damaging0.881Decrease-0.48
A23403GD614GSpikeBenign0.004Decrease-1.94
C23604AP681HSpikeNGNGDecrease-0.92
C23604GP681RSpikeNGNGDecrease-0.79
G24410AD950NSpikeBenign0.34Increase0.15
C25469TS26LORF3aBenign0.017Increase0.92
G25563TQ57HORF3aProbably Damaging0.983Decrease-1.12
C26060TT223IORF3aProbably Damaging0.998Decrease-0.07
T26767GI82SMembranePossibly Damaging0.951Decrease-2
T26767CI82TMembranePossibly Damaging0.889Decrease-2.41
T27638CV82AORF7aPossibly Damaging0.732Decrease-2.18
C27752TT120IORF7aPossibly Damaging0.915Decrease-0.26
C27874TT40IORF7bNGNGDecrease-0.22
A28249TD119VORF8Possibly Damaging0.541Decrease-0.63
C28253AF120LORF8Probably Damaging0.988Decrease-2.95
G28280CD3HNucleocapsidProbably Damaging1Increase0.34
A28281TD3VNucleocapsidProbably Damaging1Decrease-0.22
A28461GD63GNucleocapsidBenign0Decrease-0.57
C28854TS194LNucleocapsidProbably Damaging0.994Increase0.45
G28881AR203KNucleocapsidProbably Damaging0.969Decrease-2.26
G28881TR203MNucleocapsidProbably Damaging0.998Decrease-1.52
G28883CG204RNucleocapsidProbably Damaging1No Change0
G28916TG215CNucleocapsidProbably Damaging1Decrease-0.49
G29402TD377YNucleocapsidProbably Damaging1Increase0.51
Stability is yet another parameter which is crucial to judge the functional and structural activity of a protein. Protein stability dictates the conformational structure of the protein, thereby determining its function. Any change in protein stability may cause misfolding, degradation or aberrant conglomeration of proteins. In I-Mutant 2.0 the changes in the protein stability is predicted using free energy change values (DDG). A zero or a negative value of DDG indicates that the stability of a protein is decreasing. The result from I-mutant 2.0 infers that of the 27 and 24 unique deleterious or damaging changes for temporal and spatial analysis, 21 changes for both decrease the stability of the protein structures. The common mutations in both the categories are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. It is to be noted that, apart from these mutations, other important mutations as recognised by virologists in the multiple variants of concern like Alpha, Beta and Delta are L452R, E484K, D614G, P681H and P681R in Spike. Furthermore, the entropy change of the hotspot mutations for the different variants like Alpha, Beta and Delta are shown in Fig 9(a)–9(c) respectively. For example, hotspot mutation E484K in Alpha variant in Fig 9(a) which was dominant in the months of February-April 2021 has declined over the next few months. Also, D614G which is a common hotspot mutation in all the variants has also declined over time. Moreover, mutations like L452R and P681R which are part of the Delta variant are also two of the hotspot mutations as identified by the analysis. It is to be noted that Delta variant was responsible for the catastrophic 2nd wave in India. Fig 10(a) and 10(b) show the plot of confirmed and deceased cases in India till 31st October 2021. For example, western part of India has a very high number of confirmed and deceased cases which can be attributed to the Delta variant. As is shown in Table 2, Maharashtra which lies in the western part of India has both of the aforementioned mutations identified as hotspots. All these figures are considered from https://www.covid19india.org/.
Fig 9

Month wise evolution of (a) Alpha (B.1.1.7) (b) Beta (B.1.351) and (c) Delta (B.1.617.2) variants for non-synonymous hotspot mutations.

Fig 10

Illustration of (a) Confirmed and (b) Deceased cases of India to show the effects of SARS-CoV-2 in the different regions of the country.

Month wise evolution of (a) Alpha (B.1.1.7) (b) Beta (B.1.351) and (c) Delta (B.1.617.2) variants for non-synonymous hotspot mutations. Illustration of (a) Confirmed and (b) Deceased cases of India to show the effects of SARS-CoV-2 in the different regions of the country.

Conclusion

As the second wave of COVID pandemic had hit India really hard, understanding the evolution of SARS-CoV-2 virus is most crucial in this scenario. In this regard, temporal (month-wise) and spatial (state-wise) analysis are carried out for 17271 aligned Indian sequences to identify top 10 hotspot mutation points in the coding regions based on entropy for each month as well as for each state. Additionally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. As a result, for both temporal and spatial analysis, the common damaging and unstable mutations are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. Also, investigation of the effects of the characteristics of the hotspot mutations of SARS-CoV-2 on human hosts can be conducted with the help of virologists. The authors are working in this direction as well.

This file contains 4 supplementary tables named as S1-S4.

(PDF) Click here for additional data file. 27 Aug 2021 PONE-D-21-17506 Phylogenetic Analysis of 5734 Indian SARS-CoV-2 Genomes to Identify Temporal and Spatial Hotspot Mutations PLOS ONE Dear Dr. Ghosh, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Although the manuscript is well-prepared and timely, some important concerns need to be addressed. The authors should account for writing the manuscript clearly and provide appropriate discussion. Please submit your revised manuscript by October 26, 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see:  http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at  https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Arunachalam Ramaiah, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No Reviewer #3: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: SARS-CoV-2 is still spreading around the world very rapidly with a high infection rate that could become a global pandemic. In order for researchers, including us, to design a more effective vaccine, we are analyzing the evolving virus strains. You have performed multiple sequence alignments of 5734 sequences of SARS-CoV-2 using MAFFT and phylogenetic analysis using Nextstrain. Then, we identified several SNPs and point mutations. You identified the top 10 hotspot mutation points in the coding region. As a result, 130 hotspot mutations were identified in the temporal analysis and 250 in the spatial analysis. Subsequently, 32 temporally unique and 63 spatially unique hotspot mutations were identified, respectively. In addition, you have identified 21 point mutations in the time series analysis. For example, A97V in RdRp, L126F in NSP16, Q57H in ORF3a, and R203K, R203M, and G204R in Nucleocapsid. You also reaffirmed that the mutations of concern are E484Q and E484K in Spike. minor issues I could not read Tables 1 through 3 because the text was too small. Therefore, you should replace them with complete and readable tables in the text. Please move this table to the supplement. I believe that your paper needs to be released to the world as soon as possible. Reviewer #2: This paper analyzes GISAID data to identify mutational hotspots within SARS-CoV-2 genomes sequenced in India. This analysis identifies several mutations that appear to change over time or vary between regions within India. The authors claim that these locations are mutational hotspots but do not provide compelling evidence to support this. Specific comments: 1. Analysis of mutational hotspots is confounded by the competition between variants. Specifically, when one variant displaces another in a state or over time this will appear to show an enrichment for mutations associated with this variant even if the location of these mutations is not functionally important. At least for the positions identified as mutation “hotspots”, the authors should test whether they are changing in frequency within specific lineages. As an example, S:E484Q within the Delta lineage was presumed to be particularly problematic since it resembles the S:E484K mutation found in the Beta variant but it has since declined in frequency within the Delta variant. 2. Please avoid the use of the term “double mutant” as it is scientifically misleading. Please refer to the strains by either their PANGO lineage (e.g. B.1.617.2) or by their WHO designation (e.g. Delta variant). 3. Figure captions need more description. 4. Variants refers to the combination of many mutations rather than any specific mutation. E.g. Page 2, line 93 is incorrect since E484Q is not a variant. Please fix terminology throughout. 5. The protein structure model shown for nucleocapsid indicates a mostly unstructured architecture, but this is misleading. The N-terminal and C-terminal domain structures have been solved by multiple research groups and are known to be well ordered. Please update with a revised structure. 6. The use of PROVEAN and similar tools to detect “damaging” mutations is not explained well and is potentially misleading. These tools were designed to detect the impact of mutations of human proteins rather than viral proteins with the assumption that major changes to human proteins are likely to be deleterious. This cannot be assumed for emerging viruses because they are under a rapidly changing selection conditions and mutations identified by PROVEAN might be beneficial for the virus to avoid immunity or even to enhance function. This needs to be discussed more clearly. It is also unclear how this helps to identify a location as a potential mutation hotspot. Reviewer #3: Comments to the Author: The manuscript give a meaningful view point to the analysis of evolving virus strains of SARS-CoV-2, but the paper is not clearly written. Major points: The paper from introduction to discussion should be simplified; it’s too long and too verbose. The manuscript did not give a meaningful view about what authors do this work and what they get conclusion. I recommend author to revise the manuscript as clear as possible. Abstract: I strongly recommend author to re-write the abstract and give a clear abstract. Introduction: It’s too verbose. It present too much description to others research. The author may likely just put others researches together rather than summary and conclude their studies. Line。。。: 300K should be replaced using a formal description, e.g 300,000. The problem is through the entire manuscript. Line 10: The prevalent variant in South African is B.1.351, so “501Y. V2” should change to B.1.351. Line 11: Japanese should be Japan. Brazilian should be Brazil. Line 11: E484K is not a linage, please clarify update the mainly variants name. Line 24: In [8] and In [9] should be replace as more reasonable description, It can be change by author name or change to another description. This problem is present through the whole paper. Line 23-25: what this sentence relationship with previous viewpoint? Line 26-31: The author would like to Line 31-32: what this sentence mean “thereby indicating potential impacts on the ongoing development of various COVID-19 diagnosis and cure”? and what this sentence relationship with these variants? Line 36-37: previous have mentioned that “they have found Nucleocapsid to have the highest mutational changes in frequency”, the author can summary them together rather than describe it again and again. Line 15-72: All citation view were list, please summary and clarify the paragraph. Line 75: the method “multiple alignment using fast fourier transform (MAFFT)” could move to method rather introduction. Line 77-94: The sentence about method should be move to “Material and Methods” part, the sentence about the result details should be move to Results part. The introduction just retain the summary of resuts and meaning of this paper. Line 105: The citation is a lab? And the reference list 4 was not contain an author named Zhang. Line 115: “MAFFT which is a progressive alignment technique is used as the multiple sequence alignment (MSA) tool”, please delete “which is a progressive alignment technique”. Please delete the “multiple sequence alignment. the abbreviation “MSA” could be explain for the first time, and then author could use the “MSA” only. Method : It is too verbose, author do not need to explain advantage of every software or tools used. They have no relationship with your paper. Just give clear method. Results: Result was not consists of describe what is figure and table, rather give a results and explain using Figure and Table. Please re-write the results clearly. Line 184-186: what sentence “only coding regions are considered for identification of hotspot mutations as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations”? Disccussion: Line 21-26: The sentence “multiple sequence alignment of 5734 genomic sequences are carried out using MAFFT” . The author just needs to discuss the results rather than mention the method here. Line 217-218: delete sentence “the details of which are already discussed in the Result Section”. Conclusion: The author does not need to describe the method and results again, just give the summary and discovery of this paper. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 8 Nov 2021 Reviewer #1: SARS-CoV-2 is still spreading around the world very rapidly with a high infection rate that could become a global pandemic. In order for researchers, including us, to design a more effective vaccine, we are analyzing the evolving virus strains. You have performed multiple sequence alignments of 5734 sequences of SARS-CoV-2 using MAFFT and phylogenetic analysis using Nextstrain. Then, we identified several SNPs and point mutations. You identified the top 10 hotspot mutation points in the coding region. As a result, 130 hotspot mutations were identified in the temporal analysis and 250 in the spatial analysis. Subsequently, 32 temporally unique and 63 spatially unique hotspot mutations were identified, respectively. In addition, you have identified 21 point mutations in the time series analysis. For example, A97V in RdRp, L126F in NSP16, Q57H in ORF3a, and R203K, R203M, and G204R in Nucleocapsid. You also reaffirmed that the mutations of concern are E484Q and E484K in Spike. minor issues 1. I could not read Tables 1 through 3 because the text was too small. Therefore, you should replace them with complete and readable tables in the text. Please move this table to the supplement. Answer: We would like to apologise for the inconvenience caused. According to the suggestion, the tables have been modified in the revised manuscript. However, since these are very important tables, we have kept this in the main paper for the revised mauscript. I believe that your paper needs to be released to the world as soon as possible. The authors would like to thank the reviewer for the very kind comments. Reviewer #2: This paper analyzes GISAID data to identify mutational hotspots within SARS-CoV-2 genomes sequenced in India. This analysis identifies several mutations that appear to change over time or vary between regions within India. The authors claim that these locations are mutational hotspots but do not provide compelling evidence to support this. Answer: Mutations like L452R and P681R which are part of the Delta variant are also two of the hotspot mutations as identified by the analysis. It is to be noted that Delta variant was responsible for the catastrophic 2nd wave in India. Figures 10 (a) and (b) in the revised manuscript show the plot of confirmed and deceased cases in India till 31st October 2021. As can be seen from both the figures, western part of India has a very high number of confirmed and deceased cases which can be attributed to the Delta variant. As is shown in Table 2, Maharashtra which lies in the western part of India has both of the aforementioned mutations identified as hotspots. Also, some mutational hotspots are part of the Alpha, Beta and Delta variants as shown in Figure 9 in the revised manuscript, thereby confirming that they are indeed qualified to be hotspot mutations. These facts are elaborately discussed in the revised manuscript as well. Specific comments: 1. Analysis of mutational hotspots is confounded by the competition between variants. Specifically, when one variant displaces another in a state or over time this will appear to show an enrichment for mutations associated with this variant even if the location of these mutations is not functionally important. At least for the positions identified as mutation “hotspots”, the authors should test whether they are changing in frequency within specific lineages. As an example, S:E484Q within the Delta lineage was presumed to be particularly problematic since it resembles the S:E484K mutation found in the Beta variant but it has since declined in frequency within the Delta variant. Answer: According to the suggestion of the reviewer, the entropy change in hotspot mutations in variants like Alpha, Beta and Delta are reported in Figure 9 in the revised manuscript in order to illustrate the point as mentioned in the comment. 2. Please avoid the use of the term “double mutant” as it is scientifically misleading. Please refer to the strains by either their PANGO lineage (e.g. B.1.617.2) or by their WHO designation (e.g. Delta variant). Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. 3. Figure captions need more description. Answer: According to the suggestion of the reviewer, more descriptions have been added to the figures in the revised manuscript. 4. Variants refers to the combination of many mutations rather than any specific mutation. E.g. Page 2, line 93 is incorrect since E484Q is not a variant. Please fix terminology throughout. Answer: According to the suggestion of the reviewer, the terminology has been fixed in the revised manuscript. 5. The protein structure model shown for nucleocapsid indicates a mostly unstructured architecture, but this is misleading. The N-terminal and C-terminal domain structures have been solved by multiple research groups and are known to be well ordered. Please update with a revised structure. Answer: According to the suggestion of the reviewer, the structure of Nucleocapsid has been updated in the revised manuscript and the N-terminal and C-terminal have been confirmed using the PDBs 6M3M (range:50-174) and 6YUN (range:249-364) respectively. The following is the structure that has been used in the revised manuscript. 6. The use of PROVEAN and similar tools to detect “damaging” mutations is not explained well and is potentially misleading. These tools were designed to detect the impact of mutations of human proteins rather than viral proteins with the assumption that major changes to human proteins are likely to be deleterious. This cannot be assumed for emerging viruses because they are under a rapidly changing selection conditions and mutations identified by PROVEAN might be beneficial for the virus to avoid immunity or even to enhance function. This needs to be discussed more clearly. It is also unclear how this helps to identify a location as a potential mutation hotspot. Answer: It is important to note that in case of protein, damaging mostly defines instability. Generally, this is used for human proteins. As a consequence, if the human protein is damaging in nature because of mutations, then the human protein-protein interactions may occur with high or low binding affinity. Now in case of virus, similar consequences may happen which means if the virus protein is damaged because of mutations, it may interact with human proteins with similar binding affinity. As a result, the virus may acquire characteristics like transmissibility, escaping antibodies, etc. This is now clearly mentioned in the revised manuscript as well in order to avoid the confusion pertaining to the meaning of ‘damaging’ as concluded by PROVEAN and Polyphen-2. Moreover, we agree that the effects of these characteristics on human hosts are a matter of further investigations. Therefore, to draw a clear biological conclusion from the point of view of host, help of virologists is needed and as a future scope we are working in that direction. This is mentioned in the revised manuscript in the conclusion section. Please note that hotspot mutations are characterized by PROVEAN, Polyphen-2 and I-mutant 2.0 after their locations have been identified by Nextstrain. Therefore, there is no relation between locations and the aforementioned tools. It is also to be noted that PROVEAN and Polyphen-2 are developed on more or less same background. Thus, their results are analogous to each other. Therefore, to avoid redundancy only the results from well-known tool Polyphen-2 are kept in the revised manuscript. Reviewer #3: Comments to the Author: The manuscript give a meaningful view point to the analysis of evolving virus strains of SARS-CoV-2, but the paper is not clearly written. Major points: The paper from introduction to discussion should be simplified; it’s too long and too verbose. The manuscript did not give a meaningful view about what authors do this work and what they get conclusion. I recommend author to revise the manuscript as clear as possible. 1. Abstract: I strongly recommend author to re-write the abstract and give a clear abstract. Answer: According to the suggestion of the reviewer, the abstract has been rewritten in the revised manuscript. 2. Introduction: It’s too verbose. It present too much description to others research. The author may likely just put others researches together rather than summary and conclude their studies. Answer: According to the suggestion of the reviewer, the Introduction has been modified in the revised manuscript. Line。。。: 300K should be replaced using a formal description, e.g 300,000. The problem is through the entire manuscript. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 10: The prevalent variant in South African is B.1.351, so “501Y. V2” should change to B.1.351. Answer: According to the suggestion of the reviewer, the change has been made in the revised manuscript. Line 11: Japanese should be Japan. Brazilian should be Brazil. Answer: It is to be noted that in the revised manuscript, the variants of concern with the corresponding W.H.O declared naming conventions have been provided. Line 11: E484K is not a linage, please clarify update the mainly variants name. Answer: According to the suggestion of the reviewer, the change has been made in the revised manuscript. Line 24: In [8] and In [9] should be replace as more reasonable description, It can be change by author name or change to another description. This problem is present through the whole paper. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 23-25: what this sentence relationship with previous viewpoint? Answer: This is to be noted that this sentence was written inadvertently. We deeply apologise for this. The required change has been done in the revised manuscript. Line 26-31: The author would like to Answer: It is not very clear which sentence the reviewer is mentioning. Line 31-32: what this sentence mean “thereby indicating potential impacts on the ongoing development of various COVID-19 diagnosis and cure”? and what this sentence relationship with these variants? Answer: According to the suggestion of the reviewer, this sentence has been modified in the revised manuscript. The sentence indicates that Nucleocapsid cannot be a possible diagnostic target as it exhibits quite high number of mutations. Thus, this may undermine the ongoing researches targeting Nucleocapsid for COVID-19 diagnosis, vaccines, antibody and small-molecular drugs. Line 36-37: previous have mentioned that “they have found Nucleocapsid to have the highest mutational changes in frequency”, the author can summary them together rather than describe it again and again. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 15-72: All citation view were list, please summary and clarify the paragraph. Answer: According to the suggestion of the reviewer, the change has been made in the revised manuscript. Line 75: the method “multiple alignment using fast fourier transform (MAFFT)” could move to method rather introduction. Answer: It is to be noted that the only the method name has been mentioned in the Introduction to give the readers an overview. Line 77-94: The sentence about method should be move to “Material and Methods” part, the sentence about the result details should be move to Results part. The introduction just retain the summary of results and meaning of this paper. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 105: The citation is a lab? And the reference list 4 was not contain an author named Zhang. Answer: It is to be noted that Zhang Lab is not a citation but a footnote to highlight the website from which the SARS-CoV-2 protein PDBs are collected. That is why the reference [4] (which is cited in the last line of the 1st paragraph of the Introduction) does not contain an author named Zhang. Line 115: “MAFFT which is a progressive alignment technique is used as the multiple sequence alignment (MSA) tool”, please delete “which is a progressive alignment technique”. Please delete the “multiple sequence alignment. the abbreviation “MSA” could be explain for the first time, and then author could use the “MSA” only. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Method : It is too verbose, author do not need to explain advantage of every software or tools used. They have no relationship with your paper. Just give clear method. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Results: Result was not consists of describe what is figure and table, rather give a results and explain using Figure and Table. Please re-write the results clearly. Answer: According to the suggestion of the reviewer, the changes have been made to the best of abilities and readers point of view in the revised manuscript. Line 184-186: what sentence “only coding regions are considered for identification of hotspot mutations as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations”? Answer: Non-coding regions do not produce any protein to bind with human proteins. Thus, they are not considered for hotpot mutations as we have confined our research to only the coding regions. Moreover, in non-coding regions, the entropy value is high for almost all mutation points even if they may not be very important mutation points for SARS-CoV-2. Thus, instead of considering the mutation points in both coding and non-coding regions, we have only considered the mutation points in coding regions so that they exhibit the true characteristics of hotspot mutations. Disccussion: Line 21-26: The sentence “multiple sequence alignment of 5734 genomic sequences are carried out using MAFFT” . The author just needs to discuss the results rather than mention the method here. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. It is to be noted that instead of 5734 Indian SARS-CoV-2 genomes, the revised manuscript contains the analysis on 17271 such genomes as the number of genomes have updated over time. We have conducted all the experiments all over again to provide the updated results in the revised manuscript. Line 217-218: delete sentence “the details of which are already discussed in the Result Section”. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Conclusion: The author does not need to describe the method and results again, just give the summary and discovery of this paper. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Submitted filename: Response to Reviewers.docx Click here for additional data file. 7 Mar 2022 Phylogenetic Analysis of 17271 Indian SARS-CoV-2 Genomes to Identify Temporal and Spatial Hotspot Mutations PONE-D-21-17506R1 Dear Dr. Ghosh, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Chandrabose Selvaraj, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #3: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #3: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #3: Yes Reviewer #4: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #3: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #3: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #3: Author has addressed the issues that I mentioned.I believe that this paper needs to be released to the world as soon as possible. Reviewer #4: The authors answered all questions from reviewers and made all changes to the manuscript, which can be accepted in this format. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #3: No Reviewer #4: Yes: Fabrício Souza Campos 18 Mar 2022 PONE-D-21-17506R1 Phylogenetic Analysis of 17271 Indian SARS-CoV-2 Genomes to Identify Temporal and Spatial Hotspot Mutations Dear Dr. Saha: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Chandrabose Selvaraj Academic Editor PLOS ONE
  24 in total

1.  Could mutations of SARS-CoV-2 suppress diagnostic detection?

Authors:  Carl A Ascoli
Journal:  Nat Biotechnol       Date:  2021-03       Impact factor: 54.908

2.  Nextstrain: real-time tracking of pathogen evolution.

Authors:  James Hadfield; Colin Megill; Sidney M Bell; John Huddleston; Barney Potter; Charlton Callender; Pavel Sagulenko; Trevor Bedford; Richard A Neher
Journal:  Bioinformatics       Date:  2018-12-01       Impact factor: 6.931

3.  Analysis of Indian SARS-CoV-2 Genomes Reveals Prevalence of D614G Mutation in Spike Protein Predicting an Increase in Interaction With TMPRSS2 and Virus Infectivity.

Authors:  Sunil Raghav; Arup Ghosh; Jyotirmayee Turuk; Sugandh Kumar; Atimukta Jha; Swati Madhulika; Manasi Priyadarshini; Viplov K Biswas; P Sushree Shyamli; Bharati Singh; Neha Singh; Deepika Singh; Ankita Datey; Kiran Avula; Shuchi Smita; Jyotsnamayee Sabat; Debdutta Bhattacharya; Jaya Singh Kshatri; Dileep Vasudevan; Amol Suryawanshi; Rupesh Dash; Shantibhushan Senapati; Tushar K Beuria; Rajeeb Swain; Soma Chattopadhyay; Gulam Hussain Syed; Anshuman Dixit; Punit Prasad; Sanghamitra Pati; Ajay Parida
Journal:  Front Microbiol       Date:  2020-11-23       Impact factor: 5.640

4.  South Africa responds to new SARS-CoV-2 variant.

Authors:  Munyaradzi Makoni
Journal:  Lancet       Date:  2021-01-23       Impact factor: 79.321

5.  CovMT: an interactive SARS-CoV-2 mutation tracker, with a focus on critical variants.

Authors:  Intikhab Alam; Aleksandar Radovanovic; Roberto Incitti; Allan A Kamau; Mohammed Alarawi; Esam I Azhar; Takashi Gojobori
Journal:  Lancet Infect Dis       Date:  2021-02-08       Impact factor: 25.071

6.  Comprehensive analysis of genomic diversity of SARS-CoV-2 in different geographic regions of India: an endeavour to classify Indian SARS-CoV-2 strains on the basis of co-existing mutations.

Authors:  Rakesh Sarkar; Suvrotoa Mitra; Pritam Chandra; Priyanka Saha; Anindita Banerjee; Shanta Dutta; Mamta Chawla-Sarkar
Journal:  Arch Virol       Date:  2021-01-19       Impact factor: 2.574

7.  Decoding SARS-CoV-2 Transmission and Evolution and Ramifications for COVID-19 Diagnosis, Vaccine, and Medicine.

Authors:  Rui Wang; Yuta Hozumi; Changchuan Yin; Guo-Wei Wei
Journal:  J Chem Inf Model       Date:  2020-06-25       Impact factor: 4.956

8.  Emerging phylogenetic structure of the SARS-CoV-2 pandemic.

Authors:  Nicholas M Fountain-Jones; Raima Carol Appaw; Scott Carver; Xavier Didelot; Erik Volz; Michael Charleston
Journal:  Virus Evol       Date:  2020-11-10

9.  Introduction of the South African SARS-CoV-2 variant 501Y.V2 into the UK.

Authors:  Julian W Tang; Oliver T R Toovey; Kirsty N Harvey; David D S Hui
Journal:  J Infect       Date:  2021-01-17       Impact factor: 6.072

10.  Unification and extensive diversification of M/Orf3-related ion channel proteins in coronaviruses and other nidoviruses.

Authors:  Yongjun Tan; Theresa Schneider; Prakash K Shukla; Mahesh B Chandrasekharan; L Aravind; Dapeng Zhang
Journal:  Virus Evol       Date:  2021-02-16
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.