Literature DB >> 35344550

Phylogenetic analysis of 17271 Indian SARS-CoV-2 genomes to identify temporal and spatial hotspot mutations.

Nimisha Ghosh^1,2, Suman Nandi³, Indrajit Saha³.

Abstract

The second wave of SARS-CoV-2 has hit India hard and though the vaccination drive has started, moderate number of COVID affected patients is still present in the country, thereby leading to the analysis of the evolving virus strains. In this regard, multiple sequence alignment of 17271 Indian SARS-CoV-2 sequences is performed using MAFFT followed by their phylogenetic analysis using Nextstrain. Subsequently, mutation points as SNPs are identified by Nextstrain. Thereafter, from the aligned sequences temporal and spatial analysis are carried out to identify top 10 hotspot mutations in the coding regions based on entropy. Finally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. For both temporal and spatial analysis, there are 21 non-synonymous hotspot mutations which are unstable and damaging.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35344550 PMCID： PMC8959188 DOI： 10.1371/journal.pone.0265579

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

It is now close to two years since the emergence of SARS-CoV-2, the virus behind the deadly COVID-19 disease and the scientific community is still struggling to put an end to this pandemic. Though India was able to contain the spread in the first wave, the second wave put the entire system in turmoil. In September 2021, around 30,000 https://www.covid19india.org/ cases were being registered on a daily basis while in the month of May, this figure surpassed 300,000. Scientists and researchers had attributed this surge due to the evolution of this contagious virus which has resulted in Delta (B.1.617.2) variant. Though the vaccination drive in India is in full swing, doubts regarding the efficacy of the vaccine against such mutations cannot be undermined. Apart from Delta, other variants of concern as declared by W.H.O making their rounds are Alpha (B.1.1.7) [1], Beta (B.1.351) [2] and Gamma (P.1) [3] variants. All these variants, especially Delta resulted in new spurts of lockdown in the country. Thus, to understand its frequent mutations, a study pertaining to the evolution of SARS-CoV-2 virus is inevitable [4, 5]. To understand these evolutionary mutations, 103 SARS-CoV-2 sequences have been analysed by Tang et al. [6] which revealed two major lineages, L and S. These lineages are defined by two tightly linked SNPs at positions at 28144 (ORF8: C251T, S84L) and 8782 (orf1ab:T8517C, synonymous) and might influence virus pathogenesis. Raghav et al. [7] have used RTIC primers–based amplicon sequencing to profile 225 Indian SARS-CoV-2 sequences. Their analysis showed that apart from local transmission, Europe and Southeast Asia are the two major routes for introduction of the disease in India. Their study also revealed that D614G in the Spike protein as a very common mutation that increases virus shedding and infectivity. In [8], Wang et al. have proposed a h-index mutation ratio criteria to evaluate the non-conserved and conserved proteins with the help of over 15K sequences. As a result, Nucleocapsid, Spike and Papain-like protease are found to be highly non-conserved while Envelope, main protease, and Endoribonuclease protein are considered to be conservative. They have further identified mutations on 40% of nucleotides in Nucleocapsid gene, thereby reducing the efforts on the ongoing development of various COVID-19 diagnosis and cure which targets Nucleocapsid gene. Similar analysis conducted by Yuan et al. [9] with 11183 sequences revealed 119 high frequency substitutions as SNPs around the globe. Among the nucleotide changes in SNPs, C to T is the major one which indicates adaptation and evolution of the virus in the human host which can pose new challenges. Also, they have found Nucleocapsid to have the highest mutational changes in frequency. Thus both the works by Wang et al. [8] and Yuan et al. [9] refute the claim by Ascoli [10] that Nucleocapsid can be a possible diagnostic target. Thus, it is important to understand the evolution of SARS-CoV-2 over time. Cheng et al. [11] have identified five major mutation points such as C28144T, C14408T, A23403G, T8782C and C3037T in almost all strains for the month of April 2020. Their functional analysis show that these mutations lead to a decrease in protein stability and eventually a reduction in the virulence of SARS-CoV-2 while A23403G mutation increases the Spike-ACE2 interaction leading to an increase in its infectivity. Phylogenetic analysis done by Maitra et al. [12] shows that mutations such as C14408T in RdRp and A23403G in Spike majorly encompass A2a clade in 9 Indian sequences. Moreover, a triplet based mutation such as 2881–3 GGG/AAC in Nucleocapsid gene which might be responsible for affecting miRNAs bindings to original sequences has also been reported in their work. Guruprasad et al. [13] has analysed 10333 spike protein sequences out of which 8155 proteins comprised of one or more mutations, leading to a total of 9654 mutations that correspond to 400 distinct mutation sites. According to this analysis the top 10 mutations according to the total number of occurrences are D614 (7859), L5 (109), L54 (105), P1263 (61), P681 (51), S477 (47), T859 (30), S221 (28), V483 (28) and A845 (24). Other important works like [14-17] have also revealed different mutations after analysis of several SARS-CoV-2 sequences. Looking at these varied mutations as reported by all the aforementioned works, it can be easily concluded that the evolutionary study of SARS-CoV-2 genomes is very relevant in the current pandemic scenario of the ongoing waves in India. Motivated by the aforementioned studies, in this work we have performed multiple sequence alignment (MSA) of 17271 Indian SARS-CoV-2 genomes using multiple alignment using fast fourier transform (MAFFT) [18] followed by their phylogenetic analysis using Nextstrain [19] to eventually identify hotspot mutations both month-wise (temporal) and state-wise (spatial). Thereafter, from the aligned sequences, temporal and spatial analysis are carried out to identify top 10 hotspot mutations in the coding regions based on entropy, thereby resulting in 130 and 250 hotspot mutations respectively. Finally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. The hotspot mutations which are unstable and damaging and common in both the categories are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. Furthermore, as recognised by virologists, E484K in Spike which is identified in temporal analysis is yet another major mutation which is responsible for improving the ability of the virus to escape the host’s immune system [20].

Material and methods

In this section, the dataset collection for the 17271 Indian SARS-CoV-2 genomes are discussed along with the proposed pipeline.

Data acquisition

To perform the multiple sequence alignment and phylogenetic analysis, 17271 Indian SARS-CoV-2 genomes are collected from Global Initiative on Sharing All Influenza Data (GISAID) https://www.gisaid.org/ and the Reference Genome (NC 045512.2) https://www.ncbi.nlm.nih.gov/nuccore/1798174254 is collected from National Center for Biotechnology Information (NCBI). The SARS-CoV-2 sequences are mostly distributed from January 2020 to September 2021 across the states of India. Moreover, for mapping the protein sequences and the subsequent changes in the amino acid, protein PDB are collected from Zhang Lab https://zhanglab.ccmb.med.umich.edu/COVID-19/. These PDBs are then used to model and identify the structural changes in the protein. All these analyses are performed on High Performance Computing facility of NITTTR, Kolkata while MATLAB R2019b is used for checking the amino acid changes.

Pipeline of the work

The pipeline of the work is provided in Fig 1. Initially, multiple sequence alignment (MSA) of 17271 Indian SARS-CoV-2 genomes is performed using MAFFT which is followed by their phylogenetic analysis using Nextstrain, thereby leading to the identification of mutation points as SNPs. In this work, MAFFT is used as the MSA tool. As MAFFT uses fast fourier transform thus, it scores over other alignment techniques. So, MAFFT is used in this work for MSA. On the other hand, by taking the advantage of Nextstrain, in this work the evolution and geographic distribution of SARS-CoV-2 genomes are visualised by creating the metadata in our High Performance Computing environment.

Fig 1

Pipeline of the work.

Once the alignment and the phylogenetic analyses are completed and the mutation points as SNPs are identified, temporal (month-wise) and spatial (state-wise) analysis are performed for the aligned sequences to identify top 10 hotspot mutations both month-wise and state-wise. Furthermore, amino acid changes in the SARS-CoV-2 proteins are also identified considering the codon table. The top 10 hotspot mutations are identified for each month and each state based on their entropy values for the coding regions and are computed as follows: where represents the frequency of each residue α occurring at position β and 5 represents the four possible residues as nucleotides plus gap. Subsequently, the amino acid changes for the temporal and spatial non-synonymous hotspot mutations are visualised graphically. Finally, the amino acid changes of the non-synonymous hotspot mutations are considered to evaluate their functional characteristics and they are visualised in the respective protein structure as well.

Results

All the experiments in this work are carried out according to Fig 1. In this regard, MSA of 17271 Indian SARS-CoV-2 genomes is initially carried out using MAFFT. Thereafter, their phylogenetic analysis using Nextstrain reveals 5 virus clades viz. 19A, 19B, 20A, 20B and 20C and also the corresponding mutation points as SNPs. Subsequently, temporal (month-wise) and spatial (state-wise) analysis are performed for the aligned sequences to identify the top 10 hotspot mutations in each category, resulting in 190 and 250 mutation points respectively. The phylogenetic trees in radial and rectangular views considering temporal analysis are shown in Fig 2(a) and 2(b) while Fig 2(c) and 2(d) show the views considering spatial analysis. The normal and zoomed views of the geographical distribution of the sequences clade-wise are shown in Fig 2(e) and 2(f) respectively. In unsupervised learning feature selection is a non-trivial task; entropy of the aligned sequences is considered to be the selected feature in this work. For example, temporal analysis of January-March-2020 with 191 sequences shows that G11083T in NSP6 has the highest entropy value of 0.82391 while for spatial analysis of Maharastra with 3674 sequences, the highest entropy value of 1.02173 is borne by G28881A and G28881T in Nucleocapsid. Such results are reported in Tables 1 and 2 for the top 10 hotspot mutations for temporal and spatial analysis along with the associated details while S1 and S2 Tables in S1 File report the list of all temporal and spatial hotspot mutations. Table 2 reports the spatial analysis for the states of India. The entropy values corresponding to the nucleotide changes are shown in Fig 2(g) while the temporal and spatial changes in entropy are reported in S3 and S4 Tables in S1 File respectively. The evolution of the virus genome in terms of entropy for both temporal and spatial analysis is another crucial result reported in this work. For example, from a temporal perspective E484Q/K which is a much circulating variant in India has evolved over time but is on the wane now while for spatial analysis it can be seen that E484Q is one of the most prevalent variant in West Bengal. These evolution are visualised in Figs 3 and 4 respectively. It is to be noted that due to the lack of appropriate number of sequences, temporal data of January to March 2020 have been merged for the analysis. Please also note that non-coding regions of SARS-CoV-2 do not produce any protein to bind with human proteins. Thus, they are not considered for hotpot mutations. Moreover, since entropy calculation is performed on aligned sequences, only coding regions are considered for identification of hotspot mutations as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations.

Fig 2

Table 1

List of top 10 hotspot mutations based on temporal analysis.

Month	Number of Sequences	Genomic Coordinate	Entropy	Nucleotide Change	Amino Acid Change	Protein Coordinate	Coding Region
January-March-2020	191	11083	0.82391	G>T	L>F	37	NSP6
		28311	0.64212	C>T	P>L	13	Nucleocapsid
		3037	0.63531	C>T	F>F	106	NSP3
		14408	0.63531	C>T	P>L	323	RdRp
		23403	0.63531	A>G	D>G	614	Spike
		23929	0.63088	C>T	Y>Y	789	Spike
		6312	0.59276	C>A	T>K	1198	NSP3
		13730	0.58269	C>T	A>V	97	RdRp
		28688	0.57987	T>C	L>L	139	Nucleocapsid
		1397	0.53043	G>A	V>I	198	NSP2
April-2020	441	11083	0.79874	G>T	L>F	37	NSP6
		28311	0.71328	C>T	P>L	13	Nucleocapsid
		3037	0.70595	C>T	F>F	106	NSP3
		23403	0.69774	A>G	D>G	614	Spike
		14408	0.6971	C>T	P>L	323	RdRp
		6312	0.6678	C>A	T>K	1198	NSP3
		13730	0.66587	C>T	A>V	97	RdRp
		23929	0.65279	C>T	Y>Y	789	Spike
		28881	0.53127	G>A	R>K	203	Nucleocapsid
		28882	0.53127	G>A	R>R	203	Nucleocapsid
May-2020	977	28881	0.66198	G>A	R>K	203	Nucleocapsid
		28882	0.66198	G>A	R>R	203	Nucleocapsid
		28883	0.66198	G>C	G>R	204	Nucleocapsid
		25563	0.64183	G>T	Q>H	57	ORF3a
		26735	0.56685	C>T	Y>Y	71	Membrane
		18877	0.5533	C>T	L>L	280	Exon
		313	0.54277	C>T	L>L	16	NSP1
		14408	0.54115	C>T	P>L	323	RdRp
		5700	0.50567	C>A	A>D	994	NSP3
		13730	0.48254	C>T	A>V	97	RdRp
June-2020	1062	28881	0.72623	G>A	R>K	203	Nucleocapsid
		28883	0.71049	G>C	G>R	204	Nucleocapsid
		28882	0.69816	G>A	R>R	203	Nucleocapsid
		22444	0.67332	C>T	D>D	294	Spike
		25563	0.67187	G>T	Q>H	57	ORF3a
		18877	0.66299	C>T	L>L	280	Exon
		26735	0.6606	C>T	Y>Y	71	Membrane
		28854	0.6393	C>T	S>L	194	Nucleocapsid
		313	0.54631	C>T	L>L	16	NSP1
		5700	0.53036	C>A	A>D	994	NSP3
July-2020	683	28881	0.86601	G>A	R>K	203	Nucleocapsid
		28882	0.85618	G>A	R>R	203	Nucleocapsid
		28883	0.85615	G>C	G>R	204	Nucleocapsid
		25563	0.69252	G>T	Q>H	57	ORF3a
		313	0.66456	C>T	L>L	16	NSP1
		18877	0.66359	C>T	L>L	280	Exon
		5700	0.65981	C>A	A>D	994	NSP3
		26735	0.65467	C>T	Y>Y	71	Membrane
		28854	0.61568	C>T	S>L	194	Nucleocapsid
		22444	0.60236	C>T	D>D	294	Spike
August-2020	632	28881	0.79095	G>A	R>K	203	Nucleocapsid
		28883	0.78919	G>C	G>R	204	Nucleocapsid
		28882	0.78061	G>A	R>R	203	Nucleocapsid
		22444	0.62652	C>T	D>D	294	Spike
		25563	0.62045	G>T	Q>H	57	ORF3a
		28854	0.61586	C>T	S>L	194	Nucleocapsid
		26735	0.61193	C>T	Y>Y	71	Membrane
		313	0.6079	C>T	L>L	16	NSP1
		18877	0.6079	C>T	L>L	280	Exon
		5700	0.60235	C>A	A>D	994	NSP3
September-2020	629	28881	0.7396	G>A	R>K	203	Nucleocapsid
		28882	0.68911	G>A	R>R	203	Nucleocapsid
		28883	0.67924	G>C	G>R	204	Nucleocapsid
		25563	0.60326	G>T	Q>H	57	ORF3a
		313	0.59785	C>T	L>L	16	NSP1
		5700	0.59193	C>A	A>D	994	NSP3
		22444	0.57955	C>T	D>D	294	Spike
		28854	0.56792	C>T	S>L	194	Nucleocapsid
		18877	0.56622	C>T	L>L	280	Exon
		26735	0.56103	C>T	Y>Y	71	Membrane
October-2020	380	28881	0.78752	G>A	R>K	203	Nucleocapsid
		28882	0.70769	G>A	R>R	203	Nucleocapsid
		28883	0.70769	G>C	G>R	204	Nucleocapsid
		22444	0.64744	C>T	D>D	294	Spike
		18877	0.6463	C>T	L>L	280	Exon
		26735	0.6463	C>T	Y>Y	71	Membrane
		25563	0.64465	G>T	Q>H	57	ORF3a
		28854	0.64124	C>T	S>L	194	Nucleocapsid
		8917	0.57761	C>T	F>F	121	NSP4
		9389	0.55503	G>A	D>N	279	NSP4
November-2020	452	22444	0.75515	C>T	D>D	294	Spike
		28881	0.74527	G>A	R>K	203	Nucleocapsid
		28854	0.69762	C>T	S>L	194	Nucleocapsid
		18877	0.68886	C>T	L>L	280	Exon
		26735	0.68657	C>T	Y>Y	71	Membrane
		25563	0.68439	G>T	Q>H	57	ORF3a
		1947	0.66982	T>C	V>A	381	NSP2
		28882	0.66551	G>A	R>R	203	Nucleocapsid
		28883	0.66551	G>C	G>R	204	Nucleocapsid
		3267	0.48539	C>T	T>I	183	NSP3
December-2020	983	28881	0.71656	G>A	R>K	203	Nucleocapsid
		22444	0.71598	C>T	D>D	294	Spike
		1947	0.71371	T>C	V>A	381	NSP2
		25563	0.68512	G>T	Q>H	57	ORF3a
		18877	0.67905	C>T	L>L	280	Exon
		26735	0.67871	C>T	Y>Y	71	Membrane
		28854	0.67728	C>T	S>L	194	Nucleocapsid
		28883	0.67009	G>C	G>R	204	Nucleocapsid
		28882	0.65134	G>A	R>R	203	Nucleocapsid
		26060	0.56206	C>T	T>I	223	ORF3a
January-2021	500	28881	0.82738	G>A	R>K	203	Nucleocapsid
		28882	0.71685	G>A	R>R	203	Nucleocapsid
		18877	0.70613	C>T	L>L	280	Exon
		25563	0.70613	G>T	Q>H	57	ORF3a
		28883	0.70225	G>C	G>R	204	Nucleocapsid
		22444	0.69315	C>T	D>D	294	Spike
		26735	0.69315	C>T	Y>Y	71	Membrane
		28854	0.69286	C>T	S>L	194	Nucleocapsid
		3267	0.63605	C>T	T>I	183	NSP3
		21034	0.61845	C>T	L>L	126	NSP16
February-2021	980	28881	1.13342	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	1.02071	C>A, C>G	P>H, P>R	681	Spike
		23012	0.82687	G>C, G>A	E>Q, E>K	484	Spike
		24775	0.69608	A>T, A>-	Q>H, Q>-	1071	Spike
		28882	0.68897	G>A	R>R	203	Nucleocapsid
		28883	0.67724	G>C	G>R	204	Nucleocapsid
		28280	0.66855	G>T, G>C	D>Y, D>H	3	Nucleocapsid
		25469	0.65125	C>T	S>L	26	ORF3a
		22444	0.6458	C>T	D>D	294	Spike
		29402	0.64017	G>T	D>Y	377	Nucleocapsid
March-2021	1907	28881	1.03262	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	1.01066	C>A, C>G	P>H, P>R	681	Spike
		28280	0.91893	G>T, G>C	D>Y, D>H	3	Nucleocapsid
		23012	0.88114	G>C, G>A	E>Q, E>K	484	Spike
		26767	0.84724	T>C, T>G	I>T, I>S	82	Membrane
		11296	0.82674	T>G, T>-	F>L, F>-	108	NSP6
		21987	0.80846	G>A, G>-	G>D, G>-	142	Spike
		24775	0.80534	A>T, A>-	Q>H, Q>-	1071	Spike
		25469	0.77293	C>T	S>L	26	ORF3a
		22022	0.76572	G>A	E>K	154	Spike
April-2021	3054	28253	1.13895	C>A, C>T, C>-	F>L, F>F, F>-	120	ORF8
		22034	0.89681	A>G, A>-	R>G, R>-	158	Spike
		26767	0.89284	T>C, T>G	I>T, I>S	82	Membrane
		21987	0.87431	G>A, G>-	G>D, G>-	142	Spike
		28249	0.84388	A>T, A>-	D>V, D>-	119	ORF8
		24410	0.8167	G>A	D>N	950	Spike
		22033	0.76607	C>-	F>-	157	Spike
		22032	0.756	T>-	F>-	157	Spike
		28248	0.71357	G>-	D>-	119	ORF8
		11418	0.70573	T>C	V>A	149	NSP6
May-2021	2408	28253	1.08851	C>A, C>T, C>-	F>L, F>F, F>-	120	ORF8
		22034	0.81429	A>G, A>-	R>G, R>-	158	Spike
		28249	0.81342	A>T, A>-	D>V, D>-	119	ORF8
		21987	0.76579	G>A	G>D	142	Spike
		11418	0.70413	T>C	V>A	149	NSP6
		9891	0.69625	C>T	A>V	446	NSP4
		22030	0.68573	G>-	E>-	156	Spike
		28251	0.6755	T>-	F>-	120	ORF8
		5184	0.66981	C>T	P>L	822	NSP3
		11201	0.66818	A>G	T>A	77	NSP6
June-2021	1293	21987	1.0067	G>A, G>-	G>D, G>-	142	Spike
		28253	0.98317	C>A, C>-	F>L, F>-	120	ORF8
		28249	0.81706	A>T, A>-	D>V, D>-	119	ORF8
		22034	0.81496	A>G, A>-	R>G, R>-	158	Spike
		11418	0.70538	T>C	V>A	149	NSP6
		27874	0.70016	C>T	T>I	40	ORF7b
		9891	0.69617	C>T	A>V	446	NSP4
		28916	0.69472	G>T	G>C	215	Nucleocapsid
		11201	0.69311	A>G	T>A	77	NSP6
		9053	0.69268	G>T	V>L	167	NSP4
July-2021	632	21987	0.93091	G>A, G>-	G>D, G>-	142	Spike
		28253	0.87833	C>A, C>-	F>L, F>-	120	ORF8
		28249	0.7564	A>T, A>-	D>V, D>-	119	ORF8
		28251	0.71349	T>-	F>-	120	ORF8
		28250	0.711	T>-	D>-	119	ORF8
		28252	0.70261	T>-	F>-	120	ORF8
		4181	0.68595	G>T	A>S	488	NSP3
		5184	0.68595	C>T	P>L	822	NSP3
		6402	0.68595	C>T	P>L	1228	NSP3
		7124	0.68595	C>T	P>S	1469	NSP3
August-2021	15	28253	0.70869	C>A	F>L	120	ORF8
		4181	0.69142	G>T	A>S	488	NSP3
		6402	0.69142	C>T	P>L	1228	NSP3
		7124	0.69142	C>T	P>S	1469	NSP3
		8986	0.69142	C>T	D>D	144	NSP4
		9053	0.69142	G>T	V>L	167	NSP4
		10029	0.69142	C>T	T>I	492	NSP4
		11201	0.69142	A>G	T>A	77	NSP6
		11332	0.69142	A>G	V>V	120	NSP6
		19220	0.69142	C>T	A>V	394	Exon
September-2021	52	21846	0.69315	C>T	T>I	95	Spike
		24410	0.68696	G>A	D>N	950	Spike
		5184	0.60769	C>T	P>L	822	NSP3
		27874	0.59084	C>T	T>I	40	ORF7b
		4181	0.57228	G>T	A>S	488	NSP3
		6402	0.57228	C>T	P>L	1228	NSP3
		7124	0.57228	C>T	P>S	1469	NSP3
		8986	0.57228	C>T	D>D	144	NSP4
		9053	0.57228	G>T	V>L	167	NSP4
		10029	0.57228	C>T	T>I	492	NSP4

Table 2

List of top 10 hotspot mutations based on spatial analysis.

State	Number of Sequences	Genomic Coordinate	Entropy	Nucleotide Change	Amino Acid Change	Protein Coordinate	Coding Region
Maharashtra	3674	28881	1.02173	G>A, G>T	R>K, R>M	203	Nucleocapsid
		26767	0.92484	T>C, T>G	I>T, I>S	82	Membrane
		23604	0.81242	C>G	P>R	681	Spike
		28253	0.806	C>-	F>-	120	ORF8
		21987	0.79485	G>A, G>-	G>D, G>-	142	Spike
		25469	0.7663	C>T	S>L	26	ORF3a
		27638	0.70457	T>C	V>A	82	ORF7a
		29402	0.70178	G>T	D>Y	377	Nucleocapsid
		22917	0.69779	T>G	L>R	452	Spike
		23012	0.67477	G>C	E>Q	484	Spike
Telangana	2506	28253	1.0594	C>T, C>-	F>F, F>-	120	ORF8
		28881	1.05196	G>A, G>T	R>K, R>M	203	Nucleocapsid
		22034	0.92872	A>G, A>-	R>G, R>-	158	Spike
		23604	0.83122	C>G	P>R	681	Spike
		26767	0.74928	T>C	I>T	82	Membrane
		24410	0.72581	G>A	D>N	950	Spike
		29402	0.71226	G>T	D>Y	377	Nucleocapsid
		22033	0.70621	C>-	F>-	157	Spike
		27638	0.70183	T>C	V>A	82	ORF7a
		22917	0.70114	T>G	L>R	452	Spike
Gujarat	2333	28881	0.98391	G>A, G>T	R>K, R>M	203	Nucleocapsid
		28253	0.98023	C>A, C>-	F>L, F>-	120	ORF8
		23604	0.89132	C>A, C>G	P>H, P>R	681	Spike
		26767	0.79834	T>C	I>T	82	Membrane
		28249	0.78731	A>-	D>-	119	ORF8
		22034	0.76092	A>-	R>-	158	Spike
		22033	0.74274	C>-	F>-	157	Spike
		22032	0.74262	T>-	F>-	157	Spike
		25469	0.71957	C>T	S>L	26	ORF3a
		22029	0.71048	A>-	E>-	156	Spike
West Bengal	1637	28881	1.03445	G>A, G>T	R>K, R>M	203	Nucleocapsid
		26767	0.99595	T>G, T>C	I>T, I>S	82	Membrane
		23604	0.9359	C>A, C>G	P>H, P>R	681	Spike
		28253	0.88971	C>A, C>-	F>L, F>-	120	ORF8
		21987	0.81006	G>A, G>-	G>D, G>-	142	Spike
		22034	0.80702	A>G, A>-	R>G, R>-	158	Spike
		28249	0.77084	A>-	D>-	119	ORF8
		22917	0.70438	T>G	L>R	452	Spike
		29402	0.7006	G>T	D>Y	377	Nucleocapsid
		27638	0.69709	T>C	V>A	82	ORF7a
Delhi	1240	28881	1.08218	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	0.94518	C>A, C>G	P>H, P>R	681	Spike
		22444	0.76965	C>T	D>D	294	Spike
		25563	0.76199	G>T	Q>H	57	ORF3a
		26735	0.72004	C>T	Y>Y	71	Membrane
		18877	0.71311	C>T	L>L	280	Exon
		28854	0.70723	C>T	S>L	194	Nucleocapsid
		1947	0.68719	T>C	V>A	381	NSP2
		26767	0.65229	T>C	I>T	82	Membrane
		28883	0.63286	G>C	G>R	204	Nucleocapsid
Andhra Pradesh	1077	28253	1.21902	C>A, C>T, C>-	F>L, F>F, F>-	120	ORF8
		22034	1.04209	A>G, A>-	R>G, R>-	158	Spike
		28881	0.85363	G>A, G>T	R>K, R>M	203	Nucleocapsid
		22033	0.78715	C>-	F>-	157	Spike
		26767	0.73239	T>C	I>T	82	Membrane
		23604	0.73117	C>G	P>R	681	Spike
		28249	0.71674	A>-	D>-	119	ORF8
		22030	0.70822	G>-	E>-	156	Spike
		22029	0.70261	A>-	E>-	156	Spike
		22031	0.69313	T>-	F>-	157	Spike
Karnataka	520	28881	1.23964	G>A, G>T	R>K, R>M	203	Nucleocapsid
		28253	0.98145	C>A	F>L	120	ORF8
		23604	0.8514	C>G	P>R	681	Spike
		28882	0.81953	G>A	R>R	203	Nucleocapsid
		28883	0.80388	G>C	G>R	204	Nucleocapsid
		26767	0.70691	T>C	I>T	82	Membrane
		28249	0.67368	A>T, A>-	D>V, D>-	119	ORF8
		29402	0.6736	G>T	D>Y	377	Nucleocapsid
		22917	0.64897	T>G	L>R	452	Spike
		25469	0.64897	C>T	S>L	26	ORF3a
Rajasthan	434	28881	0.99106	G>A, G>T	R>K, R>M	203	Nucleocapsid
		28882	0.69671	G>A	R>R	203	Nucleocapsid
		28883	0.68481	G>C	G>R	204	Nucleocapsid
		22444	0.6518	C>T	D>D	294	Spike
		25563	0.63888	G>T	Q>H	57	ORF3a
		28854	0.61881	C>T	S>L	194	Nucleocapsid
		26735	0.61318	C>T	Y>Y	71	Membrane
		18877	0.61125	C>T	L>L	280	Exon
		1947	0.59878	T>C, T>-	V>A, V>-	381	NSP2
		23604	0.53191	C>G	P>R	681	Spike
TamilNadu	423	28253	1.16453	C>A, C>T	F>L, F>F	120	ORF8
		28881	1.09273	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	0.88416	C>A, C>G	P>H, P>R	681	Spike
		28461	0.875	A>G	D>G	63	Nucleocapsid
		24410	0.85053	G>A	D>N	950	Spike
		26767	0.75549	T>C	I>T	82	Membrane
		21618	0.69881	C>G	T>R	19	Spike
		15451	0.68935	G>A	G>S	671	RdRp
		16466	0.68935	C>T	P>L	77	Helicase
		29402	0.67288	G>T	D>Y	377	Nucleocapsid
Punjab	418	11296	1.06149	T>G, T>-	F>L, F>-	108	NSP6
		28095	0.89567	A>T, A>-	K>*, K>-	68	ORF8
		28881	0.77179	G>A, G>T	R>K, R>M	203	Nucleocapsid
		28280	0.76015	G>C	D>H	3	Nucleocapsid
		23604	0.75325	C>A, C>G	P>H, P>R	681	Spike
		28281	0.74341	A>T	D>V	3	Nucleocapsid
		11291	0.69623	G>-	G>-	107	NSP6
		11295	0.69059	T>-	F>-	108	NSP6
		21765	0.68075	T>-	I>-	68	Spike
		11292	0.66789	G>-	G>-	107	NSP6
Chhattisgarh	364	28881	1.07226	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	0.94912	C>A, C>G	P>H, P>R	681	Spike
		26767	0.91621	T>C, T>G	I>T, I>S	82	Membrane
		24410	0.90113	G>A, G>-	D>N, D>-	950	Spike
		28461	0.71677	A>G	D>G	63	Nucleocapsid
		28253	0.70958	C>-	F>-	120	ORF8
		15451	0.706	G>A	G>S	671	RdRp
		27638	0.70498	T>C	V>A	82	ORF7a
		21618	0.70489	C>G	T>R	19	Spike
		29402	0.70441	G>T	D>Y	377	Nucleocapsid
Manipur	270	28253	1.02447	C>A, C>-	F>L, F>-	120	ORF8
		21987	0.87608	G>A	G>D	142	Spike
		21846	0.71297	C>T	T>I	95	Spike
		28916	0.70747	G>T	G>C	215	Nucleocapsid
		11201	0.69044	A>G	T>A	77	NSP6
		28250	0.69044	T>-	D>-	119	ORF8
		28251	0.69044	T>-	F>-	120	ORF8
		28252	0.69044	T>-	F>-	120	ORF8
		5184	0.68705	C>T	P>L	822	NSP3
		6402	0.68705	C>T	P>L	1228	NSP3
Odisha	238	28881	1.15561	G>A, G>T	R>K, R>M	203	Nucleocapsid
		28882	0.78669	G>A	R>R	203	Nucleocapsid
		28883	0.78669	G>C	G>R	204	Nucleocapsid
		23604	0.73028	C>G	P>R	681	Spike
		29402	0.58678	G>T	D>Y	377	Nucleocapsid
		8917	0.57992	C>T	F>F	121	NSP4
		26767	0.56936	T>C	I>T	82	Membrane
		22917	0.56881	T>G	L>R	452	Spike
		24410	0.56082	G>A	D>N	950	Spike
		9389	0.55771	G>A	D>N	279	NSP4
Uttar Pradesh	229	26767	1.15838	T>C, T>-	I>T, I>-	82	Membrane
		21618	1.07939	C>G, C>-	T>R, T>-	19	Spike
		27752	0.98545	C>T, C>-	T>I, T>-	120	ORF7a
		27638	0.95253	T>C, T>-	V>A, V>-	82	ORF7a
		21987	0.87393	G>A	G>D	142	Spike
		21872	0.7677	T>-	W>-	104	Spike
		27874	0.76694	C>T	T>I	40	ORF7b
		11418	0.75432	T>C	V>A	149	NSP6
		9053	0.74627	G>T	V>L	167	NSP4
		28916	0.74627	G>T	G>C	215	Nucleocapsid
Haryana	193	28881	0.99908	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	0.82165	C>A, C>G	P>H, P>R	681	Spike
		25563	0.71135	G>T	Q>H	57	ORF3a
		22444	0.70452	C>T	D>D	294	Spike
		18877	0.67876	C>T	L>L	280	Exon
		26735	0.67876	C>T	Y>Y	71	Membrane
		28854	0.67695	C>T	S>L	194	Nucleocapsid
		1947	0.63651	T>C	V>A	381	NSP2
		28882	0.62134	G>A	R>R	203	Nucleocapsid
		28883	0.62134	G>C	G>R	204	Nucleocapsid
Himachal Pradesh	184	1947	1.00628	T>C, T>-	V>A, V>-	381	NSP2
		28881	0.8515	G>A	R>K	203	Nucleocapsid
		22444	0.74302	C>T	D>D	294	Spike
		28854	0.7196	C>T	S>L	194	Nucleocapsid
		28882	0.69576	G>A	R>R	203	Nucleocapsid
		28883	0.69576	G>C	G>R	204	Nucleocapsid
		18877	0.68944	C>T	L>L	280	Exon
		25563	0.68944	G>T	Q>H	57	ORF3a
		26735	0.68735	C>T	Y>Y	71	Membrane
		26060	0.62056	C>T	T>I	223	ORF3a
Sikkim	165	28253	1.05282	C>A, C>-	F>L, F>-	120	ORF8
		28249	0.85603	A>-	D>-	119	ORF8
		28881	0.82105	G>T	R>M	203	Nucleocapsid
		21987	0.79807	G>A	G>D	142	Spike
		23604	0.7316	C>G	P>R	681	Spike
		28251	0.72301	T>-	F>-	120	ORF8
		28252	0.72301	T>-	F>-	120	ORF8
		26767	0.70343	T>C	I>T	82	Membrane
		22034	0.69379	A>-	R>-	158	Spike
		9891	0.6927	C>T	A>V	446	NSP4
Jammu and Kashmir	164	28881	1.05025	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	1.02063	C>A, C>G	P>H, P>R	681	Spike
		22444	0.81197	C>T	D>D	294	Spike
		28280	0.79577	G>C	D>H	3	Nucleocapsid
		11296	0.76392	T>-	F>-	108	NSP6
		21765	0.67275	T>-	I>-	68	Spike
		18877	0.66944	C>T	L>L	280	Exon
		25563	0.66944	G>T	Q>H	57	ORF3a
		26735	0.66383	C>T	Y>Y	71	Membrane
		28854	0.66079	C>T	S>L	194	Nucleocapsid
Puducherry	138	28253	0.97927	C>A, C>T	F>L, F>F	120	ORF8
		23604	0.76675	C>G	P>R	681	Spike
		28881	0.74111	G>A, G>T	R>K, R>M	203	Nucleocapsid
		21987	0.69501	G>A	G>D	142	Spike
		15451	0.6866	G>A	G>S	671	RdRp
		16466	0.6866	C>T	P>L	77	Helicase
		5184	0.68486	C>T	P>L	822	NSP3
		28249	0.68291	A>T	D>-	119	ORF8
		26767	0.6806	T>C	I>T	82	Membrane
		1191	0.62794	C>T	P>L	129	NSP2
Meghalaya	135	28253	0.99245	C>A, C>-	F>L, F>-	120	ORF8
		21987	0.8842	G>A	G>D	142	Spike
		28249	0.84253	A>T, A>-	D>V, D>-	119	ORF8
		22034	0.78249	A>-	R>-	158	Spike
		9891	0.68543	C>T	A>V	446	NSP4
		11418	0.68543	T>C	V>A	149	NSP6
		5184	0.6736	C>T	P>L	822	NSP3
		26767	0.66499	T>C	I>T	82	Membrane
		28250	0.66015	T>-	D>-	119	ORF8
		28251	0.66015	T>-	F>-	120	ORF8
Uttarakhand	126	28881	1.03137	G>A, G>T	R>K, R>M	203	Nucleocapsid
		1947	0.77067	T>C	V>A	381	NSP2
		23604	0.76724	C>G	P>R	681	Spike
		22444	0.73219	C>T	D>D	294	Spike
		25563	0.66976	G>T	Q>H	57	ORF3a
		18877	0.62109	C>T	L>L	280	Exon
		26735	0.62109	C>T	Y>Y	71	Membrane
		28882	0.62109	G>A	R>R	203	Nucleocapsid
		28883	0.62109	G>C	G>R	204	Nucleocapsid
		28854	0.61478	C>T	S>L	194	Nucleocapsid
Kerala	106	28881	0.80484	G>A	R>K	203	Nucleocapsid
		3037	0.69298	C>T	F>F	106	NSP3
		14408	0.69298	C>T	P>L	323	RdRp
		23403	0.69298	A>G	D>G	614	Spike
		11083	0.6759	G>T	L>F	37	NSP6
		1397	0.6299	G>A	V>I	198	NSP2
		8653	0.6299	G>T	M>I	33	NSP4
		28688	0.6229	T>C	L>L	139	Nucleocapsid
		884	0.6155	C>T	R>C	27	NSP2
		28883	0.59118	G>C	G>R	204	Nucleocapsid
Madya Pradesh	109	28881	0.98373	G>A, G>T	R>K, R>M	203	Nucleocapsid
		23604	0.8576	C>A, C>G	P>H, P>R	681	Spike
		28280	0.54646	G>C	D>H	3	Nucleocapsid
		28882	0.52208	G>A	R>R	203	Nucleocapsid
		28883	0.52208	G>C	G>R	204	Nucleocapsid
		21895	0.51534	T>C	D>D	111	Spike
		22917	0.51023	T>G	L>R	452	Spike
		25469	0.51023	C>T	S>L	26	ORF3a
		27638	0.51023	T>C	V>A	82	ORF7a
		29402	0.51023	G>T	D>Y	377	Nucleocapsid
Chandigarh	102	22444	0.76942	C>T	D>D	294	Spike
		11296	0.75797	T>G	F>L	108	NSP6
		28881	0.7328	G>A	R>K	203	Nucleocapsid
		28882	0.68648	G>A	R>R	203	Nucleocapsid
		28883	0.68648	G>C	G>R	204	Nucleocapsid
		11291	0.68145	G>-	G>-	107	NSP6
		26735	0.65645	C>T	Y>Y	71	Membrane
		18877	0.65095	C>T	L>L	280	Exon
		25563	0.65095	G>T	Q>H	57	ORF3a
		28854	0.63871	C>T	S>L	194	Nucleocapsid
Assam	101	28253	1.0999	C>A, C>-	F>L, F>-	120	ORF8
		28249	1.05588	A>T, A>-	D>V, D>-	119	ORF8
		28881	0.96252	G>A, G>T	R>K, R>M	203	Nucleocapsid
		21987	0.9478	G>A	G>D	142	Spike
		26767	0.91189	T>C	I>T	82	Membrane
		22034	0.78077	A>-	R>-	158	Spike
		24410	0.77309	G>A	D>N	950	Spike
		23604	0.76149	C>G	P>R	681	Spike
		15451	0.73936	G>A	G>S	671	RdRp
		21618	0.73936	C>G	T>R	19	Spike

Fig 3

Month wise (temporal) entropy of Indian SARS-CoV-2 genomes to show the changes in non-synonymous hotspot mutations.

Fig 4

State wise (spatial) entropy of Indian SARS-CoV-2 genomes to show the changes in non-synonymous hotspot mutations.

Phylogenetic analysis of 17271 Indian SARS-CoV-2 Genomes where (a) and (b) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for temporal analysis, (c) and (d) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for spatial analysis, (e) and (f) are the geographical distribution in normal and zoomed views and (g) shows the value of entropy for the change in nucleotide. Once the top 10 temporal and spatial hotspot mutations are identified, thereafter, 62 and 65 unique hotspot mutations are identified respectively for each category from 190 and 250 mutation points. For temporal analysis, 62 unique mutations result in 50 non-synonymous deletions and substitutions with corresponding 8 and 48 amino acid changes while for spatial analysis 57 non-synonymous deletions and substitutions are identified from 65 unique mutations with corresponding 16 and 47 amino acid changes. These non-synonymous mutations along with their amino acid changes in protein are visualised in Fig 5. Fig 6(a) depicts the common and unique nucleotide changes for all hotspot mutations for temporal and spatial analysis in the form of Venn diagram while Fig 6(b) shows the common and unique nucleotide changes for non-synonymous hotspot mutations and the common and unique amino acid changes in protein for such analysis are visualised in Fig 6(c). Fig 6(a) shows that there are 18 and 21 unique hotspot mutations considering temporal and spatial analysis while the number of such common mutations are 44. Fig 6(b) depicts 12 and 19 unique non-synonymous hotspot mutations while 38 changes are common in both. Finally, Fig 6(c) shows that there are unique 14 and 21 amino acid changes for temporal and spatial analysis with 42 changes common in both. All the amino acid changes in the protein for the non-synonymous hotspot mutations for temporal analysis are highlighted in Fig 7 while such mutations for the spatial analysis are shown in Fig 8. Please note that though 48 and 47 substitutions corresponding to temporal and spatial analysis are reported in Figs 5 and 6, only 47 and 46 such changes are highlighted in Figs 7 and 8 respectively. This is because the structure for ORF7b is not found in the literature and thus the corresponding hotspot mutation in the structure of ORF7b cannot be highlighted in either of the cases.

Fig 5

Illustration of amino acid changes in SARS-CoV-2 proteins for the temporal and spatial non-synonymous hotspot mutations.

Fig 6

Venn diagrams of Indian SARS-CoV-2 Genomes to represent common (a) Nucleotide (b) Non-synonymous mutations and (c) Amino acid changes for the hotspot mutations.

Fig 7

Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on temporal analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Exon (g) Spike (h) ORF3a (i) Membrane (j) ORF8 (k) Nucleocapsid.

Fig 8

Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on spatial analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Helicase (g) Spike (h) ORF3a (i) Membrane (j) ORF7a (k) ORF8 (l) Nucleocapsid.

Venn diagrams of Indian SARS-CoV-2 Genomes to represent common (a) Nucleotide (b) Non-synonymous mutations and (c) Amino acid changes for the hotspot mutations. Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on temporal analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Exon (g) Spike (h) ORF3a (i) Membrane (j) ORF8 (k) Nucleocapsid. Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on spatial analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Helicase (g) Spike (h) ORF3a (i) Membrane (j) ORF7a (k) ORF8 (l) Nucleocapsid.

Discussion

India has gone through the second wave of the SARS-CoV-2 pandemic and according to experts a third wave is inevitable as the virus is evolving and new strains are being identified. Thus, the study of the evolving virus strains is very crucial in the current pandemic scenario, In this regard, we have performed temporal and spatial analysis of 17271 SARS-CoV-2 sequences which has resulted in the identification of hotspot mutation points as SNPs in each category. Changes in protein translations which can lead to functional instability in proteins are often attributed to structural alterations in amino acid residues. In this regard, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 (Polymorphism Phenotyping) [21] while I-Mutant 2.0 [22] evaluates their structural stability. Such results for temporal and spatial analysis are reported in Tables 3 and 4 respectively. The tools used for such prediction are PolyPhen-2 and I-Mutant 2.0. The prediction of Polyphen-2 http://genetics.bwh.harvard.edu/pph2/ works with sequence, structural and phylogenetic information of a SNP while I-Mutant 2.0 https://folding.biofold.org/i-mutant/i-mutant2.0.html uses support vector machine (SVM) for the automatic prediction of protein stability changes upon single point mutations. PolyPhen-2 is used to find the damaging non-synonymous hotspot mutations while protein stabilities are determined by I-Mutant 2.0. The score generated by Polyphen-2 lies between the range of 0 to 1. A score close to 1 denotes that the mutations can be more confidently considered to be damaging. Considering the prediction of Polyphen-2, it can be seen from Table 3 that out of the 56 unique amino acid changes, 27 changes are damaging for temporal analysis while for spatial analysis as can be seen from Table 4, out of 63 unique amino acid changes, 24 changes are damaging. It is important to note that in case of protein, damaging mostly defines instability. Generally, this is used for human proteins. As a consequence, if the human protein is damaging in nature because of mutations, then the human protein-protein interactions may occur with high or low binding affinity. Now in case of virus, similar consequences may happen which means if the virus protein is damaged because of mutations, it may interact with human proteins with similar binding affinity. As a result, the virus may acquire characteristics like transmissibility, escaping antibodies [23, 24] etc.

Table 3

Characteristics of non-synonymous hotspot mutations for temporal analysis.

Change in Nucleotide	Change in Amino Acid	Mapped with Coding Regions	PolyPhen-2		I-Mutant 2.0
Change in Nucleotide	Change in Amino Acid	Mapped with Coding Regions	Prediction	Score	Stability	DDG
G1397A	V198I	NSP2	Benign	0.006	Increase	0.18
T1947C	V381A	NSP2	Benign	0.009	Decrease	-1.64
C3267T	T183I	NSP3	NG	NG	Decrease	-0.1
G4181T	A488S	NSP3	Benign	0.017	Decrease	-0.89
C5184T	P822L	NSP3	Benign	0.011	Decrease	-0.54
C5700A	A994D	NSP3	Possibly Damaging	0.935	Decrease	-0.78
C6312A	T1198K	NSP3	Probably Damaging	0.998	Decrease	-1.37
C6402T	P1228L	NSP3	Benign	0.001	Decrease	-0.46
C7124T	P1469S	NSP3	Probably Damaging	0.967	Decrease	-2.17
G9053T	V167L	NSP4	Benign	0.406	Decrease	-2.14
G9389A	D279N	NSP4	Probably Damaging	0.999	Decrease	-1.26
C9891T	A446V	NSP4	Probably Damaging	0.999	Increase	0.64
C10029T	T492I	NSP4	Probably Damaging	0.973	Decrease	-0.08
G11083T	L37F	NSP6	Benign	0.027	Decrease	-0.05
A11201G	T77A	NSP6	Possibly Damaging	0.577	Decrease	-0.7
T11296G	F108L	NSP6	Benign	0.001	Decrease	-3.31
T11418C	V149A	NSP6	Possibly Damaging	0.865	Decrease	-3.43
C13730T	A97V	RdRp	Probably Damaging	0.99	Decrease	-0.53
C14408T	P323L	RdRp	Benign	0.018	Decrease	-0.8
C19220T	A394V	Exon	Benign	0.005	Decrease	-0.17
C21846T	T95I	Spike	Probably Damaging	0.999	Decrease	-1.8
G21987A	G142D	Spike	Benign	0.061	Decrease	-1.17
G22022A	E154K	Spike	NG	NG	Decrease	-1.4
A22034G	R158G	Spike	NG	NG	Decrease	-2.63
G23012C	E484Q	Spike	Possibly Damaging	0.881	Decrease	-0.48
G23012A	E484K	Spike	Possibly Damaging	0.601	Decrease	-0.85
A23403G	D614G	Spike	Benign	0.004	Decrease	-1.94
C23604A	P681H	Spike	NG	NG	Decrease	-0.92
C23604G	P681R	Spike	NG	NG	Decrease	-0.79
G24410A	D950N	Spike	Benign	0.34	Increase	0.15
A24775T	Q1071H	Spike	Probably Damaging	0.997	Decrease	-1.19
C25469T	S26L	ORF3a	Benign	0.017	Increase	0.92
G25563T	Q57H	ORF3a	Probably Damaging	0.983	Decrease	-1.12
C26060T	T223I	ORF3a	Probably Damaging	0.998	Decrease	-0.07
T26767G	I82S	Membrane	Possibly Damaging	0.951	Decrease	-2
T26767C	I82T	Membrane	Possibly Damaging	0.889	Decrease	-2.41
C27874T	T40I	ORF7b	NG	NG	Decrease	-0.22
A28249T	D119V	ORF8	Possibly Damaging	0.541	Decrease	-0.63
C28253A	F120L	ORF8	Probably Damaging	0.988	Decrease	-2.95
G28280T	D3Y	Nucleocapsid	Probably Damaging	1	Increase	0.22
G28280C	D3H	Nucleocapsid	Probably Damaging	1	Increase	0.34
C28311T	P13L	Nucleocapsid	Probably Damaging	1	Increase	0.11
C28854T	S194L	Nucleocapsid	Probably Damaging	0.994	Increase	0.45
G28881A	R203K	Nucleocapsid	Probably Damaging	0.969	Decrease	-2.26
G28881T	R203M	Nucleocapsid	Probably Damaging	0.998	Decrease	-1.52
G28883C	G204R	Nucleocapsid	Probably Damaging	1	No Change	0
G28916T	G215C	Nucleocapsid	Probably Damaging	1	Decrease	-0.49
G29402T	D377Y	Nucleocapsid	Probably Damaging	1	Increase	0.51

Table 4

Characteristics of non-synonymous hotspot mutations for spatial analysis.

Change in Nucleotide	Change in Amino Acid	Mapped with Coding Regions	PolyPhen-2		I-Mutant 2.0
Change in Nucleotide	Change in Amino Acid	Mapped with Coding Regions	Prediction	Score	Stability	DDG
C884T	R27C	NSP2	Probably Damaging	1	Decrease	-0.35
C1191T	P129L	NSP2	Possibly Damaging	0.924	Decrease	-0.53
G1397A	V198I	NSP2	Benign	0.006	Increase	0.18
T1947C	V381A	NSP2	Benign	0.009	Decrease	-1.64
C5184T	P822L	NSP3	Benign	0.011	Decrease	-0.54
C6402T	P1228L	NSP3	Benign	0.001	Decrease	-0.46
G8653T	M33I	NSP4	Benign	0.002	Decrease	-0.73
G9053T	V167L	NSP4	Benign	0.406	Decrease	-2.14
G9389A	D279N	NSP4	Probably Damaging	0.999	Decrease	-1.26
C9891T	A446V	NSP4	Probably Damaging	0.999	Increase	0.64
G11083T	L37F	NSP6	Benign	0.027	Decrease	-0.05
A11201G	T77A	NSP6	Possibly Damaging	0.577	Decrease	-0.7
T11296G	F108L	NSP6	Benign	0.001	Decrease	-3.31
T11418C	V149A	NSP6	Possibly Damaging	0.865	Decrease	-3.43
C14408T	P323L	RdRp	Benign	0.018	Decrease	-0.8
G15451A	G671S	RdRp	Probably Damaging	1	Decrease	-0.29
C16466T	P77L	Helicase	Probably Damaging	1	Decrease	-1.03
C21618G	T19R	Spike	Benign	0.007	Decrease	-0.12
C21846T	T95I	Spike	Probably Damaging	0.999	Decrease	-1.8
G21987A	G142D	Spike	Benign	0.061	Decrease	-1.17
A22034G	R158G	Spike	NG	NG	Decrease	-2.63
T22917G	L452R	Spike	Benign	0.017	Decrease	-1.4
G23012C	E484Q	Spike	Possibly Damaging	0.881	Decrease	-0.48
A23403G	D614G	Spike	Benign	0.004	Decrease	-1.94
C23604A	P681H	Spike	NG	NG	Decrease	-0.92
C23604G	P681R	Spike	NG	NG	Decrease	-0.79
G24410A	D950N	Spike	Benign	0.34	Increase	0.15
C25469T	S26L	ORF3a	Benign	0.017	Increase	0.92
G25563T	Q57H	ORF3a	Probably Damaging	0.983	Decrease	-1.12
C26060T	T223I	ORF3a	Probably Damaging	0.998	Decrease	-0.07
T26767G	I82S	Membrane	Possibly Damaging	0.951	Decrease	-2
T26767C	I82T	Membrane	Possibly Damaging	0.889	Decrease	-2.41
T27638C	V82A	ORF7a	Possibly Damaging	0.732	Decrease	-2.18
C27752T	T120I	ORF7a	Possibly Damaging	0.915	Decrease	-0.26
C27874T	T40I	ORF7b	NG	NG	Decrease	-0.22
A28249T	D119V	ORF8	Possibly Damaging	0.541	Decrease	-0.63
C28253A	F120L	ORF8	Probably Damaging	0.988	Decrease	-2.95
G28280C	D3H	Nucleocapsid	Probably Damaging	1	Increase	0.34
A28281T	D3V	Nucleocapsid	Probably Damaging	1	Decrease	-0.22
A28461G	D63G	Nucleocapsid	Benign	0	Decrease	-0.57
C28854T	S194L	Nucleocapsid	Probably Damaging	0.994	Increase	0.45
G28881A	R203K	Nucleocapsid	Probably Damaging	0.969	Decrease	-2.26
G28881T	R203M	Nucleocapsid	Probably Damaging	0.998	Decrease	-1.52
G28883C	G204R	Nucleocapsid	Probably Damaging	1	No Change	0
G28916T	G215C	Nucleocapsid	Probably Damaging	1	Decrease	-0.49
G29402T	D377Y	Nucleocapsid	Probably Damaging	1	Increase	0.51

Stability is yet another parameter which is crucial to judge the functional and structural activity of a protein. Protein stability dictates the conformational structure of the protein, thereby determining its function. Any change in protein stability may cause misfolding, degradation or aberrant conglomeration of proteins. In I-Mutant 2.0 the changes in the protein stability is predicted using free energy change values (DDG). A zero or a negative value of DDG indicates that the stability of a protein is decreasing. The result from I-mutant 2.0 infers that of the 27 and 24 unique deleterious or damaging changes for temporal and spatial analysis, 21 changes for both decrease the stability of the protein structures. The common mutations in both the categories are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. It is to be noted that, apart from these mutations, other important mutations as recognised by virologists in the multiple variants of concern like Alpha, Beta and Delta are L452R, E484K, D614G, P681H and P681R in Spike. Furthermore, the entropy change of the hotspot mutations for the different variants like Alpha, Beta and Delta are shown in Fig 9(a)–9(c) respectively. For example, hotspot mutation E484K in Alpha variant in Fig 9(a) which was dominant in the months of February-April 2021 has declined over the next few months. Also, D614G which is a common hotspot mutation in all the variants has also declined over time. Moreover, mutations like L452R and P681R which are part of the Delta variant are also two of the hotspot mutations as identified by the analysis. It is to be noted that Delta variant was responsible for the catastrophic 2nd wave in India. Fig 10(a) and 10(b) show the plot of confirmed and deceased cases in India till 31st October 2021. For example, western part of India has a very high number of confirmed and deceased cases which can be attributed to the Delta variant. As is shown in Table 2, Maharashtra which lies in the western part of India has both of the aforementioned mutations identified as hotspots. All these figures are considered from https://www.covid19india.org/.

Fig 9

Month wise evolution of (a) Alpha (B.1.1.7) (b) Beta (B.1.351) and (c) Delta (B.1.617.2) variants for non-synonymous hotspot mutations.

Fig 10

Illustration of (a) Confirmed and (b) Deceased cases of India to show the effects of SARS-CoV-2 in the different regions of the country.

Month wise evolution of (a) Alpha (B.1.1.7) (b) Beta (B.1.351) and (c) Delta (B.1.617.2) variants for non-synonymous hotspot mutations. Illustration of (a) Confirmed and (b) Deceased cases of India to show the effects of SARS-CoV-2 in the different regions of the country.

Conclusion

As the second wave of COVID pandemic had hit India really hard, understanding the evolution of SARS-CoV-2 virus is most crucial in this scenario. In this regard, temporal (month-wise) and spatial (state-wise) analysis are carried out for 17271 aligned Indian sequences to identify top 10 hotspot mutation points in the coding regions based on entropy for each month as well as for each state. Additionally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. As a result, for both temporal and spatial analysis, the common damaging and unstable mutations are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. Also, investigation of the effects of the characteristics of the hotspot mutations of SARS-CoV-2 on human hosts can be conducted with the help of virologists. The authors are working in this direction as well.

This file contains 4 supplementary tables named as S1-S4.

(PDF) Click here for additional data file. 27 Aug 2021 PONE-D-21-17506 Phylogenetic Analysis of 5734 Indian SARS-CoV-2 Genomes to Identify Temporal and Spatial Hotspot Mutations PLOS ONE Dear Dr. Ghosh, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Although the manuscript is well-prepared and timely, some important concerns need to be addressed. The authors should account for writing the manuscript clearly and provide appropriate discussion. Please submit your revised manuscript by October 26, 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Arunachalam Ramaiah, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No Reviewer #3: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: SARS-CoV-2 is still spreading around the world very rapidly with a high infection rate that could become a global pandemic. In order for researchers, including us, to design a more effective vaccine, we are analyzing the evolving virus strains. You have performed multiple sequence alignments of 5734 sequences of SARS-CoV-2 using MAFFT and phylogenetic analysis using Nextstrain. Then, we identified several SNPs and point mutations. You identified the top 10 hotspot mutation points in the coding region. As a result, 130 hotspot mutations were identified in the temporal analysis and 250 in the spatial analysis. Subsequently, 32 temporally unique and 63 spatially unique hotspot mutations were identified, respectively. In addition, you have identified 21 point mutations in the time series analysis. For example, A97V in RdRp, L126F in NSP16, Q57H in ORF3a, and R203K, R203M, and G204R in Nucleocapsid. You also reaffirmed that the mutations of concern are E484Q and E484K in Spike. minor issues I could not read Tables 1 through 3 because the text was too small. Therefore, you should replace them with complete and readable tables in the text. Please move this table to the supplement. I believe that your paper needs to be released to the world as soon as possible. Reviewer #2: This paper analyzes GISAID data to identify mutational hotspots within SARS-CoV-2 genomes sequenced in India. This analysis identifies several mutations that appear to change over time or vary between regions within India. The authors claim that these locations are mutational hotspots but do not provide compelling evidence to support this. Specific comments: 1. Analysis of mutational hotspots is confounded by the competition between variants. Specifically, when one variant displaces another in a state or over time this will appear to show an enrichment for mutations associated with this variant even if the location of these mutations is not functionally important. At least for the positions identified as mutation “hotspots”, the authors should test whether they are changing in frequency within specific lineages. As an example, S:E484Q within the Delta lineage was presumed to be particularly problematic since it resembles the S:E484K mutation found in the Beta variant but it has since declined in frequency within the Delta variant. 2. Please avoid the use of the term “double mutant” as it is scientifically misleading. Please refer to the strains by either their PANGO lineage (e.g. B.1.617.2) or by their WHO designation (e.g. Delta variant). 3. Figure captions need more description. 4. Variants refers to the combination of many mutations rather than any specific mutation. E.g. Page 2, line 93 is incorrect since E484Q is not a variant. Please fix terminology throughout. 5. The protein structure model shown for nucleocapsid indicates a mostly unstructured architecture, but this is misleading. The N-terminal and C-terminal domain structures have been solved by multiple research groups and are known to be well ordered. Please update with a revised structure. 6. The use of PROVEAN and similar tools to detect “damaging” mutations is not explained well and is potentially misleading. These tools were designed to detect the impact of mutations of human proteins rather than viral proteins with the assumption that major changes to human proteins are likely to be deleterious. This cannot be assumed for emerging viruses because they are under a rapidly changing selection conditions and mutations identified by PROVEAN might be beneficial for the virus to avoid immunity or even to enhance function. This needs to be discussed more clearly. It is also unclear how this helps to identify a location as a potential mutation hotspot. Reviewer #3: Comments to the Author: The manuscript give a meaningful view point to the analysis of evolving virus strains of SARS-CoV-2, but the paper is not clearly written. Major points: The paper from introduction to discussion should be simplified; it’s too long and too verbose. The manuscript did not give a meaningful view about what authors do this work and what they get conclusion. I recommend author to revise the manuscript as clear as possible. Abstract: I strongly recommend author to re-write the abstract and give a clear abstract. Introduction: It’s too verbose. It present too much description to others research. The author may likely just put others researches together rather than summary and conclude their studies. Line。。。： 300K should be replaced using a formal description, e.g 300,000. The problem is through the entire manuscript. Line 10: The prevalent variant in South African is B.1.351, so “501Y. V2” should change to B.1.351. Line 11: Japanese should be Japan. Brazilian should be Brazil. Line 11: E484K is not a linage, please clarify update the mainly variants name. Line 24: In [8] and In [9] should be replace as more reasonable description, It can be change by author name or change to another description. This problem is present through the whole paper. Line 23-25: what this sentence relationship with previous viewpoint? Line 26-31: The author would like to Line 31-32: what this sentence mean “thereby indicating potential impacts on the ongoing development of various COVID-19 diagnosis and cure”? and what this sentence relationship with these variants? Line 36-37: previous have mentioned that “they have found Nucleocapsid to have the highest mutational changes in frequency”, the author can summary them together rather than describe it again and again. Line 15-72: All citation view were list, please summary and clarify the paragraph. Line 75: the method “multiple alignment using fast fourier transform (MAFFT)” could move to method rather introduction. Line 77-94: The sentence about method should be move to “Material and Methods” part, the sentence about the result details should be move to Results part. The introduction just retain the summary of resuts and meaning of this paper. Line 105: The citation is a lab? And the reference list 4 was not contain an author named Zhang. Line 115: “MAFFT which is a progressive alignment technique is used as the multiple sequence alignment (MSA) tool”, please delete “which is a progressive alignment technique”. Please delete the “multiple sequence alignment. the abbreviation “MSA” could be explain for the first time, and then author could use the “MSA” only. Method : It is too verbose, author do not need to explain advantage of every software or tools used. They have no relationship with your paper. Just give clear method. Results: Result was not consists of describe what is figure and table, rather give a results and explain using Figure and Table. Please re-write the results clearly. Line 184-186: what sentence “only coding regions are considered for identification of hotspot mutations as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations”? Disccussion: Line 21-26: The sentence “multiple sequence alignment of 5734 genomic sequences are carried out using MAFFT” . The author just needs to discuss the results rather than mention the method here. Line 217-218: delete sentence “the details of which are already discussed in the Result Section”. Conclusion: The author does not need to describe the method and results again, just give the summary and discovery of this paper. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 8 Nov 2021 Reviewer #1: SARS-CoV-2 is still spreading around the world very rapidly with a high infection rate that could become a global pandemic. In order for researchers, including us, to design a more effective vaccine, we are analyzing the evolving virus strains. You have performed multiple sequence alignments of 5734 sequences of SARS-CoV-2 using MAFFT and phylogenetic analysis using Nextstrain. Then, we identified several SNPs and point mutations. You identified the top 10 hotspot mutation points in the coding region. As a result, 130 hotspot mutations were identified in the temporal analysis and 250 in the spatial analysis. Subsequently, 32 temporally unique and 63 spatially unique hotspot mutations were identified, respectively. In addition, you have identified 21 point mutations in the time series analysis. For example, A97V in RdRp, L126F in NSP16, Q57H in ORF3a, and R203K, R203M, and G204R in Nucleocapsid. You also reaffirmed that the mutations of concern are E484Q and E484K in Spike. minor issues 1. I could not read Tables 1 through 3 because the text was too small. Therefore, you should replace them with complete and readable tables in the text. Please move this table to the supplement. Answer: We would like to apologise for the inconvenience caused. According to the suggestion, the tables have been modified in the revised manuscript. However, since these are very important tables, we have kept this in the main paper for the revised mauscript. I believe that your paper needs to be released to the world as soon as possible. The authors would like to thank the reviewer for the very kind comments. Reviewer #2: This paper analyzes GISAID data to identify mutational hotspots within SARS-CoV-2 genomes sequenced in India. This analysis identifies several mutations that appear to change over time or vary between regions within India. The authors claim that these locations are mutational hotspots but do not provide compelling evidence to support this. Answer: Mutations like L452R and P681R which are part of the Delta variant are also two of the hotspot mutations as identified by the analysis. It is to be noted that Delta variant was responsible for the catastrophic 2nd wave in India. Figures 10 (a) and (b) in the revised manuscript show the plot of confirmed and deceased cases in India till 31st October 2021. As can be seen from both the figures, western part of India has a very high number of confirmed and deceased cases which can be attributed to the Delta variant. As is shown in Table 2, Maharashtra which lies in the western part of India has both of the aforementioned mutations identified as hotspots. Also, some mutational hotspots are part of the Alpha, Beta and Delta variants as shown in Figure 9 in the revised manuscript, thereby confirming that they are indeed qualified to be hotspot mutations. These facts are elaborately discussed in the revised manuscript as well. Specific comments: 1. Analysis of mutational hotspots is confounded by the competition between variants. Specifically, when one variant displaces another in a state or over time this will appear to show an enrichment for mutations associated with this variant even if the location of these mutations is not functionally important. At least for the positions identified as mutation “hotspots”, the authors should test whether they are changing in frequency within specific lineages. As an example, S:E484Q within the Delta lineage was presumed to be particularly problematic since it resembles the S:E484K mutation found in the Beta variant but it has since declined in frequency within the Delta variant. Answer: According to the suggestion of the reviewer, the entropy change in hotspot mutations in variants like Alpha, Beta and Delta are reported in Figure 9 in the revised manuscript in order to illustrate the point as mentioned in the comment. 2. Please avoid the use of the term “double mutant” as it is scientifically misleading. Please refer to the strains by either their PANGO lineage (e.g. B.1.617.2) or by their WHO designation (e.g. Delta variant). Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. 3. Figure captions need more description. Answer: According to the suggestion of the reviewer, more descriptions have been added to the figures in the revised manuscript. 4. Variants refers to the combination of many mutations rather than any specific mutation. E.g. Page 2, line 93 is incorrect since E484Q is not a variant. Please fix terminology throughout. Answer: According to the suggestion of the reviewer, the terminology has been fixed in the revised manuscript. 5. The protein structure model shown for nucleocapsid indicates a mostly unstructured architecture, but this is misleading. The N-terminal and C-terminal domain structures have been solved by multiple research groups and are known to be well ordered. Please update with a revised structure. Answer: According to the suggestion of the reviewer, the structure of Nucleocapsid has been updated in the revised manuscript and the N-terminal and C-terminal have been confirmed using the PDBs 6M3M (range:50-174) and 6YUN (range:249-364) respectively. The following is the structure that has been used in the revised manuscript. 6. The use of PROVEAN and similar tools to detect “damaging” mutations is not explained well and is potentially misleading. These tools were designed to detect the impact of mutations of human proteins rather than viral proteins with the assumption that major changes to human proteins are likely to be deleterious. This cannot be assumed for emerging viruses because they are under a rapidly changing selection conditions and mutations identified by PROVEAN might be beneficial for the virus to avoid immunity or even to enhance function. This needs to be discussed more clearly. It is also unclear how this helps to identify a location as a potential mutation hotspot. Answer: It is important to note that in case of protein, damaging mostly defines instability. Generally, this is used for human proteins. As a consequence, if the human protein is damaging in nature because of mutations, then the human protein-protein interactions may occur with high or low binding affinity. Now in case of virus, similar consequences may happen which means if the virus protein is damaged because of mutations, it may interact with human proteins with similar binding affinity. As a result, the virus may acquire characteristics like transmissibility, escaping antibodies, etc. This is now clearly mentioned in the revised manuscript as well in order to avoid the confusion pertaining to the meaning of ‘damaging’ as concluded by PROVEAN and Polyphen-2. Moreover, we agree that the effects of these characteristics on human hosts are a matter of further investigations. Therefore, to draw a clear biological conclusion from the point of view of host, help of virologists is needed and as a future scope we are working in that direction. This is mentioned in the revised manuscript in the conclusion section. Please note that hotspot mutations are characterized by PROVEAN, Polyphen-2 and I-mutant 2.0 after their locations have been identified by Nextstrain. Therefore, there is no relation between locations and the aforementioned tools. It is also to be noted that PROVEAN and Polyphen-2 are developed on more or less same background. Thus, their results are analogous to each other. Therefore, to avoid redundancy only the results from well-known tool Polyphen-2 are kept in the revised manuscript. Reviewer #3: Comments to the Author: The manuscript give a meaningful view point to the analysis of evolving virus strains of SARS-CoV-2, but the paper is not clearly written. Major points: The paper from introduction to discussion should be simplified; it’s too long and too verbose. The manuscript did not give a meaningful view about what authors do this work and what they get conclusion. I recommend author to revise the manuscript as clear as possible. 1. Abstract: I strongly recommend author to re-write the abstract and give a clear abstract. Answer: According to the suggestion of the reviewer, the abstract has been rewritten in the revised manuscript. 2. Introduction: It’s too verbose. It present too much description to others research. The author may likely just put others researches together rather than summary and conclude their studies. Answer: According to the suggestion of the reviewer, the Introduction has been modified in the revised manuscript. Line。。。： 300K should be replaced using a formal description, e.g 300,000. The problem is through the entire manuscript. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 10: The prevalent variant in South African is B.1.351, so “501Y. V2” should change to B.1.351. Answer: According to the suggestion of the reviewer, the change has been made in the revised manuscript. Line 11: Japanese should be Japan. Brazilian should be Brazil. Answer: It is to be noted that in the revised manuscript, the variants of concern with the corresponding W.H.O declared naming conventions have been provided. Line 11: E484K is not a linage, please clarify update the mainly variants name. Answer: According to the suggestion of the reviewer, the change has been made in the revised manuscript. Line 24: In [8] and In [9] should be replace as more reasonable description, It can be change by author name or change to another description. This problem is present through the whole paper. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 23-25: what this sentence relationship with previous viewpoint? Answer: This is to be noted that this sentence was written inadvertently. We deeply apologise for this. The required change has been done in the revised manuscript. Line 26-31: The author would like to Answer: It is not very clear which sentence the reviewer is mentioning. Line 31-32: what this sentence mean “thereby indicating potential impacts on the ongoing development of various COVID-19 diagnosis and cure”? and what this sentence relationship with these variants? Answer: According to the suggestion of the reviewer, this sentence has been modified in the revised manuscript. The sentence indicates that Nucleocapsid cannot be a possible diagnostic target as it exhibits quite high number of mutations. Thus, this may undermine the ongoing researches targeting Nucleocapsid for COVID-19 diagnosis, vaccines, antibody and small-molecular drugs. Line 36-37: previous have mentioned that “they have found Nucleocapsid to have the highest mutational changes in frequency”, the author can summary them together rather than describe it again and again. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 15-72: All citation view were list, please summary and clarify the paragraph. Answer: According to the suggestion of the reviewer, the change has been made in the revised manuscript. Line 75: the method “multiple alignment using fast fourier transform (MAFFT)” could move to method rather introduction. Answer: It is to be noted that the only the method name has been mentioned in the Introduction to give the readers an overview. Line 77-94: The sentence about method should be move to “Material and Methods” part, the sentence about the result details should be move to Results part. The introduction just retain the summary of results and meaning of this paper. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Line 105: The citation is a lab? And the reference list 4 was not contain an author named Zhang. Answer: It is to be noted that Zhang Lab is not a citation but a footnote to highlight the website from which the SARS-CoV-2 protein PDBs are collected. That is why the reference [4] (which is cited in the last line of the 1st paragraph of the Introduction) does not contain an author named Zhang. Line 115: “MAFFT which is a progressive alignment technique is used as the multiple sequence alignment (MSA) tool”, please delete “which is a progressive alignment technique”. Please delete the “multiple sequence alignment. the abbreviation “MSA” could be explain for the first time, and then author could use the “MSA” only. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Method : It is too verbose, author do not need to explain advantage of every software or tools used. They have no relationship with your paper. Just give clear method. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Results: Result was not consists of describe what is figure and table, rather give a results and explain using Figure and Table. Please re-write the results clearly. Answer: According to the suggestion of the reviewer, the changes have been made to the best of abilities and readers point of view in the revised manuscript. Line 184-186: what sentence “only coding regions are considered for identification of hotspot mutations as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations”? Answer: Non-coding regions do not produce any protein to bind with human proteins. Thus, they are not considered for hotpot mutations as we have confined our research to only the coding regions. Moreover, in non-coding regions, the entropy value is high for almost all mutation points even if they may not be very important mutation points for SARS-CoV-2. Thus, instead of considering the mutation points in both coding and non-coding regions, we have only considered the mutation points in coding regions so that they exhibit the true characteristics of hotspot mutations. Disccussion: Line 21-26: The sentence “multiple sequence alignment of 5734 genomic sequences are carried out using MAFFT” . The author just needs to discuss the results rather than mention the method here. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. It is to be noted that instead of 5734 Indian SARS-CoV-2 genomes, the revised manuscript contains the analysis on 17271 such genomes as the number of genomes have updated over time. We have conducted all the experiments all over again to provide the updated results in the revised manuscript. Line 217-218: delete sentence “the details of which are already discussed in the Result Section”. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Conclusion: The author does not need to describe the method and results again, just give the summary and discovery of this paper. Answer: According to the suggestion of the reviewer, the changes have been made in the revised manuscript. Submitted filename: Response to Reviewers.docx Click here for additional data file. 7 Mar 2022 Phylogenetic Analysis of 17271 Indian SARS-CoV-2 Genomes to Identify Temporal and Spatial Hotspot Mutations PONE-D-21-17506R1 Dear Dr. Ghosh, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Chandrabose Selvaraj, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #3: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #3: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #3: Yes Reviewer #4: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #3: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #3: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #3: Author has addressed the issues that I mentioned.I believe that this paper needs to be released to the world as soon as possible. Reviewer #4: The authors answered all questions from reviewers and made all changes to the manuscript, which can be accepted in this format. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #3: No Reviewer #4: Yes: Fabrício Souza Campos 18 Mar 2022 PONE-D-21-17506R1 Phylogenetic Analysis of 17271 Indian SARS-CoV-2 Genomes to Identify Temporal and Spatial Hotspot Mutations Dear Dr. Saha: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Chandrabose Selvaraj Academic Editor PLOS ONE

24 in total

1. Could mutations of SARS-CoV-2 suppress diagnostic detection?

Authors: Carl A Ascoli
Journal: Nat Biotechnol Date: 2021-03 Impact factor: 54.908

2. Nextstrain: real-time tracking of pathogen evolution.

Authors: James Hadfield; Colin Megill; Sidney M Bell; John Huddleston; Barney Potter; Charlton Callender; Pavel Sagulenko; Trevor Bedford; Richard A Neher
Journal: Bioinformatics Date: 2018-12-01 Impact factor: 6.931

3. Analysis of Indian SARS-CoV-2 Genomes Reveals Prevalence of D614G Mutation in Spike Protein Predicting an Increase in Interaction With TMPRSS2 and Virus Infectivity.

Authors: Sunil Raghav; Arup Ghosh; Jyotirmayee Turuk; Sugandh Kumar; Atimukta Jha; Swati Madhulika; Manasi Priyadarshini; Viplov K Biswas; P Sushree Shyamli; Bharati Singh; Neha Singh; Deepika Singh; Ankita Datey; Kiran Avula; Shuchi Smita; Jyotsnamayee Sabat; Debdutta Bhattacharya; Jaya Singh Kshatri; Dileep Vasudevan; Amol Suryawanshi; Rupesh Dash; Shantibhushan Senapati; Tushar K Beuria; Rajeeb Swain; Soma Chattopadhyay; Gulam Hussain Syed; Anshuman Dixit; Punit Prasad; Sanghamitra Pati; Ajay Parida
Journal: Front Microbiol Date: 2020-11-23 Impact factor: 5.640

4. South Africa responds to new SARS-CoV-2 variant.

Authors: Munyaradzi Makoni
Journal: Lancet Date: 2021-01-23 Impact factor: 79.321

5. CovMT: an interactive SARS-CoV-2 mutation tracker, with a focus on critical variants.

Authors: Intikhab Alam; Aleksandar Radovanovic; Roberto Incitti; Allan A Kamau; Mohammed Alarawi; Esam I Azhar; Takashi Gojobori
Journal: Lancet Infect Dis Date: 2021-02-08 Impact factor: 25.071

6. Comprehensive analysis of genomic diversity of SARS-CoV-2 in different geographic regions of India: an endeavour to classify Indian SARS-CoV-2 strains on the basis of co-existing mutations.

Authors: Rakesh Sarkar; Suvrotoa Mitra; Pritam Chandra; Priyanka Saha; Anindita Banerjee; Shanta Dutta; Mamta Chawla-Sarkar
Journal: Arch Virol Date: 2021-01-19 Impact factor: 2.574

10. Unification and extensive diversification of M/Orf3-related ion channel proteins in coronaviruses and other nidoviruses.

Authors: Yongjun Tan; Theresa Schneider; Prakash K Shukla; Mahesh B Chandrasekharan; L Aravind; Dapeng Zhang
Journal: Virus Evol Date: 2021-02-16