| Literature DB >> 33549714 |
Rezwanuzzaman Laskar1, Safdar Ali2.
Abstract
Mutational status of SARS-CoV-2 genomes from India along with their impact on proteins was ascertained through multiple tools including MEGA, Genome Detective, SIFT, PROVEAN and ws-SNPs&GO. Excluding gaps and ambiguous sequences, 493 variable sites (152 parsimony informative and 341 singleton) were observed. NSP3 had the highest incidence of 101 sites followed by S protein (74), NSP12b (43) and ORF3a (31). Average mutations per sample for males and females was 2.56 and 2.88 respectively. Non-uniform geographical distribution of mutations suggests that sequences in some regions are mutating faster than others. There were 281 mutations (198 Neutral and 83 Disease) affecting amino acid sequence. NSP13 has a maximum of 14 Disease variants followed by S protein and ORF3a with 13 each. Disease mutations in genomes from asymptomatic people was mere 11% but those from deceased patients was at 38% indicating contribution of these mutations to the pathophysiology of the SARS-CoV-2.Entities:
Keywords: Asymptomatic; Mutation; Protein; SARS-CoV-2
Year: 2021 PMID: 33549714 PMCID: PMC7860943 DOI: 10.1016/j.gene.2021.145470
Source DB: PubMed Journal: Gene ISSN: 0378-1119 Impact factor: 3.688
Tajima’s Neutrality Test.
| 612 | 841 | 0.028124268 | 0.004021699 | 0.000412328 | −2.69529519 |
*m = number of sequences, n = total number of sites, S = Number of segregating sites, ps = S/N, Θ = ps/a1, π = nucleotide diversity, D = Tajima test statistic.
Fig. 1Summary of variations observed in SARS-CoV-2 genomes from India. The nature of variations (Singleton/PI); type of mutation, genome localization and impact on protein (SNP, SNP-Silent, SNP-Stop, Extragenic) has been represented along with their interlinking. The width of the connecting lines represent number; broader the line more the number of that parameter.
Localization and mutations observed at the multi-variable (MV) sites.
| S No | Nucleotide Position | Category | Nucleotide Variant 1 | Amino Acid Variant 1 | Nucleotide Variant 2 | Amino Acid Variant 2 |
|---|---|---|---|---|---|---|
| 1 | 4893 | Pi | C > A | T725K | C > T | T725I |
| 2 | 5821 | SNP | A > G | L1034L | A > T | L1034F |
| 3 | 23,282 | SNP | G > C | D574H | G > T | D574Y |
| 4 | 23,593 | Pi | G > T | Q677H | G > C | Q677H |
Maximum Composite Likelihood Estimate of Nucleotide Substitution.
| Substituted | |||||
|---|---|---|---|---|---|
| Nt | A | T | C | G | |
| – | |||||
| – | |||||
| – | |||||
| – | |||||
* Rates of transitional substitutions are bold and transversional substitutions are italicized
Fig. 2Prevalence and composition of different nucleotides across reference and substituted positions in SARS-CoV-2 genomes from India.
Fig. 3Age and gender wise distribution of mutations in SARS-CoV-2 genomes from India. Number of male/female samples and sum of mutations incidence therein according to age group.
Fig. 4Average number of mutations per sample in different age groups of males/females and the differences therein.
Fig. 5Number of samples and corresponding Sum of Mutation Incidence of SARS-CoV-2 across different states of India.
Distribution of SNPs across different proteins.
| S No | Gene/Protein | No of Variable Sites | Sum of Mutations Incidence | Mutations affecting Protein (N = N2 + N3; D = D2 + D3) | Neutral by Three (N3) | Disease by One/Neutral by Two (N2) | Disease by Two/Neutral by One (D2) | Disease by Three (D3) |
|---|---|---|---|---|---|---|---|---|
| 127 | ||||||||
| NSP1 | 14 | 19 | 7 | H81Y, G137C, | R24C; V38F | R124C | ||
| NSP2 | 29 | 46 | 15 | Y196H, L204F, K338R, | G192D; P626T; I671T; V710F | T592I | ||
| NSP3 | 101 | 284 | 63 | A872T, T882I, Y925C, E940D, P971S, G989V, S1029I, P1054L, M1083I, H1141Y, A1268T, S1534I, | D930G, D1036G, | A1306V | G1069E | |
| NSP4 | 26 | 109 | 14 | W2769L, M2796I, H2831Y, A2994V, F3031Y, D3042N A3143V, L3161I | T2777I | L2781P F3071Y | T3223I | |
| NSP5 | 20 | 41 | 9 | T3453A, Q3390R, P3395S | N3405L | S3386P | L3338F | |
| NSP6 | 17 | 30 | 7 | Q3826H | I3731T, P3613L, D3681N, I3835T, | I3731F | ||
| NSP7 | 2 | 5 | 0 | |||||
| NSP8 | 11 | 26 | 6 | K4081R | E3962K, K4069T | M4032S, L4033F | R3993C | |
| NSP9 | 6 | 11 | 3 | V4181I, V4242I | T4249I | |||
| NSP10 | 4 | 4 | 2 | A4271V | ||||
| NSP12a | 1 | 3 | 1 | S4398L | ||||
| 60 | ||||||||
| NSP12b | 43 | 84 | 22 | A4487V, I4593L, S4621G, | K4451N, R4565H M4588I, E4670D, L4721I | K4483N, D4532G, A4577V,D4676Y, S4710T, M5148I, | D5076G | |
| NSP13 | 30 | 49 | 21 | P5377S, S5490A, K5669R, M5798V, V5894L I5899V, | A5770V | K5364R, E5492Q, | W5830C | |
| NSP14 | 15 | 20 | 7 | A5926S, P5971L, M5974I, K6274N | L5952I, T5930A S6180I | |||
| NSP15 | 8 | 17 | 5 | V6518L, A6533T | G6581D | D6491Y | ||
| NSP16 | 7 | 12 | 5 | P6805S, L6909F, A6914S | K6958R | G6837C | ||
| 49 | L54F, N148Y, E156D, A243S, S255F, G261S, Q271R, T299I, T323I, | I434K, S494P, D574Y, A892V | T723I, F797C, L828P, T941K, | G857C | ||||
| 22 | V13L, G18V, S74A, V77F, T175I | L41F, S74F, S171L, T190I | I62T, L83F, T176I | I35T, L46F, | ||||
| 1 | V29S | |||||||
| 5 | D3G, A68S, H125Y | A69S, V70F | ||||||
| 2 | I60V | D61L | ||||||
| 2 | P45L | G38V | ||||||
| 13 | S37L, L139F, A152S | P6T, P13L, G18V, G30V, S33I, D63N, D144Y, | R92S | |||||
| 281 |
Fig. 6Prevalence of variant sites across studied genomes. a) Most prevalent SNP and SNP_Silent across studied samples. Variants which had incidence across more than ten genomes haven been represented. b) Samples with accumulated variations in genomes.
Fig. 7Predicting the impact of mutations affecting amino acid sequence on proteins through multiple tools.
Predicting the impact of mutations on proteins from different set of genomes.
| Original congregation | Asymptomatic | Deceased | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| No of Samples | 611 | 30 | 15 | |||||||
| Samples with no mutation | 156 | 0 | 01 | |||||||
| Total Variants | 281 | 55 | 13 | |||||||
| Mutation Type | Protein Prediction by tools | No of variants | p value | Sum of Mutation Incidence | No of variants | p value | Sum of Mutation Incidence | No of variants | p value | Sum of Mutation Incidence |
| 133 | 0.47 | 254 | 31 | 0.56 | 76 | 2 | 0.15 | 4 | ||
| 65 | 0.23 | 160 | 18 | 0.33 | 60 | 6 | 0.46 | 10 | ||
| 41 | 0.15 | 187 | 2 | 0.04 | 4 | 1 | 0.08 | 2 | ||
| 42 | 0.15 | 70 | 4 | 0.07 | 2 | 4 | 0.31 | 6 | ||