| Literature DB >> 33161087 |
Chayan Roy1, Santi M Mandal2, Suresh K Mondal2, Shriparna Mukherjee3, Tarunendu Mapder4, Wriddhiman Ghosh5, Ranadhir Chakraborty6.
Abstract
To understand SARS-CoV-2 microevolution, this study explored the genome-wide frequency, gene-wise distribution, and molecular nature of all point-mutations detected across its 71,703 RNA-genomes deposited in GISAID till 21 August 2020. Globally, nsp1/nsp2 and orf7a/orf3a were the most mutation-ridden non-structural and structural genes respectively. Phylogeny of 4618 spatiotemporally-representative genomes revealed that entities belonging to the early lineages are mostly spread over Asian countries, including India, whereas the recently-derived lineages are more globally distributed. Of the total 20,163 instances of polymorphism detected across global genomes, 12,594 and 7569 involved transitions and transversions, predominated by cytidine-to-uridine and guanosine-to-uridine conversions, respectively. Positive selection of nonsynonymous mutations (dN/dS >1) in most of the structural, but not the non-structural, genes indicated that SARS-CoV-2 has already harmonized its replication/transcription machineries with the host metabolism, while it is still redefining virulence/transmissibility strategies at the molecular level. Mechanistic bases and evolutionary/pathogenicity-related implications are discussed for the predominant mutation-types.Entities:
Keywords: Genome-wide mutations; Microevolution; Nonsynonymous and synonymous mutations; SARS-CoV-2; Transition; Transversion
Year: 2020 PMID: 33161087 PMCID: PMC7644180 DOI: 10.1016/j.ygeno.2020.11.003
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736
Fig. 1Radial trees representing the phylogenetic relationships among the different SARS-COV-2 genomes sequenced till 21 August 2020. (A-D) shows the phylogeny reconstructed based on 4618 global sequences extracted from the universal dataset of 71,703 complete whole-genomes. (A) identifies and labels the clades based on the dynamic clade nomenclature system PANGOLIN [23]. This convention currently defines 62 evolved lineages based on shared mutations, of which 10 initially-described lineages (old Nextstrian Clades) have been shown. (B) identifies and labels the clades based on Year-Letter naming as per the nomenclature system proposed by Hodcroft et al. (https://nextstrain.org/blog/2020-06-02-SARSCoV2-clade-naming). (C) identifies and labels the clades based on the nomenclature system proposed by Tang et al. (https://academic.oup.com/nsr/article/7/6/1012/5775463) and which is also followed by GISAID. (D) labels the entities analyzed based on the geographical region (continent) from the sequences were obtained. (E-F) shows phylogeny based on 1148 Indian and 4630 global sequences extracted from the universal dataset of 71,703 complete whole-genomes. (E) shows only the Indian sequences, and identifies and labels the clades based on Year-Letter nomenclature system. (F) also shows only the Indian sequences, and identifies and labels the clades based on GISAID nomenclature system.
Fig. 2Gene-wise localization of all the transitions and transversions detected in the 71,703 SARS-CoV-2 genomes analyzed (the graphics are based on the data given in Supplementary File 1, Table S1). Probability density plots (showing the distributions of the mutation-types) are given for all the individual genes in their respective lower panels. Nucleotide positions (with reference to the 5′ to 3′ sequence of NC_045512.2, the earliest-sequenced SARS-CoV-2 genome) covered by each gene is plotted in the X axis. Multiple mutation-types, when detected at a single nucleotide-position, are indicated as multi-color (stacked) vertical bars.
Fig. 3Gene-wise representation of the synonymous, missense, or stop-codon-generating nature of all point mutations detected in the 71,703 SARS-CoV-2 genomes analyzed (graphics based on the data given in Supplementary File 1, Table S1). Probability density plots (showing the distributions of the mutation-types) are given for all the individual genes in their respective lower panels. Nucleotide positions (with reference to the 5′ to 3′ sequence of NC_045512.2, the earliest-sequenced SARS-CoV-2 genome) covered by each gene is plotted in the X axis. Multiple mutation-types, when detected at a single nucleotide-position, are indicated as multi-color (stacked) vertical bars.
Locus-wise distribution of the total 20,163 instances of polymorphism detected in the SARS-CoV-2 pan-genome based on 71,703 complete whole-genomes sequenced globally until 21 August 2020.
| Locus (length in bp) | Number of transitions detected (Ts) | ΣTs | Number of transversions detected (Tv) | ΣTv | Σ (Ts + Tv) | Point mutation frequency ( | No. of missense mutations | No. of synonymous mutations | dN/dS | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A➔G | G➔A | C➔U | U➔C | A➔U | U➔A | C➔A | A➔C | C➔G | G➔C | G➔U | U➔G | ||||||||
| 5’ UTR (265) | 31 | 33 | 47 | 35 | 146 | 21 | 17 | 15 | 11 | 8 | 10 | 30 | 10 | 122 | 268 | 1.41 × 10−5 | NA | NA | NA |
| 57 | 79 | 81 | 68 | 285 | 20 | 25 | 14 | 7 | 4 | 9 | 54 | 15 | 148 | 433 | 1.12 × 10−5 | 271 | 155 | 0.7398 | |
| 253 | 219 | 294 | 215 | 981 | 52 | 67 | 75 | 82 | 7 | 12 | 156 | 52 | 503 | 1484 | 1.08 × 10−5 | 996 | 477 | 0.9479 | |
| 707 | 477 | 718 | 632 | 2534 | 154 | 169 | 175 | 200 | 34 | 53 | 388 | 124 | 1297 | 3831 | 9.15 × 10−6 | 2448 | 1351 | 0.5803 | |
| 146 | 107 | 191 | 178 | 622 | 36 | 45 | 40 | 19 | 4 | 16 | 69 | 37 | 266 | 888 | 8.25 × 10−6 | 521 | 360 | 0.5126 | |
| 86 | 52 | 112 | 97 | 347 | 16 | 24 | 21 | 23 | 1 | 6 | 50 | 17 | 158 | 505 | 7.67 × 10−6 | 310 | 190 | 0.6417 | |
| 83 | 67 | 103 | 104 | 357 | 23 | 28 | 24 | 13 | 7 | 15 | 65 | 24 | 199 | 556 | 8.91 × 10−6 | 337 | 210 | 0.7000 | |
| 29 | 16 | 35 | 23 | 103 | 4 | 8 | 8 | 7 | 3 | 2 | 14 | 8 | 54 | 157 | 8.79 × 10−6 | 86 | 70 | 0.4999 | |
| 58 | 45 | 71 | 59 | 233 | 12 | 12 | 8 | 15 | 2 | 4 | 33 | 8 | 94 | 327 | 7.67 × 10−6 | 187 | 132 | 0.4892 | |
| 34 | 30 | 48 | 26 | 138 | 5 | 7 | 13 | 5 | 0 | 2 | 17 | 10 | 59 | 197 | 8.10 × 10−6 | 108 | 88 | 0.4933 | |
| 31 | 19 | 51 | 45 | 146 | 8 | 7 | 12 | 8 | 3 | 5 | 18 | 8 | 69 | 215 | 7.19 × 10−6 | 119 | 94 | 0.4187 | |
| 2 | 4 | 5 | 3 | 14 | 1 | 2 | 0 | 1 | 2 | 1 | 3 | 0 | 10 | 24 | 8.58 × 10−6 | 19 | 5 | 1.132 | |
| 259 | 175 | 319 | 285 | 1038 | 55 | 65 | 54 | 53 | 13 | 20 | 219 | 44 | 523 | 1561 | 7.64 × 10−6 | 906 | 637 | 0.6057 | |
| 181 | 96 | 200 | 173 | 650 | 37 | 39 | 53 | 33 | 6 | 12 | 133 | 29 | 342 | 992 | 8.08 × 10−6 | 583 | 405 | 0.4500 | |
| 140 | 107 | 198 | 172 | 617 | 24 | 38 | 32 | 49 | 10 | 15 | 133 | 33 | 334 | 951 | 8.39 × 10−6 | 568 | 373 | 0.4024 | |
| 149 | 96 | 112 | 110 | 467 | 38 | 30 | 32 | 39 | 4 | 21 | 94 | 21 | 279 | 746 | 1.00 × 10−5 | 514 | 228 | 0.3937 | |
| 88 | 68 | 90 | 106 | 352 | 29 | 19 | 23 | 20 | 7 | 7 | 70 | 23 | 198 | 550 | 8.58 × 10−6 | 342 | 200 | 0.4554 | |
| 346 | 246 | 428 | 417 | 1437 | 173 | 107 | 141 | 117 | 47 | 122 | 309 | 103 | 1119 | 2556 | 9.32 × 10−6 | 1615 | 906 | 0.6193 | |
| 89 | 86 | 137 | 114 | 426 | 42 | 38 | 55 | 40 | 15 | 36 | 117 | 35 | 378 | 804 | 1.35 × 10−5 | 588 | 195 | 1.5013 | |
| gap | 2 | 1 | 2 | 1 | 6 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 8 | ND | NA | NA | NA |
| 15 | 19 | 30 | 32 | 96 | 8 | 10 | 12 | 6 | 8 | 6 | 24 | 6 | 80 | 176 | 1.08 × 10−5 | 110 | 63 | 1.0206 | |
| gap | 4 | 1 | 4 | 9 | 18 | 3 | 1 | 0 | 0 | 2 | 1 | 5 | 1 | 13 | 31 | ND | NA | NA | NA |
| 50 | 40 | 82 | 60 | 232 | 22 | 14 | 21 | 11 | 7 | 17 | 55 | 18 | 165 | 397 | 8.28 × 10−6 | 209 | 183 | 0.6548 | |
| gap | 1 | 1 | 0 | 0 | 2 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 5 | ND | NA | NA | NA |
| 22 | 11 | 19 | 35 | 87 | 15 | 7 | 8 | 4 | 1 | 5 | 19 | 5 | 64 | 151 | 1.13 × 10−5 | 99 | 46 | 1.3944 | |
| gap | 2 | 0 | 1 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 2 | 5 | ND | NA | NA | NA |
| 44 | 30 | 65 | 54 | 193 | 23 | 22 | 20 | 17 | 12 | 13 | 47 | 12 | 166 | 359 | 1.37 × 10−5 | 241 | 95 | 1.1946 | |
| gap | 15 | 11 | 22 | 25 | 73 | 6 | 10 | 1 | 4 | 0 | 6 | 15 | 6 | 48 | 121 | ND | NA | NA | NA |
| 40 | 32 | 46 | 63 | 181 | 22 | 14 | 22 | 10 | 8 | 19 | 51 | 13 | 159 | 340 | 1.30 × 10−5 | 228 | 92 | 1.4522 | |
| gap | 2 | 0 | 1 | 0 | 3 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 4 | 7 | ND | NA | NA | NA |
| 160 | 142 | 215 | 102 | 619 | 92 | 33 | 65 | 52 | 30 | 58 | 169 | 24 | 523 | 1142 | 1.26 × 10−5 | 763 | 366 | 1.2633 | |
| gap | 1 | 3 | 6 | 0 | 10 | 1 | 0 | 4 | 0 | 0 | 1 | 4 | 0 | 10 | 20 | ND | NA | NA | NA |
| 10 | 8 | 16 | 14 | 48 | 6 | 3 | 1 | 3 | 2 | 3 | 9 | 2 | 29 | 77 | 9.18 × 10−6 | 53 | 19 | 1.2981 | |
| 3’ UTR (229) | 37 | 29 | 34 | 30 | 130 | 19 | 15 | 17 | 12 | 9 | 21 | 43 | 13 | 149 | 279 | 1.70 × 10−5 | NA | NA | NA |
| Pan-genome (29903) | 3174 | 2350 | 3783 | 3287 | 12,594 | 969 | 877 | 967 | 865 | 256 | 520 | 2414 | 701 | 7569 | 20,163 | 9.4 × 10−6 | 12,221 | 6940 | NA |
ND = not determined.
NA = not applicable.
dN = rate of missense (non-synonymous) mutation accumulation (ratio between the number of non-synonymous mutations and non-synonymous sites).
dS = rate of synonymous mutation accumulation (ratio between the number of synonymous mutations and synonymous sites).
Structural domain-wise distribution of the major mutation-types detected across spike protein-encoding genes in 71,703 complete SARS-CoV-2 whole-genomes sequenced globally until 21 August 2020.
| Domains of SARS-CoV-2 spike protein (span in amino acid positions) | No. of mutations detected | No. of missense mutations | No. of synonymous mutations | No. of stops generated due to mutations | No. of C➔U transitions (NCU) | NCU resulting in non-synomymous mutations | No. of G➔U transversions (NGU) | NGU resulting in missense mutations |
|---|---|---|---|---|---|---|---|---|
| Upstream region | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Signal peptide (1−13) | 38 | 24 | 14 | 0 | 6 | 4 | 4 | 4 |
| N-terminal Domain or NTD (14–305) | 696 | 468 | 221 | 7 | 101 | 58 | 85 | 77 |
| Peptide linking NTD with Receptor Binding Domain or RBD (306–318) | 31 | 21 | 10 | 0 | 5 | 2 | 1 | 1 |
| RBD (319–541) | 375 | 208 | 157 | 10 | 61 | 24 | 33 | 29 |
| Peptide linking RBD with Fusion Peptide or FP (542–787) | 464 | 295 | 165 | 4 | 98 | 63 | 46 | 41 |
| FP (788–806) | 37 | 26 | 11 | 0 | 4 | 3 | 2 | 2 |
| Peptide linking FP with Heptapeptide Repeat Sequence or HR1 (807–911) | 227 | 140 | 85 | 2 | 41 | 23 | 28 | 25 |
| HR1 (912–984) | 130 | 78 | 51 | 1 | 24 | 11 | 17 | 17 |
| Peptide linking HR1 with Heptapeptide Repeat Sequence or HR2 (985–1162) | 315 | 192 | 118 | 5 | 51 | 32 | 46 | 41 |
| HR2 (1163–1213) | 101 | 72 | 29 | 0 | 14 | 6 | 14 | 14 |
| Transmembrane domain (1213–1237) | 50 | 32 | 18 | 0 | 5 | 1 | 15 | 14 |
| Cytoplasmic domain (1237–1273) | 87 | 59 | 27 | 1 | 17 | 7 | 18 | 17 |
| Splice region | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 2556 | 1615 | 906 | 30 | 428 | 234 | 309 | 282 |
These three mutations have non-coding effect.
One of these two mutations involved stop loss and the other a stop retention.