| Literature DB >> 32984566 |
Alessandra Lo Presti1, Giovanni Rezza1,2, Paola Stefanelli1.
Abstract
BACKGROUND: An outbreak of a febrile respiratory illness due to the newly discovered Coronavirus, SARS-CoV-2, was initially detected in mid-December 2019 in the city of Wuhan, Hubei province (China). The virus then spread to most countries in the world. As an RNA virus, SARS-CoV-2 may acquire mutations that may be fixed. The aim of this study was to evaluate the selective pressure acting on SARS-CoV-2 protein coding genes.Entities:
Keywords: Bioinformatics; Evolutionary biology; Glycosylation; Infectious disease; Microbiology; Mutation; Public health; Receptor binding domain; SARS-CoV-2; Selective pressure; Virology
Year: 2020 PMID: 32984566 PMCID: PMC7505600 DOI: 10.1016/j.heliyon.2020.e05001
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Selective pressure analysis on SARS-CoV-2 protein coding gene sub-sets.
| sub- set | Positively selected sites (ω for sites >1) | Negatively selected sites (ω for sites <1) |
|---|---|---|
| nsp1 | ∖ | 65 (E); 83 (H) |
| nsp2 | 198 (V; I), 248 (S; G), 347 (K; C), 348 (S; V), 559 (I; V) | 287 (F); 488 (A); 565 (E) |
| nsp3 | 1454 (N; Y; D); 1507 (A; E), 1527 (A; V; E) | 106(F); 152 (Q), 353 (T), 380 (Q), 432 (T), 561 (L), 995 (Q), 1047 (D), |
| 1138 (K), 1303 (T), 1455 (S), 1456 (T), 1502 (A), 1544 (S), 1719 (P) | ||
| nsp4 | 33 (M; I) | 15 (L), 71 (F), 212 (V), 235 (V) |
| 3C-like proteinase | ∖ | 239 (Y) |
| nsp6 | 37 (L; F) | 222 (T), 289 (V) |
| nsp7 | ∖ | ∖ |
| nsp8 | ∖ | ∖ |
| nsp9 | ∖ | ∖ |
| nsp10 | ∖ | 128 (C) |
| nsp11 | ∖ | ∖ |
| nsp12 | 25 (G; Y); 323 (P; L); 644 (T; M) | 24 (T), 28 (T), 85 (T), 105 (R), 142 (L), 455 (Y), 591 (T), 643 (T), 896 (T) |
| helicase | 504 (P; L); 598 (A; S; V) | 337 (R); 521 (V); 547 (T), 553 (A) |
| 3′-to-5′ exonuclease | ∖ | 7 (L); 490 (E) |
| endoRNAse | ∖ | 73 (N), 127 (V), 216 (L) |
| 2′-O-ribose | ∖ | 4 (A); 36 (L), 138 (N), 163 (L) |
| methyltransferase | ||
| S | ||
| surface glycoprotein | 943 (S; P) | 348 (A); 669 (G); 681 (P); 795 (K); 853 (Q); 890 (A); 921 (K); 982 (S), 1044 (G), 1100 (T), 1166 (L) |
| ORF3a | 99 (A; S; V) | ∖ |
| E | ∖ | 63 (K) |
| M | ∖ | 69 (A) |
| ORF6 | ∖ | 61 (D) |
| ORF7a | ∖ | 69 (D); 70 (G); 92 (E) |
| ORF8 | 62 (V; L) | ∖ |
| N | 13 (P; L; S); 103 (D; Y) | 173 (A); 274 (F) |
| ORF10 | ∖ | 15 (S); 19 (C) |
Only the sites with a p-value < 0.1 (FEL, SLAC) and with a posterior probability >0.90 (FUBAR) were considered as candidates for selection and statistically supported.
Results of the mutation analysis performed on the surface glycoprotein (S) sub-set.
| Amino acid position | Reference Accession Number and residue identified | Accession Id and mutation identified |
|---|---|---|
| 27 | A | EPI_ISL_419885: V |
| 29 | T | EPI_ISL_418869: I |
| 32 | F | EPI_ISL_402132: I |
| 49 | H | EPI_ISL_403936: Y |
| EPI_ISL_403937: Y | ||
| EPI_ISL_406531: Y | ||
| EPI_ISL_408010: Y | ||
| 71 | S | EPI_ISL_417142: F |
| 146 | EPI_ISL_417977: Y | |
| 167 | T | EPI_ISL_408978: F |
| 184 | G | EPI_ISL_422298: D |
| 197 | I | EPI_ISL_418216: V |
| EPI_ISL_418265: V | ||
| 202 | K | EPI_ISL_413023: N |
| 215 | S | EPI_ISL_418409: H |
| 247 | S | EPI_ISL_406844: R |
| 255 | S | EPI_ISL_420877: F |
| 258 | W | EPI_ISL_417976: L |
| 367 | V | EPI_ISL_406596: F |
| EPI_ISL_406597: F | ||
| 458 | K | EPI_ISL_415159: R |
| 477 | S | EPI_ISL_419662: G |
| 483 | V | EPI_ISL_417139: A |
| EPI_ISL_413652: A | ||
| EPI_ISL_417076: A | ||
| 491 | P | EPI_ISL_419737: R |
| 519 | H | EPI_ISL_415159: P |
| 522 | A | EPI_ISL_421654: V |
| 574 | D | EPI_ISL_418421: Y |
| 614 | D | 614 G |
| 615 | V | EPI_ISL_412983: L |
| 630 | T | EPI_ISL_417446: S |
| 631 | P | EPI_ISL_419704: S |
| 655 | H | EPI_ISL_413486: Y |
| 675 | Q | EPI_ISL_419709: R |
| 809 | P | EPI_ISL_417408: S |
| 879 | A | EPI_ISL_418401: S |
| 936 | D | EPI_ISL_418432: Y |
| 939 | S | EPI_ISL_420814: F |
| 941 | T | EPI_ISL_415159: A |
| 943 | S | EPI_ISL_415159: P |
| EPI_ISL_420335: P | ||
| 954 | Q | EPI_ISL_417978: K |
| 1132 | I | EPI_ISL_414628: V |
| 1143 | P | EPI_ISL_407896: L |
| 1229 | M | EPI_ISL_417575: I |
| 1247 | C | EPI_ISL_416655: F |
| 1254 | C | EPI_ISL_413594: F |
| 1263 | P | EPI_ISL_415133: L |
The list of sequences harbouring the mutation 614G has been reported in Table S2.
Figure 1The representative alignment for the comparison of the surface glycoprotein between SARS-CoV-2, SARS-CoV and Bat SARS - like virus (including the first 20 SARS-CoV-2 sequences, in addition to SARS-CoV-2 references), focusing the attention on the relevant positions 472 (amino acid L or P in SARS COV), 479 (amino acid N in SARS CoV) and 487 (amino acid T or S) of SARS CoV.
Figure 2A. The predicted N-glycosylation sites in SARS-CoV-2 surface glycoprotein sub-set, obtained by using N-GlycoSite tool. The positions, number and fraction of the predicted N-glycosylation sites were reported. B. The predicted N-glycosylation sites in SARS-CoV-2 M protein sub-set, obtained by using N-GlycoSite tool. The position, number and fraction of the predicted N-glycosylation sites were reported.
Figure 3The predicted N-glycosylation sites in SARS-CoV-2 E protein sub-set obtained by using N-GlycoSite tool. The positions, number and fraction of the predicted N-glycosylation sites were reported.