| Literature DB >> 34044192 |
Patrícia Aline Gröhs Ferrareze1, Vinícius Bonetti Franceschi2, Amanda de Menezes Mayer2, Gabriel Dickin Caldana1, Ricardo Ariel Zimerman3, Claudia Elizabeth Thompson4.
Abstract
The COVID-19 pandemic caused by SARS-CoV-2 has affected millions of people since its beginning in 2019. The propagation of new lineages and the discovery of key mechanisms adopted by the virus to overlap the immune system are central topics for the entire public health policies, research and disease management. Since the second semester of 2020, the mutation E484K has been progressively found in the Brazilian territory, composing different lineages over time. It brought multiple concerns related to the risk of reinfection and the effectiveness of new preventive and treatment strategies due to the possibility of escaping from neutralizing antibodies. To better characterize the current scenario we performed genomic and phylogenetic analyses of the E484K mutated genomes sequenced from Brazilian samples in 2020. From October 2020, more than 40% of the sequenced genomes present the E484K mutation, which was identified in three different lineages (P.1, P.2 and B.1.1.33 - posteriorly renamed as N.9) in four Brazilian regions. We also evaluated the presence of E484K associated mutations and identified selective pressures acting on the spike protein, leading us to some insights about adaptive and purifying selection driving the virus evolution.Entities:
Keywords: COVID-19; E484K; Infectious diseases; Severe acute respiratory syndrome coronavirus 2; Viral evolution
Mesh:
Substances:
Year: 2021 PMID: 34044192 PMCID: PMC8143912 DOI: 10.1016/j.meegid.2021.104941
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 4.393
Fig. 1Histogram of frequent mutations observed in the Brazilian SARS-CoV-2 genomes harboring E484K mutation. Red labels above the bars indicate absolute nucleotide position and the blue labels indicate effects of these mutations in the corresponding proteins. As P.1 has only 19 genomes represented and multiple mutations, only main mutations of concern were highlighted.
UTR: Untranslated region; Syn: Synonymous substitution; del: deletion; ORF: Open Reading Frame; Nsp: Non-structural protein; S: Spike; E: Envelope; M: Membrane; N: Nucleocapsid.
Lineage-defining mutations of each of the three Brazilian lineages carrying the E484K mutation. These mutations do not necessarily reflect pangolin lineage assignment defining-mutations, but were extracted based on their representativeness in the majority of sequences of each lineage (https://github.com/cov-lineages/pangolin).
| Lineage/Mutation | N.9 (B.1.1.33 with E484K) | P.1 | P.2 |
|---|---|---|---|
| 5′ UTR | – | – | C100T |
| ORF1ab | G1264T (Nsp2) | T733C (Nsp1) | T10667G (Nsp5: L3468V) |
| S | C21614T (L18F) | ||
| ORF3a | – | T26149C (S253P) | – |
| E | – | – | – |
| M | – | – | – |
| ORF6 | – | – | – |
| ORF7a | – | – | – |
| ORF7b | A27853C (E33A) | – | – |
| ORF8 | – | G28167A (E92K) | C28253T (F120F) |
| N | – | C28512G (P80R) | G28628T (A119S) |
| ORF10 | – | – | – |
| 3′ UTR | C29722T | – | C29754T |
ORF1ab mutations are represented by its amino acid positions relative to ORF1a (Nsp1-Nsp11) and ORF1b (Nsp12-Nsp16). ins: insertion. The E484K mutation in the lineages is indicated in bold.
Fig. 2Bayesian phylogenetic inference of the 134 Brazilian E484K mutated genomes. Tips were colored by Brazilian state and the reference genome NC_045512.2 is represented in black. The branches were highlighted by lineage: green (B.1.1.33), light green (N.9), beige (B.1.1.28), light red (P.2) and blue (P.1). Mutations occurring in all analyzed genomes for each lineage were described next to the respective nodes. The asterisks indicate the SARS-CoV-2 genomes sequenced by our research group.
Fig. 3Distribution of genomes harboring E484K mutation across different lineages (A) and Brazilian states (B) from October to December 2020.
Fig. 4Monthly presence of the E484K mutation considering worldwide available data (A) and Brazilian genomes (B). For clarity, the number of genomes in (A) are represented in log10 scale.
FUBAR site table for positively selected sites.
| Site | ɑ | β | Prob[ɑ < β] |
|---|---|---|---|
| 5 | 1.880 | 20.275 | 0.9559 |
| 12 | 1.896 | 21.520 | 0.9584 |
| 26 | 1.879 | 20.481 | 0.9564 |
| 138 | 2.300 | 19.109 | 0.9214 |
| 155 | 2.266 | 17.643 | 0.9158 |
| 484 | 3.850 | 25.932 | 0.9109 |
| 677 | 2.931 | 19.933 | 0.9083 |
| 688 | 1.917 | 23.771 | 0.9628 |
FUBAR inferred the sites submitted to diversifying positive selection with a posterior probability ≥0.9.
PAML (codeml): Parameters estimates and log-likelihood values under models of variable ω ratios among sites.
| Model | Parameters | lnL | Sites showing indications of positive selection |
|---|---|---|---|
| M0 | ω = 0.40548 | −6617.469118 | None |
| M1 | p0 = 0.64290, p1 = 0.35710 | −6607.96566 | Not allowed |
| M2 | p0 = 0.96461, p1 = 0.00003, p2 = 0.03536 | −6604.579083 | 5, 12, 26, 138, 155, 222, 484, 626, 677, 688, 1263 |
| M3 | p0 = 0.93156, p1 = 0.03308, p2 = 0.03536 | −6604.578953 | 5, 12, 26, 138, 155, 222, 484, 626, 677, 688, 1263 |
| M7 | −6608.318528 | Not allowed | |
| M8 | p0 = 0.91883, | −6605.852845 | 5, 12, 14, 20, 26, 27, 52, 54, 68, 138, 145, 153, 155, 190, 218, 221, 222, 231, 235, 263, 344, 417, 439, 484, 561, 583, 626, 658, 670, 677, 688, 776, 791, 879, 936, 1005, 1065, 1071, 1072, 1076, 1099, 1104, 1118, 1152, 1162, 1238, 1259, 1263, 1264, 1272 |
ω = dN/dS = average over sites; p0,p1 and p2 indicate the proportions of groups 0, 1 and 2 in each model, respectively; ω0, ω1 and ω2 indicate the ω values of groups 0, 1 and 2 in each model, respectively. p and q are beta parameters.
PAML (codeml): Likelihood ratio statistics (2Δ/) for some comparisons between selection models.
| Comparison | 2Δ/ | Probability values ( |
|---|---|---|
| M3 | 25.78033 | <0.001 |
| M2 | 6.773154 | <0.05 |
| M8 | 4.931366 | <0.1 |
| M8 | 4.225314 | <0.05 |
The degrees of freedom used comparing models M3 vs. M0, M2 vs. M1, M8 vs. M7 and M8a vs. M8 are 4, 2, 2 and 1, respectively. Probability values ≤ 0.05 are considered as statistically significant.
PAML (codeml) site table for positively selected sites.
| Site | NEB probabilities | BEB probabilities | ||
|---|---|---|---|---|
| Prob (ω > 1) | post mean for ω | Prob (ω > 1) | post mean ± SE for ω | |
| 5 L | 0.988* | 4.850 | 0.892 | 2.153 ± 0.516 |
| 12 S | 0.988* | 4.849 | 0.892 | 2.153 ± 0.517 |
| 26 P | 0.989* | 4.854 | 0.894 | 2.156 ± 0.514 |
| 138 D | 0.758 | 3.779 | 0.741 | 1.922 ± 0.719 |
| 155 S | 0.782 | 3.893 | 0.749 | 1.935 ± 0.712 |
| 222 A | 0.775 | 3.857 | 0.746 | 1.930 ± 0.714 |
| 484 E | 0.841 | 4.163 | 0.770 | 1.968 ± 0.692 |
| 626 A | 0.860 | 4.252 | 0.778 | 1.980 ± 0.685 |
| 677 Q | 0.909 | 4.483 | 0.802 | 2.017 ± 0.659 |
| 688 A | 0.985* | 4.835 | 0.886 | 2.145 ± 0.525 |
| 1263 P | 0.880 | 4.348 | 0.787 | 1.994 ± 0.675 |
M2 selection model: Naive Empirical Bayes (NEB) analysis. Positively selected sites (*: P > 95%; **: P > 99%). Bayes Empirical Bayes (BEB).
FEL site table for negatively selected sites.
| Site | ɑ | β | LRT | Prob[ɑ > β] |
|---|---|---|---|---|
| 55 | 18.379 | 0.000 | 4.058 | 0.0440 |
| 91 | 27.071 | 0.000 | 2.792 | 0.0947 |
| 132 | 121.449 | 0.000 | 4.395 | 0.0360 |
| 180 | 121.798 | 0.000 | 4.401 | 0.0359 |
| 189 | 33.258 | 0.000 | 5.432 | 0.0198 |
| 191 | 120.557 | 0.000 | 4.393 | 0.0361 |
| 266 | 27.216 | 0.000 | 2.794 | 0.0946 |
| 324 | 118.682 | 0.000 | 4.370 | 0.0366 |
| 428 | 26.763 | 0.000 | 2.852 | 0.0913 |
| 475 | 16.429 | 0.000 | 3.268 | 0.0707 |
| 564 | 121.798 | 0.000 | 4.159 | 0.0414 |
| 821 | 21.710 | 0.000 | 3.065 | 0.0800 |
| 897 | 42.768 | 0.000 | 4.241 | 0.0395 |
| 910 | 42.680 | 0.000 | 3.305 | 0.0691 |
| 1120 | 43.397 | 0.000 | 3.598 | 0.0578 |
| 1126 | 26.763 | 0.000 | 3.495 | 0.0616 |
| 1215 | 18.327 | 0.000 | 2.861 | 0.0907 |
| 1228 | 42.759 | 0.000 | 4.136 | 0.0420 |
| 1251 | 42.680 | 0.000 | 3.305 | 0.0691 |
Sites identified under negative selection at p ≤ 0.1. Grey rows: significant sites also identified by SLAC.