| Literature DB >> 32966646 |
Matías J Pereson1,2, Laura Mojsiejczuk1,2, Alfredo P Martínez3, Diego M Flichman2,4, Gabriel H Garcia1, Federico A Di Lello1,2.
Abstract
During the first few months of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution in a new host, contrasting hypotheses have been proposed about the way the virus has evolved and diversified worldwide. The aim of this study was to perform a comprehensive evolutionary analysis to describe the human outbreak and the evolutionary rate of different genomic regions of SARS-CoV-2. The molecular evolution in nine genomic regions of SARS-CoV-2 was analyzed using three different approaches: phylogenetic signal assessment, emergence of amino acid substitutions, and Bayesian evolutionary rate estimation in eight successive fortnights since the virus emergence. All observed phylogenetic signals were very low and tree topologies were in agreement with those signals. However, after 4 months of evolution, it was possible to identify regions revealing an incipient viral lineage formation, despite the low phylogenetic signal since fortnight 3. Finally, the SARS-CoV-2 evolutionary rate for regions nsp3 and S, the ones presenting greater variability, was estimated as 1.37 × 10-3 and 2.19 × 10-3 substitution/site/year, respectively. In conclusion, results from this study about the variable diversity of crucial viral regions and determination of the evolutionary rate are consequently decisive to understand essential features of viral emergence. In turn, findings may allow the first-time characterization of the evolutionary rate of S protein, crucial for vaccine development.Entities:
Keywords: SARS-CoV-2; evolution; evolutionary rate; phylogeny
Mesh:
Substances:
Year: 2020 PMID: 32966646 PMCID: PMC7537150 DOI: 10.1002/jmv.26545
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
The number of SARS‐CoV‐2 sequences by fortnight (temporal structure)
| Fortnight | Date | Median of analyzed sequences (Q1–Q3) |
|---|---|---|
| FN1 | 12/24/2019–12/31/2019 | 15 |
| FN2 | 01/01/2020–01/15/2020 | 19 |
| FN3 | 01/16/2020–01/31/2020 | 145 (136–145.5) |
| FN4 | 02/01/2020–02/15/2020 | 119 (113–120) |
| FN5 | 02/16/2020–03/02/2020 | 258 (247–259) |
| FN6 | 03/03/2020–03/17/2020 | 403 (390–406) |
| FN7 | 03/18/2020–04/01/2020 | 447 (416–450) |
| FN8 | 04/02/2020–04/17/2020 | 199 (197–201) |
| Total | 1488–1616 |
Note: The total number of sequences is variable, depending on the analyzed region (nsp1, 1608; nsp3, 1511; nsp14, 1550; S, 1488; Orf3a, 1600; E, 1615; Orf6, 1616; Orf8, 1612; and N, 1610).
Abbreviations: FN, fortnight; Q1, quartile 1; Q3, quartile 3.
Figure 1The phylogenetic signal for SARS‐CoV‐2 data sets. The presence of the phylogenetic signal was evaluated by likelihood mapping, unresolved quartets (center), and partly resolved quartets (edges) for genomes available on April 17 for the nine analyzed regions: nsp1 (29 sequences), nsp3 (225 sequences), nsp14 (65 sequences), S (183 sequences), Orf3a (74 sequences), E (11 sequences), Orf6 (12 sequences), Orf8 (23 sequences), and N (113 sequences). The presence of a strong phylogenetic signal (<40% unresolved quartets) was not observed for any region
Amino acids selected by region and fortnight. The number indicates the amino acid location in its protein
| Amino acid percentage by FN | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Region | Amino acid substitution | FN1 | FN2 | FN3 | FN4 | FN5 | FN6 | FN7 | FN8 |
| nsp3 | A58T | 0 | 0 | 0 | 1.0 | 6.0 | 3.0 | 3.0 | 2.5 |
| P135L | 0 | 0 | 0.8 | 0 | 0 | 1.5 | 0.5 | 2.5 | |
| S | D614G | 0 | 0 | 1.5 | 1.8 | 37.0 | 64.0 | 75.0 | 88.0 |
| Orf3a | Q75H | 0 | 0 | 0 | 0 | 6.0 | 22.0 | 23.0 | 34.0 |
| G196V | 0 | 0 | 0 | 0 | 0.8 | 4.0 | 0.9 | 0.5 | |
| G251V | 0 | 0 | 8.0 | 24.0 | 8.0 | 9.0 | 10.0 | 3.0 | |
| Orf8 | V62L | 0 | 5.0 | 1.0 | 3.3 | 0.0 | 1.5 | 1.3 | 3.0 |
| L84S | 0 | 42.0 | 37.0 | 21.0 | 21.0 | 18.0 | 7.0 | 6.0 | |
| N | P13L | 0 | 0 | 0 | 0 | 1.0 | 1.0 | 2.5 | 0.5 |
| S197L | 0 | 0 | 0 | 0 | 1.1 | 5.0 | 0.9 | 0.5 | |
| S202N | 0 | 0 | 3.5 | 4.2 | 0 | 0.5 | 2.2 | 2.5 | |
| R203K | 0 | 0 | 0 | 0 | 17.0 | 19.0 | 24.0 | 23.0 | |
| G204R | 0 | 0 | 0 | 0 | 17.0 | 19.0 | 24.0 | 23.0 | |
| I292T | 0 | 0 | 0 | 0 | 2.0 | 0.2 | 0.2 | 0.5 | |
Note: Only regions where amino acid change was selected and remained until the last analyzed fortnight are shown.
Abbreviation: FN, fortnight.
The number of variable positions, number of mutations, and number of sequences with mutation by region
| Region | No. of variable aa positions (%) | No. of aa substitutions | No. of sequences with aa substitutions (%) |
|---|---|---|---|
| nsp1 (180aa) | 3 (1.7) | 37 | 37 (2.4) |
| nsp3 (1945aa) | 158 (8.1) | 322 | 294 (19.3) |
| nsp14 (527aa) | 6 (1.4) | 83 | 83 (5.5) |
| S (1273aa) | 76 (5.9) | 1013 | 904 (59.4) |
| Orf3a (275aa) | 11 (4) | 491 | 468 (30.7) |
| E (75aa) | 5 (6.7) | 6 | 6 (0.4) |
| Orf6 (60aa) | 7 (11.6) | 9 | 9 (0.6) |
| Orf8 (121aa) | 14 (11.6) | 312 | 288 (18.9) |
| N (419aa) | 53 (12.6) | 760 | 470 (30.9) |
| Total (4875aa) | 333 (6.8) | 3033 | – |
Abbreviation: aa, amino acid.
Figure 2Bayesian trees of 29 sequences of nsp1 (540 nt), 225 sequences of nsp3 (5835 nt), 65 sequences of nsp14 (1581 nt), 183 sequences of S (3822 nt), 74 sequences of Orf3a (828 nt), 11 sequences of E (228 nt), 12 sequences of Orf6 (186 nt), 23 sequences of Orf8 (366 nt), and 113 sequences of N (1260 nt). Scale bar represents substitutions per site
Figure 3A comparison of the evolutionary rates estimated using BEAST for the original data set and the date‐randomized data sets (312 sequences). This analysis was performed for regions nsp3 (5835 nt) and S (3822 nt). s.s.y. = substitutions/site/year