| Literature DB >> 32410229 |
Matías Castells1, Fernando Lopez-Tort1, Rodney Colina1, Juan Cristina2.
Abstract
On 30th January 2020, an outbreak of atypical pneumonia caused by a novel betacoronavirus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared a public health emergency of international concern by the World Health Organization. For this reason, a detailed evolutionary analysis of SARS-CoV-2 strains currently circulating in different geographic regions of the world was performed. A compositional analysis as well as a Bayesian coalescent analysis of complete genome sequences of SARS-CoV-2 strains recently isolated in Europe, North America, South America, and Asia was performed. The results of these studies revealed a diversification of SARS-CoV-2 strains in three different genetic clades. Co-circulation of different clades in different countries, as well as different genetic lineages within different clades were observed. The time of the most recent common ancestor was established to be around 1st November 2019. A mean rate of evolution of 6.57 × 10-4 substitutions per site per year was found. A significant migration rate per genetic lineage per year from Europe to South America was also observed. The results of these studies revealed an increasing diversification of SARS-CoV-2 strains. High evolutionary rates and fast population growth characterizes the population dynamics of SARS-CoV-2 strains.Entities:
Keywords: SARS-CoV-2; coalescent; coronavirus; evolution
Mesh:
Year: 2020 PMID: 32410229 PMCID: PMC7273070 DOI: 10.1002/jmv.26018
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
Bayesian coalescent inference of SARS‐CoV‐2 strains
| Group | Parameter | Value | HPD | ESS |
|---|---|---|---|---|
| SARS‐CoV‐2 full‐length | Posterior | −41307.31 | −41323.78 to −41219.28 | 1603.08 |
| Genome sequences | Prior | 171.88 | 118.75 to 218.29 | 248.06 |
| Likelihood | −41479.20 | −41471.57 to −41427.82 | 56639.35 | |
| tMRCA | 128.11 | 83.11 to 195.64 | 363.69 | |
| 11/01/2019 | 08/26/1029 to 12/20/2019 | |||
| Mean rate | 6.57 × 10−4 | 9.23 × 10−4 to 2.47 × 10−4 | 57979.28 | |
|
| 1.313 | 0.884 to 1.837 | 316.16 | |
|
| 1.133 | 0.890 to 1.450 | 350.11 | |
|
| 1.226 | 0.425 to 1.842 | 1632.04 | |
|
| 1.136 | 0.998 to 1.334 | 409.58 | |
| Recovery rate | 24.484 | 4.568 to 43.544 | 527.91 | |
| Europe‐North America | 0.550 | 2.04 × 10−4 to 1.679 | 1787.90 | |
| Europe‐South America | 1.821 | 1.83 × 10−3 to 4.344 | 2458.63 | |
| Europe‐South East Asia | 0.870 | 1.11 × 10−3 to 2.476 | 1331.36 | |
| North America‐Europe | 0.902 | 1.56 × 10−4 to 2.746 | 1748.09 | |
| North America‐South America | 0.872 | 3.88 × 10−5 to 2.675 | 1502.71 | |
| North America‐South East Asia | 0.880 | 1.03 × 10−3 to 2.663 | 776.87 | |
| South America‐Europe | 1.196 | 4.28 × 10−4 to 3.309 | 1138.00 | |
| South America‐North America | 0.719 | 2.84 × 10−5 to 2.332 | 1071.86 | |
| South America‐South East Asia | 0.838 | 1.28 × 10−4 to 2.573 | 1649.77 | |
| South East Asia‐Europe | 1.300 | 7.92 × 10−4 to 3.089 | 687.31 | |
| South East Asia‐North America | 1.319 | 9.01 × 10−4 to 3.105 | 798.09 | |
| South East Asia‐South America | 0.687 | 1.13 × 10−5 to 2.220 | 816.90 |
Abbreviations: ESS, effective sample size; HPD, highest posterior density; SARS‐CoV‐2, severe acute respiratory syndrome cor onvirus 2; tMRCA, time of the most recent common ancestor.
See the Supplementary Material Table 1 for strains included in this analysis.
In all cases, the mean values are shown.
tMRCA, shown in days. The date estimated for the tMRCA is indicated bellow.
Mean rate was calculated in substitutions per site per year.
The basic reproduction numbers for Europe, North America, South America, and South East Asia are shown, respectively.
The rates of recovery for a person with SARS‐CoV‐2 in any of the locations studied, in days.
Migration rate per lineage per year from one region to another.
Figure 1PCA of A, U, C, and G nucleotide frequencies in SARS‐CoV‐2 genomes. Position of the strains in the plane conformed by the first two major axes of PCA is shown. SVD was used to calculate principal components and unit variance was applied. The proportion of variance explained by each axis is shown between parentheses. Strain Wuhan/WH01/2019, isolated 26th December 2019 is indicated by a black arrow. Strains isolated in Europe, North America, South America, and South East Asia are shown in red, blue, green, and violet, respectively. N = 64 datapoints. PCA, principal component analysis; SARS‐CoV‐2, severe acute respiratory syndrome coronvirus 2; SVD, singular value decomposition
Figure 2Marginal probability distribution of R 0 values. The marginal probability distribution for Europe, North America, South America, and South East Asia are shown in gray, blue, red, and yellow, respectively
Figure 3DensiTree analysis of complete genome sequences of SARS‐CoV‐2 strains recently isolated in four different geographic regions of the world. The results obtained using the HKY model, a relaxed exponential clock and a structured coalescent population model is shown. 5000 trees were drawn, shown in green. Root channel is shown in blue. The scale at the bottom is in units of evolutionary time and represents the years before the last sampling date. Strains in the tree are shown by name, followed by date of isolation (day/month/year). SARS‐C oV‐2, severe acute respiratory syndrome coronvirus 2
Substitutions in parsimony informative sites in SARS‐CoV‐2 genomesa
| Genomic region (ORF) | Nucleotide substitutions | Amino acid substitutions | Clade No. | |||||
|---|---|---|---|---|---|---|---|---|
| Site | No. | Type | Site | Type | Geographic location of isolation | |||
| 5‐Non coding region | 216 | 11 | c → t | ⋯ | ⋯ | Netherlands, Luxembourg, Switzerland, France, Portugal, Italy, Chile, Mexico, Taiwan | 1 | |
| 589 | 2 | g → a | 117 | A → T | USA | |||
| 1415 | 2 | g → a | 392 | G → D | Germany | |||
| 2886 | 2 | g → a | 876 | A → T | Germany | |||
| 3012 | 11 | c → t | ⋯ | ⋯ | Netherlands, Luxembourg, Switzerland, Ireland, France, Portugal, Italy, Chile, Mexico, Taiwan | 1 | ||
| 3021 | 2 | a → g | ⋯ | ⋯ | USA | |||
| 1a | 4377 | 2 | t → c | ⋯ | ⋯ | South Korea | ||
| 5037 | 2 | g → c | 1599 | L → F | South Korea | |||
| 5059 | 4 | a → c | 1607 | I → V | Canada, USA | |||
| 8757 | 22 | c → t | ⋯ | ⋯ | France, Germany, Chile, USA, South Korea, China | 2 | ||
| 9452 | 2 | t → a | 3071 | F → Y | France, Chile | |||
| 11058 | 6 | g → t | 3606 | L → F | Italy, Brazil, USA, Hong Kong | 3 | ||
| 14383 | 11 | c → t | 314 | P → L | Netherlands, Luxembourg, Switzerland, Ireland, France, Portugal, Italy, Chile, Mexico, Taiwan | |||
| 14780 | 3 | c → t | ⋯ | ⋯ | France, Brazil, Chile | |||
| 16442 | 2 | a → g | ⋯ | ⋯ | USA | |||
| 1b | 16950 | 2 | g → t | 1170 | V → F | USA | ||
| 17445 | 2 | c → t | ⋯ | ⋯ | Chile | |||
| 17722 | 6 | c → t | 1427 | P → L | USA | |||
| 17833 | 6 | a → g | 1464 | Y → C | USA | |||
| 18035 | 8 | c → t | ⋯ | ⋯ | USA | |||
| 23160 | 2 | c → t | ⋯ | ⋯ | USA | |||
| S | 23378 | 11 | a → g | 614 | D → G | Netherlands, Luxembourg, Switzerland, Ireland, France, Portugal, Italy, Chile, Mexico, Taiwan | 1 | (G) |
| 3a | 25954 | 2 | g → t | 196 | G → V | France, Chile | ||
| 26063 | 2 | c → t | ⋯ | ⋯ | Chile | |||
| 26119 | 6 | g → t | 251 | G → V | Italy, Brazil, Hong Kong, Singapore | 3 | (V) | |
| M | 27021 | 2 | c → t | 175 | T → M | Netherlands | ||
| 8 | 28119 | 24 | t → c | 84 | L → S | France, Germany, Chile, USA, South Korea, China | 2 | (S) |
| 28555 | 2 | g → t | 103 | D → Y | Chile | |||
| N | 28829 | 2 | c → t | ⋯ | ⋯ | France, Chile | ||
| 28838 | 5 | c → t | 194 | S → L | Canada, USA | 3 | ||
| N | 28838 | 2 | c → t | 197 | S → L | France, Chile | ||
| 28856‐8 4 | ggg→acc | 203‐204 RG → KR | Netherlands, Chile, Mexico | |||||
Abbreviations: GISAID, Global Initiative on Sharing Avian Influenza Data; ORF, Open reading frame; SARS‐CoV‐2, severe acute respiratory syndrome coronvirus 2.
Substitutions found in relation to SARS‐CoV‐2 strain βCov/Wuhan/WH01/2019 genome (accession number GISAID: EPI_ISL_406798).
No. refers to the number of strains carrying that substitution in the alignment.
Clade assignment is indicated when substitution is present in more than four or more strains in the alignment. S, G, an V clade names assignment by GISAID, accordingly to amino acid substitutions found in Orf 8, S, and 3a, respectively.
A synonymous substitution is shown by a dotted line (⋯).