| Literature DB >> 32387564 |
Lucy van Dorp1, Mislav Acman2, Damien Richard3, Liam P Shaw4, Charlotte E Ford2, Louise Ormond2, Christopher J Owen2, Juanita Pang5, Cedric C S Tan2, Florencia A T Boshier6, Arturo Torres Ortiz7, François Balloux8.
Abstract
SARS-CoV-2 is a SARS-like coronavirus of likely zoonotic origin first identified in December 2019 in Wuhan, the capital of China's Hubei province. The virus has since spread globally, resulting in the currently ongoing COVID-19 pandemic. The first whole genome sequence was published on January 5 2020, and thousands of genomes have been sequenced since this date. This resource allows unprecedented insights into the past demography of SARS-CoV-2 but also monitoring of how the virus is adapting to its novel human host, providing information to direct drug and vaccine design. We curated a dataset of 7666 public genome assemblies and analysed the emergence of genomic diversity over time. Our results are in line with previous estimates and point to all sequences sharing a common ancestor towards the end of 2019, supporting this as the period when SARS-CoV-2 jumped into its human host. Due to extensive transmission, the genetic diversity of the virus in several countries recapitulates a large fraction of its worldwide genetic diversity. We identify regions of the SARS-CoV-2 genome that have remained largely invariant to date, and others that have already accumulated diversity. By focusing on mutations which have emerged independently multiple times (homoplasies), we identify 198 filtered recurrent mutations in the SARS-CoV-2 genome. Nearly 80% of the recurrent mutations produced non-synonymous changes at the protein level, suggesting possible ongoing adaptation of SARS-CoV-2. Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. We additionally provide an interactive user-friendly web-application to query the alignment of the 7666 SARS-CoV-2 genomes.Entities:
Keywords: Betacoronavirus; Homoplasies; Mutation; Phylogenetics
Mesh:
Substances:
Year: 2020 PMID: 32387564 PMCID: PMC7199730 DOI: 10.1016/j.meegid.2020.104351
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 3.342
Estimates of SARS-CoV-2 time to most recent common ancestor (tMRCA). BCI: Bayesian Credible Interval; HPD: Highest Posterior Density; CI: Confidence Interval. Asterix * denotes non-peer reviewed estimate of tMRCA. ‘N.’ denotes the number of whole genomes analysed.
| Reference | N. | Substitution Rate (per site per year) | Estimated tMRCA | Method |
|---|---|---|---|---|
| Li et al. 2020 ( | 32 | 1.0 × 10–3 (95% BCI 1.854 × 10–4, 4.0 × 10–3) | October 15, 2019 (95% BCI May 2, 2019; January 17, 2020) | Rate-informed strict clock model (BEAST v1.8.4) |
| Li et al. 2020 ( | 32 | 1.8266 × 10–3 (95% BCI 7.5813 × 10–4, 3.0883 × 10–3) | December 6, 2019 (95% BCI November 16, 2019; December 21, 2019) | Rate-estimated relaxed clock model (BEAST v1.8.4) |
| Giovanetti et al. 2020 ( | 54 | 6.58 × 10–3 (95% HPD 5.2 × 10–3, 8.1 × 10–3) | November 25, 2019 (95% CI September 28, 2019; December 21, 2019) | Relaxed clock model (BEAST v1.10.4) |
| Hill & Rambaut 2020*1 | 75 | 0.92 × 10–3 (95% HPD 0.33 × 10–3–1.46 × 10–3) | November 29, 2019 (95% CI October 28, 2019; December 20, 2019) | Unreported clock model (BEAST v1.7.0) |
| Hill & Rambaut 2020*1 | 86 | 0.80 × 10–3 (95% HPD 0.14 × 10–3, 1.31 × 10–3) | November 17, 2019 (95% CI August 27, 2019; December 19, 2019) | Unreported clock model (BEAST v1.7.0) |
| Hill & Rambaut 2020*1 | 116 | 1.04 × 10–3 (95% HPD 0.71 × 10–3, 1.40 × 10–3) | December 3, 2019 (95% CI November 16, 2019; December 17, 2019) | Unreported clock model (BEAST v1.7.0) |
| Lu et al. 2020* (41) | 53 | − | November 29, 2019 (95% HPD November 14, 2019; December 13, 2019) | Strict clock model (BEAST v1.10.0) |
| Duchene et al. 2020*2 | 47 | 1.23 × 10–4(95% HPD 5.63 × 10–4, 1.98 × 10–3) | November 19, 2019 (HPD October 21, 2019; December 11, 2019) | Strict clock model (BEAST v1.10) |
| Duchene et al. 2020*2 | 47 | 1.29 × 10–3 (HPD 5.35 × 10–4, 2.15 × 10–3) | November 12, 2019 (HPD September 26, 2019; December 11, 2019) | Relaxed clock model (BEAST v1.10) |
| Volz et al. 2020*3 | 53 | Model constrained between 7 × 10–4 & 2 × 10–3 | December 8, 2019 (95% CI November 21, 2019; December 20, 2019) | Strict clock model (BEAST v2.6.0) |
| Volz et al. 2020*3 | 53 | Model constrained between 5 × 10–4 & 1.25 × 10–3 | December 5, 2019(95% CI November 6, 2019; December 13, 2019) | Maximum Likelihood regression ( |
1http://virological.org/t/phylodynamic-analysis-of-sars-cov-2-update-2020-03-06/420; 2http://virological.org/t/temporal-signal-and-the-evolutionary-rate-of-2019-n-cov-using-47-genomes-collected-by-feb-01-2020/379; 3https://doi.org/10.25561/77169
Fig. 1Global sequencing efforts have contributed hugely to our understanding of the genomic diversity of SARS-CoV-2. a) Viral assemblies available from global regions as of 19/04/2020. b) Cumulative total of viral assemblies uploaded to GISAID included in our analysis. c) Radial Maximum Likelihood phylogeny for 7666 complete SARS-CoV-2 genomes. Colours represent continents where isolates were collected. Green: Asia; Red: Europe; Purple: North America; Orange: Oceania; Dark blue: South America according to metadata annotations available on NextStrain (https://github.com/nextstrain/ncov/tree/master/data). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Genomic diversity of SARSCoV-2 in the USA, UK, Iceland and China. Strains collected from all four countries are highlighted on the global phylogenetic tree. a) Strains collected in the USA shown in purple. b) Strains from the UK shown in red. c) Strains collected in Iceland shown in red. d) Strains collected in China shown in green. Regional colours match to the global phylogeny shown in Fig. 1c. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Inspection of a major homoplastic site in Orf1ab of SARS-CoV-2 genome (position 11,083). Panel A shows a colour-coded schematic of the SARS-CoV-2 genome annotated as per NC_045512.2 and a plot of all potential homoplastic sites in Orf1ab measured as minimal number of character-state changes on a Maximum Parsimony tree (see Methods). Exemplar homoplasy (denoted with *) has been shown on the radial ML phylogenetic tree in panel B. Panel C shows the distribution of cophenetic distances between isolates carrying the identified homoplasy (red) and the distribution for all isolates (grey), showing that isolates with the homoplasy tend to cluster in the phylogeny. Equivalent figures for other filtered homoplasies are generated as part of the filtering method (see Methods). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)