| Literature DB >> 35137080 |
Spyros Lytras1, Joseph Hughes1, Darren Martin2, Phillip Swanepoel2, Arné de Klerk2, Rentia Lourens3, Sergei L Kosakovsky Pond4, Wei Xia5, Xiaowei Jiang6, David L Robertson1.
Abstract
The lack of an identifiable intermediate host species for the proximal animal ancestor of SARS-CoV-2, and the large geographical distance between Wuhan and where the closest evolutionary related coronaviruses circulating in horseshoe bats (members of the Sarbecovirus subgenus) have been identified, is fueling speculation on the natural origins of SARS-CoV-2. We performed a comprehensive phylogenetic study on SARS-CoV-2 and all the related bat and pangolin sarbecoviruses sampled so far. Determining the likely recombination events reveals a highly reticulate evolutionary history within this group of coronaviruses. Distribution of the inferred recombination events is nonrandom with evidence that Spike, the main target for humoral immunity, is beside a recombination hotspot likely driving antigenic shift events in the ancestry of bat sarbecoviruses. Coupled with the geographic ranges of their hosts and the sampling locations, across southern China, and into Southeast Asia, we confirm that horseshoe bats, Rhinolophus, are the likely reservoir species for the SARS-CoV-2 progenitor. By tracing the recombinant sequence patterns, we conclude that there has been relatively recent geographic movement and cocirculation of these viruses' ancestors, extending across their bat host ranges in China and Southeast Asia over the last 100 years. We confirm that a direct proximal ancestor to SARS-CoV-2 has not yet been sampled, since the closest known relatives collected in Yunnan shared a common ancestor with SARS-CoV-2 approximately 40 years ago. Our analysis highlights the need for dramatically more wildlife sampling to: 1) pinpoint the exact origins of SARS-CoV-2's animal progenitor, 2) the intermediate species that facilitated transmission from bats to humans (if there is one), and 3) survey the extent of the diversity in the related sarbecoviruses' phylogeny that present high risk for future spillovers.Entities:
Keywords: zzm321990 Rhinolophuszzm321990 ; zzm321990 Sarbecoviruseszzm321990 ; COVID-19; SARS-CoV-2; bats; coronaviruses; host range; origin; pangolins; recombination
Mesh:
Year: 2022 PMID: 35137080 PMCID: PMC8882382 DOI: 10.1093/gbe/evac018
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1Recombination-minimized phylogeny and recombination hot-/coldspots. Maximum likelihood phylogeny inferred from a recombination-free whole-genome alignment of the 78 Sarbecoviruses (A), see Materials and Methods. The non-nCoV/SARS-CoV clade is collapsed for clarity. All nodes presented have bootstrap confidence values above 90%. Distribution of recombination hot- and coldspots across the alignment based on the RRT (B) and the BDT (C) methods. For both plots, light and dark gray represent 95% and 99% confidence intervals of expected recombination breakpoint clustering under random recombination. Peaks above the shaded area represent recombination hotspots and drops below represent coldspots, annotated on the corresponding ORF genome schematic above each plot by vertical red and blue lines, respectively. All ORF names and the NTD and RBD encoding regions of Spike are also annotated on the schematics.
Fig. 3Recombination analysis and geographic distribution of Sarbecoviruses. Maximum clade credibility (MCC) dated phylogeny of RBP region 5 of 78 Sarbecoviruses (A). All tips are annotated with the geographic region the viruses have been sampled in and notable viruses are annotated with genome schematics separated into the 22 inferred RBP regions, each colored based on phylogenetic distance from SARS-CoV-2 (see scale and Materials and Methods). RBP region 21 has been removed from the schematic due to limited phylogenetic information in the alignment. The GX cluster annotated with an asterisk contains the five pangolin coronaviruses collected in Guangxi. Map of East Asia with geographic regions (provinces within China, countries outside China) colored based on Sarbecoviruses sampling (B): blue for regions with only non-nCoV clade samples, pink for regions where nCoV viruses have been sampled. Shading in the nCoV regions corresponds to phylogenetic distance from SARS-CoV-2 (see scale). Notable nCoV viruses and pangolin trafficking routes (adapted from Xu et al. [2016]) are annotated onto the map.
Fig. 4Molecular dating and Rhinolophus host geographic distributions. Tip-dated Bayesian phylogeny of RBP region 5 showing the nine closest relatives to SARS-CoV-2 (A). Tree nodes have been adjusted to the mean age estimates and posterior distributions are shown for each node with mean age estimate and 95% HPD confidence intervals presented to their left. Tips are annotated with the host species they were sampled in, bat silhouette colors correspond to panel (B). Geographic ranges of Rhinolophus species the SARS-CoV-2 closest relatives have been sampled in (B). Maps are restricted to East Asia and separated into province-level within China and country-level outside China.
Fig. 2Nonrecombinant topologies of SARS-CoV-2 relatives. Zoomed in regions of selected RBP region maximum likelihood phylogenies (A). Branches within the nCoV clade are colored in red and outside the nCoV clade in green. Genome schematics of close SARS-CoV-2 relatives with recombinant Spike regions (B). RBP regions 15 and 16 are highlighted and the non-nCoV subclades of the maximum likelihood phylogenies containing the relevant viruses are presented. The coloring of nonrecombinant segments indicates patristic distance to SARS-CoV-2 (see fig. 3 legend). Nodes with bootstrap confidence values below 80% have been collapsed.