| Literature DB >> 33754082 |
H L Wells1, M Letko2,3, G Lasso4, B Ssebide5, J Nziza5, D K Byarugaba6,7, I Navarrete-Macias8, E Liang8, M Cranfield9,10, B A Han11, M W Tingley12, M Diuk-Wasser1, T Goldstein9, C K Johnson9, J A K Mazet9, K Chandran4, V J Munster3, K Gilardi6,9, S J Anthony13.
Abstract
Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) and SARS-CoV-2 are not phylogenetically closely related; however, both use the angiotensin-converting enzyme 2 (ACE2) receptor in humans for cell entry. This is not a universal sarbecovirus trait; for example, many known sarbecoviruses related to SARS-CoV-1 have two deletions in the receptor binding domain of the spike protein that render them incapable of using human ACE2. Here, we report three sequences of a novel sarbecovirus from Rwanda and Uganda that are phylogenetically intermediate to SARS-CoV-1 and SARS-CoV-2 and demonstrate via in vitro studies that they are also unable to utilize human ACE2. Furthermore, we show that the observed pattern of ACE2 usage among sarbecoviruses is best explained by recombination not of SARS-CoV-2, but of SARS-CoV-1 and its relatives. We show that the lineage that includes SARS-CoV-2 is most likely the ancestral ACE2-using lineage, and that recombination with at least one virus from this group conferred ACE2 usage to the lineage including SARS-CoV-1 at some time in the past. We argue that alternative scenarios such as convergent evolution are much less parsimonious; we show that biogeography and patterns of host tropism support the plausibility of a recombination scenario, and we propose a competitive release hypothesis to explain how this recombination event could have occurred and why it is evolutionarily advantageous. The findings provide important insights into the natural history of ACE2 usage for both SARS-CoV-1 and SARS-CoV-2 and a greater understanding of the evolutionary mechanisms that shape zoonotic potential of coronaviruses. This study also underscores the need for increased surveillance for sarbecoviruses in southwestern China, where most ACE2-using viruses have been found to date, as well as other regions such as Africa, where these viruses have only recently been discovered.Entities:
Keywords: coronavirus; recombination; viral ecology; virus evolution
Year: 2021 PMID: 33754082 PMCID: PMC7928622 DOI: 10.1093/ve/veab007
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Full list of sequences and accession numbers used in this study.
| Accession | Name | Date | Country | Host | ACE2 usage |
|---|---|---|---|---|---|
| AY304486 | SARS coronavirus SZ3 | 2003 | Guangdong, China |
|
|
| AY304488 | SARS coronavirus SZ16 | 2003 | Hong Kong, China |
| |
| AY572034 | SARS coronavirus civet007 | 2004 | Guangdong, China |
| |
| DQ022305 | Bat SARS coronavirus HKU3 1 | 2005 | Hong Kong, China |
| |
| DQ071615 | Bat SARS coronavirus Rp3 | 2004 | Guangxi, China |
| [ |
| DQ084199 | Bat SARS coronavirus HKU3 2 | 2005 | Hong Kong, China |
| |
| DQ084200 | Bat SARS coronavirus HKU3 3 | 2005 | Hong Kong, China |
| |
| DQ412042 | Bat SARS coronavirus Rf1 | 2004 | Hubei, China |
|
|
| DQ412043 | Bat SARS coronavirus Rm1 | 2004 | Hubei, China |
| |
| DQ648856 | Bat coronavirus BtCoV/273/2005 | 2004 | Hubei, China |
|
|
| DQ648857 | Bat coronavirus BtCoV/279/2005 | 2004 | Hubei, China |
|
|
| EPI_ISL_402125 | BetaCoV/Wuhan Hu 1 | 2019 | Hubei, China | human |
|
| EPI_ISL_402131 | BetaCoV/RaTG13 | 2013 | Yunnan, China |
|
|
| EPI_ISL_412976 | BetaCoV/RmYN01 | 2019 | Yunnan, China |
| |
| EPI_ISL_412977 | BetaCoV/RmYN02 | 2019 | Yunnan, China |
| |
| EPI_ISL_410538 | BetaCoV/P4L | 2017 | Guangxi, China |
| |
| EPI_ISL_410539 | BetaCoV/P1E | 2017 | Guangxi, China |
| |
| EPI_ISL_410540 | BetaCoV/P5L | 2017 | Guangxi, China |
| |
| EPI_ISL_410541 | BetaCoV/P5E | 2017 | Guangxi, China |
| |
| EPI_ISL_410542 | BetaCoV/P2V | 2017 | Guangxi, China |
| |
| EPI_ISL_410543 | BetaCoV/P3B | 2017 | Guangxi, China |
| |
| EPI_ISL_410544 | BetaCoV/P2S | 2019 | Guangdong, China |
|
|
| FJ588686 | Bat SARS coronavirus Rs672/2006 | 2006 | Guizhou, China |
| |
| GQ153539 | Bat SARS coronavirus HKU3 4 | 2005 | Hong Kong, China |
| |
| GQ153540 | Bat SARS coronavirus HKU3 5 | 2005 | Hong Kong, China |
| |
| GQ153541 | Bat SARS coronavirus HKU3 6 | 2005 | Hong Kong, China |
| |
| GQ153542 | Bat SARS coronavirus HKU3 7 | 2006 | Guangdong, China |
| |
| GQ153543 | Bat SARS coronavirus HKU3 8 | 2006 | Guangdong, China |
|
|
| GQ153544 | Bat SARS coronavirus HKU3 9 | 2006 | Hong Kong, China |
| |
| GQ153545 | Bat SARS coronavirus HKU3 10 | 2006 | Hong Kong, China |
| |
| GQ153546 | Bat SARS coronavirus HKU3 11 | 2007 | Hong Kong, China |
| |
| GQ153547 | Bat SARS coronavirus HKU3 12 | 2007 | Hong Kong, China |
| |
| GQ153548 | Bat SARS coronavirus HKU3 13 | 2007 | Hong Kong, China |
|
|
| GU190215 | Bat coronavirus BM48-31/ BGR/2008 | 2008 | Bulgaria |
|
|
| JX993987 | Bat coronavirus Rp/Shaanxi2011 | 2011 | Shaanxi, China |
|
|
| JX993988 | Bat coronavirus Cp/Yunnan2011 | 2011 | Yunnan, China |
|
|
| KC881005 | Bat SARS-like coronavirus RsSHC014 | 2012 | Yunnan, China |
| [ |
| KC881006 | Bat SARS-like coronavirus Rs3367 | 2012 | Yunnan, China |
| |
| KF294457 | SARS related bat coronavirus Longquan 140 | 2012 | Guizhou, China |
|
|
| KF367457 | Bat SARS-like coronavirus WIV1 | 2012 | Yunnan, China |
| |
| KF569996 | Rhinolophus affinis coronavirus LYRa11 | 2011 | Yunnan, China |
|
|
| KF636752 | Bat Hp betacoronavirus/Zhejiang2013 | 2013 | Zhejiang, China |
| |
| KJ473811 | Bat coronavirus BtRf BetaCoV/JL2012 | 2012 | Jilin, China |
|
|
| KJ473812 | Bat coronavirus BtRf BetaCoV/HeB2013 | 2013 | Hebei, China |
|
|
| KJ473813 | Bat coronavirus BtRf BetaCoV/SX2013 | 2013 | Shanxi, China |
| |
| KJ473814 | Bat coronavirus BtRs BetaCoV/HuB2013 | 2013 | Hubei, China |
|
|
| KJ473815 | Bat coronavirus BtRs BetaCoV/GX2013 | 2013 | Guangxi, China |
|
|
| KJ473816 | Bat coronavirus BtRs BetaCoV/YN2013 | 2013 | Yunnan, China |
|
|
| KP886808 | Bat SARS-like coronavirus YNLF 31C | 2013 | Yunnan, China |
| |
| KP886809 | Bat SARS-like coronavirus YNLF 34C | 2013 | Yunnan, China |
| |
| KT444582 | SARS-like coronavirus WIV16 | 2013 | Yunnan, China |
|
|
| KU182964 | Bat coronavirus JTMC15 | 2013 | Yunnan, China |
| |
| KU182963 | Bat coronavirus MLHJC35 | 2012 | Jilin, China |
| |
| KU973692 | SARS related coronavirus F46 | 2012 | Yunnan, China |
| |
| KY352407 | SARS related coronavirus BtKY72 | 2007 | Kenya |
| |
| KY417142 | Bat SARS-like coronavirus As6526 | 2014 | Yunnan, China |
| [ |
| KY417143 | Bat SARS-like coronavirus Rs4081 | 2012 | Yunnan, China |
| [ |
| KY417144 | Bat SARS-like coronavirus Rs4084 | 2012 | Yunnan, China |
|
|
| KY417145 | Bat SARS-like coronavirus Rf4092 | 2012 | Yunnan, China |
|
|
| KY417146 | Bat SARS-like coronavirus Rs4231 | 2013 | Yunnan, China |
| [ |
| KY417147 | Bat SARS-like coronavirus Rs4237 | 2013 | Yunnan, China |
|
|
| KY417148 | Bat SARS-like coronavirus Rs4247 | 2013 | Yunnan, China |
|
|
| KY417149 | Bat SARS-like coronavirus Rs4255 | 2013 | Yunnan, China |
| |
| KY417150 | Bat SARS-like coronavirus Rs4874 | 2013 | Yunnan, China |
|
|
| KY417151 | Bat SARS-like coronavirus Rs7327 | 2014 | Yunnan, China |
| [ |
| KY417152 | Bat SARS-like coronavirus Rs9401 | 2015 | Yunnan, China |
| |
| KY770858 | Bat coronavirus Anlong 103 | 2013 | Guizhou, China |
| |
| KY770859 | Bat coronavirus Anlong 112 | 2013 | Guizhou, China |
| |
| KY770860 | Bat coronavirus Jiyuan 84 | 2012 | Henan, China |
| |
| KY938558 | Bat coronavirus 16BO133 | 2016 | South Korea |
| |
| MG772933 | Bat SARS-like coronavirus SL CoVZC45 | 2017 | Zhejiang, China |
|
|
| MG772934 | Bat SARS-like coronavirus SL CoVZXC21 | 2015 | Zhejiang, China |
|
|
| MK211374 | Bat coronavirus BtRl BetaCoV/SC2018 | 2018 | Sichuan, China |
| |
| MK211375 | Bat coronavirus BtRs BetaCoV/YN2018A | 2018 | Yunnan, China |
| |
| MK211376 | Bat coronavirus BtRs BetaCoV/YN2018B | 2018 | Yunnan, China |
| |
| MK211377 | Bat coronavirus BtRs BetaCoV/YN2018C | 2018 | Yunnan, China |
| |
| MK211378 | Bat coronavirus BtRs BetaCoV/YN2018D | 2018 | Yunnan, China |
| |
| NC_004718 | SARS coronavirus | 2003 | Canada | human |
|
| MT726044 | PREDICT PDF-2370 | 2013 | Uganda |
| |
| MT726043 | PREDICT PDF-2386 | 2013 | Uganda |
| |
| MT726045 | PREDICT PRD-0038 | 2010 | Rwanda |
|
All accession numbers are from GenBank with the exception of those beginning with EPI_ISL, which are from GISAID. Metadata includes sequencing year, geographic origin, and host species. Citations used to determine hACE2 binding capability are also included.
Viruses that were not cultured but their spike was shown to enable (or not) hACE2-mediated entry using pseudotyped or recombinant viruses.
These sequences were not included in the final phylogenetic reconstruction due to high genetic identity with another sequence in the alignment.
Figure 1.Phylogenetic tree of the RNA-dependent RNA polymerase (RdRp) gene (nsp12) and associated geographic origin and host species. Colors of clade bars represent the different geographic lineages. Lineage 1 is shown in blue, Lineage 2 in green, and Lineage 3 in orange. The clade of viruses from Africa and Europe is putatively named ‘Lineage 4’ and is shown in purple. The phylogeny shows strong posterior support for the branching order presented; however, different models or genes have produced trees with different branching orders placing Lineage 4 outside Lineage 5, so the branch to Lineage 4 is dashed to represent this uncertainty (Supplementary Fig. S1). The putative ‘Lineage 5’ containing SARS-CoV-2 is also shown in blue at the bottom of the tree to demonstrate that the sequences are from the same regions as Lineage 1 viruses. The geographic origin of each virus is indicated by the lines that terminate in the respective country or province with the same color code. The full province and country names for all two- and three-letter codes can be found in Table 1. As human, civet, and pangolin viruses cannot be certain to have naturally originated in the province in which they were first found, their locations are not illustrated, but the natural range of the pangolin (Manis javanica) is denoted with dashed shading and the origins of the SARS-CoV-1 and SARS-CoV-2 human outbreaks are designated with red stars in Guangdong and Hubei, respectively. Hosts are also shown with colored symbols according to the key on the left. The host phylogeny in the key was adapted from Agnarsson et al. (2011). The root of the tree was shortened for clarity.
Figure 2.Phylogenetic trees of RdRp (left) and the RBD (right) demonstrating recombination events between ACE2-users and non-ACE2-users. Names of viruses that have been confirmed to use hACE2 are shown in red font, and those that have been shown to not use hACE2 are shown in blue font (citations can be found in Table 1). Viruses in black font have not yet been tested. The red and blue highlighted clade bars separate viruses with the structure associated with ACE2 usage (highly similar to viruses confirmed to use hACE2 specifically) and the structure with deletions that cannot use ACE2, respectively. Connecting lines indicate recombination events that resulted in a gain of ACE2 usage (red) or a loss of ACE2 usage (blue). The two different groups of RBD sequence within the Lineage 1 recombinants that gained ACE2 usage are distinguished in red (Type 1) and purple (Type 2) highlighting. The distances of the roots have been shortened for clarity. The branch leading to Lineage 4 is dashed to demonstrate uncertainty in its positioning.
Figure 5.The phylogenetic backbone of the RdRp gene alongside the amino acid sequences of the RBM. Amino acid numbering is relative to SARS-CoV-1. Virus names in red font are known hACE2 users, those in blue are known non-users, and those in black have not been tested. Residues within 10 Å of the interface with hACE2 are considered interfacial, and exact distances between each interfacial residue and the closest hACE2 residue (based on structural modeling of SARS-CoV-1 bound with hACE2) are shown along the bottom. Residues that are closer to the interface (3 Å or less) and thus make strong interactions with hACE2 are shown in red, and as distance increases this color transitions to purple, blue, and finally to white. The receptor binding ridge sequences are highlighted in purple and the remaining interfacial segments have been numbered regions 1, 2, and 3 for clarity within the main text. The colors of these regions correspond with the colors in the structural models of Fig. 4. The branch leading to Lineage 4 is dashed to demonstrate uncertainty in its positioning.
Figure 3.hACE2 usage of bat sarbecoviruses investigated using a surrogate VSV-psuedotyping system. (A) Schematic showing the structure of chimeric spike proteins. The SARS-CoV-1 spike backbone is used in conjunction with the RBD from the Uganda and Rwanda strains. (B) Incorporation of chimeric SARS-CoV-1 spike proteins into VSV. Western blots show successful expression of chimeric spikes (lysates) and their incorporation into VSV (particles). (C) hACE2 entry assays. Left, wildtype SARS-CoV spike protein is able to mediate entry into BHK cells expressing hACE2. In contrast, recombinant spike proteins containing either the Uganda or Rwanda RBD were unable to mediate entry. Entry is expressed relative to VSV particles with no spike protein. Right, control experiment for entry assay. BHK cells do not express hACE2 and therefore do not permit entry of hACE2-dependent VSV pseudotypes.
Figure 4.Structural modeling of sarbecovirus RBDs found in Uganda and Rwanda. (A) Structural superposition of the X-ray structures for the RBDs in SARS-CoV-1 (PDB 2ajf, red) (Li et al. 2005) and SARS-CoV-2 (PDB 6m0j, cyan) (Lan et al. 2020) and homology models for SARS-CoV found in Uganda (PDF2370 and PDF-2386, magenta) and Rwanda (PRD-0038, yellow). (B) Overview of the X-ray structure of SAR-CoV-1 RBD (red) bound to hACE2 (blue) (PDB 2ajf, red) (F. Li et al. 2005). (C) Close-up view of the interface between hACE2 (blue) and RBDs in SARS-CoV-1 (PDB 2ajf, top left) (Li et al. 2005) and SARSCoV-2 (PDB 6m0j, top right) (Lan et al. 2020) and homology models for viruses found in Uganda (PDF-2370 and PDF-2386, bottom, left) and Rwanda (PRD-0038, bottom, right). The color of the RBD loops corresponds to the colors of the labeled sequence regions in Fig. 5: region 1 in cyan, region 2 in orange, the receptor binding ridge in purple, and region 3 in green. Labeled RBD residues correspond to interfacial residues whose identity differ in African sarbecoviruses and SARS-CoV-1 or SARS-CoV-2 (labels are included in all four panels to facilitate the identification of counterpart residues in each virus). Asterisks denote residues whose identity is not shared by any ACE-2 binding SARS-CoV as dictated by Fig. 5. Labeled hACE2 residues correspond to residues within 5 Å of RBD residues depicted.
Recombination breakpoints detected in ACE2-using Lineage 1 viruses by the program 3SEQ.
| Major parent | Minor parent | Child |
| Length | Breakpoint estimates |
|---|---|---|---|---|---|
|
KU973692 F46 |
EPI_ISL_402131 RaTG13 |
NC_004718 SARS-CoV-1 | 0 | 952 |
8836–8837 and 10510–10542 8836–8837 and 10726–10752 |
|
MK211374 SC2018 |
EPI_ISL_412976 RmYN01 |
NC_004718 SARS-CoV-1 | 0 | 1290 |
6497–6519 and 8363–8365 6401–6406 and 8363–8365 6440–6472 and 8363–8365 |
|
KY417146 Rs4231 |
KY417151 Rs7327 |
NC_004718 SARS-CoV-1 | 0 | 573 | 9760–9772 and 10702–10704 |
|
MG772933 SL-CoVZC45 |
KY770860 Jiyuan-84 |
NC_004718 SARS-CoV-1 | 1.4775E-07 | 1072 | 11035–11037 and 12610–12624 |
|
KY770859 Anlong-112 |
KY352407 BtKY72 |
AY304486 SARS-SZ3 | 0 | 993 | 8620–8681 and 10732–10771 |
|
MK211374 SC2018 |
KJ473814 HuB2013 |
AY304486 SARS-SZ3 | 1.1774E-07 | 1077 | 6755–6784 and 8397–8431 |
|
KY417146 Rs4231 |
MK211376 YN2018B |
AY304486 SARS-SZ3 | 0 | 558 | 9760–9772 and 10702–10704 |
|
MG772933 SL-CoVZC45 |
KP886808 YNLF_31C |
AY304486 SARS-SZ3 | 1.592E-07 | 791 | 11260–11273 and 12543–12558 |
|
EPI_ISL_412976 RmYN01 |
NC_004718 SARS-CoV-1 |
KF569996 LYRa11 | 0 | 921 |
9107–9113 and 10700–10701 9027–9043 and 10865–10869 9077–9095 and 10865–10869 9107–9113 and 10865–10869 9027–9043 and 10840–10842 9077–9095 and 10840–10842 9107–9113 and 10840–10842 9027–9043 and 10700–10701 9077–9095 and 10700–10701 |
|
JX993988 Cp/Yunnan2011 |
KY770859 Anlong-112 |
KF569996 LYRa11 | 0 | 1627 |
1658–1714 and 4151–4199 1368–1428 and 4229–4240 1487–1498 and 4229–4240 1658–1714 and 4229–4240 1368–1428 and 4151–4199 1487–1498 and 4151–4199 |
|
NC_004718 SARS-CoV-1 |
KY417142 As6526 |
KC881006 Rs3367 | 0 | 2117 | 0–11 and 9245–9251 |
|
KC881005 RsSHC014 |
KF569996 LYRa11 |
KC881006 Rs3367 | 0 | 168 | 10201–10233 and 10549–10565 |
|
KY417151 Rs7327 |
KY417142 As6526 |
KC881006 Rs3367 | 0 | 3036 | 1853–3932 and 8288–8374 |
|
NC_004718 SARS-CoV-1 |
KY417142 As6526 |
KF367457 WIV1 | 0 | 2116 | 0–11 and 9245–9251 |
|
KC881005 RsSHC014 |
KF569996 LYRa11 |
KF367457 WIV1 | 0 | 168 | 10201–10233 and 10549–10565 |
|
KY417151 Rs7327 |
KY417142 As6526 |
KF367457 WIV1 | 0 | 3036 | 1853–3932 and 8288–8374 |
|
KF367457 WIV1 |
KY417146 Rs4231 |
KC881005 RsSHC014 | 0 | 378 | 9841–9915 and 10549–10572 |
|
KY417151 Rs7327 |
KY417142 As6526 |
KC881005 RsSHC014 | 0 | 3037 | 1853–3932 and 8288–8374 |
|
KF367457 WIV1 |
KY417146 Rs4231 |
KY417144 Rs4084 | 0 | 378 | 9841–9915 and 10549–10572 |
|
KY417151 Rs7327 |
KY417142 As6526 |
KY417144 Rs4084 | 0 | 3034 | 1853–3932 and 8288–8374 |
|
NC_004718 SARS-CoV-1 |
MK211377 YN2018C |
MK211376 YN2018B | 0 | 2417 | 411–551 and 9245–9251 |
|
KC881005 RsSHC014 |
KF569996 LYRa11 |
MK211376 YN2018B | 0 | 122 | 10201–10233 and 10469–10497 |
|
KY417151 Rs7327 |
MK211378 YN2018D |
MK211376 YN2018B | 0 | 2205 | 4541–5578 and 8766–8789 |
|
NC_004718 SARS-CoV-1 |
KY417142 As6526 |
KY417151 Rs7327 | 0 | 2112 | 0–11 and 9245–9251 |
|
KC881005 RsSHC014 |
KF569996 LYRa11 |
KY417151 Rs7327 | 0 | 122 | 10201–10233 and 10469–10497 |
|
KY417144 Rs4084 |
MK211377 YN2018C |
KY417151 Rs7327 | 0 | 3260 | 924–1939 and 8186–8374 |
|
NC_004718 SARS-CoV-1 |
KY417142 As6526 |
KY417152 Rs9401 | 0 | 2112 | 0–11 and 9245–9251 |
|
KC881005 RsSHC014 |
KF569996 LYRa11 |
KY417152 Rs9401 | 0 | 122 | 10201–10233 and 10469–10497 |
|
KY417144 Rs4084 |
MK211377 YN2018C |
KY417152 Rs9401 | 0 | 3260 | 924–1939 and 8186–8374 |
|
NC_004718 SARS-CoV-1 |
KY417149 Rs4255 |
KY417146 Rs4231 | 0 | 2296 | 0–11 and 8838–8840 |
|
NC_004718 SARS-CoV-1 |
KC881005 RsSHC014 |
KY417146 Rs4231 | 0 | 1788 | 9769–9780 and 12448–12793 |
|
NC_004718 SARS-CoV-1 |
KY417143 Rs4081 |
KT444582 WIV16 | 0 | 2293 | 0–32 and 8838–8840 |
|
KF367457 WIV1 |
KY417146 Rs4231 |
KT444582 WIV16 | 0 | 541 | 0–8891 and 9973–10233 |
|
KC881005 RsSHC014 |
NC_004718 SARS-CoV-1 |
KT444582 WIV16 | 0 | 403 | 0–8891 and 9769–9780 |
|
KY417143 Rs4081 |
KY417146 Rs4231 |
KT444582 WIV16 | 4E-12 | 1781 |
5975–6133 and 8727–12793 3536–5782 and 8727–12793 |
|
NC_004718 SARS-CoV-1 |
KY417143 Rs4081 |
KY417150 Rs4874 | 0 | 2294 | 0–32 and 8838–8840 |
|
KF367457 WIV1 |
KY417146 Rs4231 |
KY417150 Rs4874 | 0 | 541 | 0–8891 and 9973–10233 |
|
KC881005 RsSHC014 |
NC_004718 SARS-CoV-1 |
KY417150 Rs4874 | 0 | 403 | 0–8891 and 9769–9780 |
|
KY417143 Rs4081 |
KY417146 Rs4231 |
KY417150 Rs4874 | 4E-12 | 1782 |
5975–6133 and 8727–12793 3536–5782 and 8727–12793 |
|
EPI_ISL_402125 SARS-CoV-2 |
KU182964 JTMC15 |
EPI_ISL_412977 RmYN02 | 0 | 1111 |
8957–8957 and 10827–10828 8938–8941 and 10831–10845 8957–8957 and 10831–10845 8938–8941 and 10827–10828 |
|
EPI_ISL_410542 P2V |
KY770859 Anlong-112 |
EPI_ISL_412977 RmYN02 | 0 | 3218 |
1904–1907 and 5126–5128 1862–1879 and 5126–5128 1883–1885 and 5126–5128 |
Each recombinant Lineage 1 virus was set as the child sequence, and the parental sequences between the breakpoints identified (minor parent) and on either side (major parent) are listed. The p-value indicates the level of significance indicated by 3SEQ. Breakpoint estimates are given as ranges, and the minimum length of the recombinant region between these breakpoints is given. Numbering is relative to the alignment, which begins at SARS-CoV-2 nucleotide 12,681. When 3SEQ identified more than one set of breakpoint estimates, all were included in the table. Each recombinant region was further analyzed separately for more breakpoints within, since 3SEQ identifies only one at a time.
Figure 6.Recombination breakpoints detected in Lineage 1 ACE2-using sequences. The top of this figure illustrates that the recombination suggested by the change in topology in Fig. 2 for 13 Lineage 1 viruses is supported by formal breakpoint analysis. The breakpoints detected for each of the 13 recombinant Lineage 1 sequences with ACE2-using structure (no deletions) are shown. Sequences that are nearly identical are colored the same for simplicity. The bars represent the sequence of genome beginning 750 bp before RdRp spanning through the end of S2 (SARS-CoV-2 nucleotides 12,681 through 25,176) and each box within represents a recombinant section within the sequence. The breakpoints correspond to those identified in Table 2. Numbering is relative to the alignment. The parental sequence is shown within each box. Sequences identified as the minor parent by 3SEQ were labeled within the breakpoint margins and the major parent outside. Six regions where these sequences appear to be free of recombination are labeled A–F and a corresponding phylogeny for each region is shown below. Regions A and E were further tested for recombination breakpoints in all sequences, not just the 13 Lineage 1 viruses, and were found to be breakpoint-free. The topology of regions A and E is not different enough from Fig. 2 to suggest that recombination within RdRp or RBD significantly changed the interpretation of our results. For each region, sequences were tracked with connecting lines of corresponding color to identify where recombination may have occurred between Lineage 1 and Lineage 5 and hypothesized events are specifically marked with dotted lines. This highlights the secondary recombination of Rs4084 and RsSHC014 in region E on top of the primary recombination in regions B through E. Sequence names of Lineage 2 and 3 viruses are greyed out and Lineages 4 and 5 are collapsed and highlighted in darker grey to make the changes in topology between the trees more visible.
Figure 8.Proposed timeline of deletion and recombination events. The timeline demonstrates the sequence of events that led to loss of ACE2 usage in Lineages 2, 3, and 4 and gain of ACE2 usage within Lineage 1, leading to the emergence of SARS-CoV-1. Events are dated with MRCA age estimates; however, the exact intention is less to provide exact dates and more to suggest a particular order of events, which is strongly supported by the posterior probabilities of the time-calibrated phylogenies. The arrow for the Lineage 4 event is again dashed to demonstrate uncertainty in its positioning. We illustrate two hypotheses for the acquisition and subsequent spread of ACE2 usage in Lineage 1: recombination and persistence. The recombination hypothesis is much more parsimonious, as persistence would require multiple independent deletion events to generate the observed pattern of ACE2 usage.
Figure 7.Time-calibrated phylogenies for recombination-free regions of the genome. Breakpoint-free regions A and E from Fig. 6 were chosen for time calibration since evidence of recombination was found in both RdRp and RBD. Both regions A and E were free of recombination for all sequences included in the tree, ensuring the best possible dating estimates. The MRCA of all Lineage 1 recombinants and its corresponding divergence date are labeled on each tree, demonstrating that the MRCA in region E (within the RBD) is much older than the MRCA in region A (proxy for RdRp, see Fig. 6). This suggests that there would not have been enough time for the RBDs of the recombinants to diversify to the extent shown here if only a single recombination event occurred between Lineage 5 and Lineage 1. The MRCAs of each type are labeled in red (Type 1) and purple (Type 2). Posterior distributions of rate estimates are also shown for each model as well as for a relaxed clock model of region E. For the observed sequence divergence in region E to have accumulated since the MRCA of the 13 recombinants in region A (1852), a clock rate of 5.899e-3 would be required, which is well outside the posterior distributions estimated by both our strict and relaxed clock models.