| Literature DB >> 34462471 |
Simon Pollett1,2,3, Matthew A Conte1, Mark Sanborn1, Richard G Jarman1, Grace M Lidl1, Kayvon Modjarrad4, Irina Maljkovic Berry5.
Abstract
The SARS-CoV-2 pandemic prompts evaluation of recombination in human coronavirus (hCoV) evolution. We undertook recombination analyses of 158,118 public seasonal hCoV, SARS-CoV-1, SARS-CoV-2 and MERS-CoV genome sequences using the RDP4 software. We found moderate evidence for 8 SARS-CoV-2 recombination events, two of which involved the spike gene, and low evidence for one SARS-CoV-1 recombination event. Within MERS-CoV, 229E, OC43, NL63 and HKU1 datasets, we noted 7, 1, 9, 14, and 1 high-confidence recombination events, respectively. There was propensity for recombination breakpoints in the non-ORF1 region of the genome containing structural genes, and recombination severely skewed the temporal structure of these data, especially for NL63 and OC43. Bayesian time-scaled analyses on recombinant-free data indicated the sampled diversity of seasonal CoVs emerged in the last 70 years, with 229E displaying continuous lineage replacements. These findings emphasize the importance of genomic based surveillance to detect recombination in SARS-CoV-2, particularly if recombination may lead to immune evasion.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34462471 PMCID: PMC8405798 DOI: 10.1038/s41598-021-96626-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Frequency of recombination events detected in 229E, NL63, OC43, HKU1, MERS-CoV, SARS-CoV1 and SARS-CoV-2, stratified by level of evidence.
| Coronavirus species | Recombination events detected by any method | Recombination events detected by ≥ 3 methods | Recombination events detected by ≥ 3 methods and without another evolutionary process possibly explaining the recombination signal (high evidence) | Recombination events with high level evidence seen in multiple genomes | |
|---|---|---|---|---|---|
| hCoV-229E | 22 | 4 | 3 | 1 | 1 |
| hCoV-NL63 | 65 | 31 | 24 | 14 | 7 |
| hCoV-OC43 | 138 | 23 | 16 | 9 | 6 |
| hCoV-HKU1 | 37 | 14 | 9 | 1 | 1 |
| MERS-CoV | 365 | 12 | 10 | 7 | 6 |
| SARS-CoV-1 | 49 | 1 | 0 | 0 | 0 |
| SARS-CoV-2 | 100296a | 33 | 8 | 0 | 0 |
aRandomly subsampled into 100 × n = 300 independent datasets.
Figure 1Estimated recombination breakpoint positions NL63, OC43 and MERS-CoV whole genomes. p values for the frequency of recombination breakpoints in the non-ORF1 region (containing the structural genes) versus the ORF1 region are derived by the χ2 test. Approximate breakpoints are breakpoints that could not be placed with certainty due to overlapping recombination or other reasons.
Figure 2Maximum likelihood phylogeny of recombinants in NL63. Scale represents nucleotides per site. Recombinant events with multiple genomes are marked in blue, or as singletons are marked in yellow. Phylogeny was rooted with a 229E outgroup (removed for clarity).
Figure 3Maximum likelihood phylogenies of recombinants in OC43. Scale represents nucleotides per site. Recombinant events with multiple genomes are marked in blue, or as singletons are marked in yellow. Phylogeny was rooted with an HKU1 outgroup (removed for clarity).
Figure 4Maximum likelihood phylogeny of recombinants in MERS-CoV. Scale represents nucleotides per site. (a) Taxa colored by host (camel = black, human = green). (b) Colored taxa indicate confirmed recombinant clades.
Root-to-tip regression coefficient and intercept of seasonal hCoV phylogenies with and without recombinants removed.
| Lineage | Sequences (n) | Date range (years) | Slope coefficienta | Intercept (TMRCA)b |
|---|---|---|---|---|
| hCoV-229E | 22 | 26.36 | 2.69 × 10–4 | 1990 A.D |
| hCoV-NL63 | 65 | 35.38 | − 1.00 × 10–4 | 2149 A.D |
| hCoV-OC43 | 138 | 33.98 | − 0.00 | 27,359 A.D |
| hCoV-HKU1 | 37 | 14.17 | 4.52 × 10–4 | 1941 A.D |
| hCoV-229E | 19 | 26.36 | 2.65 × 10–4 | 1990 A.D |
| hCoV-NL63* | 56 | 35.04 | 7.68 × 10–5 | 1944 A.D |
| hCoV-OC43 | 110 | 34.00 | 2.83 × 10–4 | 1967 A.D |
| hCoV-HKU1 | 24 | 13.41 | 1.00 × 10–3 | 1978 A.D |
aApproximates evolutionary rate (substitutions/site/year).
bApproximates TMRCA.
*Non-recombinant region was used, with genomes containing breakpoints in this region removed (N = 9).
Bayesian TMRCA estimates for 229E, HKU1, NL63 and OC43a.
| TMRCA (A.D) | Lower 95% HPD | Upper 95% HPD | Nucleotide Subst Model | Clock model | Demographic model | |
|---|---|---|---|---|---|---|
| 229Ea | 1989 | 1988 | 1990 | SYM + I + G | Strict | Constant |
| HKU1a | 1951 | 1842 | 1998 | GTR + G + I | UCLN | Bayesian Skyline |
| NL63b | 1964 | 1945 | 1978 | HKY + I | UCLN | Bayesian Skyline |
| OC43a | 1970 | 1960 | 1978 | GTR + I | UCLN | Bayesian Skyline |
aRecombinant genomes removed.
bNon-recombinant region 13093-20198.
UCLN, uncorrelated lognormal; TMRCA, time to most recent common ancestor.