| Literature DB >> 35814334 |
Arné de Klerk1, Phillip Swanepoel1, Rentia Lourens2, Mpumelelo Zondo1, Isaac Abodunran1, Spyros Lytras3, Oscar A MacLean3, David Robertson3, Sergei L Kosakovsky Pond4, Jordan D Zehr4, Venkatesh Kumar5, Michael J Stanhope6, Gordon Harkins7, Ben Murrell5, Darren P Martin1.
Abstract
Recombination contributes to the genetic diversity found in coronaviruses and is known to be a prominent mechanism whereby they evolve. It is apparent, both from controlled experiments and in genome sequences sampled from nature, that patterns of recombination in coronaviruses are non-random and that this is likely attributable to a combination of sequence features that favour the occurrence of recombination break points at specific genomic sites, and selection disfavouring the survival of recombinants within which favourable intra-genome interactions have been disrupted. Here we leverage available whole-genome sequence data for six coronavirus subgenera to identify specific patterns of recombination that are conserved between multiple subgenera and then identify the likely factors that underlie these conserved patterns. Specifically, we confirm the non-randomness of recombination break points across all six tested coronavirus subgenera, locate conserved recombination hot- and cold-spots, and determine that the locations of transcriptional regulatory sequences are likely major determinants of conserved recombination break-point hotspot locations. We find that while the locations of recombination break points are not uniformly associated with degrees of nucleotide sequence conservation, they display significant tendencies in multiple coronavirus subgenera to occur in low guanine-cytosine content genome regions, in non-coding regions, at the edges of genes, and at sites within the Spike gene that are predicted to be minimally disruptive of Spike protein folding. While it is apparent that sequence features such as transcriptional regulatory sequences are likely major determinants of where the template-switching events that yield recombination break points most commonly occur, it is evident that selection against misfolded recombinant proteins also strongly impacts observable recombination break-point distributions in coronavirus genomes sampled from nature.Entities:
Keywords: Coronavirus; Evolution; Phylogenetics; Recombination; Selection
Year: 2022 PMID: 35814334 PMCID: PMC9261289 DOI: 10.1093/ve/veac054
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Variation across coronavirus genomes in the densities of detectable recombination break points. All detected break-point positions are indicated directly above each graph with vertical lines. A gene-map is shown as lines beneath the densities. The grey-lined, green areas indicate 99 per cent bounds of expected degrees of break-point clustering under random recombination. Areas where the dark/black lines (break-point number per 200 nucleotide window) have emerged above the green areas, are considered potential recombination hotspots, and are marked (brightly) in red. Areas where the black lines drop below the green areas are considered potential recombination cold-spots, and are marked in (cold) blue.
Figure 2.Recombination region count matrices indicating genome regions that are most and least commonly transferred during detectable coronavirus recombination events. Unique recombination events for six coronavirus subgenera, mapped onto recombination region count matrices based on determined break-point positions. Each cell in the matrix represents a pair of genome sites with the colours (heat) of cells indicating the number of times recombination events separated the represented pairs of sites. Reference sequence gene maps of the most prevalent virus in each subgenus were obtained from the NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore) and are plotted alongside each matrix. Nucleotide positions are plotted according to full analysed nucleotide sequence alignment (Supplementary material). Genome maps indicate the coding regions of individual protein products. Non-structure proteins encoded by ORF1ab are indicated in blue (cold) and other genes are indicated in orange (warm).
Comparison of detectable break-point numbers in non-coding regions and coding regions with rows in bold indicating subgenera with significantly more break points in non-coding regions than would be expected under random recombination.
| Subgenus | BPs | BPs in coding regions | Permutation P-val |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 66 | 0.660 |
|
|
|
|
|
|
|
|
|
|
|
| 30 | 1683 | 0.650 |
BPs = Break points.
Break-point densities falling in the end 10 per cent (5 per cent each end) of genes vs the middle 90 per cent of genes with rows in bold indicating subgenera with significantly higher numbers of detectable break points in the ending 10 per cent of genes than would be expected under random recombination.
| Subgenus | BPs | BPs in the middle 90% of genes | Permutation P-val |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| 5 | 127 | 0.810 |
|
| 12 | 112 | 0.054 |
|
|
|
|
|
|
|
|
|
|
BPs = Break points.
Individual genes and sub-gene regions with significantly lower numbers of detectable break points than would be expected under random recombination.
| Subgenus | Genome region | BPs | BPs outside region | Permutation P-val |
|---|---|---|---|---|
|
| ORF1a | 114 | 278 | 0.001 |
|
| ORF1a | 43 | 130 | 0.001 |
|
| ORF1a | 43 | 130 | 0.001 |
|
| ORF1a | 14 | 65 | <0.001 |
|
| ORF1a | 94 | 213 | <0.001 |
|
| ORF1a | 667 | 1016 | 0.024 |
|
| plpro (nsp3) | 7 | 72 | 0.039 |
|
| plpro (nsp3) | 49 | 258 | 0.031 |
|
| plpro (nsp3) | 282 | 1401 | 0.016 |
|
| nsp4 | 0 | 66 | 0.035 |
|
| nsp4 | 78 | 1605 | 0.002 |
BPs = Break points.
Associations between decreased GC content and detected recombination break-point sites with rows in bold indicating subgenera displaying average GC contents in the vicinity of break-point sites that are significantly lower than what would be expected under random recombination.
| Within 20 nt of break-point site | WIthin 10 nt of break-point site | |||
|---|---|---|---|---|
| Subgenus | P-val. | Significant | P-val | Significant |
|
|
|
|
|
|
|
| 0.322 | No | 0.080 | Marginal |
|
| 0.590 | No | 0.051 | Marginal |
|
| 0.791 | No | 0.693 | No |
|
|
|
|
|
|
|
| 0.948 | No | 0.911 | No |
Figure 3.Regional variations in average pairwise sequence similarity (green/ top x-axis parameter) and GC content (blue/ bottom horizontal X-axis parameter) across coronavirus genomes. The plotted values indicate the pairwise sequence similarity and GC proportions within a moving 40-nucleotide window. Also indicated are the locations of the main genes (above each graph), transcriptional regulatory sequences (TRDs; in purple/ top stripes beneath gene boxes), identified break-point locations (in mustard/ beneath TRSB locations), potential recombination hotspots (in red/ Y-axis bright stripes through graphs) and potential recombination cold-spots (in blue/ Y-axis cold stripes through graphs).
Association of break-point locations with higher/lower degrees of average pairwise sequence similarity with rows in bold indicating significant associations.
| Within 20 nt of break-point site | Within 10 nt of break-point site | |||
|---|---|---|---|---|
| Subgenus | Association with higher/lower similarity | P-val | Association with higher/lower similarity | P-val |
|
|
|
|
|
|
|
|
|
|
|
|
|
| Higher | 0.184 | Higher | 0.192 |
|
| Higher | 0.465 | Higher | 0.475 |
|
|
|
|
|
|
|
|
|
|
|
|
Associations between transcription regulatory sequence (TRS) sites and the locations of detected recombination break points with P-values in bold indicating significant associations of TRS sites with higher break-point numbers.
| Subgenus | Within 46 nts P-val | Within 21 nts P-val | Within 9 nts P-val | Within 2 nts P-val |
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.117 | 0.178 | 0.210 | 0.806 |
|
|
|
|
| 0.478 |
|
|
|
|
|
|
|
|
|
|
|
|
Conserved patterns of recombination across various coronavirus subgenera. Rows each contain the result of either a statistical test or the presence/absence of a particular characteristic of recombination (such as the presence of a hotspot at a specific genome location): BP = break point; blue = significant association or presence of characteristic; light blue = marginally significant association; pink = no significant association or absence of characteristic; white = untested.