| Literature DB >> 27774277 |
Darren P Martin1, Ben Murrell2, Michael Golden3, Arjun Khoosal1, Brejnev Muhire1.
Abstract
RDP4 is the latest version of recombination detection program (RDP), a Windows computer program that implements an extensive array of methods for detecting and visualising recombination in, and stripping evidence of recombination from, virus genome sequence alignments. RDP4 is capable of analysing twice as many sequences (up to 2,500) that are up to three times longer (up to 10 Mb) than those that could be analysed by older versions of the program. RDP4 is therefore also applicable to the analysis of bacterial full-genome sequence datasets. Other novelties in RDP4 include (1) the capacity to differentiate between recombination and genome segment reassortment, (2) the estimation of recombination breakpoint confidence intervals, (3) a variety of 'recombination aware' phylogenetic tree construction and comparison tools, (4) new matrix-based visualisation tools for examining both individual recombination events and the overall phylogenetic impacts of multiple recombination events and (5) new tests to detect the influences of gene arrangements, encoded protein structure, nucleic acid secondary structure, nucleotide composition, and nucleotide diversity on recombination breakpoint patterns. The key feature of RDP4 that differentiates it from other recombination detection tools is its flexibility. It can be run either in fully automated mode from the command line interface or with a graphically rich user interface that enables detailed exploration of both individual recombination events and overall recombination patterns.Entities:
Keywords: horizontal gene transfer; lateral gene transfer; reassortment; sequence analysis software
Year: 2015 PMID: 27774277 PMCID: PMC5014473 DOI: 10.1093/ve/vev003
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.The main elements of the RDP4 program interface. The interface is split into four main resizable components: (1) a ‘zoomable’ sequence display that serves both as an alignment viewer and as a viewer of colour coded recombinant and parental sequences; (2) interchangeable tree/matrix/information displays that provide information on individual user-selected recombination events such as inferred breakpoint locations (and statistically plausible alternative locations), parental sequences (and phylogenetically plausible alternative parents), analysis warnings (such as if there is a high probability of recombinants and/or recombination breakpoints having been misidentified), and relative degrees of support by different analysis methods for detected recombination signals; (3) a schematic sequence display depicting colour-coded representations of the analysed sequences and the locations of detected recombination events; and (4) a plot display graphically illustrating the statistical evidence underlying the detection of individual user-selected recombination events.
Figure 2.Examples of tools that are available in RDP4 for visualising overall patterns of recombination. The dataset examined here is the foot-and-mouth disease virus (FMDV) full genome dataset analysed in Heath et al. (2006; see the file, Example3(FMDV).rdp, that is distributed with RDP4). (a) Population-scaled recombination rate plots indicating variations in basal recombination rates across FMDV genomes and the presence of a likely recombination cold-spot between nucleotide positions ∼2000 and ∼4000. (b) Recombination breakpoint density plots indicating the presence of two recombination breakpoint hotspots at nucleotide positions ∼1900 and ∼4100. (c) Recombination breakpoint pair matrix. The yellow-red spot indicates that whenever a breakpoint occurs at nucleotide position ∼1900, there is a very strong tendency for a second breakpoint to occur at position ∼4100 (i.e., positions 1900 and 4100 are not only breakpoint hotspots: they are a breakpoint hotspot pair). (d) Recombination region count matrix. The top half of the matrix indicates that nucleotide site positions that are bounded by the recombination hotspots indicated in (b) tend to be co-inherited from the same parental virus (indicated by the dark blue triangle representing all pairs of sites between nucleotide positions ∼1900 and ∼4100). The bottom half of the matrix indicates site-pairs that are significantly more (in blue) or less (in red) frequently co-inherited during recombination than would be expected under random recombination. (e) Phylogenetic compatibility matrices illustrating the overall phylogenetic impacts of recombination in this FMDV dataset. Both the Shimodaira–Hasegawa (upper half) and Robinson–Foulds (lower half) compatibility matrices demonstrate that phylogenetic trees constructed for different parts of region ∼2000 to ∼4000 (indicated by the large blue-green triangles off the diagonals of both matrices) tend to be less different from one another than phylogenetic trees constructed from similarly sized portions of sequence sampled from elsewhere along the alignment (indicated by red-orange colours): these matrices therefore support the finding in (a) that there is a recombination cold-spot between nucleotide positions ∼2000 and ∼4000.