Literature DB >> 20798170

RDP3: a flexible and fast computer program for analyzing recombination.

Darren P Martin1, Philippe Lemey, Martin Lott, Vincent Moulton, David Posada, Pierre Lefeuvre.   

Abstract

UNLABELLED: RDP3 is a new version of the RDP program for characterizing recombination events in DNA-sequence alignments. Among other novelties, this version includes four new recombination analysis methods (3SEQ, VISRD, PHYLRO and LDHAT), new tests for recombination hot-spots, a range of matrix methods for visualizing over-all patterns of recombination within datasets and recombination-aware ancestral sequence reconstruction. Complementary to a high degree of analysis flow automation, RDP3 also has a highly interactive and detailed graphical user interface that enables more focused hands-on cross-checking of results with a wide variety of newly implemented phylogenetic tree construction and matrix-based recombination signal visualization methods. The new RDP3 can accommodate large datasets and is capable of analyzing alignments ranging in size from 1000 × 10 kilobase sequences to 20 × 2 megabase sequences within 48 h on a desktop PC. AVAILABILITY: RDP3 is available for free from its web site http://darwin.uvigo.es/rdp/rdp.html.

Entities:  

Mesh:

Year:  2010        PMID: 20798170      PMCID: PMC2944210          DOI: 10.1093/bioinformatics/btq467

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


rpd3 is a computer program for statistical identification and characterization of historical recombination events. Given a set of aligned nucleotide sequences, rpd3 will rapidly analyze these with a range of powerful non-parametric recombination detection methods (including bootscan, maxchi, chimaera, 3seq, geneconv, siscan, phylpro and visrd; Boni et al., 2007; Gibbs et al., 2000; Lemey et al., 2009; Padidam et al., 1999, Posada and Crandall, 2001; Weiller, 1998). It will provide a detailed breakdown of recombination breakpoint locations, and the identities of recombinant and parental sequences. For further downstream analyses, the program enables users to save edited sequence alignments with (i) recombinant sequences removed; (ii) recombinationally derived tracts of sequence removed; or (iii) recombinant sequences split into their constituent parts. An important strength of rdp3 that makes it applicable to a variety of recombination analysis problems is that, unlike many other recombination detection programs such as simplot (Lole et al., 1999), dual brothers (Minin et al., 2005), jphmm (Schultz et al., 2006) or scueal (Kosakovsky et al., 2009), it does not screen predefined sets of potentially recombinant (or query) sequences against other predefined sets of non-recombinant (or reference) sequences. rdp3 instead treats every sequence within an input alignment as a potential recombinant and systematically screens large numbers of sequence triplets and/or quartets to identify sets of three or four sequences that contain a recombinant and two sequences resembling its parents. Such an approach means that rdp3 can simultaneously detect the entire scope of recombination evident within a dataset (i.e. not just that occurring between the reference strains or species) enabling its use in the characterization of complex recombinants such as those derived through recombination between parental sequences that were themselves recombinant. The drawback of such a flexible, exploratory framework is that it can often be difficult to assess the uncertainty associated with inferred recombination patterns. However, with its wide range of cross-checking tools, rpd3 is complementary to probabilistic recombination analysis approaches.

1 NEW FEATURES IN rpd3

Although the graphically intensive and highly interactive rpd3 interface remains superficially unchanged from that of its predecessor, rpd2 (Martin et al., 2005a, b), it includes simple point-and-click access to a multitude of powerful new features. Among these are three new non-parametric recombination detection methods (3seq, visrd and phylpro; Boni et al., 2007; Lemey et al., 2009; Weiller, 1998), a parametric recombination rate estimation method (ldhat; McVean et al., 2004), two new tree construction methods (Maximum likelihood with phyml and Bayesian with mrbayes; Guindon and Gascual, 2003; Ronquist and Huelsenbeck, 2003), two recombination hotspot-tests (Heath et al., 2006), a test of recombination induced protein mis-folding (Lefeuvre et al., 2007; Voigt et al., 2002), recombination-aware methods for reconstructing ancestral sequences (Arenas and Posada et al., 2010) and a range of matrix methods for visualizing overall patterns of recombination within datasets (Jakobsen and Easteal, 1996; Lefeuvre et al., 2009; McVean et al., 2004). In addition to the new methods implemented in rpd3, another important improvement over rpd2 is the way in which rpd3 automatically scans alignments for recombination signals and then infers the minimum numbers of recombination events needed to account for these signals. rpd3 implements a range of heuristic recombinant sequence identification methods based on the phylpro (Weiller, 1998), visrd (Lemey et al., 2009) and subtree-prune and regraft methods (that identify recombinants sequences as those which ‘jump’ between the branches of phylogenetic trees constructed from different fragments of the same sequence alignment; Beiko and Hamilton, 2006; Heath et al., 2006). rdp3 also automatically checks detected recombination signals to determine whether they might not be better accounted for by sequence misalignment than recombination. Misalignments introduce homoplasy and are a common cause of false positive recombination signals. Misalignments are automatically detected in rpd3 by separately realigning recombinant sequences with each of their identified parents (rpd3 uses clustalw to do this; Chenna et al., 2003) and comparing these pair-wise alignments to those of the corresponding sequence pairs in the full multiple sequence alignment. By more accurately identifying recombinant sequences and discounting recombination signals attributable to sequence misalignments, rpd3 significantly outperforms rdp2 for overall quantitative assessments of recombination patterns such as those carried out in the new breakpoint hot-spot and protein folding disruption tests. In addition to streamlined tools for managing, testing and editing information on detected recombination events, rpd3 also provides a range of new tools for users to cross-check how accurately the program has identified (i) groups of recombinants supposedly sharing traces of the same recombination events; (ii) recombinant and parental sequences; and (iii) recombination breakpoint positions. These include heat-plots indicating how closely the recombination patterns in two recombinants resemble one another in relation to their supposed parental sequences, color coded phylogenetic trees for identifying recombinants and parental sequences and maxchi (Maynard Smith, 1992) and lard (Holmes et al., 1999) breakpoint matrices for manually identifying breakpoint positions. All of the automated recombination detection methods in rpd3 have been rigorously speed optimized and as a result the program is able to analyze datasets containing up to 40 million nt within 48 h on a standard 2 GHz processor with 2 GB of RAM. Such large datasets might, for example, consist of 20 full bacterial genome sequences, or 1000 full viral genome sequences. With default program settings datasets containing 100 10 kb long sequences can be analyzed within 10 min. Funding: Wellcome Trust (to D.P.M.); Postdoctoral fellowship from the Fund for Scientific Research (FWO) Flanders (to Ph.L.); South African Centre of High Performance Computing bursary (to M.L.); European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.); Spanish Ministry of Science and Education (BFU2009-08611 to D.P.); GIS CRVOI (grant NPRAO/AIRD/CRVOI/08/03 to Pi.L.); Wellcome Trust (grant number GR079127MA). Conflict of Interest: none declared.
  25 in total

1.  Evaluation of methods for detecting recombination from DNA sequences: computer simulations.

Authors:  D Posada; K A Crandall
Journal:  Proc Natl Acad Sci U S A       Date:  2001-11-20       Impact factor: 11.205

2.  Protein building blocks preserved by recombination.

Authors:  Christopher A Voigt; Carlos Martinez; Zhen-Gang Wang; Stephen L Mayo; Frances H Arnold
Journal:  Nat Struct Biol       Date:  2002-07

3.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

4.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Authors:  Stéphane Guindon; Olivier Gascuel
Journal:  Syst Biol       Date:  2003-10       Impact factor: 15.683

Review 5.  Analyzing the mosaic structure of genes.

Authors:  J M Smith
Journal:  J Mol Evol       Date:  1992-02       Impact factor: 2.395

6.  Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences.

Authors:  G F Weiller
Journal:  Mol Biol Evol       Date:  1998-03       Impact factor: 16.240

7.  A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences.

Authors:  I B Jakobsen; S Easteal
Journal:  Comput Appl Biosci       Date:  1996-08

8.  Phylogenetic evidence for recombination in dengue virus.

Authors:  E C Holmes; M Worobey; A Rambaut
Journal:  Mol Biol Evol       Date:  1999-03       Impact factor: 16.240

9.  The fine-scale structure of recombination rate variation in the human genome.

Authors:  Gilean A T McVean; Simon R Myers; Sarah Hunt; Panos Deloukas; David R Bentley; Peter Donnelly
Journal:  Science       Date:  2004-04-23       Impact factor: 47.728

10.  Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination.

Authors:  K S Lole; R C Bollinger; R S Paranjape; D Gadkari; S S Kulkarni; N G Novak; R Ingersoll; H W Sheppard; S C Ray
Journal:  J Virol       Date:  1999-01       Impact factor: 5.103

View more
  655 in total

1.  Complete nucleotide sequence of a South African isolate of Grapevine fanleaf virus.

Authors:  Renate L Lamprecht; Hans J Maree; Dirk Stephan; Johan T Burger
Journal:  Virus Genes       Date:  2012-06-05       Impact factor: 2.332

2.  Recombination analysis of Maize dwarf mosaic virus (MDMV) in the Sugarcane mosaic virus (SCMV) subgroup of potyviruses.

Authors:  Gyöngyvér Gell; Endre Sebestyén; Ervin Balázs
Journal:  Virus Genes       Date:  2014-11-13       Impact factor: 2.332

3.  Isolation-driven divergence: speciation in a widespread North American songbird (Aves: Certhiidae).

Authors:  Joseph D Manthey; John Klicka; Garth M Spellman
Journal:  Mol Ecol       Date:  2011-09-21       Impact factor: 6.185

4.  Role of multiple hosts in the cross-species transmission and emergence of a pandemic parvovirus.

Authors:  Andrew B Allison; Carole E Harbison; Israel Pagan; Karla M Stucker; Jason T Kaelber; Justin D Brown; Mark G Ruder; M Kevin Keel; Edward J Dubovi; Edward C Holmes; Colin R Parrish
Journal:  J Virol       Date:  2011-11-09       Impact factor: 5.103

5.  Nature and intensity of selection pressure on CRISPR-associated genes.

Authors:  Nobuto Takeuchi; Yuri I Wolf; Kira S Makarova; Eugene V Koonin
Journal:  J Bacteriol       Date:  2011-12-16       Impact factor: 3.490

6.  Human metapneumovirus G protein is highly conserved within but not between genetic lineages.

Authors:  Chin-Fen Yang; Chiaoyin K Wang; Sharon J Tollefson; Linda D Lintao; Alexis Liem; Marla Chu; John V Williams
Journal:  Arch Virol       Date:  2013-02-06       Impact factor: 2.574

7.  Diversity of beet curly top Iran virus isolated from different hosts in Iran.

Authors:  Sara Gharouni Kardani; Jahangir Heydarnejad; Mohammad Zakiaghl; Mohsen Mehrvar; Simona Kraberger; Arvind Varsani
Journal:  Virus Genes       Date:  2013-01-18       Impact factor: 2.332

8.  A new begomovirus-betasatellite complex is associated with chilli leaf curl disease in Sri Lanka.

Authors:  D M J B Senanayake; J E A R M Jayasinghe; S Shilpi; S K Wasala; Bikash Mandal
Journal:  Virus Genes       Date:  2012-10-23       Impact factor: 2.332

9.  Genetic analysis of Iranian population of Potato leafroll virus based on ORF0.

Authors:  Shaheen Nourinejhad Zarghani; Masoud Shams-Bakhsh; Neda Zand; Nemat Sokhandan-Bashir; Maghsoud Pazhouhandeh
Journal:  Virus Genes       Date:  2012-08-18       Impact factor: 2.332

10.  Genetic characterization of feline calicivirus strains associated with varying disease manifestations during an outbreak season in Missouri (1995-1996).

Authors:  Victor G Prikhodko; Carlos Sandoval-Jaime; Eugenio J Abente; Karin Bok; Gabriel I Parra; Igor B Rogozin; Eileen N Ostlund; Kim Y Green; Stanislav V Sosnovtsev
Journal:  Virus Genes       Date:  2013-11-12       Impact factor: 2.332

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.