| Literature DB >> 19812727 |
Bastien Boussau1, Laurent Guéguen, Manolo Gouy.
Abstract
Homologous recombination is a pervasive biological process that affects sequences in all living organisms and viruses. In the presence of recombination, the evolutionary history of an alignment of homologous sequences cannot be properly depicted by a single bifurcating tree: some sites have evolved along a specific phylogenetic tree, others have followed another path. Methods available to analyse recombination in sequences usually involve an analysis of the alignment through sliding-windows, or are particularly demanding in computational resources, and are often limited to nucleotide sequences. In this article, we propose and implement a Mixture Model on trees and a phylogenetic Hidden Markov Model to reveal recombination breakpoints while searching for the various evolutionary histories that are present in an alignment known to have undergone homologous recombination. These models are sufficiently efficient to be applied to dozens of sequences on a single desktop computer, and can handle equivalently nucleotide or protein sequences. We estimate their accuracy on simulated sequences and test them on real data.Entities:
Keywords: PhyML; maximum likelihood; molecular phylogeny; recombination
Year: 2009 PMID: 19812727 PMCID: PMC2747125 DOI: 10.4137/ebo.s2242
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1Example rooted tree for likelihood computation.
Figure 2Client-Server architecture to efficiently find a set of topologies that best describe the alignment.
Figure 3Ability of the Phylo-HMM (left) and the Mixture Model (right) and to detect the number of segments in simulated alignments.
Figure 4Ability of the Phylo-HMM (left) and Mixture Model (right) to detect the breakpoint position in simulated alignments. The dashed grey line corresponds to values that would be obtained with an ideal method, whose reconstructions are identical to simulations.
Figure 5Ability of the Phylo-HMM (left) and Mixture Model (right) to recover topologies from simulated alignments. RF distances were computed between simulated and reconstructed trees for each part of the alignments, and are reported with respect to the number of sites the reconstructed trees are based upon.
Figure 6Trees found by the Phylo-HMM on Gao et al data. The trees found by the Mixture model are nearly identical.
| Topologies | Site 1 likelihood | Site 2 likelihood | Site 3 likelihood | Site 4 likelihood |
|---|---|---|---|---|
| Topology 1 | 10−2 | 10−4 | 10−4 | 10−4 |
| Topology 2 | 10−4 | 10−2 | 10−3 | 10−4 |
| Topology 3 | 10−4 | 10−4 | 10−2 | 10−4 |
| Topology 4 | 10−4 | 10−4 | 10−4 | 10−2 |