Literature DB >> 25963975

Species Tree Inference Using a Mixture Model.

Ikram Ullah1, Pekka Parviainen2, Jens Lagergren3.   

Abstract

Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by coestimating gene trees and the species tree but this approach poses a scalability problem for larger data sets. We present MixTreEM-DLRS: A two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural expectation maximization algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model (Åkerborg O, Sennblad B, Arvestad L, Lagergren J. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A. 106(14):5714-5719), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad L, Lagergren J, Sennblad B. 2009. The gene evolution model and computing its associated probabilities. J ACM. 56(2):1-44). We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance with a recent genome-scale species tree reconstruction method PHYLDOG (Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V. 2013. Genome-scale coestimation of species and gene trees. Genome Res. 23(2):323-330) as well as with a fast parsimony-based algorithm Duptree (Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13):1540-1541). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem/.
© The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Keywords:  expectation maximization; mammalian phylogeny; mixture model; phylogenetics; species trees

Mesh:

Year:  2015        PMID: 25963975     DOI: 10.1093/molbev/msv115

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  4 in total

1.  Probabilistic inference of lateral gene transfer events.

Authors:  Mehmood Alam Khan; Owais Mahmudi; Ikram Ullah; Lars Arvestad; Jens Lagergren
Journal:  BMC Bioinformatics       Date:  2016-11-11       Impact factor: 3.169

Review 2.  Recent progress on methods for estimating and updating large phylogenies.

Authors:  Paul Zaharias; Tandy Warnow
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2022-08-22       Impact factor: 6.671

3.  Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer.

Authors:  Ruth Davidson; Pranjal Vachaspati; Siavash Mirarab; Tandy Warnow
Journal:  BMC Genomics       Date:  2015-10-02       Impact factor: 3.969

4.  FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models.

Authors:  Erin K Molloy; Tandy Warnow
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.