Literature DB >> 27927902

Challenges in Species Tree Estimation Under the Multispecies Coalescent Model.

Bo Xu1, Ziheng Yang2,3.   

Abstract

The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
Copyright © 2016 by the Genetics Society of America.

Entities:  

Keywords:  BPP; anomaly zone; concatenation; gene trees; incomplete lineage sorting; maximum likelihood; multispecies coalescent; species trees

Mesh:

Year:  2016        PMID: 27927902      PMCID: PMC5161269          DOI: 10.1534/genetics.116.190173

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  79 in total

1.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci.

Authors:  Bruce Rannala; Ziheng Yang
Journal:  Genetics       Date:  2003-08       Impact factor: 4.562

2.  Estimating species phylogenies using coalescence times among sequences.

Authors:  Liang Liu; Lili Yu; Dennis K Pearl; Scott V Edwards
Journal:  Syst Biol       Date:  2009-07-16       Impact factor: 15.683

3.  Maximum tree: a consistent estimator of the species tree.

Authors:  Liang Liu; Lili Yu; Dennis K Pearl
Journal:  J Math Biol       Date:  2009-03-13       Impact factor: 2.259

Review 4.  Gene tree discordance, phylogenetic inference and the multispecies coalescent.

Authors:  James H Degnan; Noah A Rosenberg
Journal:  Trends Ecol Evol       Date:  2009-03-21       Impact factor: 17.712

5.  BEST: Bayesian estimation of species trees under the coalescent model.

Authors:  Liang Liu
Journal:  Bioinformatics       Date:  2008-09-17       Impact factor: 6.937

6.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

7.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence.

Authors:  Laura S Kubatko; Bryan C Carstens; L Lacey Knowles
Journal:  Bioinformatics       Date:  2009-02-10       Impact factor: 6.937

8.  Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors.

Authors:  Ralph Burgess; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2008-07-04       Impact factor: 16.240

Review 9.  Bayesian molecular clock dating of species divergences in the genomics era.

Authors:  Mario dos Reis; Philip C J Donoghue; Ziheng Yang
Journal:  Nat Rev Genet       Date:  2015-12-21       Impact factor: 53.242

10.  Genome-wide evidence for speciation with gene flow in Heliconius butterflies.

Authors:  Simon H Martin; Kanchon K Dasmahapatra; Nicola J Nadeau; Camilo Salazar; James R Walters; Fraser Simpson; Mark Blaxter; Andrea Manica; James Mallet; Chris D Jiggins
Journal:  Genome Res       Date:  2013-09-17       Impact factor: 9.043

View more
  29 in total

1.  Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone.

Authors:  Alison Cloutier; Timothy B Sackton; Phil Grayson; Michele Clamp; Allan J Baker; Scott V Edwards
Journal:  Syst Biol       Date:  2019-11-01       Impact factor: 15.683

2.  Genomic evidence reveals a radiation of placental mammals uninterrupted by the KPg boundary.

Authors:  Liang Liu; Jin Zhang; Frank E Rheindt; Fumin Lei; Yanhua Qu; Yu Wang; Yu Zhang; Corwin Sullivan; Wenhui Nie; Jinhuan Wang; Fengtang Yang; Jinping Chen; Scott V Edwards; Jin Meng; Shaoyuan Wu
Journal:  Proc Natl Acad Sci U S A       Date:  2017-08-14       Impact factor: 11.205

3.  Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees.

Authors:  Ziheng Yang; Tianqi Zhu
Journal:  Proc Natl Acad Sci U S A       Date:  2018-02-05       Impact factor: 11.205

4.  Probabilities of Unranked and Ranked Anomaly Zones under Birth-Death Models.

Authors:  Anastasiia Kim; Noah A Rosenberg; James H Degnan
Journal:  Mol Biol Evol       Date:  2020-05-01       Impact factor: 16.240

5.  Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent.

Authors:  Tomáš Flouri; Xiyun Jiao; Bruce Rannala; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2018-10-01       Impact factor: 16.240

6.  Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow.

Authors:  Xiyun Jiao; Tomáš Flouri; Ziheng Yang
Journal:  Natl Sci Rev       Date:  2021-07-15       Impact factor: 17.275

7.  Full-Likelihood Genomic Analysis Clarifies a Complex History of Species Divergence and Introgression: The Example of the erato-sara Group of Heliconius Butterflies.

Authors:  Yuttapong Thawornwattana; Fernando A Seixas; Ziheng Yang; James Mallet
Journal:  Syst Biol       Date:  2022-08-10       Impact factor: 9.160

8.  Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti.

Authors:  Dario Copetti; Alberto Búrquez; Enriquena Bustamante; Joseph L M Charboneau; Kevin L Childs; Luis E Eguiarte; Seunghee Lee; Tiffany L Liu; Michelle M McMahon; Noah K Whiteman; Rod A Wing; Martin F Wojciechowski; Michael J Sanderson
Journal:  Proc Natl Acad Sci U S A       Date:  2017-10-23       Impact factor: 11.205

9.  An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times.

Authors:  Konstantinos Angelis; Sandra Álvarez-Carretero; Mario Dos Reis; Ziheng Yang
Journal:  Syst Biol       Date:  2018-01-01       Impact factor: 15.683

10.  Efficient Bayesian Species Tree Inference under the Multispecies Coalescent.

Authors:  Bruce Rannala; Ziheng Yang
Journal:  Syst Biol       Date:  2017-09-01       Impact factor: 15.683

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.