Literature DB >> 26589995

The Impact of Missing Data on Species Tree Estimation.

Zhenxiang Xi1, Liang Liu2, Charles C Davis3.   

Abstract

Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and gene rate heterogeneity. We demonstrate that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed (by gene and/or by species) and that a sufficiently large number of genes are sampled. When data sets are indecisive sensu Sanderson et al. (2010. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol. 10:155) and/or ILS is high, however, high amounts of missing data that are randomly distributed require exhaustive levels of gene sampling, likely exceeding most empirical studies to date. Moreover, missing data become especially problematic when they are nonrandomly distributed. We demonstrate that STAR produces inconsistent results when the amount of nonrandom missing data is high, regardless of the degree of ILS and gene rate heterogeneity. Similarly, concatenation methods using maximum likelihood can be misled by nonrandom missing data in the presence of gene rate heterogeneity, which becomes further exacerbated when combined with high ILS. In contrast, ASTRAL, MP-EST, and MRP are more robust under all of these scenarios. These results underscore the importance of understanding the influence of missing data in the phylogenomics era.
© The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  coalescent methods; concatenation methods; gene rate heterogeneity; incomplete lineage sorting; missing data; species tree estimation

Mesh:

Year:  2015        PMID: 26589995     DOI: 10.1093/molbev/msv266

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  29 in total

1.  The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets.

Authors:  Xiaodong Jiang; Scott V Edwards; Liang Liu
Journal:  Syst Biol       Date:  2020-07-01       Impact factor: 15.683

2.  Loss of color terms not demonstrated.

Authors:  David Nash
Journal:  Proc Natl Acad Sci U S A       Date:  2017-09-14       Impact factor: 11.205

3.  Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships.

Authors:  Arun N Prasanna; Daniel Gerber; Teeratas Kijpornyongpan; M Catherine Aime; Vinson P Doyle; Laszlo G Nagy
Journal:  Syst Biol       Date:  2020-01-01       Impact factor: 15.683

4.  Mitogenomes provide new insights of evolutionary history of Boreheptagyiini and Diamesini (Diptera: Chironomidae: Diamesinae).

Authors:  Xiao-Long Lin; Zheng Liu; Li-Ping Yan; Xin Duan; Wen-Jun Bu; Xin-Hua Wang; Chen-Guang Zheng
Journal:  Ecol Evol       Date:  2022-05-24       Impact factor: 3.167

5.  New Insights Into the Relationships Within Subtribe Scorzonerinae (Cichorieae, Asteraceae) Using Hybrid Capture Phylogenomics (Hyb-Seq).

Authors:  Elham Hatami; Katy E Jones; Norbert Kilian
Journal:  Front Plant Sci       Date:  2022-07-01       Impact factor: 6.627

6.  Marker Development for Phylogenomics: The Case of Orobanchaceae, a Plant Family with Contrasting Nutritional Modes.

Authors:  Xi Li; Baohai Hao; Da Pan; Gerald M Schneeweiss
Journal:  Front Plant Sci       Date:  2017-11-21       Impact factor: 5.753

7.  OCTAL: Optimal Completion of gene trees in polynomial time.

Authors:  Sarah Christensen; Erin K Molloy; Pranjal Vachaspati; Tandy Warnow
Journal:  Algorithms Mol Biol       Date:  2018-03-15       Impact factor: 1.405

8.  Continued Adaptation of C4 Photosynthesis After an Initial Burst of Changes in the Andropogoneae Grasses.

Authors:  Matheus E Bianconi; Jan Hackel; Maria S Vorontsova; Adriana Alberti; Watchara Arthan; Sean V Burke; Melvin R Duvall; Elizabeth A Kellogg; Sébastien Lavergne; Michael R McKain; Alexandre Meunier; Colin P Osborne; Paweena Traiperm; Pascal-Antoine Christin; Guillaume Besnard
Journal:  Syst Biol       Date:  2020-05-01       Impact factor: 15.683

9.  Large Phylogenomic Data sets Reveal Deep Relationships and Trait Evolution in Chlorophyte Green Algae.

Authors:  Xi Li; Zheng Hou; Chenjie Xu; Xuan Shi; Lingxiao Yang; Louise A Lewis; Bojian Zhong
Journal:  Genome Biol Evol       Date:  2021-07-06       Impact factor: 3.416

10.  Taxon-specific or universal? Using target capture to study the evolutionary history of rapid radiations.

Authors:  Gil Yardeni; Juan Viruel; Margot Paris; Jaqueline Hess; Clara Groot Crego; Marylaure de La Harpe; Norma Rivera; Michael H J Barfuss; Walter Till; Valeria Guzmán-Jacob; Thorsten Krömer; Christian Lexer; Ovidiu Paun; Thibault Leroy
Journal:  Mol Ecol Resour       Date:  2021-10-10       Impact factor: 8.678

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.