Literature DB >> 22930702

Impact of missing data on phylogenies inferred from empirical phylogenomic data sets.

Béatrice Roure1, Denis Baurain, Hervé Philippe.   

Abstract

Progress in sequencing technology allows researchers to assemble ever-larger supermatrices for phylogenomic inference. However, current phylogenomic studies often rest on patchy data sets, with some having 80% missing (or ambiguous) data or more. Though early simulations had suggested that missing data per se do not harm phylogenetic inference when using sufficiently large data sets, Lemmon et al. (Lemmon AR, Brown JM, Stanger-Hall K, Lemmon EM. 2009. The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst Biol. 58:130-145.) have recently cast doubt on this consensus in a study based on the introduction of parsimony-uninformative incomplete characters. In this work, we empirically reassess the issue of missing data in phylogenomics while exploring possible interactions with the model of sequence evolution. First, we note that parsimony-uninformative incomplete characters are actually informative in a probabilistic framework. A reanalysis of Lemmon's data set with this in mind gives a very different interpretation of their results and shows that some of their conclusions may be unfounded. Second, we investigate the effect of the progressive introduction of missing data in a complete supermatrix (126 genes × 39 species) capable of resolving animal relationships. These analyses demonstrate that missing data perturb phylogenetic inference slightly beyond the expected decrease in resolving power. In particular, they exacerbate systematic errors by reducing the number of species effectively available for the detection of multiple substitutions. Consequently, large sparse supermatrices are more sensitive to phylogenetic artifacts than smaller but less incomplete data sets, which argue for experimental designs aimed at collecting a modest number (~50) of highly covered genes. Our results further confirm that including incomplete yet short-branch taxa (i.e., slowly evolving species or close outgroups) can help to eschew artifacts, as predicted by simulations. Finally, it appears that selecting an adequate model of sequence evolution (e.g., the site-heterogeneous CAT model instead of the site-homogeneous WAG model) is more beneficial to phylogenetic accuracy than reducing the level of missing data.

Mesh:

Year:  2012        PMID: 22930702     DOI: 10.1093/molbev/mss208

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  74 in total

1.  Error, signal, and the placement of Ctenophora sister to all other animals.

Authors:  Nathan V Whelan; Kevin M Kocot; Leonid L Moroz; Kenneth M Halanych
Journal:  Proc Natl Acad Sci U S A       Date:  2015-04-20       Impact factor: 11.205

2.  Zygomorphy evolved from disymmetry in Fumarioideae (Papaveraceae, Ranunculales): new evidence from an expanded molecular phylogenetic framework.

Authors:  Hervé Sauquet; Laetitia Carrive; Noëlie Poullain; Julie Sannier; Catherine Damerval; Sophie Nadot
Journal:  Ann Bot       Date:  2015-03-26       Impact factor: 4.357

3.  Phylotranscriptomic analysis of the origin and early diversification of land plants.

Authors:  Norman J Wickett; Siavash Mirarab; Nam Nguyen; Tandy Warnow; Eric Carpenter; Naim Matasci; Saravanaraj Ayyampalayam; Michael S Barker; J Gordon Burleigh; Matthew A Gitzendanner; Brad R Ruhfel; Eric Wafula; Joshua P Der; Sean W Graham; Sarah Mathews; Michael Melkonian; Douglas E Soltis; Pamela S Soltis; Nicholas W Miles; Carl J Rothfels; Lisa Pokorny; A Jonathan Shaw; Lisa DeGironimo; Dennis W Stevenson; Barbara Surek; Juan Carlos Villarreal; Béatrice Roure; Hervé Philippe; Claude W dePamphilis; Tao Chen; Michael K Deyholos; Regina S Baucom; Toni M Kutchan; Megan M Augustin; Jun Wang; Yong Zhang; Zhijian Tian; Zhixiang Yan; Xiaolei Wu; Xiao Sun; Gane Ka-Shu Wong; James Leebens-Mack
Journal:  Proc Natl Acad Sci U S A       Date:  2014-10-29       Impact factor: 11.205

4.  Phylogenomic resolution of scorpions reveals multilevel discordance with morphological phylogenetic signal.

Authors:  Prashant P Sharma; Rosa Fernández; Lauren A Esposito; Edmundo González-Santillán; Lionel Monod
Journal:  Proc Biol Sci       Date:  2015-04-07       Impact factor: 5.349

5.  A phylogeny and revised classification of Squamata, including 4161 species of lizards and snakes.

Authors:  R Alexander Pyron; Frank T Burbrink; John J Wiens
Journal:  BMC Evol Biol       Date:  2013-04-29       Impact factor: 3.260

6.  Prospects for building large timetrees using molecular data with incomplete gene coverage among species.

Authors:  Alan Filipski; Oscar Murillo; Anna Freydenzon; Koichiro Tamura; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2014-06-27       Impact factor: 16.240

7.  Can quartet analyses combining maximum likelihood estimation and Hennigian logic overcome long branch attraction in phylogenomic sequence data?

Authors:  Patrick Kück; Mark Wilkinson; Christian Groß; Peter G Foster; Johann W Wägele
Journal:  PLoS One       Date:  2017-08-25       Impact factor: 3.240

8.  A 250 plastome phylogeny of the grass family (Poaceae): topological support under different data partitions.

Authors:  Jeffery M Saarela; Sean V Burke; William P Wysocki; Matthew D Barrett; Lynn G Clark; Joseph M Craine; Paul M Peterson; Robert J Soreng; Maria S Vorontsova; Melvin R Duvall
Journal:  PeerJ       Date:  2018-02-02       Impact factor: 2.984

9.  Estimating Bayesian Phylogenetic Information Content.

Authors:  Paul O Lewis; Ming-Hui Chen; Lynn Kuo; Louise A Lewis; Karolina Fučíková; Suman Neupane; Yu-Bo Wang; Daoyuan Shi
Journal:  Syst Biol       Date:  2016-05-06       Impact factor: 15.683

10.  Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation.

Authors:  Nathan S Upham; Jacob A Esselstyn; Walter Jetz
Journal:  PLoS Biol       Date:  2019-12-04       Impact factor: 8.029

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.