Literature DB >> 20976444

Automated removal of noisy data in phylogenomic analyses.

Vadim V Goremykin1, Svetlana V Nikiforova, Olaf R P Bininda-Emonds.   

Abstract

Noisy data, especially in combination with misalignment and model misspecification can have an adverse effect on phylogeny reconstruction; however, effective methods to identify such data are few. One particularly important class of noisy data is saturated positions. To avoid potential errors related to saturation in phylogenomic analyses, we present an automated procedure involving the step-wise removal of the most variable positions in a given data set coupled with a stopping criterion derived from correlation analyses of pairwise ML distances calculated from the deleted (saturated) and the remaining (conserved) subsets of the alignment. Through a comparison with existing methods, we demonstrate both the effectiveness of our proposed procedure for identifying noisy data and the effect of the removal of such data using a well-publicized case study involving placental mammals. At the least, our procedure will identify data sets requiring greater data exploration, and we recommend its use to investigate the effect on phylogenetic analyses of removing subsets of variable positions exhibiting weak or no correlation to the rest of the alignment. However, we would argue that this procedure, by identifying and removing noisy data, facilitates the construction of more accurate phylogenies by, for example, ameliorating potential long-branch attraction artefacts.

Entities:  

Mesh:

Year:  2010        PMID: 20976444     DOI: 10.1007/s00239-010-9398-z

Source DB:  PubMed          Journal:  J Mol Evol        ISSN: 0022-2844            Impact factor:   2.395


  58 in total

1.  The root of the tree of life in the light of the covarion model.

Authors:  P Lopez; P Forterre; H Philippe
Journal:  J Mol Evol       Date:  1999-10       Impact factor: 2.395

2.  Where do rodents fit? Evidence from the complete mitochondrial genome of Sciurus vulgaris.

Authors:  A Reyes; C Gissi; G Pesole; F M Catzeflis; C Saccone
Journal:  Mol Biol Evol       Date:  2000-06       Impact factor: 16.240

3.  Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny.

Authors:  A Reyes; G Pesole; C Saccone
Journal:  Gene       Date:  2000-12-23       Impact factor: 3.688

4.  Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms.

Authors:  Michael J Moore; Charles D Bell; Pamela S Soltis; Douglas E Soltis
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

5.  Mitochondrial genomes of a bandicoot and a brushtail possum confirm the monophyly of australidelphian marsupials.

Authors:  M J Phillips; Y H Lin; G L Harrison; D Penny
Journal:  Proc Biol Sci       Date:  2001-07-22       Impact factor: 5.349

Review 6.  The biochemical phylogeny of guinea-pigs and gundis, and the paraphyly of the order rodentia.

Authors:  D Graur; W A Hide; A Zharkikh; W H Li
Journal:  Comp Biochem Physiol B       Date:  1992-04

7.  Pika and vole mitochondrial genomes increase support for both rodent monophyly and glires.

Authors:  Yu-Hsin Lin; Peter J Waddell; David Penny
Journal:  Gene       Date:  2002-07-10       Impact factor: 3.688

8.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors:  Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

9.  Confirming the phylogeny of mammals by use of large comparative sequence data sets.

Authors:  Arjun B Prasad; Marc W Allard; Eric D Green
Journal:  Mol Biol Evol       Date:  2008-05-02       Impact factor: 16.240

10.  Fast genes and slow clades: comparative rates of molecular evolution in mammals.

Authors:  Olaf R P Bininda-Emonds
Journal:  Evol Bioinform Online       Date:  2007-05-31       Impact factor: 1.625

View more
  20 in total

1.  Bacterial proteins pinpoint a single eukaryotic root.

Authors:  Romain Derelle; Guifré Torruella; Vladimír Klimeš; Henner Brinkmann; Eunsoo Kim; Čestmír Vlček; B Franz Lang; Marek Eliáš
Journal:  Proc Natl Acad Sci U S A       Date:  2015-02-02       Impact factor: 11.205

2.  Estimating Bayesian Phylogenetic Information Content.

Authors:  Paul O Lewis; Ming-Hui Chen; Lynn Kuo; Louise A Lewis; Karolina Fučíková; Suman Neupane; Yu-Bo Wang; Daoyuan Shi
Journal:  Syst Biol       Date:  2016-05-06       Impact factor: 15.683

3.  Water lily (Nymphaea thermarum) genome reveals variable genomic signatures of ancient vascular cambium losses.

Authors:  Rebecca A Povilus; Jeffrey M DaCosta; Christopher Grassa; Prasad R V Satyaki; Morgan Moeglein; Johan Jaenisch; Zhenxiang Xi; Sarah Mathews; Mary Gehring; Charles C Davis; William E Friedman
Journal:  Proc Natl Acad Sci U S A       Date:  2020-03-31       Impact factor: 11.205

4.  Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae).

Authors:  Matthew Parks; Richard Cronn; Aaron Liston
Journal:  BMC Evol Biol       Date:  2012-06-25       Impact factor: 3.260

5.  Systematic error in seed plant phylogenomics.

Authors:  Bojian Zhong; Oliver Deusch; Vadim V Goremykin; David Penny; Patrick J Biggs; Robin A Atherton; Svetlana V Nikiforova; Peter James Lockhart
Journal:  Genome Biol Evol       Date:  2011-10-19       Impact factor: 3.416

6.  Diversity measures in environmental sequences are highly dependent on alignment quality--data from ITS and new LSU primers targeting basidiomycetes.

Authors:  Dirk Krüger; Danuta Kapturska; Christiane Fischer; Rolf Daniel; Tesfaye Wubet
Journal:  PLoS One       Date:  2012-02-21       Impact factor: 3.240

7.  Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects.

Authors:  Zhuo Su; Jeffrey P Townsend
Journal:  BMC Evol Biol       Date:  2015-05-14       Impact factor: 3.260

8.  Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice.

Authors:  Liat Shavit Grievink; David Penny; Barbara R Holland
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

Review 9.  The Origin of Land Plants: A Phylogenomic Perspective.

Authors:  Bojian Zhong; Linhua Sun; David Penny
Journal:  Evol Bioinform Online       Date:  2015-07-08       Impact factor: 1.625

10.  Phylogenomics and coalescent analyses resolve extant seed plant relationships.

Authors:  Zhenxiang Xi; Joshua S Rest; Charles C Davis
Journal:  PLoS One       Date:  2013-11-21       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.