Literature DB >> 12777519

Obtaining maximal concatenated phylogenetic data sets from large sequence databases.

Michael J Sanderson1, Amy C Driskell, Richard H Ree, Oliver Eulenstein, Sasha Langley.   

Abstract

To improve the accuracy of tree reconstruction, phylogeneticists are extracting increasingly large multigene data sets from sequence databases. Determining whether a database contains at least k genes sampled from at least m species is an NP-complete problem. However, the skewed distribution of sequences in these databases permits all such data sets to be obtained in reasonable computing times even for large numbers of sequences. We developed an exact algorithm for obtaining the largest multigene data sets from a collection of sequences. The algorithm was then tested on a set of 100,000 protein sequences of green plants and used to identify the largest multigene ortholog data sets having at least 3 genes and 6 species. The distribution of sizes of these data sets forms a hollow curve, and the largest are surprisingly small, ranging from 62 genes by 6 species, to 3 genes by 65 species, with more symmetrical data sets of around 15 taxa by 15 genes. These upper bounds to sequence concatenation have important implications for building the tree of life from large sequence databases.

Mesh:

Year:  2003        PMID: 12777519     DOI: 10.1093/molbev/msg115

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  13 in total

1.  Gene and genome trees conflict at many levels.

Authors:  Leanne S Haggerty; Fergal J Martin; David A Fitzpatrick; James O McInerney
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2009-08-12       Impact factor: 6.237

2.  Phylogenomics with incomplete taxon coverage: the limits to inference.

Authors:  Michael J Sanderson; Michelle M McMahon; Mike Steel
Journal:  BMC Evol Biol       Date:  2010-05-25       Impact factor: 3.260

3.  Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling.

Authors:  Douglas Chesters
Journal:  Syst Biol       Date:  2017-05-01       Impact factor: 15.683

4.  Universally distributed single-copy genes indicate a constant rate of horizontal transfer.

Authors:  Christopher J Creevey; Tobias Doerks; David A Fitzpatrick; Jeroen Raes; Peer Bork
Journal:  PLoS One       Date:  2011-08-05       Impact factor: 3.240

5.  The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms.

Authors:  Kyung Mo Kim; Gustavo Caetano-Anollés
Journal:  BMC Evol Biol       Date:  2012-01-27       Impact factor: 3.260

6.  STBase: one million species trees for comparative biology.

Authors:  Michelle M McMahon; Akshay Deepak; David Fernández-Baca; Darren Boss; Michael J Sanderson
Journal:  PLoS One       Date:  2015-02-13       Impact factor: 3.240

7.  Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches.

Authors:  Stephen A Smith; Jeremy M Beaulieu; Michael J Donoghue
Journal:  BMC Evol Biol       Date:  2009-02-11       Impact factor: 3.260

8.  Phylogenomic analyses support the monophyly of Taphrinomycotina, including Schizosaccharomyces fission yeasts.

Authors:  Yu Liu; Jessica W Leigh; Henner Brinkmann; Melanie T Cushion; Naiara Rodriguez-Ezpeleta; Hervé Philippe; B Franz Lang
Journal:  Mol Biol Evol       Date:  2008-10-14       Impact factor: 16.240

9.  Inferring angiosperm phylogeny from EST data with widespread gene duplication.

Authors:  Michael J Sanderson; Michelle M McMahon
Journal:  BMC Evol Biol       Date:  2007-02-08       Impact factor: 3.260

10.  SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics.

Authors:  Béatrice Roure; Naiara Rodriguez-Ezpeleta; Hervé Philippe
Journal:  BMC Evol Biol       Date:  2007-02-08       Impact factor: 3.260

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.