Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Obtaining maximal concatenated phylogenetic data sets from large sequence databases.

Literature DB >> 12777519

Obtaining maximal concatenated phylogenetic data sets from large sequence databases.

Michael J Sanderson¹, Amy C Driskell, Richard H Ree, Oliver Eulenstein, Sasha Langley.

Abstract

To improve the accuracy of tree reconstruction, phylogeneticists are extracting increasingly large multigene data sets from sequence databases. Determining whether a database contains at least k genes sampled from at least m species is an NP-complete problem. However, the skewed distribution of sequences in these databases permits all such data sets to be obtained in reasonable computing times even for large numbers of sequences. We developed an exact algorithm for obtaining the largest multigene data sets from a collection of sequences. The algorithm was then tested on a set of 100,000 protein sequences of green plants and used to identify the largest multigene ortholog data sets having at least 3 genes and 6 species. The distribution of sizes of these data sets forms a hollow curve, and the largest are surprisingly small, ranging from 62 genes by 6 species, to 3 genes by 65 species, with more symmetrical data sets of around 15 taxa by 15 genes. These upper bounds to sequence concatenation have important implications for building the tree of life from large sequence databases.

Mesh：

Year: 2003 PMID： 12777519 DOI： 10.1093/molbev/msg115

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Keyword Cloud
Cited

13 in total

1. Gene and genome trees conflict at many levels.

Authors: Leanne S Haggerty; Fergal J Martin; David A Fitzpatrick; James O McInerney
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2009-08-12 Impact factor: 6.237

2. Phylogenomics with incomplete taxon coverage: the limits to inference.

Authors: Michael J Sanderson; Michelle M McMahon; Mike Steel
Journal: BMC Evol Biol Date: 2010-05-25 Impact factor: 3.260

3. Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling.

Authors: Douglas Chesters
Journal: Syst Biol Date: 2017-05-01 Impact factor: 15.683

4. Universally distributed single-copy genes indicate a constant rate of horizontal transfer.

Authors: Christopher J Creevey; Tobias Doerks; David A Fitzpatrick; Jeroen Raes; Peer Bork
Journal: PLoS One Date: 2011-08-05 Impact factor: 3.240

5. The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms.

Authors: Kyung Mo Kim; Gustavo Caetano-Anollés
Journal: BMC Evol Biol Date: 2012-01-27 Impact factor: 3.260

Obtaining maximal concatenated phylogenetic data sets from large sequence databases.

1. Gene and genome trees conflict at many levels.

2. Phylogenomics with incomplete taxon coverage: the limits to inference.

3. Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling.

4. Universally distributed single-copy genes indicate a constant rate of horizontal transfer.

5. The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms.

6. STBase: one million species trees for comparative biology.

7. Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches.

8. Phylogenomic analyses support the monophyly of Taphrinomycotina, including Schizosaccharomyces fission yeasts.

9. Inferring angiosperm phylogeny from EST data with widespread gene duplication.

10. SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics.