Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Identifying optimal incomplete phylogenetic data sets from sequence databases.

Literature DB >> 15878123

Identifying optimal incomplete phylogenetic data sets from sequence databases.

Changhui Yan¹, J Gordon Burleigh, Oliver Eulenstein.

Abstract

We introduce a new method for identifying optimal incomplete data sets from large sequence databases based on the graph theoretic concept of alpha-quasi-bicliques. The quasi-biclique method searches large sequence databases to identify useful phylogenetic data sets with a specified amount of missing data while maintaining the necessary amount of overlap among genes and taxa. The utility of the quasi-biclique method is demonstrated on large simulated sequence databases and on a data set of green plant sequences from GenBank. The quasi-biclique method greatly increases the taxon and gene sampling in the data sets while adding only a limited amount of missing data. Furthermore, under the conditions of the simulation, data sets with a limited amount of missing data often produce topologies nearly as accurate as those built from complete data sets. The quasi-biclique method will be an effective tool for exploiting sequence databases for phylogenetic information and also may help identify critical sequences needed to build large phylogenetic data sets.

Mesh：

Year: 2005 PMID： 15878123 DOI： 10.1016/j.ympev.2005.02.008

Source DB: PubMed Journal: Mol Phylogenet Evol ISSN： 1055-7903 Impact factor: 4.286

Keyword Cloud
Cited

6 in total

Identifying optimal incomplete phylogenetic data sets from sequence databases.

1. Exploring biological interaction networks with tailored weighted quasi-bicliques.

2. Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms.

3. OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

4. Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches.

5. SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics.

6. Selecting informative subsets of sparse supermatrices increases the chance to find correct trees.