Literature DB >> 17060202

Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes.

Michelle M McMahon1, Michael J Sanderson.   

Abstract

A comprehensive phylogeny of papilionoid legumes was inferred from sequences of 2228 taxa in GenBank release 147. A semiautomated analysis pipeline was constructed to download, parse, assemble, align, combine, and build trees from a pool of 11,881 sequences. Initial steps included all-against-all BLAST similarity searches coupled with assembly, using a novel strategy for building length-homogeneous primary sequence clusters. This was followed by a combination of global and local alignment protocols to build larger secondary clusters of locally aligned sequences, thus taking into account the dramatic differences in length of the heterogeneous coding and noncoding sequence data present in GenBank. Next, clusters were checked for the presence of duplicate genes and other potentially misleading sequences and examined for combinability with other clusters on the basis of taxon overlap. Finally, two supermatrices were constructed: a "sparse" matrix based on the primary clusters alone (1794 taxa x 53,977 characters), and a somewhat more "dense" matrix based on the secondary clusters (2228 taxa x 33,168 characters). Both matrices were very sparse, with 95% of their cells containing gaps or question marks. These were subjected to extensive heuristic parsimony analyses using deterministic and stochastic heuristics, including bootstrap analyses. A "reduced consensus" bootstrap analysis was also performed to detect cryptic signal in a subtree of the data set corresponding to a "backbone" phylogeny proposed in previous studies. Overall, the dense supermatrix appeared to provide much more satisfying results, indicated by better resolution of the bootstrap tree, excellent agreement with the backbone papilionoid tree in the reduced bootstrap consensus analysis, few problematic large polytomies in the strict consensus, and less fragmentation of conventionally recognized genera. Nevertheless, at lower taxonomic levels several problems were identified and diagnosed. A large number of methodological issues in supermatrix construction at this scale are discussed, including detection of annotation errors in GenBank sequences; the shortage of effective algorithms and software for local multiple sequence alignment; the difficulty of overcoming effects of fragmentation of data into nearly disjoint blocks in sparse supermatrices; and the lack of informative tools to assess confidence limits in very large trees.

Mesh:

Substances:

Year:  2006        PMID: 17060202     DOI: 10.1080/10635150600999150

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  40 in total

1.  Multiple continental radiations and correlates of diversification in Lupinus (Leguminosae): testing for key innovation with incomplete taxon sampling.

Authors:  Christopher S Drummond; Ruth J Eastwood; Silvia T S Miotto; Colin E Hughes
Journal:  Syst Biol       Date:  2012-01-05       Impact factor: 15.683

2.  Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures.

Authors:  Alexandros Stamatakis; Michael Ott
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2008-12-27       Impact factor: 6.237

3.  Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama.

Authors:  W John Kress; David L Erickson; F Andrew Jones; Nathan G Swenson; Rolando Perez; Oris Sanjur; Eldredge Bermingham
Journal:  Proc Natl Acad Sci U S A       Date:  2009-10-19       Impact factor: 11.205

4.  Exploration of Plastid Phylogenomic Conflict Yields New Insights into the Deep Relationships of Leguminosae.

Authors:  Rong Zhang; Yin-Huan Wang; Jian-Jun Jin; Gregory W Stull; Anne Bruneau; Domingos Cardoso; Luciano Paganucci De Queiroz; Michael J Moore; Shu-Dong Zhang; Si-Yun Chen; Jian Wang; De-Zhu Li; Ting-Shuang Yi
Journal:  Syst Biol       Date:  2020-07-01       Impact factor: 15.683

5.  Rapid progress on the vertebrate tree of life.

Authors:  Robert C Thomson; H Bradley Shaffer
Journal:  BMC Biol       Date:  2010-03-08       Impact factor: 7.431

6.  Structural analysis of biodiversity.

Authors:  Lawrence Sirovich; Mark Y Stoeckle; Yu Zhang
Journal:  PLoS One       Date:  2010-02-24       Impact factor: 3.240

7.  Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling.

Authors:  Alexandros Stamatakis; Markus Göker; Guido W Grimm
Journal:  Evol Bioinform Online       Date:  2010-05-24       Impact factor: 1.625

8.  Phylogenomics with incomplete taxon coverage: the limits to inference.

Authors:  Michael J Sanderson; Michelle M McMahon; Mike Steel
Journal:  BMC Evol Biol       Date:  2010-05-25       Impact factor: 3.260

9.  Efficient tree searches with available algorithms.

Authors:  Gonzalo Giribet
Journal:  Evol Bioinform Online       Date:  2007-11-12       Impact factor: 1.625

10.  Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life.

Authors:  Heroen Verbruggen; Christine A Maggs; Gary W Saunders; Line Le Gall; Hwan Su Yoon; Olivier De Clerck
Journal:  BMC Evol Biol       Date:  2010-01-20       Impact factor: 3.260

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.