Literature DB >> 19147663

A hierarchical model for incomplete alignments in phylogenetic inference.

Fuxia Cheng1, Stefanie Hartmann, Mayetri Gupta, Joseph G Ibrahim, Todd J Vision.   

Abstract

MOTIVATION: Full-length DNA and protein sequences that span the entire length of a gene are ideally used for multiple sequence alignments (MSAs) and the subsequent inference of their relationships. Frequently, however, MSAs contain a substantial amount of missing data. For example, expressed sequence tags (ESTs), which are partial sequences of expressed genes, are the predominant source of sequence data for many organisms. The patterns of missing data typical for EST-derived alignments greatly compromise the accuracy of estimated phylogenies.
RESULTS: We present a statistical method for inferring phylogenetic trees from EST-based incomplete MSA data. We propose a class of hierarchical models for modeling pairwise distances between the sequences, and develop a fully Bayesian approach for estimation of the model parameters. Once the distance matrix is estimated, the phylogenetic tree may be constructed by applying neighbor-joining (or any other algorithm of choice). We also show that maximizing the marginal likelihood from the Bayesian approach yields similar results to a profile likelihood estimation. The proposed methods are illustrated using simulated protein families, for which the true phylogeny is known, and one real protein family. AVAILABILITY: R code for fitting these models are available from: http://people.bu.edu/gupta/software.htm.

Entities:  

Mesh:

Year:  2009        PMID: 19147663      PMCID: PMC2647833          DOI: 10.1093/bioinformatics/btp015

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  34 in total

1.  Expressed sequence tags: alternative or complement to whole genome sequences?

Authors:  Stephen Rudd
Journal:  Trends Plant Sci       Date:  2003-07       Impact factor: 18.313

2.  Prospects for building the tree of life from large sequence databases.

Authors:  Amy C Driskell; Cécile Ané; J Gordon Burleigh; Michelle M McMahon; Brian C O'meara; Michael J Sanderson
Journal:  Science       Date:  2004-11-12       Impact factor: 47.728

3.  Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data.

Authors:  Tae-Kun Seo; Hirohisa Kishino; Jeffrey L Thorne
Journal:  Proc Natl Acad Sci U S A       Date:  2005-03-11       Impact factor: 11.205

4.  SDM: a fast distance-based approach for (super) tree building in phylogenomics.

Authors:  Alexis Criscuolo; Vincent Berry; Emmanuel J P Douzery; Olivier Gascuel
Journal:  Syst Biol       Date:  2006-10       Impact factor: 15.683

Review 5.  The molecular ecologist's guide to expressed sequence tags.

Authors:  Amy Bouck; Todd Vision
Journal:  Mol Ecol       Date:  2007-03       Impact factor: 6.185

6.  Combining data in phylogenetic analysis.

Authors:  J P Huelsenbeck; J J Bull; C W Cunningham
Journal:  Trends Ecol Evol       Date:  1996-04       Impact factor: 17.712

Review 7.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis.

Authors:  J A Eisen
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

8.  Rose: generating sequence families.

Authors:  J Stoye; D Evers; F Meyer
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

9.  Phytome: a platform for plant comparative genomics.

Authors:  Stefanie Hartmann; Dihui Lu; Jason Phillips; Todd J Vision
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?

Authors:  Stefanie Hartmann; Todd J Vision
Journal:  BMC Evol Biol       Date:  2008-03-26       Impact factor: 3.260

View more
  2 in total

1.  PhyloMissForest: a random forest framework to construct phylogenetic trees with missing data.

Authors:  Diogo Pinheiro; Sergio Santander-Jimenéz; Aleksandar Ilic
Journal:  BMC Genomics       Date:  2022-05-18       Impact factor: 4.547

2.  Selecting informative subsets of sparse supermatrices increases the chance to find correct trees.

Authors:  Bernhard Misof; Benjamin Meyer; Björn Marcus von Reumont; Patrick Kück; Katharina Misof; Karen Meusemann
Journal:  BMC Bioinformatics       Date:  2013-12-03       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.