Literature DB >> 16108706

Large multiple organism gene finding by collapsed Gibbs sampling.

Sourav Chatterji1, Lior Pachter.   

Abstract

The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.

Mesh:

Substances:

Year:  2005        PMID: 16108706     DOI: 10.1089/cmb.2005.12.599

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  3 in total

1.  Reference based annotation with GeneMapper.

Authors:  Sourav Chatterji; Lior Pachter
Journal:  Genome Biol       Date:  2006-04-05       Impact factor: 13.583

Review 2.  EGASP: the human ENCODE Genome Annotation Assessment Project.

Authors:  Roderic Guigó; Paul Flicek; Josep F Abril; Alexandre Reymond; Julien Lagarde; France Denoeud; Stylianos Antonarakis; Michael Ashburner; Vladimir B Bajic; Ewan Birney; Robert Castelo; Eduardo Eyras; Catherine Ucla; Thomas R Gingeras; Jennifer Harrow; Tim Hubbard; Suzanna E Lewis; Martin G Reese
Journal:  Genome Biol       Date:  2006-08-07       Impact factor: 13.583

3.  Reranking candidate gene models with cross-species comparison for improved gene prediction.

Authors:  Qian Liu; Koby Crammer; Fernando C N Pereira; David S Roos
Journal:  BMC Bioinformatics       Date:  2008-10-14       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.