Literature DB >> 9390295

Integrating database homology in a probabilistic gene structure model.

D Kulp1, D Haussler, M G Reese, F H Eeckman.   

Abstract

We present an improved stochastic model of genes in DNA, and describe a method for integrating database homology into the probabilistic framework. A generalized hidden Markov model (GHMM) describes the grammar of a legal parse of a DNA sequence. Probabilities are estimated for gene features by using dynamic programming to combine information from multiple sensors. We show how matches to homologous sequences from a database can be integrated into the probability estimation by interpreting the likelihood of a sequence in terms of the bit-cost to encode a sequence given a homology match. We also demonstrate how homology matches in protein databases can be exploited to help identify splice sites. Our experiments show significant improvements in the sensitivity and specificity of gene structure identification when these new features are added to our gene-finding system, Genie. Experimental results in tests using a standard set of annotated genes showed that Genie identified 95% of coding nucleotides correctly with a specificity of 91%, and 77% of exons were identified exactly.

Mesh:

Substances:

Year:  1997        PMID: 9390295

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  11 in total

1.  An assessment of gene prediction accuracy in large DNA sequences.

Authors:  R Guigó; P Agarwal; J F Abril; M Burset; J W Fickett
Journal:  Genome Res       Date:  2000-10       Impact factor: 9.043

2.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

Review 3.  Cross-species sequence comparisons: a review of methods and available resources.

Authors:  Kelly A Frazer; Laura Elnitski; Deanna M Church; Inna Dubchak; Ross C Hardison
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

4.  Genome annotation assessment in Drosophila melanogaster.

Authors:  M G Reese; G Hartzell; N L Harris; U Ohler; J F Abril; S E Lewis
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

5.  Genie--gene finding in Drosophila melanogaster.

Authors:  M G Reese; D Kulp; H Tammana; D Haussler
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

6.  Candidate-gene screening and association analysis at the autism-susceptibility locus on chromosome 16p: evidence of association at GRIN2A and ABAT.

Authors:  Gabrielle Barnby; Aaron Abbott; Nuala Sykes; Andrew Morris; Daniel E Weeks; Richard Mott; Janine Lamb; Anthony J Bailey; Anthony P Monaco
Journal:  Am J Hum Genet       Date:  2005-04-13       Impact factor: 11.025

7.  The HMG-domain protein BAP111 is important for the function of the BRM chromatin-remodeling complex in vivo.

Authors:  O Papoulas; G Daubresse; J A Armstrong; J Jin; M P Scott; J W Tamkun
Journal:  Proc Natl Acad Sci U S A       Date:  2001-05-01       Impact factor: 11.205

8.  The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates.

Authors:  Laurence Ettwiller; Benedict Paten; Marcel Souren; Felix Loosli; Jochen Wittbrodt; Ewan Birney
Journal:  Genome Biol       Date:  2005-12-02       Impact factor: 13.583

9.  Gene identification in novel eukaryotic genomes by self-training algorithm.

Authors:  Alexandre Lomsadze; Vardges Ter-Hovhannisyan; Yury O Chernoff; Mark Borodovsky
Journal:  Nucleic Acids Res       Date:  2005-11-28       Impact factor: 16.971

10.  Position-dependent motif characterization using non-negative matrix factorization.

Authors:  Lucie N Hutchins; Sean M Murphy; Priyam Singh; Joel H Graber
Journal:  Bioinformatics       Date:  2008-10-13       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.