Literature DB >> 7584414

Optimally parsing a sequence into different classes based on multiple types of evidence.

G D Stormo1, D Haussler.   

Abstract

We consider the problem of parsing a sequence into different classes of subsequences. Two common examples are finding the exons and introns in genomic sequences and identifying the secondary structure domains of protein sequences. In each case there are various types of evidence that are relevant to the classification, but none are completely reliable, so we expect some weighted average of all the evidence to provide improved classifications. For example, in the problem of identifying coding regions in genomic DNA, the combined use of evidence such as codon bias and splice junction patterns can give more reliable predictions than either type of evidence alone. We show three main results: 1. For a given weighting of the evidence a dynamic programming algorithm returns the optimal parse and any number of sub-optimal parses. 2. For a given weighting of the evidence a dynamic programming algorithm determines the probability of the optimal parse and any number of sub-optimal parses under a natural Boltzmann-Gibbs distribution over the set of possible parses. 3. Given a set of sequences with known correct parses, a dynamic programming algorithm allows one to apply gradient descent to obtain the weights that maximize the probability of the correct parses of these sequences.

Mesh:

Substances:

Year:  1994        PMID: 7584414

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  8 in total

1.  GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

Authors:  Kevin L Howe; Tom Chothia; Richard Durbin
Journal:  Genome Res       Date:  2002-09       Impact factor: 9.043

2.  Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

Authors:  Søren Mørk; Ian Holmes
Journal:  Bioinformatics       Date:  2012-01-03       Impact factor: 6.937

3.  Conrad: gene prediction using conditional random fields.

Authors:  David DeCaprio; Jade P Vinson; Matthew D Pearson; Philip Montgomery; Matthew Doherty; James E Galagan
Journal:  Genome Res       Date:  2007-08-09       Impact factor: 9.043

4.  mGene: accurate SVM-based gene finding with an application to nematode genomes.

Authors:  Gabriele Schweikert; Alexander Zien; Georg Zeller; Jonas Behr; Christoph Dieterich; Cheng Soon Ong; Petra Philips; Fabio De Bona; Lisa Hartmann; Anja Bohlen; Nina Krüger; Sören Sonnenburg; Gunnar Rätsch
Journal:  Genome Res       Date:  2009-06-29       Impact factor: 9.043

5.  Genie--gene finding in Drosophila melanogaster.

Authors:  M G Reese; D Kulp; H Tammana; D Haussler
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

6.  Ab initio gene finding in Drosophila genomic DNA.

Authors:  A A Salamov; V V Solovyev
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

7.  Noncoding RNA gene detection using comparative sequence analysis.

Authors:  E Rivas; S R Eddy
Journal:  BMC Bioinformatics       Date:  2001-10-10       Impact factor: 3.169

8.  EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance.

Authors:  Thomas Schou Larsen; Anders Krogh
Journal:  BMC Bioinformatics       Date:  2003-06-03       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.