Literature DB >> 7704660

Constructing gene models from accurately predicted exons: an application of dynamic programming.

Y Xu1, R J Mural, E C Uberbacher.   

Abstract

This paper presents a computationally efficient algorithm, the Gene Assembly Program III (GAP III), for constructing gene models from a set of accurately-predicted 'exons'. The input to the algorithm is a set of clusters of exon candidates, generated by a new version of the GRAIL coding region recognition system. The exon candidates of a cluster differ in their presumed edges and occasionally in their reading frames. Each exon candidate has a numerical score representing its 'probability' of being an actual exon. GAP III uses a dynamic programming algorithm to construct a gene model, complete or partial, by optimizing a predefined objective function. The optimal gene models constructed by GAP III correspond very well with the structures of genes which have been determined experimentally and reported in the Genome Sequence Database (GSDB). On a test set of 137 human and mouse DNA sequences consisting of 954 true exons, GAP III constructed 137 gene models using 892 exons, among which 859 (859/954 = 90%) are true exons and 33 (33/892 = 3%) are false positive. Among the 859 true positives, 635 (74%) match the actual exons exactly, and 838 (98%) have at least one edge correct. GAP III is computationally efficient. If we use E and C to represent the total number of exon candidates in all clusters and the number of clusters, respectively, the running time of GAP III is proportional to (E x C).

Entities:  

Mesh:

Year:  1994        PMID: 7704660     DOI: 10.1093/bioinformatics/10.6.613

Source DB:  PubMed          Journal:  Comput Appl Biosci        ISSN: 0266-7061


  5 in total

Review 1.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

2.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions.

Authors:  Daniel Kotlar; Yizhar Lavner
Journal:  Genome Res       Date:  2003-07-17       Impact factor: 9.043

3.  GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

Authors:  Kevin L Howe; Tom Chothia; Richard Durbin
Journal:  Genome Res       Date:  2002-09       Impact factor: 9.043

Review 4.  Computational methods for exon detection.

Authors:  J M Claverie
Journal:  Mol Biotechnol       Date:  1998-08       Impact factor: 2.695

5.  Classifying coding DNA with nucleotide statistics.

Authors:  Nicolas Carels; Diego Frías
Journal:  Bioinform Biol Insights       Date:  2009-10-28
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.