Literature DB >> 8441672

Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.

E E Snyder1, G D Stormo.   

Abstract

Dynamic programming (DP) is applied to the problem of precisely identifying internal exons and introns in genomic DNA sequences. The program GeneParser first scores the sequence of interest for splice sites and for these intron- and exon-specific content measures: codon usage, local compositional complexity, 6-tuple frequency, length distribution and periodic asymmetry. This information is then organized for interpretation by DP. GeneParser employs the DP algorithm to enforce the constraints that introns and exons must be adjacent and non-overlapping and finds the highest scoring combination of introns and exons subject to these constraints. Weights for the various classification procedures are determined by training a simple feed-forward neural network to maximize the number of correct predictions. In a pilot study, the system has been trained on a set of 56 human gene fragments containing 150 internal exons in a total of 158,691 bps of genomic sequence. When tested against the training data, GeneParser precisely identifies 75% of the exons and correctly predicts 86% of coding nucleotides as coding while only 13% of non-exon bps were predicted to be coding. This corresponds to a correlation coefficient for exon prediction of 0.85. Because of the simplicity of the network weighting scheme, generalization performance is nearly as good as with the training set.

Entities:  

Mesh:

Substances:

Year:  1993        PMID: 8441672      PMCID: PMC309159          DOI: 10.1093/nar/21.3.607

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  20 in total

1.  Determination of eukaryotic protein coding regions using neural networks and information theory.

Authors:  R Farber; A Lapedes; K Sirotkin
Journal:  J Mol Biol       Date:  1992-07-20       Impact factor: 5.469

2.  Splicing of the adenovirus-2 E1A 13S mRNA requires a minimal intron length and specific intron signals.

Authors:  P J Ulfendahl; U Pettersson; G Akusjärvi
Journal:  Nucleic Acids Res       Date:  1985-09-11       Impact factor: 16.971

3.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons.

Authors:  M S Waterman; M Eggert
Journal:  J Mol Biol       Date:  1987-10-20       Impact factor: 5.469

4.  Heuristic informational analysis of sequences.

Authors:  J M Claverie; L Bougueleret
Journal:  Nucleic Acids Res       Date:  1986-01-10       Impact factor: 16.971

5.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression.

Authors:  M B Shapiro; P Senapathy
Journal:  Nucleic Acids Res       Date:  1987-09-11       Impact factor: 16.971

6.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

7.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.

Authors:  M Zuker; P Stiegler
Journal:  Nucleic Acids Res       Date:  1981-01-10       Impact factor: 16.971

8.  A minimal intron length but no specific internal sequence is required for splicing the large rabbit beta-globin intron.

Authors:  B Wieringa; E Hofer; C Weissmann
Journal:  Cell       Date:  1984-07       Impact factor: 41.582

9.  Codon preference and its use in identifying protein coding regions in long DNA sequences.

Authors:  R Staden; A D McLachlan
Journal:  Nucleic Acids Res       Date:  1982-01-11       Impact factor: 16.971

10.  Recognition of protein coding regions in DNA sequences.

Authors:  J W Fickett
Journal:  Nucleic Acids Res       Date:  1982-09-11       Impact factor: 16.971

View more
  26 in total

1.  Design optimization methods for genomic DNA tiling arrays.

Authors:  Paul Bertone; Valery Trifonov; Joel S Rozowsky; Falk Schubert; Olof Emanuelsson; John Karro; Ming-Yang Kao; Michael Snyder; Mark Gerstein
Journal:  Genome Res       Date:  2005-12-19       Impact factor: 9.043

2.  A hidden Markov model that finds genes in E. coli DNA.

Authors:  A Krogh; I S Mian; D Haussler
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

3.  Monitoring the efficacy of hybrid selection during positional cloning: the search for BRCA1.

Authors:  T Hattier; R Bell; D Shaffer; S Stone; R S Phelps; S V Tavtigian; M H Skolnick; D Shattuck-Eidens; A Kamb
Journal:  Mamm Genome       Date:  1995-12       Impact factor: 2.957

4.  Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences.

Authors:  J Kleffe; K Hermann; W Vahrson; B Wittig; V Brendel
Journal:  Nucleic Acids Res       Date:  1996-12-01       Impact factor: 16.971

5.  Ab initio gene finding in Drosophila genomic DNA.

Authors:  A A Salamov; V V Solovyev
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

6.  A frameshift error detection algorithm for DNA sequencing projects.

Authors:  G A Fichant; Y Quentin
Journal:  Nucleic Acids Res       Date:  1995-08-11       Impact factor: 16.971

7.  De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins.

Authors:  Anton V Persikov; Mona Singh
Journal:  Nucleic Acids Res       Date:  2013-10-03       Impact factor: 16.971

8.  Fission yeast gene structure and recognition.

Authors:  M Q Zhang; T G Marr
Journal:  Nucleic Acids Res       Date:  1994-05-11       Impact factor: 16.971

9.  An algorithm for identifying novel targets of transcription factor families: application to hypoxia-inducible factor 1 targets.

Authors:  Yue Jiang; Bojan Cukic; Donald A Adjeroh; Heath D Skinner; Jie Lin; Qingxi J Shen; Bing-Hua Jiang
Journal:  Cancer Inform       Date:  2009-03-04

10.  Methodology for constructing problem definitions in bioinformatics.

Authors:  Amy M Hauth; Gertraud Burger
Journal:  Bioinform Biol Insights       Date:  2008-04-24
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.