Literature DB >> 1480466

Assessment of protein coding measures.

J W Fickett1, C S Tung.   

Abstract

A number of methods for recognizing protein coding genes in DNA sequence have been published over the last 13 years, and new, more comprehensive algorithms, drawing on the repertoire of existing techniques, continue to be developed. To optimize continued development, it is valuable to systematically review and evaluate published techniques. At the core of most gene recognition algorithms is one or more coding measures--functions which produce, given any sample window of sequence, a number or vector intended to measure the degree to which a sample sequence resembles a window of 'typical' exonic DNA. In this paper we review and synthesize the underlying coding measures from published algorithms. A standardized benchmark is described, and each of the measures is evaluated according to this benchmark. Our main conclusion is that a very simple and obvious measure--counting oligomers--is more effective than any of the more sophisticated measures. Different measures contain different information. However there is a great deal of redundancy in the current suite of measures. We show that in future development of gene recognition algorithms, attention can probably be limited to six of the twenty or so measures proposed to date.

Mesh:

Substances:

Year:  1992        PMID: 1480466      PMCID: PMC334555          DOI: 10.1093/nar/20.24.6441

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  40 in total

1.  gm: a practical tool for automating DNA sequence analysis.

Authors:  C A Fields; C A Soderlund
Journal:  Comput Appl Biosci       Date:  1990-07

2.  Identifying coding exons by similarity search: alu-derived and other potentially misleading protein sequences.

Authors:  J M Claverie
Journal:  Genomics       Date:  1992-04       Impact factor: 5.736

3.  Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

Authors:  A Tramontano; M F Macchiato
Journal:  Nucleic Acids Res       Date:  1986-01-10       Impact factor: 16.971

4.  Database bias and the identification of protein coding sequences.

Authors:  M E Moody; B Fristensky
Journal:  DNA       Date:  1987-10

5.  Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences.

Authors:  P McCaldon; P Argos
Journal:  Proteins       Date:  1988

6.  The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences.

Authors:  M J Bibb; P R Findlay; M W Johnson
Journal:  Gene       Date:  1984-10       Impact factor: 3.688

7.  A method to locate protein coding sequences in DNA of prokaryotic systems.

Authors:  A S Kolaskar; B V Reddy
Journal:  Nucleic Acids Res       Date:  1985-01-11       Impact factor: 16.971

8.  Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.

Authors:  J C Shepherd
Journal:  Proc Natl Acad Sci U S A       Date:  1981-03       Impact factor: 11.205

9.  Recognition of protein coding regions in DNA sequences.

Authors:  J W Fickett
Journal:  Nucleic Acids Res       Date:  1982-09-11       Impact factor: 16.971

10.  A prevalent persistent global nonrandomness that distinguishes coding and non-coding eucaryotic nuclear DNA sequences.

Authors:  B E Blaisdell
Journal:  J Mol Evol       Date:  1983       Impact factor: 2.395

View more
  67 in total

1.  Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence.

Authors:  C Médigue; M Rose; A Viari; A Danchin
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

2.  Evaluation of gene-finding programs on mammalian sequences.

Authors:  S Rogic; A K Mackworth; F B Ouellette
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

Review 3.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

4.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions.

Authors:  Daniel Kotlar; Yizhar Lavner
Journal:  Genome Res       Date:  2003-07-17       Impact factor: 9.043

5.  Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

Authors:  Søren Mørk; Ian Holmes
Journal:  Bioinformatics       Date:  2012-01-03       Impact factor: 6.937

6.  Overlapping codes within protein-coding sequences.

Authors:  Shalev Itzkovitz; Eran Hodis; Eran Segal
Journal:  Genome Res       Date:  2010-09-14       Impact factor: 9.043

7.  Measuring the coding potential of genomic sequences through a combination of triplet occurrence patterns and RNY preference.

Authors:  Christoforos Nikolaou; Yannis Almirantis
Journal:  J Mol Evol       Date:  2004-09       Impact factor: 2.395

8.  A hidden Markov model that finds genes in E. coli DNA.

Authors:  A Krogh; I S Mian; D Haussler
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

Review 9.  Computational methods for exon detection.

Authors:  J M Claverie
Journal:  Mol Biotechnol       Date:  1998-08       Impact factor: 2.695

10.  Fission yeast gene structure and recognition.

Authors:  M Q Zhang; T G Marr
Journal:  Nucleic Acids Res       Date:  1994-05-11       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.