Literature DB >> 8786136

Evaluation of gene structure prediction programs.

M Burset1, R Guigó.   

Abstract

We evaluate a number of computer programs designed to predict the structure of protein coding genes in genomic DNA sequences. Computational gene identification is set to play an increasingly important role in the development of the genome projects, as emphasis turns from mapping to large-scale sequencing. The evaluation presented here serves both to assess the current status of the problem and to identify the most promising approaches to ensure further progress. The programs analyzed were uniformly tested on a large set of vertebrate sequences with simple gene structure, and several measures of predictive accuracy were computed at the nucleotide, exon, and protein product levels. The results indicated that the predictive accuracy of the programs analyzed was lower than originally found. The accuracy was even lower when considering only those sequences that had recently been entered and that did not show any similarity to previously entered sequences. This indicates that the programs are overly dependent on the particularities of the examples they learn from. For most of the programs, accuracy in this test set ranged from 0.60 to 0.70 as measured by the Correlation Coefficient (where 1.0 corresponds to a perfect prediction and 0.0 is the value expected for a random prediction), and the average percentage of exons exactly identified was less than 50%. Only those programs including protein sequence database searches showed substantially greater accuracy. The accuracy of the programs was severely affected by relatively high rates of sequence errors. Since the set on which the programs were tested included only relatively short sequences with simple gene structure, the accuracy of the programs is likely to be even lower when used for large uncharacterized genomic sequences with complex structure. While in such cases, programs currently available may still be of great use in pinpointing the regions likely to contain exons, they are far from being powerful enough to elucidate its genomic structure completely.

Mesh:

Substances:

Year:  1996        PMID: 8786136     DOI: 10.1006/geno.1996.0298

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


  144 in total

1.  Positional characterisation of false positives from computational prediction of human splice sites.

Authors:  T A Thanaraj
Journal:  Nucleic Acids Res       Date:  2000-02-01       Impact factor: 16.971

2.  Is "junk" DNA mostly intron DNA?

Authors:  G K Wong; D A Passey; Y Huang; Z Yang; J Yu
Journal:  Genome Res       Date:  2000-11       Impact factor: 9.043

3.  An assessment of gene prediction accuracy in large DNA sequences.

Authors:  R Guigó; P Agarwal; J F Abril; M Burset; J W Fickett
Journal:  Genome Res       Date:  2000-10       Impact factor: 9.043

4.  Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve.

Authors:  C T Zhang; J Wang
Journal:  Nucleic Acids Res       Date:  2000-07-15       Impact factor: 16.971

5.  Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER.

Authors:  Gautam Aggarwal; Ramakrishna Ramaswamy
Journal:  J Biosci       Date:  2002-02       Impact factor: 1.826

Review 6.  The BioTools Suite. A comprehensive suite of platform-independent bioinformatics tools.

Authors:  D S Wishart; S Fortin
Journal:  Mol Biotechnol       Date:  2001-09       Impact factor: 2.695

7.  SGP-1: prediction and validation of homologous genes based on sequence alignments.

Authors:  T Wiehe; S Gebauer-Jung; T Mitchell-Olds; R Guigó
Journal:  Genome Res       Date:  2001-09       Impact factor: 9.043

8.  Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing.

Authors:  M Brudno; M S Gelfand; S Spengler; M Zorn; I Dubchak; J G Conboy
Journal:  Nucleic Acids Res       Date:  2001-06-01       Impact factor: 16.971

9.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs.

Authors:  Z Kan; E C Rouchka; W R Gish; D J States
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

10.  Evaluation of gene-finding programs on mammalian sequences.

Authors:  S Rogic; A K Mackworth; F B Ouellette
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.