Literature DB >> 10779488

Genome annotation assessment in Drosophila melanogaster.

M G Reese1, G Hartzell, N L Harris, U Ohler, J F Abril, S E Lewis.   

Abstract

Computational methods for automated genome annotation are critical to our community's ability to make full use of the large volume of genomic sequence being generated and released. To explore the accuracy of these automated feature prediction tools in the genomes of higher organisms, we evaluated their performance on a large, well-characterized sequence contig from the Adh region of Drosophila melanogaster. This experiment, known as the Genome Annotation Assessment Project (GASP), was launched in May 1999. Twelve groups, applying state-of-the-art tools, contributed predictions for features including gene structure, protein homologies, promoter sites, and repeat elements. We evaluated these predictions using two standards, one based on previously unreleased high-quality full-length cDNA sequences and a second based on the set of annotations generated as part of an in-depth study of the region by a group of Drosophila experts. Although these standard sets only approximate the unknown distribution of features in this region, we believe that when taken in context the results of an evaluation based on them are meaningful. The results were presented as a tutorial at the conference on Intelligent Systems in Molecular Biology (ISMB-99) in August 1999. Over 95% of the coding nucleotides in the region were correctly identified by the majority of the gene finders, and the correct intron/exon structures were predicted for >40% of the genes. Homology-based annotation techniques recognized and associated functions with almost half of the genes in the region; the remainder were only identified by the ab initio techniques. This experiment also presents the first assessment of promoter prediction techniques for a significant number of genes in a large contiguous region. We discovered that the promoter predictors' high false-positive rates make their predictions difficult to use. Integrating gene finding and cDNA/EST alignments with promoter predictions decreases the number of false-positive classifications but discovers less than one-third of the promoters in the region. We believe that by establishing standards for evaluating genomic annotations and by assessing the performance of existing automated genome annotation tools, this experiment establishes a baseline that contributes to the value of ongoing large-scale annotation projects and should guide further research in genome informatics.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10779488      PMCID: PMC310877          DOI: 10.1101/gr.10.4.483

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  49 in total

1.  REPuter: fast computation of maximal repeats in complete genomes.

Authors:  S Kurtz; C Schleiermacher
Journal:  Bioinformatics       Date:  1999-05       Impact factor: 6.937

2.  Interpolated markov chains for eukaryotic promoter recognition.

Authors:  U Ohler; S Harbeck; H Niemann; E Nöth; M G Reese
Journal:  Bioinformatics       Date:  1999-05       Impact factor: 6.937

3.  Critical assessment of methods of protein structure prediction (CASP): round III.

Authors:  J Moult; T Hubbard; K Fidelis; J T Pedersen
Journal:  Proteins       Date:  1999

4.  Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations.

Authors:  S Henikoff; J G Henikoff; S Pietrokovski
Journal:  Bioinformatics       Date:  1999-06       Impact factor: 6.937

5.  Heuristic approach to deriving models for gene finding.

Authors:  J Besemer; M Borodovsky
Journal:  Nucleic Acids Res       Date:  1999-10-01       Impact factor: 16.971

6.  The Eukaryotic Promoter Database (EPD): recent developments.

Authors:  R C Périer; T Junier; C Bonnard; P Bucher
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

7.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

8.  Scriptable access to the Caenorhabditis elegans genome sequence and other ACEDB databases.

Authors:  L D Stein; J Thierry-Mieg
Journal:  Genome Res       Date:  1998-12       Impact factor: 9.043

9.  Processing and analysis of CASP3 protein structure predictions.

Authors:  A Zemla; C Venclovas; J Moult; K Fidelis
Journal:  Proteins       Date:  1999

10.  An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

Authors:  M Ashburner; S Misra; J Roote; S E Lewis; R Blazej; T Davis; C Doyle; R Galle; R George; N Harris; G Hartzell; D Harvey; L Hong; K Houston; R Hoskins; G Johnson; C Martin; A Moshrefi; M Palazzolo; M G Reese; A Spradling; G Tsang; K Wan; K Whitelaw; S Celniker
Journal:  Genetics       Date:  1999-09       Impact factor: 4.562

View more
  73 in total

1.  Promoter prediction on a genomic scale--the Adh experience.

Authors:  U Ohler
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

2.  First pass annotation of promoters on human chromosome 22.

Authors:  M Scherf; A Klingenhoff; K Frech; K Quandt; R Schneider; K Grote; M Frisch; V Gailus-Durner; A Seidel; R Brack-Werner; T Werner
Journal:  Genome Res       Date:  2001-03       Impact factor: 9.043

3.  Gene2EST: a BLAST2 server for searching expressed sequence tag (EST) databases with eukaryotic gene-sized queries.

Authors:  C Gemünd; C Ramu; B Altenberg-Greulich; T J Gibson
Journal:  Nucleic Acids Res       Date:  2001-03-15       Impact factor: 16.971

4.  Is "junk" DNA mostly intron DNA?

Authors:  G K Wong; D A Passey; Y Huang; Z Yang; J Yu
Journal:  Genome Res       Date:  2000-11       Impact factor: 9.043

5.  An assessment of gene prediction accuracy in large DNA sequences.

Authors:  R Guigó; P Agarwal; J F Abril; M Burset; J W Fickett
Journal:  Genome Res       Date:  2000-10       Impact factor: 9.043

6.  A protein trap strategy to detect GFP-tagged proteins expressed from their endogenous loci in Drosophila.

Authors:  X Morin; R Daneman; M Zavortink; W Chia
Journal:  Proc Natl Acad Sci U S A       Date:  2001-12-11       Impact factor: 11.205

7.  An automated annotation tool for genomic DNA sequences using GeneScan and BLAST.

Authors:  A M Lynn; C K Jain; K Kosalai; P Barman; N Thakur; H Batra; A Bhattacharya
Journal:  J Genet       Date:  2001-04       Impact factor: 1.166

8.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs.

Authors:  Z Kan; E C Rouchka; W R Gish; D J States
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

Review 9.  Genomic sequence, splicing, and gene annotation.

Authors:  S M Mount
Journal:  Am J Hum Genet       Date:  2000-09-08       Impact factor: 11.025

10.  Quantitative measures for the management and comparison of annotated genomes.

Authors:  Karen Eilbeck; Barry Moore; Carson Holt; Mark Yandell
Journal:  BMC Bioinformatics       Date:  2009-02-23       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.