Literature DB >> 7765200

Recognizing exons in genomic sequence using GRAIL II.

Y Xu1, R Mural, M Shah, E Uberbacher.   

Abstract

We have described an improved neural network system for recognizing protein coding regions (exons) in human genomic DNA sequences. This coding region recognition system is part of a new version of GRAIL, GRAIL II, and represents a significant improvement over the coding recognition performance of the previous GRAIL system. GRAIL II divides the process of locating exons into four steps. It first generates an exon candidate pool consisting of all possible (translation start-donor), (acceptor-donor), and (acceptor-translation stop) pairs within all open reading frames of the test sequence. The vast majority of these exon candidates are eliminated from consideration by applying a set of heuristic rules. After reducing the size of the candidate pool, GRAIL II uses three trained neural networks to evaluate the coding potential and accuracy of the edges of starting exon, internal exon and terminal exon candidates. These networks output a set of overlapping candidates for each exon which differ by their scores and position of their edges. Multiple candidates for a given exon are grouped into a cluster based on their locations relative to candidates corresponding to other exons, and the highest scoring candidate for each cluster is used as the "best" prediction of the corresponding exon. Unlike the previous GRAIL version, GRAIL II uses variable-length windows to evaluate exon candidates and its performance is nearly independent of exon length. In addition to several strong indicators of coding potential, the system uses several other types of information including scores for splice junctions, GC composition, and the properties of the regions adjacent to an exon candidate, to aid in the discrimination process. On a large set of sequences from Genbank (3), GRAIL II located 93% of all exons regardless of size with a false positive rate of 12%. Among the true positives, 62% match the actual exons exactly (the exons edges are correct to the base), and 93% match at least one edge correctly. These statistics are further improved, especially the false positive rate and accuracy of the edges, through a process of gene model construction by the Gene Assembly Program (GAP III) (4) module of GRAIL II, which uses the scored exon candidates as input and constructs optimal gene models. The gene modeling system will be described elsewhere.

Entities:  

Mesh:

Year:  1994        PMID: 7765200

Source DB:  PubMed          Journal:  Genet Eng (N Y)        ISSN: 0196-3716


  13 in total

1.  Analysis of 148 kb of genomic DNA around the wnt1 locus of Fugu rubripes.

Authors:  K Gellner; S Brenner
Journal:  Genome Res       Date:  1999-03       Impact factor: 9.043

2.  An effective approach for analyzing "prefinished" genomic sequence data.

Authors:  P M Kuehl; J M Weisemann; J W Touchman; E D Green; M S Boguski
Journal:  Genome Res       Date:  1999-02       Impact factor: 9.043

3.  A complexity reduction algorithm for analysis and annotation of large genomic sequences.

Authors:  Trees-Juen Chuang; Wen-Chang Lin; Hurng-Chun Lee; Chi-Wei Wang; Keh-Lin Hsiao; Zi-Hao Wang; Danny Shieh; Simon C Lin; Lan-Yang Ch'ang
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

4.  Rat Genome Database (RGD): mapping disease onto the genome.

Authors:  Simon Twigger; Jian Lu; Mary Shimoyama; Dan Chen; Dean Pasko; Hanping Long; Jessica Ginster; Chin-Fu Chen; Rajni Nigam; Anne Kwitek; Janan Eppig; Lois Maltais; Donna Maglott; Greg Schuler; Howard Jacob; Peter J Tonellato
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

5.  SeqHelp: a program to analyze molecular sequences utilizing common computational resources.

Authors:  M K Lee; E D Lynch; M C King
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

6.  The genomic region encompassing the nephropathic cystinosis gene (CTNS): complete sequencing of a 200-kb segment and discovery of a novel gene within the common cystinosis-causing deletion.

Authors:  J W Touchman; Y Anikster; N L Dietrich; V V Maduro; G McDowell; V Shotelersuk; G G Bouffard; S M Beckstrom-Sternberg; W A Gahl; E D Green
Journal:  Genome Res       Date:  2000-02       Impact factor: 9.043

7.  Chromosome evolution: the junction of mammalian chromosomes in the formation of mouse chromosome 10.

Authors:  M T Pletcher; B A Roe; F Chen; T Do; A Do; E Malaj; R H Reeves
Journal:  Genome Res       Date:  2000-10       Impact factor: 9.043

8.  The mouse Clock locus: sequence and comparative analysis of 204 kb from mouse chromosome 5.

Authors:  L D Wilsbacher; A M Sangoram; M P Antoch; J S Takahashi
Journal:  Genome Res       Date:  2000-12       Impact factor: 9.043

9.  Large-scale sequencing of two regions in human chromosome 7q22: analysis of 650 kb of genomic sequence around the EPO and CUTL1 loci reveals 17 genes.

Authors:  G Glöckner; S Scherer; R Schattevoy; A Boright; J Weber; L C Tsui; A Rosenthal
Journal:  Genome Res       Date:  1998-10       Impact factor: 9.043

10.  Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13.

Authors:  W Jang; A Hua; S V Spilson; W Miller; B A Roe; M H Meisler
Journal:  Genome Res       Date:  1999-01       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.