Literature DB >> 7584460

Identification of human gene structure using linear discriminant functions and dynamic programming.

V V Solovyev1, A A Salamov, C B Lawrence.   

Abstract

Development of advanced technique to identify gene structure is one of the main challenges of the Human Genome Project. Discriminant analysis was applied to the construction of recognition functions for various components of gene structure. Linear discriminant functions for splice sites, 5'-coding, internal exon, and 3'-coding region recognition have been developed. A gene structure prediction system FGENE has been developed based on the exon recognition functions. We compute a graph of mutual compatibility of different exons and present a gene structure models as paths of this directed acyclic graph. For an optimal model selection we apply a variant of dynamic programming algorithm to search for the path in the graph with the maximal value of the corresponding discriminant functions. Prediction by FGENE for 185 complete human gene sequences has 81% exact exon recognition accuracy and 91% accuracy at the level of individual exon nucleotides with the correlation coefficient (C) equals 0.90. Testing FGENE on 35 genes not used in the development of discriminant functions shows 71% accuracy of exact exon prediction and 89% at the nucleotide level (C = 0.86). FGENE compares very favorably with the other programs currently used to predict protein-coding regions. Analysis of uncharacterized human sequences based on our methods for splice site (HSPL, RNASPL), internal exons (HEXON), all type of exons (FEXH) and human (FGENEH) and bacterial (CDSB) gene structure prediction and recognition of human and bacterial sequences (HBR) (to test a library for E. coli contamination) is available through the University of Houston, Weizmann Institute of Science network server and a WWW page of the Human Genome Center at Baylor College of Medicine.

Entities:  

Mesh:

Substances:

Year:  1995        PMID: 7584460

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  12 in total

Review 1.  Computational gene finding in plants.

Authors:  Mihaela Pertea; Steven L Salzberg
Journal:  Plant Mol Biol       Date:  2002-01       Impact factor: 4.076

2.  Reevaluating human gene annotation: a second-generation analysis of chromosome 22.

Authors:  John E Collins; Melanie E Goward; Charlotte G Cole; Luc J Smink; Elizabeth J Huckle; Sarah Knowles; Jacqueline M Bye; David M Beare; Ian Dunham
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

3.  A complexity reduction algorithm for analysis and annotation of large genomic sequences.

Authors:  Trees-Juen Chuang; Wen-Chang Lin; Hurng-Chun Lee; Chi-Wei Wang; Keh-Lin Hsiao; Zi-Hao Wang; Danny Shieh; Simon C Lin; Lan-Yang Ch'ang
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

4.  GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

Authors:  Kevin L Howe; Tom Chothia; Richard Durbin
Journal:  Genome Res       Date:  2002-09       Impact factor: 9.043

5.  The Ensembl automatic gene annotation system.

Authors:  Val Curwen; Eduardo Eyras; T Daniel Andrews; Laura Clarke; Emmanuel Mongin; Steven M J Searle; Michele Clamp
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

6.  Random sheared fosmid library as a new genomic tool to accelerate complete finishing of rice (Oryza sativa spp. Nipponbare) genome sequence: sequencing of gap-specific fosmid clones uncovers new euchromatic portions of the genome.

Authors:  Jetty S S Ammiraju; Yeisoo Yu; Meizhong Luo; Dave Kudrna; HyeRan Kim; Jose L Goicoechea; Yuichi Katayose; Takashi Matsumoto; Jianzhong Wu; Takuji Sasaki; Rod A Wing
Journal:  Theor Appl Genet       Date:  2005-11-10       Impact factor: 5.699

7.  Genome annotation assessment in Drosophila melanogaster.

Authors:  M G Reese; G Hartzell; N L Harris; U Ohler; J F Abril; S E Lewis
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

8.  Molecular and functional characterization of a Taenia adhesion gene family (TAF) encoding potential protective antigens of Taenia saginata oncospheres.

Authors:  Luis Miguel Gonzalez; Pedro Bonay; Laura Benitez; Elizabeth Ferrer; Leslie J S Harrison; R Michael E Parkhouse; Teresa Garate
Journal:  Parasitol Res       Date:  2006-10-18       Impact factor: 2.289

9.  Spotted leaf11, a negative regulator of plant cell death and defense, encodes a U-box/armadillo repeat protein endowed with E3 ubiquitin ligase activity.

Authors:  Li-Rong Zeng; Shaohong Qu; Alicia Bordeos; Chengwei Yang; Marietta Baraoidan; Hongyan Yan; Qi Xie; Baek Hie Nahm; Hei Leung; Guo-Liang Wang
Journal:  Plant Cell       Date:  2004-09-17       Impact factor: 11.277

10.  The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes.

Authors:  James C Estill; Jeffrey L Bennetzen
Journal:  Plant Methods       Date:  2009-06-19       Impact factor: 4.993

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.