Literature DB >> 10779491

Ab initio gene finding in Drosophila genomic DNA.

A A Salamov1, V V Solovyev.   

Abstract

Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify approximately 90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf. html.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10779491      PMCID: PMC310882          DOI: 10.1101/gr.10.4.516

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  25 in total

Review 1.  Assessment of protein coding measures.

Authors:  J W Fickett; C S Tung
Journal:  Nucleic Acids Res       Date:  1992-12-25       Impact factor: 16.971

2.  Drosophila RNA polymerase II elongation factor DmS-II has homology to mouse S-II and sequence similarity to yeast PPR2.

Authors:  T K Marshall; H Guo; D H Price
Journal:  Nucleic Acids Res       Date:  1990-11-11       Impact factor: 16.971

3.  Prediction of gene structure.

Authors:  R Guigó; S Knudsen; N Drake; T Smith
Journal:  J Mol Biol       Date:  1992-07-05       Impact factor: 5.469

4.  A hidden Markov model that finds genes in E. coli DNA.

Authors:  A Krogh; I S Mian; D Haussler
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

5.  Genome annotation assessment in Drosophila melanogaster.

Authors:  M G Reese; G Hartzell; N L Harris; U Ohler; J F Abril; S E Lewis
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

6.  Optimally parsing a sequence into different classes based on multiple types of evidence.

Authors:  G D Stormo; D Haussler
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1994

7.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.

Authors:  E E Snyder; G D Stormo
Journal:  Nucleic Acids Res       Date:  1993-02-11       Impact factor: 16.971

8.  Prediction of human mRNA donor and acceptor sites from the DNA sequence.

Authors:  S Brunak; J Engelbrecht; S Knudsen
Journal:  J Mol Biol       Date:  1991-07-05       Impact factor: 5.469

9.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.

Authors:  V V Solovyev; A A Salamov; C B Lawrence
Journal:  Nucleic Acids Res       Date:  1994-12-11       Impact factor: 16.971

10.  Identification of human gene functional regions based on oligonucleotide composition.

Authors:  V V Solovyev; C B Lawrence
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1993
View more
  521 in total

1.  Identification and analysis of Arabidopsis expressed sequence tags characteristic of non-coding RNAs.

Authors:  G C MacIntosh; C Wilkerson; P J Green
Journal:  Plant Physiol       Date:  2001-11       Impact factor: 8.340

2.  Anatomical and physiological evidence for involvement of tuberoinfundibular peptide of 39 residues in nociception.

Authors:  Arpad Dobolyi; Hiroshi Ueda; Hitoshi Uchida; Miklós Palkovits; Ted B Usdin
Journal:  Proc Natl Acad Sci U S A       Date:  2002-01-29       Impact factor: 11.205

3.  Rolling-circle transposons in eukaryotes.

Authors:  V V Kapitonov; J Jurka
Journal:  Proc Natl Acad Sci U S A       Date:  2001-07-10       Impact factor: 11.205

4.  Mariner-like transposases are widespread and diverse in flowering plants.

Authors:  Cédric Feschotte; Susan R Wessler
Journal:  Proc Natl Acad Sci U S A       Date:  2001-12-26       Impact factor: 11.205

5.  Computational inference of homologous gene structures in the human genome.

Authors:  R F Yeh; L P Lim; C B Burge
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

6.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

7.  Compositional gradients in Gramineae genes.

Authors:  Gane Ka-Shu Wong; Jun Wang; Lin Tao; Jun Tan; JianGuo Zhang; Douglas A Passey; Jun Yu
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

8.  A genetic model for the female sterility barrier between Asian and African cultivated rice species.

Authors:  Andrea Garavito; Romain Guyot; Jaime Lozano; Frédérick Gavory; Sylvie Samain; Olivier Panaud; Joe Tohme; Alain Ghesquière; Mathias Lorieux
Journal:  Genetics       Date:  2010-05-10       Impact factor: 4.562

9.  Proteomic survey of metabolic pathways in rice.

Authors:  Antonius Koller; Michael P Washburn; B Markus Lange; Nancy L Andon; Cosmin Deciu; Paul A Haynes; Lara Hays; David Schieltz; Ryan Ulaszek; Jing Wei; Dirk Wolters; John R Yates
Journal:  Proc Natl Acad Sci U S A       Date:  2002-08-05       Impact factor: 11.205

10.  Genome of the Chinese tree shrew.

Authors:  Yu Fan; Zhi-Yong Huang; Chang-Chang Cao; Ce-Shi Chen; Yuan-Xin Chen; Ding-Ding Fan; Jing He; Hao-Long Hou; Li Hu; Xin-Tian Hu; Xuan-Ting Jiang; Ren Lai; Yong-Shan Lang; Bin Liang; Sheng-Guang Liao; Dan Mu; Yuan-Ye Ma; Yu-Yu Niu; Xiao-Qing Sun; Jin-Quan Xia; Jin Xiao; Zhi-Qiang Xiong; Lin Xu; Lan Yang; Yun Zhang; Wei Zhao; Xu-Dong Zhao; Yong-Tang Zheng; Ju-Min Zhou; Ya-Bing Zhu; Guo-Jie Zhang; Jun Wang; Yong-Gang Yao
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.