Literature DB >> 15716020

GeneLook: a novel ab initio gene identification system suitable for automated annotation of prokaryotic sequences.

Tatsunari Nishi1, Toshimichi Ikemura, Shigehiko Kanaya.   

Abstract

With the rapid increases in the amounts of sequence data for prokaryotic genomes, it has become important to develop systems for automated and accurate genome annotation. We present herein a novel ab initio gene identification system, GeneLook, that predicts protein-coding open reading frames (ORFs) with high sensitivity and specificity with no prior knowledge of the sequence composition. The system predicts protein-coding ORFs in two stages, seed ORF selection and main prediction. In the selection of reliable seed ORFs containing at least 200 codons, GeneLook predicts translation start sites and operon structures through searches for ribosome-binding sites and a novel operon prediction algorithm. The codon and nucleotide frequencies of seed ORFs are then used to determine values for two new coding-potential parameters for identification of protein-coding ORFs of at least 34 codons and for another parameter that improves the prediction accuracy for GC-rich genomes. In the main prediction, GeneLook uses these parameters to identify the most likely genes of a given minimal length. We assessed the performance of GeneLook with two indices, sensitivity and specificity that are defined as true positives (TP)/(TP+false negatives) and TP/(TP+false positives), respectively. This system predicted protein-coding ORFs for Escherichia coli and Bacillus subtilis with sensitivities of 96.5% and 96.2%, respectively, and specificities of 96.9% and 96.1%, respectively. The system also identified 94.1% of annotated genes of the Pseudomonas aeruginosa genome, which is GC-rich, with high specificity (97.2%). Furthermore, GeneLook identified protein-coding ORFs with high accuracy from a wide variety of prokaryotic genomes.

Entities:  

Mesh:

Year:  2005        PMID: 15716020     DOI: 10.1016/j.gene.2004.10.018

Source DB:  PubMed          Journal:  Gene        ISSN: 0378-1119            Impact factor:   3.688


  6 in total

Review 1.  Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

Authors:  Tim T Binnewies; Yair Motro; Peter F Hallin; Ole Lund; David Dunn; Tom La; David J Hampson; Matthew Bellgard; Trudy M Wassenaar; David W Ussery
Journal:  Funct Integr Genomics       Date:  2006-05-12       Impact factor: 3.410

2.  Proteome-wide protein interaction measurements of bacterial proteins of unknown function.

Authors:  Matthias Meier; Rene V Sit; Stephen R Quake
Journal:  Proc Natl Acad Sci U S A       Date:  2012-12-24       Impact factor: 11.205

3.  Genome and transcriptome analysis of the food-yeast Candida utilis.

Authors:  Yasuyuki Tomita; Kazuho Ikeo; Hideyuki Tamakawa; Takashi Gojobori; Shigehito Ikushima
Journal:  PLoS One       Date:  2012-05-18       Impact factor: 3.240

4.  MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

Authors:  Huaiqiu Zhu; Gang-Qing Hu; Yi-Fan Yang; Jin Wang; Zhen-Su She
Journal:  BMC Bioinformatics       Date:  2007-03-16       Impact factor: 3.169

5.  Draft Genome Sequence of Methanothermobacter sp. Strain EMTCatA1, Reconstructed from the Metagenome of a Thermophilic Electromethanogenesis-Catalyzing Biocathode.

Authors:  Hajime Kobayashi; Xiaohan Sun; Qian Fu; Haruo Maeda; Kozo Sato
Journal:  Genome Announc       Date:  2017-08-31

6.  Draft Genome Sequence of a Novel Coriobacteriaceae sp. Strain, EMTCatB1, Reconstructed from the Metagenome of a Thermophilic Electromethanogenic Biocathode.

Authors:  Hajime Kobayashi; Qian Fu; Haruo Maeda; Kozo Sato
Journal:  Genome Announc       Date:  2017-03-09
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.