Literature DB >> 15247104

Accuracy improvement for identifying translation initiation sites in microbial genomes.

Huai-Qiu Zhu1, Gang-Qing Hu, Zheng-Qing Ouyang, Jin Wang, Zhen-Su She.   

Abstract

MOTIVATION: At present the computational gene identification methods in microbial genomes have a high prediction accuracy of verified translation termination site (3' end), but a much lower accuracy of the translation initiation site (TIS, 5' end). The latter is important to the analysis and the understanding of the putative protein of a gene and the regulatory machinery of the translation. Improving the accuracy of prediction of TIS is one of the remaining open problems.
RESULTS: In this paper, we develop a four-component statistical model to describe the TIS of prokaryotic genes. The model incorporates several features with biological meanings, including the correlation between translation termination site and TIS of genes, the sequence content around the start codon; the sequence content of the consensus signal related to ribosomal binding sites (RBSs), and the correlation between TIS and the upstream consensus signal. An entirely non-supervised training system is constructed, which takes as input a set of annotated coding open reading frames (ORFs) by any gene finder, and gives as output a set of organism-specific parameters (without any prior knowledge or empirical constants and formulas). The novel algorithm is tested on a set of reliable datasets of genes from Escherichia coli and Bacillus subtillis. MED-Start may correctly predict 95.4% of the start sites of 195 experimentally confirmed E.coli genes, 96.6% of 58 reliable B.subtillis genes. Moreover, the test results indicate that the algorithm gives higher accuracy for more reliable datasets, and is robust to the variation of gene length. MED-Start may be used as a postprocessor for a gene finder. After processing by our program, the improvement of gene start prediction of gene finder system is remarkable, e.g. the accuracy of TIS predicted by MED 1.0 increases from 61.7 to 91.5% for 854 E.coli verified genes, while that by GLIMMER 2.02 increases from 63.2 to 92.0% for the same dataset. These results show that our algorithm is one of the most accurate methods to identify TIS of prokaryotic genomes. AVAILABILITY: The program MED-Start can be accessed through the website of CTB at Peking University: http://ctb.pku.edu.cn/main/SheGroup/MED_Start.htm.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15247104     DOI: 10.1093/bioinformatics/bth390

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  25 in total

1.  DNA-energetics-based analyses suggest additional genes in prokaryotes.

Authors:  Garima Khandelwal; Jalaj Gupta; B Jayaram
Journal:  J Biosci       Date:  2012-07       Impact factor: 1.826

2.  GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes.

Authors:  Amrita Pati; Natalia N Ivanova; Natalia Mikhailova; Galina Ovchinnikova; Sean D Hooper; Athanasios Lykidis; Nikos C Kyrpides
Journal:  Nat Methods       Date:  2010-05-02       Impact factor: 28.547

3.  Defining genes in the genome of the hyperthermophilic archaeon Pyrococcus furiosus: implications for all microbial genomes.

Authors:  Farris L Poole; Brian A Gerwe; Robert C Hopkins; Gerrit J Schut; Michael V Weinberg; Francis E Jenney; Michael W W Adams
Journal:  J Bacteriol       Date:  2005-11       Impact factor: 3.490

4.  Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors:  Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal:  Bioinformatics       Date:  2007-01-19       Impact factor: 6.937

5.  Complete genome sequence of a beneficial plant root-associated bacterium, Pseudomonas brassicacearum.

Authors:  Philippe Ortet; Mohamed Barakat; David Lalaouna; Sylvain Fochesato; Valérie Barbe; Benoit Vacherie; Catherine Santaella; Thierry Heulin; Wafa Achouak
Journal:  J Bacteriol       Date:  2011-04-22       Impact factor: 3.490

6.  SearchDOGS bacteria, software that provides automated identification of potentially missed genes in annotated bacterial genomes.

Authors:  Seán S Óhéigeartaigh; David Armisén; Kevin P Byrne; Kenneth H Wolfe
Journal:  J Bacteriol       Date:  2014-03-21       Impact factor: 3.490

7.  Re-annotation of two hyperthermophilic archaea Pyrococcus abyssi GE5 and Pyrococcus furiosus DSM 3638.

Authors:  Junxiang Gao; Ji Wang
Journal:  Curr Microbiol       Date:  2011-11-06       Impact factor: 2.188

8.  Identification of the protein sequence of the type III effector XopD from the B100 strain of Xanthomonas campestris pv campestris.

Authors:  Joanne Canonne; Carole Pichereaux; Daniel Marino; Dominique Roby; Michel Rossignol; Susana Rivas
Journal:  Plant Signal Behav       Date:  2012-02-01

9.  Analysis of the secretome and identification of novel constituents from culture filtrate of bacillus Calmette-Guerin using high-resolution mass spectrometry.

Authors:  Jianhua Zheng; Xianwen Ren; Candong Wei; Jian Yang; Yongfeng Hu; Liguo Liu; Xingye Xu; Jin Wang; Qi Jin
Journal:  Mol Cell Proteomics       Date:  2013-04-24       Impact factor: 5.911

10.  Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

Authors:  Yuko Makita; Michiel J L de Hoon; Antoine Danchin
Journal:  BMC Bioinformatics       Date:  2007-02-08       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.