Literature DB >> 11488920

Identification of protein-coding genes in the genome of Vibrio cholerae with more than 98% accuracy using occurrence frequencies of single nucleotides.

J Wang1, C T Zhang.   

Abstract

The published sequence of the Vibrio cholerae genome indicates that, in addition to the genes that encode proteins of known and unknown function, there are 1577 ORFs identified as conserved hypothetical or hypothetical gene candidates. Because the annotation is not 100% accurate, it is not known which of the 1577 ORFs are true protein-coding genes. In this paper, an algorithm based on the Z curve method, with sensitivity, specificity and accuracy greater than 98%, is used to solve this problem. Twenty-fold cross-validation tests show that the accuracy of the algorithm is 98.8%. A detailed discussion of the mechanism of the algorithm is also presented. It was found that 172 of the 1577 ORFs are unlikely to be protein-coding genes. The number of protein-coding genes in the V. cholerae genome was re-estimated and found to be approximately 3716. This result should be of use in microarray analysis of gene expression in the genome, because the cost of preparing chips may be somewhat decreased. A computer program was written to calculate a coding score called VCZ for gene identification in the genome. Coding/noncoding is simply determined by VCZ > 0/VCZ < 0. The program is freely available on request for academic use.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11488920     DOI: 10.1046/j.1432-1327.2001.02341.x

Source DB:  PubMed          Journal:  Eur J Biochem        ISSN: 0014-2956


  6 in total

1.  ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.

Authors:  Feng-Biao Guo; Hong-Yu Ou; Chun-Ting Zhang
Journal:  Nucleic Acids Res       Date:  2003-03-15       Impact factor: 16.971

2.  An integrative method for identifying the over-annotated protein-coding genes in microbial genomes.

Authors:  Jia-Feng Yu; Ke Xiao; Dong-Ke Jiang; Jing Guo; Ji-Hua Wang; Xiao Sun
Journal:  DNA Res       Date:  2011-09-08       Impact factor: 4.458

Review 3.  A brief review of computational gene prediction methods.

Authors:  Zhuo Wang; Yazhu Chen; Yixue Li
Journal:  Genomics Proteomics Bioinformatics       Date:  2004-11       Impact factor: 7.691

4.  Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity-based and composition-based methods.

Authors:  Feng-Biao Guo; Lifeng Xiong; Jade L L Teng; Kwok-Yung Yuen; Susanna K P Lau; Patrick C Y Woo
Journal:  DNA Res       Date:  2013-04-09       Impact factor: 4.458

5.  Recognition of Protein-coding Genes Based on Z-curve Algorithms.

Authors:  Feng -Biao Guo; Yan Lin; Ling -Ling Chen
Journal:  Curr Genomics       Date:  2014-04       Impact factor: 2.236

6.  A Brief Review: The Z-curve Theory and its Application in Genome Analysis.

Authors:  Ren Zhang; Chun-Ting Zhang
Journal:  Curr Genomics       Date:  2014-04       Impact factor: 2.236

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.