Literature DB >> 15297987

Multivariate entropy distance method for prokaryotic gene identification.

Zhengqing Ouyang1, Huaiqiu Zhu, Jin Wang, Zhen-Su She.   

Abstract

A new simple method is found for efficient and accurate identification of coding sequences in prokaryotic genome. The method employs a Shannon description of artificial language for DNA sequences. It consists in translating a DNA sequence into a pseudo-amino acid sequence with 20 fundamental words according to the universal genetic code. With an entropy-density profile (EDP), the method maps a sequence of finite length to a vector and then analyzes its position in the 20-dimensional phase space depending on its nature. It is found that the ratio of the relative distance to an averaged coding and non-coding EDP over a small number (up to one) of open reading frames (ORFs) can serve as a good coding potential. An iterative algorithm is designed for finding a set of "root" sequences using this coding potential. A multivariate entropy distance (MED) algorithm is then proposed for the identification of prokaryotic genes; it has a feature to combine the use of a coding potential and an EDP-based sequence similarity analysis. The current version of MED is unsupervised, parameter-free and simple to implement. It is demonstrated to be able to detect 95-99% genes with 10-30% of additional genes when tested against the RefSeq database of NCBI and to detect 97.5-99.8% of confirmed genes with known functions. It is also shown to be able to find a set of (functionally known) genes that are missed by other well-known gene finding algorithms. All measurements show that the MED algorithm reaches a similar performance level as the algorithms like GeneMark and Glimmer for prokaryotic gene prediction.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15297987     DOI: 10.1142/s0219720004000624

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  9 in total

1.  Defining genes in the genome of the hyperthermophilic archaeon Pyrococcus furiosus: implications for all microbial genomes.

Authors:  Farris L Poole; Brian A Gerwe; Robert C Hopkins; Gerrit J Schut; Michael V Weinberg; Francis E Jenney; Michael W W Adams
Journal:  J Bacteriol       Date:  2005-11       Impact factor: 3.490

2.  Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors:  Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal:  Bioinformatics       Date:  2007-01-19       Impact factor: 6.937

3.  Exploration of multivariate analysis in microbial coding sequence modeling.

Authors:  Tahir Mehmood; Jon Bohlin; Anja Bråthen Kristoffersen; Solve Sæbø; Jonas Warringer; Lars Snipen
Journal:  BMC Bioinformatics       Date:  2012-05-14       Impact factor: 3.169

4.  Rohlin distance and the evolution of influenza A virus: weak attractors and precursors.

Authors:  Raffaella Burioni; Riccardo Scalco; Mario Casartelli
Journal:  PLoS One       Date:  2011-12-06       Impact factor: 3.240

5.  Presence of extensive Wolbachia symbiont insertions discovered in the genome of its host Glossina morsitans morsitans.

Authors:  Corey Brelsfoard; George Tsiamis; Marco Falchetto; Ludvik M Gomulski; Erich Telleria; Uzma Alam; Vangelis Doudoumis; Francesca Scolari; Joshua B Benoit; Martin Swain; Peter Takac; Anna R Malacrida; Kostas Bourtzis; Serap Aksoy
Journal:  PLoS Negl Trop Dis       Date:  2014-04-24

6.  PHANOTATE: a novel approach to gene identification in phage genomes.

Authors:  Katelyn McNair; Carol Zhou; Elizabeth A Dinsdale; Brian Souza; Robert A Edwards
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

7.  Gene prediction in metagenomic fragments based on the SVM algorithm.

Authors:  Yongchu Liu; Jiangtao Guo; Gangqing Hu; Huaiqiu Zhu
Journal:  BMC Bioinformatics       Date:  2013-04-10       Impact factor: 3.169

8.  The Genome Reverse Compiler: an explorative annotation tool.

Authors:  Andrew S Warren; João Carlos Setubal
Journal:  BMC Bioinformatics       Date:  2009-01-27       Impact factor: 3.169

9.  MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

Authors:  Huaiqiu Zhu; Gang-Qing Hu; Yi-Fan Yang; Jin Wang; Zhen-Su She
Journal:  BMC Bioinformatics       Date:  2007-03-16       Impact factor: 3.169

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.