Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multivariate entropy distance method for prokaryotic gene identification.

Literature DB >> 15297987

Multivariate entropy distance method for prokaryotic gene identification.

Zhengqing Ouyang¹, Huaiqiu Zhu, Jin Wang, Zhen-Su She.

Abstract

A new simple method is found for efficient and accurate identification of coding sequences in prokaryotic genome. The method employs a Shannon description of artificial language for DNA sequences. It consists in translating a DNA sequence into a pseudo-amino acid sequence with 20 fundamental words according to the universal genetic code. With an entropy-density profile (EDP), the method maps a sequence of finite length to a vector and then analyzes its position in the 20-dimensional phase space depending on its nature. It is found that the ratio of the relative distance to an averaged coding and non-coding EDP over a small number (up to one) of open reading frames (ORFs) can serve as a good coding potential. An iterative algorithm is designed for finding a set of "root" sequences using this coding potential. A multivariate entropy distance (MED) algorithm is then proposed for the identification of prokaryotic genes; it has a feature to combine the use of a coding potential and an EDP-based sequence similarity analysis. The current version of MED is unsupervised, parameter-free and simple to implement. It is demonstrated to be able to detect 95-99% genes with 10-30% of additional genes when tested against the RefSeq database of NCBI and to detect 97.5-99.8% of confirmed genes with known functions. It is also shown to be able to find a set of (functionally known) genes that are missed by other well-known gene finding algorithms. All measurements show that the MED algorithm reaches a similar performance level as the algorithms like GeneMark and Glimmer for prokaryotic gene prediction.

Entities: Chemical

Mesh：

Substances：

Year: 2004 PMID： 15297987 DOI： 10.1142/s0219720004000624

Source DB: PubMed Journal: J Bioinform Comput Biol ISSN： 0219-7200 Impact factor: 1.122

Keyword Cloud
Cited

9 in total

1. Defining genes in the genome of the hyperthermophilic archaeon Pyrococcus furiosus: implications for all microbial genomes.

Authors: Farris L Poole; Brian A Gerwe; Robert C Hopkins; Gerrit J Schut; Michael V Weinberg; Francis E Jenney; Michael W W Adams
Journal: J Bacteriol Date: 2005-11 Impact factor: 3.490

2. Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors: Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

3. Exploration of multivariate analysis in microbial coding sequence modeling.

Authors: Tahir Mehmood; Jon Bohlin; Anja Bråthen Kristoffersen; Solve Sæbø; Jonas Warringer; Lars Snipen
Journal: BMC Bioinformatics Date: 2012-05-14 Impact factor: 3.169

4. Rohlin distance and the evolution of influenza A virus: weak attractors and precursors.

Authors: Raffaella Burioni; Riccardo Scalco; Mario Casartelli
Journal: PLoS One Date: 2011-12-06 Impact factor: 3.240

5. Presence of extensive Wolbachia symbiont insertions discovered in the genome of its host Glossina morsitans morsitans.

Authors: Corey Brelsfoard; George Tsiamis; Marco Falchetto; Ludvik M Gomulski; Erich Telleria; Uzma Alam; Vangelis Doudoumis; Francesca Scolari; Joshua B Benoit; Martin Swain; Peter Takac; Anna R Malacrida; Kostas Bourtzis; Serap Aksoy
Journal: PLoS Negl Trop Dis Date: 2014-04-24

Multivariate entropy distance method for prokaryotic gene identification.

1. Defining genes in the genome of the hyperthermophilic archaeon Pyrococcus furiosus: implications for all microbial genomes.

2. Identifying bacterial genes and endosymbiont DNA with Glimmer.

3. Exploration of multivariate analysis in microbial coding sequence modeling.

4. Rohlin distance and the evolution of influenza A virus: weak attractors and precursors.

5. Presence of extensive Wolbachia symbiont insertions discovered in the genome of its host Glossina morsitans morsitans.

6. PHANOTATE: a novel approach to gene identification in phage genomes.

7. Gene prediction in metagenomic fragments based on the SVM algorithm.

8. The Genome Reverse Compiler: an explorative annotation tool.

9. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.