Literature DB >> 19537162

Analysis of n-gram based promoter recognition methods and application to whole genome promoter prediction.

T Sobha Rani1, Raju S Bapi.   

Abstract

Promoter prediction is an important and complex problem. Pattern recognition algorithms typically require features that could capture this complexity. A special bias towards certain combinations of base pairs in the promoter sequences may be possible. In order to determine these biases n-grams are usually extracted and analyzed. An n-gram is a selection of n contiguous characters from a given character stream, DNA sequence segments in this case. Here a systematic study is made to discover the efficacy of n-grams for n = 2, 3, 4, 5 in promoter prediction. A study of n-grams as features for a neural network classifier for E. coli and Drosophila promoters is made. In case of E. coli n=3 and in case of Drosophila n=4 seem to give optimal prediction values. Using the 3-gram features, promoter prediction in the genome sequence of E. coli is done. The results are encouraging in positive identification of promoters in the genome compared to software packages such as BPROM, NNPP, and SAK. Whole genome promoter prediction in Drosophila genome was also performed but with 4-gram features.

Entities:  

Mesh:

Year:  2009        PMID: 19537162

Source DB:  PubMed          Journal:  In Silico Biol        ISSN: 1386-6338


  7 in total

1.  Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Authors:  Meng Zhang; Cangzhi Jia; Fuyi Li; Chen Li; Yan Zhu; Tatsuya Akutsu; Geoffrey I Webb; Quan Zou; Lachlan J M Coin; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

2.  N-gram analysis of 970 microbial organisms reveals presence of biological language models.

Authors:  Hatice Ulku Osmanbeyoglu; Madhavi K Ganapathiraju
Journal:  BMC Bioinformatics       Date:  2011-01-10       Impact factor: 3.169

3.  Recognition of prokaryotic promoters based on a novel variable-window Z-curve method.

Authors:  Kai Song
Journal:  Nucleic Acids Res       Date:  2011-09-27       Impact factor: 16.971

4.  Assessing the effects of data selection and representation on the development of reliable E. coli sigma 70 promoter region predictors.

Authors:  Mostafa M Abbas; Mostafa M Mohie-Eldin; Yasser El-Manzalawy
Journal:  PLoS One       Date:  2015-03-24       Impact factor: 3.240

5.  TSSPlant: a new tool for prediction of plant Pol II promoters.

Authors:  Ilham A Shahmuradov; Ramzan Kh Umarov; Victor V Solovyev
Journal:  Nucleic Acids Res       Date:  2017-05-05       Impact factor: 16.971

6.  Genome Sequence of the Pigment-Producing Bacterium Pseudogulbenkiania ferrooxidans, Isolated from Loktak Lake.

Authors:  Sampada Puranik; Reshma Talkal; Asifa Qureshi; Anshuman Khardenavis; Atya Kapley; Hemant J Purohit
Journal:  Genome Announc       Date:  2013-12-26

7.  bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli.

Authors:  Ilham Ayub Shahmuradov; Rozaimi Mohamad Razali; Salim Bougouffa; Aleksandar Radovanovic; Vladimir B Bajic
Journal:  Bioinformatics       Date:  2017-02-01       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.