Literature DB >> 19772433

A new data mining approach for the detection of bacterial promoters combining stochastic and combinatorial methods.

Catherine Eng1, Charu Asthana, Bertrand Aigle, Sébastien Hergalant, Jean-François Mari, Pierre Leblond.   

Abstract

We present a new data mining method based on stochastic analysis (Hidden Markov Model [HMM]) and combinatorial methods for discovering new transcriptional factors in bacterial genome sequences. Sigma factor binding sites (SFBSs) were described as patterns of box1-spacer-box2 corresponding to the -35 and -10 DNA motifs of bacterial promoters. We used a high-order HMM in which the hidden process is a second-order HMM chain. Applied on the genome of the model bacterium Streptomyces coelicolor A3(2), the a posteriori state probabilities revealed local maxima or peaks whose distribution was enriched in the intergenic sequences ("iPeaks" for intergenic peaks). Short DNA sequences underlying the iPeaks were extracted and clustered by a hierarchical classification algorithm based on the SmithWaterman local similarity. Some selected motif consensuses were used as box1 (-35 motif ) in the search of a potential neighbouring box2 (-10 motif ) using a word enumeration algorithm. This new SFBS mining methodology applied on Streptomyces coelicolor was successful to retrieve already known SFBSs and to suggest new potential transcriptional factor binding sites (TFBSs). The well-defined SigR regulon (oxidative stress response) was also used as a test quorum to compare first- and second-order HMM. Our approach also allowed the preliminary detection of known SFBSs in Bacillus subtilis.

Entities:  

Mesh:

Year:  2009        PMID: 19772433     DOI: 10.1089/cmb.2008.0122

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  2 in total

1.  Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana.

Authors:  Michael Seifert; André Gohr; Marc Strickert; Ivo Grosse
Journal:  PLoS Comput Biol       Date:  2012-01-12       Impact factor: 4.475

2.  Autoregressive higher-order hidden Markov models: exploiting local chromosomal dependencies in the analysis of tumor expression profiles.

Authors:  Michael Seifert; Khalil Abou-El-Ardat; Betty Friedrich; Barbara Klink; Andreas Deutsch
Journal:  PLoS One       Date:  2014-06-23       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.