Literature DB >> 19644166

A Monte Carlo EM algorithm for de novo motif discovery in biomolecular sequences.

Chengpeng Bi1.   

Abstract

Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19644166     DOI: 10.1109/TCBB.2008.103

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  6 in total

1.  MCOIN: a novel heuristic for determining transcription factor binding site motif width.

Authors:  Alastair M Kilpatrick; Bruce Ward; Stuart Aitken
Journal:  Algorithms Mol Biol       Date:  2013-06-27       Impact factor: 1.405

Review 2.  A Review on Planted (l, d) Motif Discovery Algorithms for Medical Diagnose.

Authors:  Satarupa Mohanty; Prasant Kumar Pattnaik; Ahmed Abdulhakim Al-Absi; Dae-Ki Kang
Journal:  Sensors (Basel)       Date:  2022-02-05       Impact factor: 3.576

3.  PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.

Authors:  Qiang Yu; Hongwei Huo; Yipu Zhang; Hongzhi Guo
Journal:  PLoS One       Date:  2012-10-31       Impact factor: 3.240

4.  An Affinity Propagation-Based DNA Motif Discovery Algorithm.

Authors:  Chunxiao Sun; Hongwei Huo; Qiang Yu; Haitao Guo; Zhigang Sun
Journal:  Biomed Res Int       Date:  2015-08-10       Impact factor: 3.411

5.  Stochastic EM-based TFBS motif discovery with MITSU.

Authors:  Alastair M Kilpatrick; Bruce Ward; Stuart Aitken
Journal:  Bioinformatics       Date:  2014-06-15       Impact factor: 6.937

Review 6.  Review of Different Sequence Motif Finding Algorithms.

Authors:  Fatma A Hashim; Mai S Mabrouk; Walid Al-Atabany
Journal:  Avicenna J Med Biotechnol       Date:  2019 Apr-Jun
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.