Literature DB >> 23461573

Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.

ZhiZhuo Zhang1, Cheng Wei Chang, Willy Hugo, Edwin Cheung, Wing-Kin Sung.   

Abstract

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.

Mesh:

Substances:

Year:  2013        PMID: 23461573     DOI: 10.1089/cmb.2012.0233

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  7 in total

1.  Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets.

Authors:  Jarkko Toivonen; Teemu Kivioja; Arttu Jolma; Yimeng Yin; Jussi Taipale; Esko Ukkonen
Journal:  Nucleic Acids Res       Date:  2018-05-04       Impact factor: 16.971

2.  The Brm-HDAC3-Erm repressor complex suppresses dedifferentiation in Drosophila type II neuroblast lineages.

Authors:  Chwee Tat Koe; Song Li; Fabrizio Rossi; Jack Jing Lin Wong; Yan Wang; Zhizhuo Zhang; Keng Chen; Sherry Shiying Aw; Helena E Richardson; Paul Robson; Wing-Kin Sung; Fengwei Yu; Cayetano Gonzalez; Hongyan Wang
Journal:  Elife       Date:  2014-03-11       Impact factor: 8.140

3.  A dynamic CTCF chromatin binding landscape promotes DNA hydroxymethylation and transcriptional induction of adipocyte differentiation.

Authors:  Julie Dubois-Chevalier; Frédérik Oger; Hélène Dehondt; François F Firmin; Céline Gheeraert; Bart Staels; Philippe Lefebvre; Jérôme Eeckhoute
Journal:  Nucleic Acids Res       Date:  2014-09-02       Impact factor: 16.971

4.  WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.

Authors:  Hongbo Zhang; Lin Zhu; De-Shuang Huang
Journal:  Sci Rep       Date:  2017-06-12       Impact factor: 4.379

5.  Genome-wide mapping and analysis of aryl hydrocarbon receptor (AHR)- and aryl hydrocarbon receptor repressor (AHRR)-binding sites in human breast cancer cells.

Authors:  Sunny Y Yang; Shaimaa Ahmed; Somisetty V Satheesh; Jason Matthews
Journal:  Arch Toxicol       Date:  2017-07-05       Impact factor: 5.153

6.  MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.

Authors:  Jarkko Toivonen; Pratyush K Das; Jussi Taipale; Esko Ukkonen
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

7.  Peroxisome proliferator-activated receptor γ regulates genes involved in insulin/insulin-like growth factor signaling and lipid metabolism during adipogenesis through functionally distinct enhancer classes.

Authors:  Frédérik Oger; Julie Dubois-Chevalier; Céline Gheeraert; Stéphane Avner; Emmanuelle Durand; Philippe Froguel; Gilles Salbert; Bart Staels; Philippe Lefebvre; Jérôme Eeckhoute
Journal:  J Biol Chem       Date:  2013-11-27       Impact factor: 5.157

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.