Vu Ngo1, Mengchi Wang1, Wei Wang1,2,3. 1. Graduate Program of Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, CA, USA. 2. Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA, USA. 3. Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, CA, USA.
Abstract
MOTIVATION: Increasing evidence has shown that nucleotide modifications such as methylation and hydroxymethylation on cytosine would greatly impact the binding of transcription factors (TFs). However, there is a lack of motif finding algorithms with the function to search for motifs with modified bases. In this study, we expand on our previous motif finding pipeline Epigram to provide systematic de novo motif discovery and performance evaluation on methylated DNA motifs. RESULTS: mEpigram outperforms both MEME and DREME on finding modified motifs in simulated data that mimics various motif enrichment scenarios. Furthermore we were able to identify methylated motifs in Arabidopsis DNA affinity purification sequencing (DAP-seq) data that were previously demonstrated to contain such motifs. When applied to TF ChIP-seq and DNA methylome data in H1 and GM12878, our method successfully identified novel methylated motifs that can be recognized by the TFs or their co-factors. We also observed spacing constraint between the canonical motif of the TF and the newly discovered methylated motifs, which suggests operative recognition of these cis-elements by collaborative proteins. AVAILABILITY AND IMPLEMENTATION: The mEpigram program is available at http://wanglab.ucsd.edu/star/mEpigram. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Increasing evidence has shown that nucleotide modifications such as methylation and hydroxymethylation on cytosine would greatly impact the binding of transcription factors (TFs). However, there is a lack of motif finding algorithms with the function to search for motifs with modified bases. In this study, we expand on our previous motif finding pipeline Epigram to provide systematic de novo motif discovery and performance evaluation on methylated DNA motifs. RESULTS: mEpigram outperforms both MEME and DREME on finding modified motifs in simulated data that mimics various motif enrichment scenarios. Furthermore we were able to identify methylated motifs in Arabidopsis DNA affinity purification sequencing (DAP-seq) data that were previously demonstrated to contain such motifs. When applied to TF ChIP-seq and DNA methylome data in H1 and GM12878, our method successfully identified novel methylated motifs that can be recognized by the TFs or their co-factors. We also observed spacing constraint between the canonical motif of the TF and the newly discovered methylated motifs, which suggests operative recognition of these cis-elements by collaborative proteins. AVAILABILITY AND IMPLEMENTATION: The mEpigram program is available at http://wanglab.ucsd.edu/star/mEpigram. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Sven Heinz; Christopher Benner; Nathanael Spann; Eric Bertolino; Yin C Lin; Peter Laslo; Jason X Cheng; Cornelis Murre; Harinder Singh; Christopher K Glass Journal: Mol Cell Date: 2010-05-28 Impact factor: 17.970
Authors: Stefanie J J Bartels; Cornelia G Spruijt; Arie B Brinkman; Pascal W T C Jansen; Michiel Vermeulen; Hendrik G Stunnenberg Journal: PLoS One Date: 2011-10-03 Impact factor: 3.240
Authors: Mengchi Wang; Kai Zhang; Vu Ngo; Chengyu Liu; Shicai Fan; John W Whitaker; Yue Chen; Rizi Ai; Zhao Chen; Jun Wang; Lina Zheng; Wei Wang Journal: Nucleic Acids Res Date: 2019-07-26 Impact factor: 16.971
Authors: Xin Hu; Marcos R Estecio; Runzhe Chen; Alexandre Reuben; Linghua Wang; Junya Fujimoto; Jian Carrot-Zhang; Nicholas McGranahan; Lisha Ying; Junya Fukuoka; Chi-Wan Chow; Hoa H N Pham; Myrna C B Godoy; Brett W Carter; Carmen Behrens; Jianhua Zhang; Mara B Antonoff; Boris Sepesi; Yue Lu; Harvey I Pass; Humam Kadara; Paul Scheet; Ara A Vaporciyan; John V Heymach; Ignacio I Wistuba; J Jack Lee; P Andrew Futreal; Dan Su; Jean-Pierre J Issa; Jianjun Zhang Journal: Nat Commun Date: 2021-01-29 Impact factor: 14.919