Literature DB >> 21486936

MEME-ChIP: motif analysis of large DNA datasets.

Abstract

MOTIVATION: Advances in high-throughput sequencing have resulted in rapid growth in large, high-quality datasets including those arising from transcription factor (TF) ChIP-seq experiments. While there are many existing tools for discovering TF binding site motifs in such datasets, most web-based tools cannot directly process such large datasets.
RESULTS: The MEME-ChIP web service is designed to analyze ChIP-seq 'peak regions'--short genomic regions surrounding declared ChIP-seq 'peaks'. Given a set of genomic regions, it performs (i) ab initio motif discovery, (ii) motif enrichment analysis, (iii) motif visualization, (iv) binding affinity analysis and (v) motif identification. It runs two complementary motif discovery algorithms on the input data--MEME and DREME--and uses the motifs they discover in subsequent visualization, binding affinity and identification steps. MEME-ChIP also performs motif enrichment analysis using the AME algorithm, which can detect very low levels of enrichment of binding sites for TFs with known DNA-binding motifs. Importantly, unlike with the MEME web service, there is no restriction on the size or number of uploaded sequences, allowing very large ChIP-seq datasets to be analyzed. The analyses performed by MEME-ChIP provide the user with a varied view of the binding and regulatory activity of the ChIP-ed TF, as well as the possible involvement of other DNA-binding TFs. AVAILABILITY: MEME-ChIP is available as part of the MEME Suite at http://meme.nbcr.net.

Entities: Disease Gene

Mesh：

Substances：
Transcription Factors
DNA

Year: 2011 PMID： 21486936 PMCID： PMC3106185 DOI： 10.1093/bioinformatics/btr189

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

The genomic regions identified as bound by a transcription factor (TF) in a chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiment are a rich source of information about transcriptional regulation. These regions are defined by mapping the sequence tags to the genome, which identifies ‘peaks’ of (direct or indirect) binding by the ChIP-ed factor typically to a resolution of about 100 bp. This high resolution is of obvious utility for identifying which genes a TF regulates, but the genomic regions surrounding the peaks are typically highly enriched for binding sites of the ChIP-ed TF and other TFs. Hence, these regions can be mined computationally to understand the roles, interactions and functions of the ChIP-ed TF and its regulatory partners. We describe here a web service called MEME-ChIP that automatically performs five types of analysis on ChIP-seq regions. (i) Ab initio motif discovery identifies novel sequence patterns (motifs) in the ChIP-seq regions that may be due to TF binding sites. (ii) Motif enrichment analysis looks for enrichment of known TF DNA-binding motifs in the data. (iii) Motif visualization displays the relative locations and binding strengths of TF binding sites in the input regions. (iv) Motif binding strength analysis computes an estimate of the total DNA-binding affinity of each input region for the TF corresponding to each discovered motif. (v) Motif identification compares the ab initio motifs to known TF DNA-binding motifs. The output of MEME-ChIP is thus a multifaceted view of the identities, prevalence, DNA-binding patterns and potential interactions of the ChIP-ed TF and its regulatory partners. Ab initio motifs discovered in ChIP-seq data give an unbiased view of the in vivo DNA-binding propensities of TFs binding alone or in protein complexes. MEME-ChIP employs two motif discovery algorithms with complementary characteristics. The MEME (Bailey ) algorithm uses expectation maximization (EM) to discover probabilistic models of DNA-binding by single TFs or TF complexes. MEME motifs can provide accurate thermodynamic models of TF binding. MEME is complemented by DREME (Bailey, 2011), which uses a simpler, non-probabilistic model (regular expressions) to describe the short binding motifs characteristic of single eukaryotic TFs. DREME is often able to detect very short motifs that are not found by MEME. MEME-ChIP also attempts to identify the motifs found by MEME and DREME by comparing them to a database of known TF motifs using the TOMTOM (Gupta ) algorithm. Motif discovery thus identifies novel binding motifs and TFs that are regulatory partners of the ChIP-ed TF. Motif enrichment analysis can identify additional regulatory motifs whose enrichment in the ChIP-seq regions is too slight to be detected by ab initio motif discovery. It achieves higher sensitivity by limiting the search for motifs to a set of previously known TF DNA-binding motifs. MEME-ChIP uses the AME (McLeay and Bailey, 2010) algorithm for motif enrichment analysis. For motif visualization and binding strength analysis, MEME-ChIP utilizes the MAST (Bailey and Gribskov, 1998) and AMA (Buske ) algorithms, respectively. MAST uses a threshold-based approach to identify a putative set of non-overlapping binding sites in the ChIP-seq regions for all the motifs discovered by MEME (or DREME). This allows associations among the locations of the different motifs (TF binding sites) to be seen by eye. The AMA algorithm computes a thermodynamic estimate of the average binding affinity of the TF (as described by the motif) for the each ChIP-seq sequence region. MEME-ChIP complements other web-based ChIP-seq motif analysis tools. Like MEME-ChIP, the peak-motifs algorithm (Thomas-Chollier ) performs several analyses including a plot of the positional distribution of each motif. Trawler (Ettwiller ) performs motif discovery and (optionally) analyzes the conservation of predicted motif sites. Peak-motifs and Trawler both perform word-based motif discovery. MEME-ChIP does as well (via DREME) and complements this with MEME, a non-word-based approach. Unlike MEME-ChIP, peak-motifs and Trawler do not perform motif enrichment analysis or motif binding strength analysis.

2 IMPLEMENTATION

The MEME-ChIP web service simplifies the analysis of ChIP-seq data by executing a computational pipeline on a set of genomic regions uploaded by the user. The uploaded regions should be FASTA-formatted sequences of at least 100 bp in length, each centered on a ChIP-seq tag peak. Prior to motif discovery and motif enrichment analysis, MEME-ChIP centers and trims each sequence to 100 bp; the full-length sequences are used in the subsequent motif visualization step. All trimmed sequences are input to the DREME motif discovery algorithm, whereas, due to computational complexity, a maximum of 600 sequences (randomly selected from the input) are input to the MEME algorithm. MEME and DREME output novel motifs as position-specific probability matrices along with a wealth of other information about the motifs discovered. MEME-ChIP runs the AME motif enrichment algorithm on all of the trimmed sequences. AME computes and outputs the statistical enrichment in the sequences of matches to each motif in the JASPAR CORE database (Portales-Casamar ) of TF motifs. AMA computes and outputs the average binding affinity score for each motif MEME finds and for each input sequence. MEME-ChIP uses the MAST algorithm to visualize the locations of (putative) matches to each of the MEME and DREME motifs in the untrimmed input sequences. It also compares each of the MEME and DREME motifs to each of the motifs in the JASPAR CORE database to identify possible TFs binding to each motif.

3 EXAMPLE

To demonstrate the functionality of MEME-ChIP, we use it to analyze the ChIP-seq peak regions reported by Kassouf for SCL (also called Tal1), a key regulator of erythropoeisis. (Complete results are available at http://meme.nbcr.net/meme/doc/examples/memechip_example_output_files.) The two ab initio motif discovery algorithms (MEME and DREME) and motif enrichment analysis algorithm (AME) all identify a known SCL binding motif. In the case of MEME and AME, the most significant motif found is a composite motif believed to represent binding of a protein complex involving SCL and GATA-1, another transcription factor that plays a central role in erythropoeisis (Fig. 1, column 1, rows 1 and 3). The value of running two types of motif discovery algorithms is illustrated by the fact that although DREME does not discover this composite motif, it finds a better match to the canonical SCL binding motif (Fig. 1, column 2, row 2) than MEME does. Interestingly, DREME reports that the SCL motif is less significant in this ChIP-seq dataset than the canonical GATA-1 motif is (Fig. 1, column 1, row 2), suggesting that SCL binds more frequently in complex with GATA-1 than alone. This is supported by the fact that the motif enrichment analysis by AME also reports that the canonical SCL motif is less enriched than both the SCL-GATA-1 motif and the GATA-1 motif (data not shown). In all, AME reports that the SCL ChIP-seq regions are enriched for 15 known vertebrate motifs. DREME reports nine significant motifs, six of which match known vertebrate TF motifs, and three of which are novel. MEME finds three significant motifs, one of which is novel. Such novel motifs are possible candidates for further study, such as by using Gene Ontology enrichment analysis (Buske ) to predict their transcriptional roles. Additional details on the implementation and use of MEME-ChIP are given in the Supplementary materials.

Fig. 1.

Two most significant motifs found by the MEME, DREME and AME algorithms in the SCL ChIP-seq data. For MEME and DREME motifs, the motif Logo (bottom) is shown aligned with the most similar JASPAR motif Logo (top) if the similarity is significant (E ≤ 0.05). Funding: ARC Centre of Excellence in Bioinformatics (to P.M.); National Institutes of Health grant (R0-1 RR021692-05 to T.L.B.). Conflict of Interest: none declared.

10 in total

1. Genome-wide identification of TAL1's functional targets: insights into its mechanisms of action in primary erythroid cells.

Authors: Mira T Kassouf; Jim R Hughes; Stephen Taylor; Simon J McGowan; Shamit Soneji; Angela L Green; Paresh Vyas; Catherine Porcher
Journal: Genome Res Date: 2010-06-21 Impact factor: 9.043

2. Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation.

Authors: Laurence Ettwiller; Benedict Paten; Mirana Ramialison; Ewan Birney; Joachim Wittbrodt
Journal: Nat Methods Date: 2007-06-24 Impact factor: 28.547

3. Combining evidence using p-values: application to sequence homology searches.

Authors: T L Bailey; M Gribskov
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

4. Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data.

Authors: Robert C McLeay; Timothy L Bailey
Journal: BMC Bioinformatics Date: 2010-04-01 Impact factor: 3.169

5. Assigning roles to DNA regulatory motifs using comparative genomics.

Authors: Fabian A Buske; Mikael Bodén; Denis C Bauer; Timothy L Bailey
Journal: Bioinformatics Date: 2010-02-10 Impact factor: 6.937

6. DREME: motif discovery in transcription factor ChIP-seq data.

Authors: Timothy L Bailey
Journal: Bioinformatics Date: 2011-05-04 Impact factor: 6.937

7. MEME: discovering and analyzing DNA and protein sequence motifs.

Authors: Timothy L Bailey; Nadya Williams; Chris Misleh; Wilfred W Li
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

8. Quantifying similarity between motifs.

Authors: Shobhit Gupta; John A Stamatoyannopoulos; Timothy L Bailey; William Stafford Noble
Journal: Genome Biol Date: 2007 Impact factor: 13.583

9. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles.

Authors: Elodie Portales-Casamar; Supat Thongjuea; Andrew T Kwon; David Arenillas; Xiaobei Zhao; Eivind Valen; Dimas Yusuf; Boris Lenhard; Wyeth W Wasserman; Albin Sandelin
Journal: Nucleic Acids Res Date: 2009-11-11 Impact factor: 16.971

10. RSAT: regulatory sequence analysis tools.

Authors: Morgane Thomas-Chollier; Olivier Sand; Jean-Valéry Turatsinze; Rekin's Janky; Matthieu Defrance; Eric Vervisch; Sylvain Brohée; Jacques van Helden
Journal: Nucleic Acids Res Date: 2008-05-21 Impact factor: 16.971

10 in total

766 in total

1. ETV4 and AP1 Transcription Factors Form Multivalent Interactions with three Sites on the MED25 Activator-Interacting Domain.

Authors: Simon L Currie; Jedediah J Doane; Kathryn S Evans; Niraja Bhachech; Bethany J Madison; Desmond K W Lau; Lawrence P McIntosh; Jack J Skalicky; Kathleen A Clark; Barbara J Graves
Journal: J Mol Biol Date: 2017-07-17 Impact factor: 5.469

2. Alternative Sigma Factor RpoX Is a Part of the RpoE Regulon and Plays Distinct Roles in Stress Responses, Motility, Biofilm Formation, and Hemolytic Activities in the Marine Pathogen Vibrio alginolyticus.

Authors: Dan Gu; Jun Zhang; Yuan Hao; Rongjing Xu; Yuanxing Zhang; Yue Ma; Qiyao Wang
Journal: Appl Environ Microbiol Date: 2019-07-01 Impact factor: 4.792

3. Integrated cistromic and expression analysis of amplified NKX2-1 in lung adenocarcinoma identifies LMO3 as a functional transcriptional target.

Authors: Hideo Watanabe; Joshua M Francis; Michele S Woo; Banafsheh Etemad; Wenchu Lin; Daniel F Fries; Shouyong Peng; Eric L Snyder; Purushothama Rao Tata; Francesca Izzo; Anna C Schinzel; Jeonghee Cho; Peter S Hammerman; Roel G Verhaak; William C Hahn; Jayaraj Rajagopal; Tyler Jacks; Matthew Meyerson
Journal: Genes Dev Date: 2013-01-15 Impact factor: 11.361

4. PD-L1 (B7-H1) Competes with the RNA Exosome to Regulate the DNA Damage Response and Can Be Targeted to Sensitize to Radiation or Chemotherapy.

Authors: Xinyi Tu; Bo Qin; Yong Zhang; Cheng Zhang; Mohamed Kahila; Somaira Nowsheen; Ping Yin; Jian Yuan; Huadong Pei; Hu Li; Jia Yu; Zhiwang Song; Qin Zhou; Fei Zhao; Jiaqi Liu; Chao Zhang; Haidong Dong; Robert W Mutter; Zhenkun Lou
Journal: Mol Cell Date: 2019-04-30 Impact factor: 17.970

5. ZFX Mediates Non-canonical Oncogenic Functions of the Androgen Receptor Splice Variant 7 in Castrate-Resistant Prostate Cancer.

Authors: Ling Cai; Yi-Hsuan Tsai; Ping Wang; Jun Wang; Dongxu Li; Huitao Fan; Yilin Zhao; Rohan Bareja; Rui Lu; Elizabeth M Wilson; Andrea Sboner; Young E Whang; Deyou Zheng; Joel S Parker; H Shelton Earp; Gang Greg Wang
Journal: Mol Cell Date: 2018-09-27 Impact factor: 17.970

6. VDR regulation of microRNA differs across prostate cell models suggesting extremely flexible control of transcription.

Authors: Prashant K Singh; Mark D Long; Sebastiano Battaglia; Qiang Hu; Song Liu; Lara E Sucheston-Campbell; Moray J Campbell
Journal: Epigenetics Date: 2015-01-29 Impact factor: 4.528

7. Zc3h13 Regulates Nuclear RNA m⁶A Methylation and Mouse Embryonic Stem Cell Self-Renewal.

Authors: Jing Wen; Ruitu Lv; Honghui Ma; Hongjie Shen; Chenxi He; Jiahua Wang; Fangfang Jiao; Hang Liu; Pengyuan Yang; Li Tan; Fei Lan; Yujiang Geno Shi; Chuan He; Yang Shi; Jianbo Diao
Journal: Mol Cell Date: 2018-03-15 Impact factor: 17.970

8. Prediction of regulatory motifs from human Chip-sequencing data using a deep learning framework.

Authors: Jinyu Yang; Anjun Ma; Adam D Hoppe; Cankun Wang; Yang Li; Chi Zhang; Yan Wang; Bingqiang Liu; Qin Ma
Journal: Nucleic Acids Res Date: 2019-09-05 Impact factor: 16.971

9. The bHLH transcription factor HBI1 mediates the trade-off between growth and pathogen-associated molecular pattern-triggered immunity in Arabidopsis.

Authors: Min Fan; Ming-Yi Bai; Jung-Gun Kim; Tina Wang; Eunkyoo Oh; Lawrence Chen; Chan Ho Park; Seung-Hyun Son; Seong-Ki Kim; Mary Beth Mudgett; Zhi-Yong Wang
Journal: Plant Cell Date: 2014-02-18 Impact factor: 11.277

10. Collaborative regulation of development but independent control of metabolism by two epidermis-specific transcription factors in Caenorhabditis elegans.

Authors: Jiaofang Shao; Kan He; Hao Wang; Wing Sze Ho; Xiaoliang Ren; Xiaomeng An; Ming Kin Wong; Bin Yan; Dongying Xie; John Stamatoyannopoulos; Zhongying Zhao
Journal: J Biol Chem Date: 2013-10-06 Impact factor: 5.157