Literature DB >> 29145608

MeDReaders: a database for transcription factors that bind to methylated DNA.

Guohua Wang1,2, Ximei Luo1, Jianan Wang1, Jun Wan3, Shuli Xia4, Heng Zhu5, Jiang Qian2, Yadong Wang1.   

Abstract

Understanding the molecular principles governing interactions between transcription factors (TFs) and DNA targets is one of the main subjects for transcriptional regulation. Recently, emerging evidence demonstrated that some TFs could bind to DNA motifs containing highly methylated CpGs both in vitro and in vivo. Identification of such TFs and elucidation of their physiological roles now become an important stepping-stone toward understanding the mechanisms underlying the methylation-mediated biological processes, which have crucial implications for human disease and disease development. Hence, we constructed a database, named as MeDReaders, to collect information about methylated DNA binding activities. A total of 731 TFs, which could bind to methylated DNA sequences, were manually curated in human and mouse studies reported in the literature. In silico approaches were applied to predict methylated and unmethylated motifs of 292 TFs by integrating whole genome bisulfite sequencing (WGBS) and ChIP-Seq datasets in six human cell lines and one mouse cell line extracted from ENCODE and GEO database. MeDReaders database will provide a comprehensive resource for further studies and aid related experiment designs. The database implemented unified access for users to most TFs involved in such methylation-associated binding actives. The website is available at http://medreader.org/.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29145608      PMCID: PMC5753207          DOI: 10.1093/nar/gkx1096

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   19.160


INTRODUCTION

In the process of gene transcription cooperative interactions between transcription factors (TFs) and DNA methylation play an important role in regulating gene expression. The classical view of TF–DNA interaction is that TFs usually bind to non-methylated DNA motifs in open chromatin regions, whereas high level of methylation at CpG dinucleotides (mCpG) in the cis-regulatory elements prohibits recruitment of TFs, except only a few proteins with a mCpG-binding domain (MBD), including MeCP2, MBD1, MBD2 and MBD4. These MBD proteins are known to recognize methylated DNA in a sequence-independent manner (1,2). However, several TFs without MBDs were found to interact with methylated DNA in sporadic studies previously. For example, transcription factor KLF4 (3), Kaiso (4), ZFP57 (5) and CEBPα (6) were identified with high affinity to distinct methylated DNA sequences. More recently, systematic efforts have revealed that hundreds of TFs could specifically bind to methylated DNA by means of tandem mass spectrometry (7), functional protein microarray (3), DNA microarray (8), systematic evolution of ligands by exponential enrichment (SELEX) (9) and ChIP-BS-seq (10). Identification of such TFs and elucidation of their functions become important stepping stones towards understanding the mechanism underlying these methylation-mediated biological processes, leading to crucial implications for human diseases and cancer. Over the past 30 years, many databases have been constructed to archive information of TF binding sites, providing invaluable resources for the transcription community and beyond. For instance, TRANSFAC (11), JASPAR (12) and UniPROBE (13) are the most common open-access databases containing hundreds of transcription factor position weight matrices (PWMs) constructed from DNA binding sequences. The PWMs can help search and predict potential TF binding sites in the whole genome. Meanwhile, TF regulatory activity has been known as biological species-dependent. Hence, lots of species-specific TF databases were created, such as PlantTFDB for plant (14), AnimalTFDB for Animal (15) and ITFP for human, mouse and rat (16). Some databases such as TFBSshape (17) not only contain extensive nucleotide sequences of TFs, but also calculate DNA structural features from nucleotide sequences provided by motif databases. Unfortunately, none of these databases records methylated DNA binding sites for TFs. With the advance of next generation sequencing technologies, DNA methylation sites can be determined at the single base pair resolution. A number of systematical DNA methylation databases have been developed for epigenetic studies. As the first DNA methylation database, MethDB stores DNA methylation data and gene expression information (18). NGSMethDB archives DNA methylation profiles generated from bisulfite sequencing technique (19). MethBank (20), MethyCancer (21) and MENT (22) focus on DNA methylation status of some specific biological problems, such as embryonic development and multifarious cancers. MethSMRT hosts the DNA N6-methyladenine and N4-methylcytosine methylomes (23). ENCODE database also contains many datasets of Whole Genome Bisulfite Sequencing (WGBS) and ChIP-Seq datasets obtained from many cell lines. These databases provide us with a large amount of profiles including TFs binding sequences and corresponding DNA methylation status. However, none of the existing databases systematically documents the interactions between TFs and methylated DNA sequences. To fill this gap for the researchers to better understand the interactions between DNA methylation and TFs, we collected information about methylated DNA–TF interactions from two major public sources: published literatures and ENCODE database. We developed a database, dubbed as MeDReaders, where 753 methylated DNA–TF interactions involving 731 TFs were manually curated from the literature. A total of 292 TFs were predicted to bind to distinct methylated and unmethylated DNA motifs based on integration of WGBS data and ChIP-Seq data in six human cell lines and one mouse cell line extracted from ENCODE and GEO database. MeDReaders can help the scientists to compare methylated DNA binding activities between different species and datasets, and further understand the biological processes that are mediated by DNA methylation. The MeDReaders is publicly available at http://medreader.org/ without use restriction.

MATERIALS AND METHODS

Data sources

To extract experimentally confirmed methylated DNA–TF interactions from the published literatures, we first searched all relevant papers from the PubMed literature database. CEBPα (3,6), ZFP57/KAP1 (5,24), ZBTB33 (4), CEBPB/ATF4 (25) were found to interact with methylated DNA using EMSA or ChIP-BS-seq experiments. Hundreds of TFs were identified to prefer CpG-methylated sequences by high-throughput technology, such as Tandem mass spectrometry (MS/MS) (26,27), protein microarray (3), methylation-sensitive SELEX (9). In total we manually curated 753 methylated DNA–TF interactions involving 731 TFs from 4 human cell lines/tissues and 4 mouse cell lines/tissues (Table 1). However, the retrieved records are different due to diverse methods in individual experiments. For example, using SELEX in vitro, we only got TF binding motifs instead of binding sequences. But we obtained some protein binding DNA sequences from protein arrays, where methylated binding motif logos for only a few specific TFs can be retrieved.
Table 1.

Transcription factors summarized from published literatures

SpeciesNo. of TFsNo. of cells/tissues
Human6014
Mouse1304
Another way to access the interaction between TFs and methylated DNA sequences is to re-analyze the datasets from the ENCODE Consortium and NCBI GEO by focusing on the methylation levels of TF binding regions. We downloaded WGBS data for four human cell lines, ChIP-Seq data for six human and one mouse cell lines from the ENCODE, and WGBS data of ES-E14, IMR-90 and HCT116 cell lines and ChIP-Seq of ES-E14 cell from the GEO with accession numbers GSM1027571, GSM2210597, GSM1465024 and GSM699165 (Table 2). All datasets were re-processed using the ENCODE standard pipeline. In summary, Bismark (28) was used for the WGBS data analysis to align sequencing reads then call methylation levels, while the Irreproducible Discovery Rate method (29) was employed for ChIP-Seq data to call the TF binding peaks.
Table 2.

Transcription factors inferred by WGBS and ChIP-Seq datasets

SpeciesCell/tissueNo. of TFs
HumanGM1287844
HumanH1-hESC33
HumanHepG289
HumanHCT1165
HumanIMR-906
HumanK562110
MouseE145

Sequence motifs containing methylated sites

The same computational method described in our published paper (30) was adopted to predicted methylated and unmethylated motifs of TFs by integrating WGBS and ChIP-Seq data. DNA sequences within each ChIP-Seq peak were extracted and grouped based on their average methylation level. The MEME (31) algorithm was used to predict significantly enriched sequence motifs in each group. The predicted motif was then utilized to scan the ChIP-Seq peak region. We recorded the DNA segment with highest match score to the motif, while examining the methylation level on the CpG within the identified DNA segment. At last, the high and low methylation motifs were reconstructed according to the DNA methylation levels (cutoff 0.6) of CpG sites in the predicted TF binding segment. We introduced a new letter ‘E’ to represent highly methylated-C within TF binding sequences. Many interactions between TFs and methylated DNA were predicted by computational method, which provide the starting point for further in vivo characterization of TF binding patterns and high-resolution DNA methylation analyses.

Database implementation

The website was built using Spring boot framework. The database was organized by H2 database and queried through the Hibernate DAO layer. The web pages were constructed using HTML5 and rendered using Thymeleaf template. Jquery library was used with Semantic UI framework to provide a responsive user friendly front-end interface.

RESULTS

Usage and access

User-friendly web interface was developed to facilitate users to browse, search and download the methylated DNA–TF interactions data, and upload new experiemntially verified methylated DNA–TF interactions to the database. Once reviewed and approved by the managers of the database, the newly submitted data will be included in the database, and made available to the public in the coming release. The main functionality of MeDReaders is shown in Figure 1.
Figure 1.

Functionality of MeDReaders.

Functionality of MeDReaders.

Browsing the database

Data in MeDReaders knowledge base can be browsed by TF gene symbols. To browse the methylated DNA binding TFs data from two major sources, users first go into the ‘High-methyl(TFs)’ and ‘Methylome+CHIP-Seq’ pages, respectively. For example, if a user wants to know whether a human TF named ‘ATF6B’ is known to bind to methylated DNA in the literature, s/he can go to the ‘High-methyl(TFs)’ page and then select ‘human’ and ‘ATF6B(CREBL1)’. On this page, the basic information of the selected TF is shown, such as the genomic location, strand and Uniprot ID, Refseq Gene ID, Ensembl Gene ID, to name a few. Dependent on the experimental methods, some DNA motifs are provided with the raw binding sequences, but others not. When a user is interested in the methylated DNA binding TFs predicted with the in silico method via integrating WGBS and ChIP-Seq data, s/he can go to the ‘Methylome+CHIP-Seq’ page and then select a species, cell lines/tissues, and a TF-of-interest. For example, in searching a TF named ‘ATF2(CREB2)’ in human GM12878 cell line, ATF2’s motifs for methylated and unmethylated DNA binding sites will be shown on this page. Two examples on how to browse the database are shown in the Figure 2A and B. We also provide a useful link to visualize TF binding peaks with associated DNA methylation levels underneath by adding custom tacks in UCSC Genome Browser.
Figure 2.

Screenshot of how to browser MeDReaders. (A) Screenshot of browsing the records retrieved from published literatures. (B) Screenshot of browsing the methylated DNA–TF interactions predicted by integrating WGBS and ChIP-Seq data and visualizing the DNA methylation and TF binding sites by using UCSC Genome Browser.

Screenshot of how to browser MeDReaders. (A) Screenshot of browsing the records retrieved from published literatures. (B) Screenshot of browsing the methylated DNA–TF interactions predicted by integrating WGBS and ChIP-Seq data and visualizing the DNA methylation and TF binding sites by using UCSC Genome Browser.

Searching the database

The MeDReaders database provides a ‘Search’ page for users to search methylated DNA–TF interactions by TF names, Ensemble gene IDs, RefSeq gene IDs or binding DNA sequences. Users can obtain the TF basic information and the TF binding DNA motif and sequences. For example, if a user wants to query the ATF TF subfamily, they can select a species and type in ‘ATF’. As a result, all records about those TFs in the ATF subfamily collected from the two resources will be shown. An example on how to retrieve information about the ATF subfamily in humans is shown Figure 3.
Figure 3.

Screenshot of how to search the data.

Screenshot of how to search the data.

Submitting and downloading

It is our expectation that more interactions between TF and methylated DNA will be found in future systematic studies. To accommodate this demand, MeDReaders provides a submission page for users to upload new experimentally verified methylated DNA–TF interactions. After manual curation and computational analysis, the new information about methylated DNA binding TFs will be uploaded to our database. MeDReaders also provides a download page for users to download the profiles. Each predicted methylated-DNA binding TF file contained all peaks information and TF binding sites information, including CpG site loci, methylation levels, methylated read number and total read number in WGBS experiment.

DISCUSSION

MeDReaders is the first resource focusing on the interactions between methylated DNA and TFs. With more evidences to demonstrate the importance of methylated DNA binding TF binding activities in physiologically relevant contexts, we foresee that more researchers will be focusing on elucidating the biological consequences of the methylated DNA–TF binding activities in the near future. With the rapid accumulation of WGBS and ChIP-Seq experiments, more methylated DNA–TF interactions would be predicted in multiple model organisms. Researchers can take advantage of such information from this database for further epigenetic-associated TF regulation studies. People also can perform specific validation on targets of their interest based on our summarized predictions. Therefore, we will continue to expand MeDReaders database with the new publicly available datasets and keep improving the algorithms for deep mining. We believe that our database will become a valuable resource for methylated DNA binding TF community. In our previous study, we observed that many TFs bind to both methylated and unmethylated DNA, but the sequence of the methylated binding sites are often different to their canonical unmethylated sequences (3). These observations suggested that DNA methylation altered the binding specificity. Therefore, we considered these cases as methylation-dependent binding. On the other hand, Taipale and colleagues (9) reported that some TFs could bind to methylated and unmethylated DNA with the same binding sites. In such cases, the TF-DNA interactions are methylation-independent. The MeDReaders is likely to contain two types of interactions. Further experiments are required to distinguish the two situations. We are fully aware that superimposing the independent ChIP-seq and methylome data cannot prove that the TF binding and methylation events are from the same cells because both measurements are population-based. Ideally, one should perform ChIP followed by bisulfite-sequencing to confirm that a give TF indeed binds to the methylated DNA. In our previous publication, we tested some of methylated sites using this approach (3). However, since this approach does not perform well on a genomic scale, we are not able to find such genome-wide data. Nonetheless, we believe our ‘predicted’ methylated binding sites are valuable to the community because such data provide a starting point for the researchers to further investigate the methylated DNA–TF interactions. Furthermore, we let users set cutoff values for methylation level retrieved from the downloadable file to consider methylated binding sites. For example, if a user sets methylation level of 1.0 to be considered as a high methylation level, the TF ChIP-Seq sites will definitely co-occur with methylated sites in cells.
  29 in total

1.  TRANSFAC: an integrated system for gene expression regulation.

Authors:  E Wingender; X Chen; R Hehl; H Karas; I Liebich; V Matys; T Meinhardt; M Prüss; I Reuter; F Schacherer
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The p120 catenin partner Kaiso is a DNA methylation-dependent transcriptional repressor.

Authors:  A Prokhortchouk; B Hendrich; H Jørgensen; A Ruzov; M Wilm; G Georgiev; A Bird; E Prokhortchouk
Journal:  Genes Dev       Date:  2001-07-01       Impact factor: 11.361

Review 3.  The methyl-CpG binding domain and the evolving role of DNA methylation in animals.

Authors:  Brian Hendrich; Susan Tweedie
Journal:  Trends Genet       Date:  2003-05       Impact factor: 11.639

4.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Authors:  Albin Sandelin; Wynand Alkema; Pär Engström; Wyeth W Wasserman; Boris Lenhard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  CpG methylation of half-CRE sequences creates C/EBPalpha binding sites that activate some tissue-specific genes.

Authors:  Vikas Rishi; Paramita Bhattacharya; Raghunath Chatterjee; Julian Rozenberg; Jianfei Zhao; Kimberly Glass; Peter Fitzgerald; Charles Vinson
Journal:  Proc Natl Acad Sci U S A       Date:  2010-11-08       Impact factor: 11.205

Review 6.  Transcription factors as readers and effectors of DNA methylation.

Authors:  Heng Zhu; Guohua Wang; Jiang Qian
Journal:  Nat Rev Genet       Date:  2016-08-01       Impact factor: 53.242

7.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications.

Authors:  Felix Krueger; Simon R Andrews
Journal:  Bioinformatics       Date:  2011-04-14       Impact factor: 6.937

8.  In embryonic stem cells, ZFP57/KAP1 recognize a methylated hexanucleotide to affect chromatin and DNA methylation of imprinting control regions.

Authors:  Simon Quenneville; Gaetano Verde; Andrea Corsinotti; Adamandia Kapopoulou; Johan Jakobsson; Sandra Offner; Ilaria Baglivo; Paolo V Pedone; Giovanna Grimaldi; Andrea Riccio; Didier Trono
Journal:  Mol Cell       Date:  2011-11-04       Impact factor: 17.970

9.  MethSMRT: an integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing.

Authors:  Pohao Ye; Yizhao Luan; Kaining Chen; Yizhi Liu; Chuanle Xiao; Zhi Xie
Journal:  Nucleic Acids Res       Date:  2016-10-18       Impact factor: 16.971

10.  UniPROBE: an online database of protein binding microarray data on protein-DNA interactions.

Authors:  Daniel E Newburger; Martha L Bulyk
Journal:  Nucleic Acids Res       Date:  2008-10-08       Impact factor: 16.971

View more
  32 in total

1.  Spatiotemporal specificity of correlated DNA methylation and gene expression pairs across different human tissues and stages of brain development.

Authors:  Kangli Wang; Rujia Dai; Yan Xia; Jianghua Tian; Chuan Jiao; Tatiana Mikhailova; Chunling Zhang; Chao Chen; Chunyu Liu
Journal:  Epigenetics       Date:  2021-10-21       Impact factor: 4.861

2.  ImmuMethy, a database of DNA methylation plasticity at a single cytosine resolution in human blood and immune cells.

Authors:  Huiying Qi; Shibin Song; Pingzhang Wang
Journal:  Database (Oxford)       Date:  2022-04-01       Impact factor: 4.462

3.  TFBSshape: an expanded motif database for DNA shape features of transcription factor binding sites.

Authors:  Tsu-Pei Chiu; Beibei Xin; Nicholas Markarian; Yingfei Wang; Remo Rohs
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

4.  iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool.

Authors:  Xiao Yang; Xiucai Ye; Xuehong Li; Lesong Wei
Journal:  Front Genet       Date:  2021-03-31       Impact factor: 4.599

5.  MethReg: estimating the regulatory potential of DNA methylation in gene transcription.

Authors:  Tiago C Silva; Juan I Young; Eden R Martin; X Steven Chen; Lily Wang
Journal:  Nucleic Acids Res       Date:  2022-05-20       Impact factor: 19.160

6.  A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features.

Authors:  Changli Feng; Zhaogui Ma; Deyun Yang; Xin Li; Jun Zhang; Yanjuan Li
Journal:  Front Bioeng Biotechnol       Date:  2020-05-05

7.  Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs.

Authors:  Lifu Zhang; Benzhi Dong; Zhixia Teng; Ying Zhang; Liran Juan
Journal:  Biomed Res Int       Date:  2020-05-22       Impact factor: 3.411

8.  Contribution of DNA methylation to the expression of FCGRT in human liver and myocardium.

Authors:  R B Cejas; D C Ferguson; A Quiñones-Lombraña; J E Bard; J G Blanco
Journal:  Sci Rep       Date:  2019-06-17       Impact factor: 4.379

9.  Accurate identification of RNA D modification using multiple features.

Authors:  Lijun Dou; Wenyang Zhou; Lichao Zhang; Lei Xu; Ke Han
Journal:  RNA Biol       Date:  2021-03-17       Impact factor: 4.652

10.  4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.

Authors:  Rao Zeng; Song Cheng; Minghong Liao
Journal:  Front Cell Dev Biol       Date:  2021-05-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.