Literature DB >> 22135302

DiseaseMeth: a human disease methylation database.

Jie Lv1, Hongbo Liu, Jianzhong Su, Xueting Wu, Hui Liu, Boyan Li, Xue Xiao, Fang Wang, Qiong Wu, Yan Zhang.   

Abstract

DNA methylation is an important epigenetic modification for genomic regulation in higher organisms that plays a crucial role in the initiation and progression of diseases. The integration and mining of DNA methylation data by methylation-specific PCR and genome-wide profiling technology could greatly assist the discovery of novel candidate disease biomarkers. However, this is difficult without a comprehensive DNA methylation repository of human diseases. Therefore, we have developed DiseaseMeth, a human disease methylation database (http://bioinfo.hrbmu.edu.cn/diseasemeth). Its focus is the efficient storage and statistical analysis of DNA methylation data sets from various diseases. Experimental information from over 14,000 entries and 175 high-throughput data sets from a wide number of sources have been collected and incorporated into DiseaseMeth. The latest release incorporates the gene-centric methylation data of 72 human diseases from a variety of technologies and platforms. To facilitate data extraction, DiseaseMeth supports multiple search options such as gene ID and disease name. DiseaseMeth provides integrated gene methylation data based on cross-data set analysis for disease and normal samples. These can be used for in-depth identification of differentially methylated genes and the investigation of gene-disease relationship.

Entities:  

Mesh:

Year:  2011        PMID: 22135302      PMCID: PMC3245164          DOI: 10.1093/nar/gkr1169

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

DNA methylation is one of the enzymatic modifications in mammalian genomes (1,2). The methylated cytosines are almost exclusively in a CpG dinucleotide sequence. DNA methyltransferases (DNMTs) are the main enzymes that catalyze CpG methylation. DNA methyltransferase 1 (DNMT1) is responsible for the post-replicative copying of preexisting CpG methylation patterns, while DNMT3A and DNMT3B are responsible for de novo DNA methylation (3). Previous studies suggested that DNA methylation abnormality is one of the most frequent epigenetic events in human diseases, and DNA methylation patterns in disease tissue are different from those in their normal counterparts (4,5). Aberrant hypomethylation may lead to genome instability, transcriptional activation of oncogenes, loss of imprinting, while hypermethylation in local regions may be related to the selective advantage for cancer cells (6–8). Because of the important roles of promoter methylation in functional regulation, DNA methylation has been studied extensively in diseases such as neurodevelopmental disorders, neurodegenerative and neurological diseases, autoimmune diseases and cancers. Some important cases have been reported, they include neprilysin (NEP) in Alzheimer’s disease, frataxin (FXN) in Friedreich's ataxia (9), survival of motor neuron (SMN2) in spinal muscular atrophy (9), methylguanine-DNA methyltransferase (MGMT) in colorectal cancer (5,10), prolactin receptor (PRLR) in breast cancer (11), methyl CpG binding protein (MeCP2) in Rett syndrome (12) and imprinting controlled region at 15q11–q13 in Prader–Willi and Angelman syndromes (13). Locus-specific approaches, like methylation specific PCR, pyrosequencing and bisulfite sequencing, were widely used in laboratories. Recently, high-throughput approaches based on array and next-generation sequencing for genome-wide analysis have been favored (14). A collection of the disease methylation data produced by these techniques will be useful and could be used to explore the potential methylation markers/phenotypes from whole methylomes in human disease states. In general, there are two types of methylation databases, experimental evidence databases and large-scale databases. The earlier methylation databases include MethDB (15), MethPrimerDB (16), MethCancerDB (17), PubMeth (18) and MeInfoText (19). MethDB holds information about the occurrence of methylated cytosines in the DNA assay information obtained from multiorganism CpG methylation analysis. MethPrimerDB stores confirmed primer and experimental information on four PCR methods for CpG methylation analysis. MethCancerDB provides documentation of pre-existing information regarding DNA methylation in various cancers that includes study size, type of cancer and method used. PubMeth and MeInfoText are based on text-mining of Medline/PubMed abstracts to extract information on methylation in cancer. In contrast to these databases, there are two large-scale methylation databases MethyCancer (20) and NGSmethDB (21). MethyCancer hosts large-scale methylation reference data, cancer-related genes and cancer information from public data sources. NGSmethDB hosts several sequence-based reference methylation data sets that can be used to gain gene specific and differential methylation information. However, there is a need for a database that can integrate the dispersed data and provide a convenient way for in-depth data mining. To this end, we developed DiseaseMeth to combine experimental methylation information from loci-specific technologies with inferred gene-centric methylation states from methylation profiling technology. Various laboratories have profiled the methylomes of a few human diseases [breast cancer (22,23) and leukemia (24,25)], producing data that could be integrated to gain further knowledge. It should be possible to identify differentially methylated genes in diseases by integrating data sets of the same disease. Cross-data set analysis for specific diseases is useful because it is difficult and costly for experimental biologists to discover potentially novel genes/regions in diseases. Furthermore, results are often contradictory and need further confirmation. To tackle these challenges, DiseaseMeth was developed to store and mine data efficiently through a user-friendly extraction interface. The current release of DiseaseMeth incorporates 72 disease types. Moreover, DiseaseMeth stores many reference methylation data sets derived from normal tissues/cells that can be used to identify aberrantly methylated genes, and genomic data such as CpG islands, histone modifications and annotated genes. In addition, DiseaseMeth provides: (i) search options that can be used to statistically identify gene-centric methylation differences, extract detailed information of differentially methylated genes in disease compared with normal tissue, and calculate the significance of a specific differentially methylated gene; (ii) tools to calculate the correlation of DNA methylation with the pairwise relationships of gene–gene, gene–disease and disease–disease, which could help in the discovery of disease-specific and disease-consistent genes/markers; and (iii) a genome methylation browser and customized views that display gene-centric disease methylation information combined with genomic information on the genomic scale. In brief, DiseaseMeth hosts comprehensive disease methylation data and provides tools to explore the relationships between diseases and DNA methylation.

OVERVIEW AND DATA PROCESSING

DiseaseMeth includes literature-based experimental information and large-scale methylation data. Over 14 000 entries for experimental information were collected by text mining from more than 25 000 published paper of DNA methylation research in PubMed. DiseaseMeth also holds 175 large-scale methylation data sets for 50 diseases, which were primarily collected from various websites and institutes, such as ArrayExpress, Gene Expression Omnibus (GEO), UCSC. Detailed and updated statistics of the number of disease types is shown on the DiseaseMeth homepage. A summary of the publications sourced and the links to the sites from which the data were downloaded is maintained and updated on the download page. The download page lists detailed information about the data sets including name/ID, disease, data analysis, publication link, experimental platform, sample size and download link. For raw high-throughput data, only information about methylation within the promoter region of RefSeq genes was kept for analysis. We defined the promoter region as the region from 1.5 kb upstream of the transcription start site (TSS) to 500 bp downstream of the TSS of RefSeq genes. For data from assemblies other than the UCSC March 2006 human reference sequence (hg18, NCBI build 36.1), we used the LiftOver tool from UCSC (26) to convert the coordinates from other assemblies to hg18. Normalized and standardized data (0–100%) was used directly. Unnormalized data was transformed according to the common procedures. All normalized methylation data were subsequently transformed into a consistent interval [0,100] by percentile normalization reflecting relevant methylation levels before being finally stored. All data available for download is stored in GFF (General Feature Format) files. The basic operations in the DiseaseMeth database are search, view, download and analyze (Figure 1). A flexible search engine based on a MySQL backend is provided to allow user-friendly data mining and downloading. The methylation information in DiseaseMeth with a few other annotations can be viewed using the visualization module based on the Perl Bio::Graphics package.
Figure 1.

Overview of structure and workflow of DiseaseMeth. Users can input Refseq ID, Gene symbol or genomic position to the query engine to gain the methylation pattern of these regions in different samples. The terms imported by users would be transformed into the genomic coordinates which are further used to search the relational database of DiseaseMeth. Users also can restrict the disease type. The query results can be viewed in the gene-centric result viewer, and downloaded as flat format. The relationship analysis of module is provided for users to investigate the relationships among genes and diseases.

Overview of structure and workflow of DiseaseMeth. Users can input Refseq ID, Gene symbol or genomic position to the query engine to gain the methylation pattern of these regions in different samples. The terms imported by users would be transformed into the genomic coordinates which are further used to search the relational database of DiseaseMeth. Users also can restrict the disease type. The query results can be viewed in the gene-centric result viewer, and downloaded as flat format. The relationship analysis of module is provided for users to investigate the relationships among genes and diseases.

DATABASE USAGE AND ACCESS

Using the search tool to retrieving the methylation states of promoters

All methylation states for a given chromosomal region can be retrieved, when the start and end chromosome coordinates are provided. The data for a selected disease, tissue, cell line, technology and gene ID can also be retrieved. The more query parameters that are provided the narrower the range of the entries that are retrieved. A valid RefSeq ID (NM*) or an official gene symbol is needed to obtain the gene-centric methylation information. The output is displayed by default as an overview table that summarizes the methylation states of the genes and gives disease and other information. The table contains links to generate gene-centric methylation information panels for disease and normal samples based on the specified search parameters. As an example, we analyzed the promoter region of the gene RASSF1 (Figure 2). The results show that the promoter of the transcript is differentially methylated between the disease and control states. To facilitate viewing genes and other relevant information, gene-specific links to other resources such as the GeneCards database are displayed in the overview panel and in the gene-centric information table. The full results of a query can be downloaded from a link in the overview panel.
Figure 2.

A screen shot of the DiseaseMeth search results for the gene RASSF1. The default view generated by the search tool is shown. Clicking the ‘Fetch gene-centric information of all genes’ button in the toolbar displays the gene-centric results, where the gene ID, gene Name, methylation level (from 0% to 100%), the number of relevant data in the database, and the significance of the methylation difference between disease and normal data sets for the genes are shown. In addition, the relevant reference links are also included in the overview panel. Concurrent searching of multiple genes is supported. In the gene-centric panel, a link (Visualization) is available to display the epigenomic data in the genomic context. There is also a ‘Visualize a selected gene’ button in the toolbar in the default view that does the same task. The whole of the search results can be downloaded by clicking the ‘Download all’ button in the toolbar.

A screen shot of the DiseaseMeth search results for the gene RASSF1. The default view generated by the search tool is shown. Clicking the ‘Fetch gene-centric information of all genes’ button in the toolbar displays the gene-centric results, where the gene ID, gene Name, methylation level (from 0% to 100%), the number of relevant data in the database, and the significance of the methylation difference between disease and normal data sets for the genes are shown. In addition, the relevant reference links are also included in the overview panel. Concurrent searching of multiple genes is supported. In the gene-centric panel, a link (Visualization) is available to display the epigenomic data in the genomic context. There is also a ‘Visualize a selected gene’ button in the toolbar in the default view that does the same task. The whole of the search results can be downloaded by clicking the ‘Download all’ button in the toolbar.

Genomic methylation viewer

DiseaseMeth includes a user-friendly and configurable genome browser through which multiple genomic and epigenomic resources can be visualized simultaneously (Figure 2). The genomic methylation viewer connecting to a MySQL backend is used to show the methylation landscape for disease methylation information, other genomic annotations [GC content, genes and CpG islands (27)] and epigenomic information (methylation data sets and histone modifications). Features of the viewer include the ability to zoom through the given regions, to enter a region by specifying the genomic coordinates, to change the order of tracks and to show and hide certain feature tracks and configure the appearance of the displayed information. Currently, a few epigenomic tracks can be configured in the viewer: (i) HAIB RRBS tracks from ENCODE/HudsonAlpha tracks; (ii) RRBS tracks from BI Human Reference Epigenome Mapping Project; (iii) MeDIP-chip; (iv) MeDIP-seq; and (v) Histone modification. In the viewer, the methylation values do not indicate strand-specific information. A color gradient from white (methylation value = 0) to red (methylation value = 100) is used to display the numeric methylation states of the cytosines/regions. The browser can also link out to other epigenomic databases such as MethyCancer (20) and HHMD (28).

Analysis tools to explore the relationships between genes and diseases

In disease, a few genes, including tumor suppressor genes, are silenced by promoter CpG island methylation (13). However, only a subset of colorectal cancers has been documented as exhibiting promoter methylation and is referred to as the CpG island methylator phenotype (CIMP) (29). In addition, gene–gene and disease–disease relationships have been characterized using available gene expression data (30). Because the availability of DNA methylation data is low, it is inefficient to analyze the relationships of genes and diseases using DNA methylation. However, with the data stored in DiseaseMeth, it is easy to determine a preliminary profile of the relationship between genes and diseases using the quantitative tools that have been developed: (i) disease–gene relationship analysis tool; (ii) disease–disease relationship analysis tool; and (iii) gene–gene relationship analysis tool. One of the merits of these tools is that they are highly customizable for analyzing given regions, diseases and so on; thus, facilitating specific analysis focusing on continuous regions such as imprinting clusters (31). The three analysis tools are available for download.

SYSTEM DESIGN AND IMPLEMENTATION

DiseaseMeth consists of three major software components: an Apache HTTP server, a MySQL database and a Perl installation using the Bio::Perl, Bio::Graphics and DBI packages. The backend data analysis programs were written in Perl and deployed as CGI programs. The Perl programs for the analysis tools are available on the website.

FUTURE DEVELOPMENT

To build a DNA methylation database focusing on human diseases, continued efforts will be made to update the DiseaseMeth data and improve the genomic methylation viewer and database functionality. As the DNA methylation data becomes available, we will continuously collect the latest disease data sets to keep DiseaseMeth up-to-date. Because of the usefulness of reference methylation maps in human DNA methylation analysis, we will include more methylation maps of normal cell lines/tissues in DiseaseMeth to help with comparative studies of disease-specific from normal methylomes. We will develop new data processing algorithms to handle the large-scale nature of DNA methylation sequencing data. Because of the importance of integrative analysis, we will regularly collect data from new sources to enhance the analytical depths of DiseaseMeth. We will also encourage new data to be submitted directly to DiseaseMeth to keep DiseaseMeth updated and to make it comprehensive. The genomic methylation viewer will be improved to display more (epi)genomic resources and will be extended to include more configurable functionalities. Finally, novel analysis tools will be developed to provide better integration and to enhance the data mining capabilities. As a resource to study the potential regulatory function of DNA methylation, DiseaseMeth can be extended to include more data sets and tools can be developed for the identification of disease-related DNA methylation markers for candidate genes using an integrated differential methylation identification algorithm (32). We expect that the continuous efforts to use and improve DiseaseMeth will contribute to our understanding of DNA methylation driven human diseases.

FUNDING

Funding for open access charge: National Natural Science Foundation of China (61075023 and 30971645); Natural Science Foundation of Heilongjiang Province (C201012); State Key Laboratory of Urban Water Resource and Environment (2010TS05) and Scientific Research Fund of Heilongjiang Provincial Education Department (12511272). Conflict of interest statement. None declared.
  32 in total

Review 1.  Chromatin modifications and their function.

Authors:  Tony Kouzarides
Journal:  Cell       Date:  2007-02-23       Impact factor: 41.582

Review 2.  Phenotypic plasticity and the epigenetics of human disease.

Authors:  Andrew P Feinberg
Journal:  Nature       Date:  2007-05-24       Impact factor: 49.962

3.  MGMT promoter methylation and field defect in sporadic colorectal cancer.

Authors:  Lanlan Shen; Yutaka Kondo; Gary L Rosner; Lianchun Xiao; Natalie Supunpong Hernandez; Jill Vilaythong; P Scott Houlihan; Robert S Krouse; Anil R Prasad; Janine G Einspahr; Julie Buckmeier; David S Alberts; Stanley R Hamilton; Jean-Pierre J Issa
Journal:  J Natl Cancer Inst       Date:  2005-09-21       Impact factor: 13.506

4.  Differential DNA methylation patterns of small B-cell lymphoma subclasses with different clinical behavior.

Authors:  F B Rahmatpanah; S Carstens; J Guo; O Sjahputera; K H Taylor; D Duff; H Shi; J W Davis; S I Hooshmand; R Chitma-Matsiga; C W Caldwell
Journal:  Leukemia       Date:  2006-08-10       Impact factor: 11.528

Review 5.  Cancer epigenomics: DNA methylomes and histone-modification maps.

Authors:  Manel Esteller
Journal:  Nat Rev Genet       Date:  2007-03-06       Impact factor: 53.242

Review 6.  Genome-epigenome interactions in cancer.

Authors:  Romulo M Brena; Joseph F Costello
Journal:  Hum Mol Genet       Date:  2007-04-15       Impact factor: 6.150

7.  MeInfoText: associated gene methylation and cancer information from text mining.

Authors:  Yu-Ching Fang; Hsuan-Cheng Huang; Hsueh-Fen Juan
Journal:  BMC Bioinformatics       Date:  2008-01-14       Impact factor: 3.169

8.  MethCancerDB--aberrant DNA methylation in human cancer.

Authors:  M Lauss; I Visne; A Weinhaeusel; K Vierlinger; C Noehammer; A Kriegner
Journal:  Br J Cancer       Date:  2008-02-05       Impact factor: 7.640

9.  PubMeth: a cancer methylation database combining text-mining and expert annotation.

Authors:  Maté Ongenaert; Leander Van Neste; Tim De Meyer; Gerben Menschaert; Sofie Bekaert; Wim Van Criekinge
Journal:  Nucleic Acids Res       Date:  2007-10-11       Impact factor: 16.971

10.  MethyCancer: the database of human DNA methylation and cancer.

Authors:  Ximiao He; Suhua Chang; Jiajie Zhang; Qian Zhao; Haizhen Xiang; Kanthida Kusonmano; Liu Yang; Zhong Sheng Sun; Huanming Yang; Jing Wang
Journal:  Nucleic Acids Res       Date:  2007-09-21       Impact factor: 16.971

View more
  46 in total

1.  EPITRANS: a database that integrates epigenome and transcriptome data.

Authors:  Soo Young Cho; Jin Choul Chai; Soo Jun Park; Hyemyung Seo; Chae-Bong Sohn; Young Seek Lee
Journal:  Mol Cells       Date:  2013-11-08       Impact factor: 5.034

2.  METHCOMP: a special purpose compression platform for DNA methylation data.

Authors:  Jianhao Peng; Olgica Milenkovic; Idoia Ochoa
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

3.  Meta-Analysis for Epigenome-Wide Association Studies.

Authors:  Nan Wang; Shuilin Jin
Journal:  Methods Mol Biol       Date:  2022

4.  KRAS Promoter Methylation Status and miR-18a-3p and miR-143 Expression in Patients With Wild-type KRAS Gene in Colorectal Cancer.

Authors:  Jehison Alirio Herrera-Pulido; Orlando Ricaurte Guerrero; Jinneth Acosta Forero; Pablo Moreno-Acosta; Alfredo Romero-Rojas; Carolina Sanabria; Gustavo Hernández; Martha Lucía Serrano
Journal:  Cancer Diagn Progn       Date:  2022-09-03

5.  Structural insights into methylated DNA recognition by the C-terminal zinc fingers of the DNA reader protein ZBTB38.

Authors:  Nicholas O Hudson; Frank G Whitby; Bethany A Buck-Koehntop
Journal:  J Biol Chem       Date:  2018-10-24       Impact factor: 5.157

6.  Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes.

Authors:  Hongbo Liu; Xiaojuan Liu; Shumei Zhang; Jie Lv; Song Li; Shipeng Shang; Shanshan Jia; Yanjun Wei; Fang Wang; Jianzhong Su; Qiong Wu; Yan Zhang
Journal:  Nucleic Acids Res       Date:  2015-12-03       Impact factor: 16.971

7.  PD_NGSAtlas: a reference database combining next-generation sequencing epigenomic and transcriptomic data for psychiatric disorders.

Authors:  Zheng Zhao; Yongsheng Li; Hong Chen; Jianping Lu; Peter M Thompson; Juan Chen; Zishan Wang; Juan Xu; Chun Xu; Xia Li
Journal:  BMC Med Genomics       Date:  2014-12-31       Impact factor: 3.063

8.  MethHC: a database of DNA methylation and gene expression in human cancer.

Authors:  Wei-Yun Huang; Sheng-Da Hsu; Hsi-Yuan Huang; Yi-Ming Sun; Chih-Hung Chou; Shun-Long Weng; Hsien-Da Huang
Journal:  Nucleic Acids Res       Date:  2014-11-14       Impact factor: 16.971

9.  ALKBH1-8 and FTO: Potential Therapeutic Targets and Prognostic Biomarkers in Lung Adenocarcinoma Pathogenesis.

Authors:  Geting Wu; Yuanliang Yan; Yuan Cai; Bi Peng; Juanni Li; Jinzhou Huang; Zhijie Xu; Jianhua Zhou
Journal:  Front Cell Dev Biol       Date:  2021-06-03

10.  TSGene: a web resource for tumor suppressor genes.

Authors:  Min Zhao; Jingchun Sun; Zhongming Zhao
Journal:  Nucleic Acids Res       Date:  2012-10-12       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.