Literature DB >> 20965971

NGSmethDB: a database for next-generation sequencing single-cytosine-resolution DNA methylation data.

Michael Hackenberg1, Guillermo Barturen, José L Oliver.   

Abstract

Next-generation sequencing (NGS) together with bisulphite conversion allows the generation of whole genome methylation maps at single-cytosine resolution. This allows studying the absence of methylation in a particular genome region over a range of tissues, the differential tissue methylation or the changes occurring along pathological conditions. However, no database exists fully addressing such requirements. We propose here NGSmethDB (http://bioinfo2.ugr.es/NGSmethDB/gbrowse/) for the storage and retrieval of methylation data derived from NGS. Two cytosine methylation contexts (CpG and CAG/CTG) are considered. Through a browser interface coupled to a MySQL backend and several data mining tools, the user can search for methylation states in a set of tissues, retrieve methylation values for a set of tissues in a given chromosomal region, or display the methylation of promoters among different tissues. NGSmethDB is currently populated with human, mouse and Arabidopsis data, but other methylomes will be incorporated through an automatic pipeline as soon as new data become available. Dump downloads for three coverage levels (1, 5 or 10 reads) are available. NGSmethDB will be useful for experimental researchers, as well as for bioinformaticians, who might use the data as input for further research.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20965971      PMCID: PMC3013793          DOI: 10.1093/nar/gkq942

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

DNA methylation is a common epigenetic mark that can be found in eukaryotes exclusively at cytosine residues (5meC). This modification has important roles in embryonic development, as shown by early lethality in mice that lack DNA methyltransferases (DNMTs), the inactivation of the X chromosome in female cells or the establishment and maintenance of allele-specific expression of imprinted genes (1–3). Numerous studies over the past decades suggest that cytosine DNA methylation functions to maintain the repressed chromatin state and therefore stably silence promoter activity (4). In animal genomes, the predominantly methylated sequence context is the dinucleotide CpG, while non-CpG methylation exists in plants that is targeted to transposable elements by a mechanism that depends upon small interfering RNAs (5). Recently, methylation at sequence contexts CHH and CHG has been detected in human undifferentiated cells (6). Many different techniques have been developed for DNA methylation profiling (7,8). The detection methods can be divided into a methylation-dependent pretreatment and an analytical step. The first step is necessary as 5meC is not readily distinguished from unmethylated cytosine by hybridization-based methods and PCR amplification erases the DNA methylation information. Basically, three different pretreatments can be distinguished: enzyme digestion, affinity enrichment (immunoprecipiation) and sodium bisulphite conversion. The information on the DNA methylation is finally read out by a gel-based, array-based or sequencing-based analysis. Virtually, all combinations of these two steps exist. Depending on the specific combination used, we can distinguish between ‘single cytosine’ and ‘region wide’ profiling of methylation states. The region wide methods detect normally the methylation states of known CpG islands or unmethylated fragments using either enzyme digestion or immunoprecipitation. There are several drawbacks with these methods. Apart from the errors introduced by the methylation-dependent pretreatment, only ‘mean values’ of the regions can be detected. Although for many experiments it might be sufficient to get information whether a given region is methylated or unmethylated, for others it will be not. For example, recently it has been shown that many CpG islands show internal fluctuations that can be resolved by means of single-cytosine resolution analysis (9,10). Furthermore, single-cytosine resolution data can be critical to resolve the methylation states, and the possible functionality, of very small islands (islets) or even orphan CpG dinucleotides (10,11). However, to completely exploit the full potential of single-base resolution whole genome methylation maps, a specifically designed database is needed. Given the lack of single base data in the past, current databases are only focused either on specific regions and/or on pathologic situations (12,13). In the next years, however, whole genome methylation data will become available for many new tissues, pathological conditions and species and it will be of critical importance to store and unify this information in an adequate way. We therefore propose here NGSmethDB, a database for single-cytosine resolution methylation data. The database uses a web interface based on GBrowse (14) and coupled to a MySQL backend, which allows to visualize the methylation data in a genomic context together with many other annotations, as well as full data downloads. In addition, a set of powerful data mining tools are also implemented, so the user can filter, analyze and retrieve data in many different ways. For example, the user can search for unmethylated or differentially methylated cytosines in a selected set of tissues, or display and analyze the promoter methylation of RefSeq genes. Finally, the database extends the commonly used focus on CpG dinucleotides to the recently discovered non-CpG targets for DNA methylation in undifferentiated tissues (6).

FEATURES AND SCOPE

The NGSmethDB database can be divided into two parts. First, the content can be visualized, together with many other common annotations, by means of a web interface based on GBrowse (14) coupled to a MySQL backend; and second, several user-friendly data mining tools are provided so the average user can generate its own data sets easily. Currently, the database holds information on three species (human, mouse and Arabidopsis) and 52 different tissues (21 unique tissues). Furthermore, two different methylation contexts are considered, CpG and CWG, but other non-CpG contexts, as CAH or CHH, will be soon available. Currently, the database holds methylation data of 696 599 217 cytosines for human (hg18), 69 459 481 cytosines for mouse (mm8) and 16 321 229 cytosines for Arabidopsis (TAIR8). A detailed and updated database statistical table is maintained on-line: http://bioinfo2.ugr.es/gbrowse2/StatGraphs/datasourcesrpt.php. A summary of the publications where the data were generated from is also maintained and updated on-line: http://bioinfo2.ugr.es/gbrowse2/DataSource/datasourcesrpt.php?start=1. We encourage data submissions of new methylation data in order to populate and maintain updated NGSmethDB. For most data, the methylation information for the cytosines is directly available for the three mentioned genome assemblies. In these cases, we populate the database with these processed data. For other cases, we used the LiftOver tool (15) to convert the coordinates from other assemblies, or developed scripts to process the raw data (like fastaq files) in order to obtain the methylation information for all covered cytosines. All methylation values for both CpG and CWG contexts are calculated taking into account both strands. The assigned methylation value is therefore a weighted mean between the context in the direct and reverse strands. Which means that it is the sum of reads that indicate methylation (cytosine not converted to uracil/thymine) mapped to the specific position in the ‘+’ strand and those mapped to the ‘−’ strand, divided by the total number of reads mapped to the position regardless of the strand.

Genomic browser interface

The GBrowse genome viewer (14) connected to a MySQL backend is used to set up a web browser interface for NGSmethDB. Features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of features, as well as the ability to enable and disable feature tracks and change their relative order and appearance. The user can also upload private annotations to view them in the context of the existing ones at the NGSmethDB web site. Apart from the methylation data, the following related annotations are currently available on the NGSmethDB browser: (i) CpGcluster CpG islands (16); (ii) Takai-Jones CpG islands (17); (iii) RefSeq genes (18); (iv) HMR conserved TFBSs (19); (v) CisRED regulatory elements (20); and (vi) the chromosome sequence (hg18, mm8 and TAIR8 genome assemblies) and G + C content. The methylation information of a given context is represented by the coordinate of the cytosine on the direct strand. To display the methylation values of the cytosines we use a color gradient from white (methylation value = 0, unmethylated in all reads) to red (methylation value = 1, methylated in all reads). To demonstrate the usefulness of the web interface, we analyzed the promoter region of the gene TIAM1 (Figure 1). It can be seen that this promoter is differentially methylated among the different tissues.
Figure 1.

Visualization of methylation states of CpG dinucleotides in different tissues in the NGSmethDB genome browser. The promoter region of the gene TIAM1 (NM_003253) is shown. The different methylation values are displayed by means of a color gradient from white (unmethylated in all reads) toward red (methylated in all reads).

Visualization of methylation states of CpG dinucleotides in different tissues in the NGSmethDB genome browser. The promoter region of the gene TIAM1 (NM_003253) is shown. The different methylation values are displayed by means of a color gradient from white (unmethylated in all reads) toward red (methylated in all reads).

Data mining tools

Currently, five different ways are implemented to retrieve raw data from the database. For all five possibilities, two different sequence contexts and three coverage levels exist. We detected not just the methylation values of CpG dinucleotides but also for the cytosines in a CWG (CAG or CTG) context. The methylation value at a given position (cytosine) is calculated as explained before taking both strands into consideration. We stored three different coverage levels in the database: cytosines covered by at least 1, 5 and 10 reads.

Dump download

This option shows first an overview of current database content, including a short description of the tissue, the genome coverage in %, a link to PubMed, and raw data files for #read ≥ 1, #read ≥ 5 and #read ≥ 10 coverage. The files show the chromosome, chromosome-start and chromosome-end coordinates, the sequence methylation context (either CpG or CWG), the number of reads and the cytosine methylation ratio.

Retrieve unmethylated contexts

This tool can be used to retrieve all unmethylated cytosines in a given set of tissues. The user has to select the sequence context (CG or CWG), the read coverage, the threshold for unmethylation (often a threshold of 0.2 is used, i.e. all cytosines with values ≤0.2 are considered to be unmethylated) and the tissues. The tool will detect all cytosine contexts showing lower methylation ratios than the chosen threshold in all selected tissues. The provided output file holds the chromosome, chromosome start- and end-coordinates and the methylation values in all selected tissues. Note that this tool can be also used to retrieve all CpGs which are present in every single analyzed tissue by setting the threshold to one. In doing so, cytosines with methylation data in all tissues will be reported regardless of its methylation state, i.e. cytosines that are not covered by at least the number of chosen coverage threshold (1, 5 or 10) in any of the analyzed tissues will not be reported in the output.

Retrieve differentially methylated contexts

By means of this tool all differentially methylated cytosine contexts can be determined in a given set of tissues. All parameters of the ‘Retrieve unmethylated contexts’ (see above) are available here, plus one additional parameter: the threshold for the methylation value which defines whether a cytosine is considered to be methylated (often a threshold of 0.8 is used, i.e. all cytosines with higher values than ≥0.8 are considered to be methylated). We define a cytosine as differentially methylated if it is unmethylated in at least one tissue and methylated in at least one other tissue. The tool reports those differentially methylated cytosine contexts that are either methylated or unmethylated in all analyzed tissues, i.e. those contexts that show intermediate methylation in only one tissue will not be reported.

Get methylation states of promoter regions

This tool allows depicting the methylation states of all cytosine contexts within the promoter region of RefSeq genes. We define the promoter region as beginning 1.5 kb upstream of the Transcription Start Site (TSS) and ending 500 bp downstream of the TSS. The user needs to provide a valid RefSeq name (NM_*) or a unique TAIR gene id (ATxGxxxxx) and the desired coverage. The output is displayed by default as an overview table that summarizes the fluctuation along the promoter as well as over the different tissues. A detailed table can also be generated (Figure 2).
Figure 2.

Detailed analysis of the promoter region of the gene TIAM1 (NM_003253) by means of the NGSmethDB data mining tools. The table shows the following columns: relative coordinate towards the point TSS-1.5 kb, the chromosomal coordinate of the cytosine, the number of tissues for which methylation data exists, the number of tissues where the cytosine were found to be unmethylated, the number of tissues where the cytosine were found to be methylated, the tissue names where the cytosine is methylated and unmethylated, respectively, the mean methylation value among all tissues, the minimum and maximum methylation values over all tissues. By means of the color code, green for unmethylation (value ≤0.2) and red for methylation (value ≥0.8), the situation can be rapidly analyzed. For example, if both minimum and maximum values are green for one cytosine position, this means that this cytosine is unmethylated in all analyzed tissues. On the other hand, if the minimum value is green and the maximum value is red, this indicates differential methylation over the different tissues for the given cytosine.

Detailed analysis of the promoter region of the gene TIAM1 (NM_003253) by means of the NGSmethDB data mining tools. The table shows the following columns: relative coordinate towards the point TSS-1.5 kb, the chromosomal coordinate of the cytosine, the number of tissues for which methylation data exists, the number of tissues where the cytosine were found to be unmethylated, the number of tissues where the cytosine were found to be methylated, the tissue names where the cytosine is methylated and unmethylated, respectively, the mean methylation value among all tissues, the minimum and maximum methylation values over all tissues. By means of the color code, green for unmethylation (value ≤0.2) and red for methylation (value ≥0.8), the situation can be rapidly analyzed. For example, if both minimum and maximum values are green for one cytosine position, this means that this cytosine is unmethylated in all analyzed tissues. On the other hand, if the minimum value is green and the maximum value is red, this indicates differential methylation over the different tissues for the given cytosine.

Retrieve methylation data for chromosome region

All methylation values for a selected set of tissues can be retrieved for a given chromosomal region, once the user provides the start and end chromosome coordinates.

CONCLUSIONS

Over the next years, methylation data for a growing number of tissues, cell types, pathological conditions and diverse species will all be available. In most of the original publications, the authors focus on concrete questions and scarcely the whole potential of the data can be exploited. To get more out of these data, a joint analysis with data from other tissues and/or species is needed. To carry out such analysis, data must be first stored in an appropriate way in a database. We propose here NGSmethDB, a new database with a very broad scope to facilitate the analysis of methylation data from different sources. Heterogeneous methylation data can be either simultaneously visualized through a powerful web interface or selectively downloaded by means of the provided data mining tools that allow the user to design new experiments and retrieve exactly the adequate data for them. Thus, we are confident that the database will be of great usefulness both for experimental and bioinformatics researchers.

FUNDING

The Spanish Government grant (BIO2008-01353 to J.L.O.); ‘Juan de la Cierva’ (to M.H.); Basque Country ‘Programa de formación de investigadores’ grant (to G.B.). Funding for open access charge: The Spanish Government grant (BIO2008-01353 to J.L.O.). Conflict of interest statement. None declared.
  18 in total

Review 1.  Methylation-induced repression--belts, braces, and chromatin.

Authors:  A P Bird; A P Wolffe
Journal:  Cell       Date:  1999-11-24       Impact factor: 41.582

2.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

Review 3.  Structure and function of eukaryotic DNA methyltransferases.

Authors:  Taiping Chen; En Li
Journal:  Curr Top Dev Biol       Date:  2004       Impact factor: 4.897

Review 4.  Gardening the genome: DNA methylation in Arabidopsis thaliana.

Authors:  Simon W-L Chan; Ian R Henderson; Steven E Jacobsen
Journal:  Nat Rev Genet       Date:  2005-05       Impact factor: 53.242

Review 5.  Principles and challenges of genomewide DNA methylation analysis.

Authors:  Peter W Laird
Journal:  Nat Rev Genet       Date:  2010-03       Impact factor: 53.242

6.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22.

Authors:  Daiya Takai; Peter A Jones
Journal:  Proc Natl Acad Sci U S A       Date:  2002-03-12       Impact factor: 11.205

7.  Prediction of CpG-island function: CpG clustering vs. sliding-window methods.

Authors:  Michael Hackenberg; Guillermo Barturen; Pedro Carpena; Pedro L Luque-Escamilla; Christopher Previti; José L Oliver
Journal:  BMC Genomics       Date:  2010-05-26       Impact factor: 3.969

8.  An improved version of the DNA Methylation database (MethDB).

Authors:  Céline Amoreira; Winfried Hindermann; Christoph Grunau
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

9.  High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing.

Authors:  Emily Hodges; Andrew D Smith; Jude Kendall; Zhenyu Xuan; Kandasamy Ravi; Michelle Rooks; Michael Q Zhang; Kenny Ye; Arindam Bhattacharjee; Leonardo Brizuela; W Richard McCombie; Michael Wigler; Gregory J Hannon; James B Hicks
Journal:  Genome Res       Date:  2009-07-06       Impact factor: 9.043

Review 10.  The methylome: approaches for global DNA methylation profiling.

Authors:  Stephan Beck; Vardhman K Rakyan
Journal:  Trends Genet       Date:  2008-03-05       Impact factor: 11.639

View more
  28 in total

1.  METHCOMP: a special purpose compression platform for DNA methylation data.

Authors:  Jianhao Peng; Olgica Milenkovic; Idoia Ochoa
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

2.  Position-dependent correlations between DNA methylation and the evolutionary rates of mammalian coding exons.

Authors:  Trees-Juen Chuang; Feng-Chi Chen; Yen-Zho Chen
Journal:  Proc Natl Acad Sci U S A       Date:  2012-09-10       Impact factor: 11.205

3.  miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments.

Authors:  Michael Hackenberg; Naiara Rodríguez-Ezpeleta; Ana M Aransay
Journal:  Nucleic Acids Res       Date:  2011-04-22       Impact factor: 16.971

Review 4.  Sequencing technologies and genome sequencing.

Authors:  Chandra Shekhar Pareek; Rafal Smoczynski; Andrzej Tretyn
Journal:  J Appl Genet       Date:  2011-06-23       Impact factor: 3.240

5.  MethylomeDB: a database of DNA methylation profiles of the brain.

Authors:  Yurong Xin; Benjamin Chanrion; Anne H O'Donnell; Maria Milekic; Ramiro Costa; Yongchao Ge; Fatemeh G Haghighi
Journal:  Nucleic Acids Res       Date:  2011-12-02       Impact factor: 16.971

6.  Databases and bioinformatics tools for the study of DNA repair.

Authors:  Kaja Milanowska; Kristian Rother; Janusz M Bujnicki
Journal:  Mol Biol Int       Date:  2011-07-14

7.  DiseaseMeth: a human disease methylation database.

Authors:  Jie Lv; Hongbo Liu; Jianzhong Su; Xueting Wu; Hui Liu; Boyan Li; Xue Xiao; Fang Wang; Qiong Wu; Yan Zhang
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

8.  PD_NGSAtlas: a reference database combining next-generation sequencing epigenomic and transcriptomic data for psychiatric disorders.

Authors:  Zheng Zhao; Yongsheng Li; Hong Chen; Jianping Lu; Peter M Thompson; Juan Chen; Zishan Wang; Juan Xu; Chun Xu; Xia Li
Journal:  BMC Med Genomics       Date:  2014-12-31       Impact factor: 3.063

9.  EpiFactors: a comprehensive database of human epigenetic factors and complexes.

Authors:  Yulia A Medvedeva; Andreas Lennartsson; Rezvan Ehsani; Ivan V Kulakovskiy; Ilya E Vorontsov; Pouda Panahandeh; Grigory Khimulya; Takeya Kasukawa; Finn Drabløs
Journal:  Database (Oxford)       Date:  2015-07-07       Impact factor: 3.451

10.  CpG_MPs: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data.

Authors:  Jianzhong Su; Haidan Yan; Yanjun Wei; Hongbo Liu; Hui Liu; Fang Wang; Jie Lv; Qiong Wu; Yan Zhang
Journal:  Nucleic Acids Res       Date:  2012-08-31       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.