| Literature DB >> 27084938 |
Felipe Albrecht1, Markus List2, Christoph Bock3, Thomas Lengauer2.
Abstract
Large amounts of epigenomic data are generated under the umbrella of the International Human Epigenome Consortium, which aims to establish 1000 reference epigenomes within the next few years. These data have the potential to unravel the complexity of epigenomic regulation. However, their effective use is hindered by the lack of flexible and easy-to-use methods for data retrieval. Extracting region sets of interest is a cumbersome task that involves several manual steps: identifying the relevant experiments, downloading the corresponding data files and filtering the region sets of interest. Here we present the DeepBlue Epigenomic Data Server, which streamlines epigenomic data analysis as well as software development. DeepBlue provides a comprehensive programmatic interface for finding, selecting, filtering, summarizing and downloading region sets. It contains data from four major epigenome projects, namely ENCODE, ROADMAP, BLUEPRINT and DEEP. DeepBlue comes with a user manual, examples and a well-documented application programming interface (API). The latter is accessed via the XML-RPC protocol supported by many programming languages. To demonstrate usage of the API and to enable convenient data retrieval for non-programmers, we offer an optional web interface. DeepBlue can be openly accessed at http://deepblue.mpi-inf.mpg.de.Entities:
Mesh:
Year: 2016 PMID: 27084938 PMCID: PMC4987868 DOI: 10.1093/nar/gkw211
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparison between web platforms for accessing and processing epigenomic data
| Tool | Epigenomic data | Operations | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ENCODE | ROADMAP | BLUEPRINT | DEEP | Text search | Filter by | Count and summarize | Visualization | |||
| Metadata | Regions content | Overlap | ||||||||
| UCSC GB | ✓ | ✓ | ✓ | |||||||
| Galaxy | ✓ | ✓ | ✓ | ✓ | ||||||
| UCSC TB | ✓ | ✓ | ✓ | |||||||
| ENCODE portal | ✓ | ✓ | ✓ | ✓ | ||||||
| IHEC portal | ✓ | ✓ | ✓ | ✓ | ||||||
| DeepBlue | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Abbreviations: UCSC GB: UCSC Genome Browser; UCSC TB: UCSC Table Browser.
Figure 1.DeepBlue Data Model: the experiments and annotation metadata are constituted by their names, controlled vocabulary terms and extra metadata. The experiments and annotations data form a set of regions. Each region is linked to the corresponding metadata and contains its start and end, as well as additional attributes found in the data files. The regions are stored in their respective genome and chromosome collections.
DeepBlue commands to list, search, select, operate and retrieve epigenomic data
| Category | Command | Description |
|---|---|---|
| Information | Obtain information about an entity | |
| List and search | List all registered genomes | |
| List all registered biosources | ||
| List all registered samples | ||
| List all registered epigenetic marks | ||
| List all available experiments | ||
| List all available annotations | ||
| Perform a full-text search | ||
| Selection | Select regions from experiments | |
| Select regions from experiments | ||
| Select regions from annotations | ||
| Select genes as regions | ||
| Generate tiling regions | ||
| Upload and use a small region-set | ||
| Operation | Aggregate and summarize regions | |
| Filter regions using their attributes | ||
| Generate flanking regions | ||
| Filter overlapping regions | ||
| Merge two regions set | ||
| Result | Count selected regions | |
| Request a score matrix | ||
| Request the selected regions | ||
| Request | Obtain the requested data |
The typical workflow starts with listing the experiments, followed by data selection. Optionally, the selected data can be processed by counting or retrieving the selected regions. The results can be downloaded as a formatted table or score matrix.
Figure 2.Workflow diagram and source code for the identification of H3K27ac peaks that overlap with promoters in any of the BLUEPRINT datasets and subsequent identification of transcription factor peaks that overlap with these promoters in any of the ENCODE datasets. The different colors represent different types of commands, e.g. green for data selection, red for operations, purple for requests and gray for download.