Literature DB >> 28334349

DeepBlueR: large-scale epigenomic analysis in R.

Felipe Albrecht^1,2, Markus List¹, Christoph Bock^1,3,4, Thomas Lengauer¹.

Abstract

MOTIVATION: While large amounts of epigenomic data are publicly available, their retrieval in a form suitable for downstream analysis is a bottleneck in current research. The DeepBlue Epigenomic Data Server provides a powerful interface and API for filtering, transforming, aggregating and downloading data from several epigenomic consortia.
RESULTS: To make public epigenomic data conveniently available for analysis in R, we developed an R/Bioconductor package that connects to the DeepBlue Epigenomic Data Server, enabling users to quickly gather and transform epigenomic data from selected experiments for analysis in the Bioconductor ecosystem.
AVAILABILITY AND IMPLEMENTATION: http://deepblue.mpi-inf.mpg.de/R . REQUIREMENTS: R 3.3, Bioconductor 3.4. CONTACT: felipe.albrecht@mpi-inf.mpg.de or markus.list@mpi-inf.mpg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Year: 2017 PMID： 28334349 PMCID： PMC5870546 DOI： 10.1093/bioinformatics/btx099

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Epigenomic mapping consortia such as the BLUEPRINT Epigenome Project (Adams ), the German Epigenome Programme (DEEP) (http://www.deutsches-epigenom-programm.de), The Encyclopedia of DNA Elements (ENCODE) (The ENCODE Project Consortium, 2004) and the NIH Roadmap Epigenomics Mapping Consortium (ROADMAP) (Kundaje ) have made substantial progress in generating epigenomic data. These individual projects cooperate under the International Human Epigenome Consortium (IHEC) (Stunnenberg ) with the goal to define standards for data quality, metadata content and processing pipelines, as well as to make processed data available to the scientific community. For the latter, a number of data portals have been developed (Bujold ; Fernández ) through which relevant experimental data can be downloaded for local analyses. However, this approach has certain disadvantages. For instance, huge files that span the entire genome need to be downloaded even if only a small portion is needed, e.g. only promoter regions. Moreover, to answer a specific research question, it is usually necessary to transform, filter and aggregate data of various types across many experimental files. Complex operations on these data are not always feasible on a local computer due to resource limitations. To facilitate the analysis of public epigenomic datasets, we previously developed the DeepBlue epigenomic data server (Albrecht ), a platform that provides programmatic access to unaltered epigenomic data provided by the aforementioned consortia and to server-side data operations through a web service. R (R Core Team, 2016) and the Bioconductor ecosystem (Huber ) form one of the most popular environments for downstream analysis and visualization of genomic and epigenomic data. Access to epigenomic data from various sources is already possible through the AnnotationHub package (http://bioconductor.org/packages/AnnotationHub/), for instance. However, a general solution for extracting only relevant subsets of information as it is possible with the DeepBlue server is currently missing. Here we present a R/Bioconductor package that provides user-friendly access to DeepBlue and streamlines the workflow from data retrieval to downstream analysis.

2 Overview

In DeepBlueR, various commands can be combined in custom workflows operating on epigenomic data on the DeepBlue server. A list of commands available DeepBlueR is provided in the Supplementary Information. DeepBlueR has been optimized for speed, which included modifications of the Bioconductor XML-RPC package, use of data compression and local caching of results. Upon import, all data is converted into suitable R data structures such as GenomicRanges (Lawrence ). In a typical workflow (Fig. 1), a set of regions is selected from various files. The selected regions are subsequently filtered and finally summarized. Each data operation command returns a Query ID that can either serve as input for the following command or can be used to trigger the execution of the workflow. In the latter case, a Request ID is returned which allows for checking if a request is completed and for downloading the results. DeepBlue incorporates commonly used annotations such as GENCODE (Harrow ) or the ENSEMBL regulatory build (Zerbino ) to simplify the selection of regions of interest.

Fig. 1.

DeepBlueR facilitates combining data operations into a data processing workflow. For each command, a query ID is returned and the final data is accessible through the request ID

3 Conclusion

Public data portals enable researchers to access to terabytes of epigenomic data. This creates a strong demand for data analysis in statistical environments such as R, which is not effective on local computers due to the volume of the data. Here we present a Bioconductor package that enables R users to tap directly into the DeepBlue epigenomic data server to operate on large epigenomic datasets. Results are conveniently transformed to R data structures that can be directly used with R/Bioconductor packages for visualization or analysis. Usage examples and documentation can be found in the Supplementary Information, including an example of a genome-wide cluster analysis of DNA methylation across 212 samples from the BLUEPRINT consortium. For the future, we intend to add new functionality as the DeepBlue API evolves. Moreover, we aim at providing better integration with R packages such as TCGAbiolinks (Colaprico ) or LOLA (Sheffield and Bock, 2016).

Funding

This work has been supported by the German Federal Ministry of Education and Research grant no. 01KU1216A (DEEP project) and has been performed in the context of EU FP7 grant no. HEALTH-F5-2011-282510 (BLUEPRINT project). Conflict of Interest: none declared. Click here for additional data file.

13 in total

1. The International Human Epigenome Consortium Data Portal.

Authors: David Bujold; David Anderson de Lima Morais; Carol Gauthier; Catherine Côté; Maxime Caron; Tony Kwan; Kuang Chung Chen; Jonathan Laperle; Alexei Nordell Markovits; Tomi Pastinen; Bryan Caron; Alain Veilleux; Pierre-Étienne Jacques; Guillaume Bourque
Journal: Cell Syst Date: 2016-11-15 Impact factor: 10.304

Review 2. Orchestrating high-throughput genomic analysis with Bioconductor.

Authors: Wolfgang Huber; Vincent J Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton S Carvalho; Hector Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D Hansen; Rafael A Irizarry; Michael Lawrence; Michael I Love; James MacDonald; Valerie Obenchain; Andrzej K Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan
Journal: Nat Methods Date: 2015-02 Impact factor: 28.547

3. The BLUEPRINT Data Analysis Portal.

Authors: José María Fernández; Victor de la Torre; David Richardson; Romina Royo; Montserrat Puiggròs; Valentí Moncunill; Stamatina Fragkogianni; Laura Clarke; Paul Flicek; Daniel Rico; David Torrents; Enrique Carrillo de Santa Pau; Alfonso Valencia
Journal: Cell Syst Date: 2016-11-15 Impact factor: 10.304

4. The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery.

Authors: Hendrik G Stunnenberg; Martin Hirst
Journal: Cell Date: 2016-11-17 Impact factor: 41.582

5. BLUEPRINT to decode the epigenetic signature written in blood.

Authors: David Adams; Lucia Altucci; Stylianos E Antonarakis; Juan Ballesteros; Stephan Beck; Adrian Bird; Christoph Bock; Bernhard Boehm; Elias Campo; Andrea Caricasole; Fredrik Dahl; Emmanouil T Dermitzakis; Tariq Enver; Manel Esteller; Xavier Estivill; Anne Ferguson-Smith; Jude Fitzgibbon; Paul Flicek; Claudia Giehl; Thomas Graf; Frank Grosveld; Roderic Guigo; Ivo Gut; Kristian Helin; Jonas Jarvius; Ralf Küppers; Hans Lehrach; Thomas Lengauer; Åke Lernmark; David Leslie; Markus Loeffler; Elizabeth Macintyre; Antonello Mai; Joost H A Martens; Saverio Minucci; Willem H Ouwehand; Pier Giuseppe Pelicci; Hèléne Pendeville; Bo Porse; Vardhman Rakyan; Wolf Reik; Martin Schrappe; Dirk Schübeler; Martin Seifert; Reiner Siebert; David Simmons; Nicole Soranzo; Salvatore Spicuglia; Michael Stratton; Hendrik G Stunnenberg; Amos Tanay; David Torrents; Alfonso Valencia; Edo Vellenga; Martin Vingron; Jörn Walter; Spike Willcocks
Journal: Nat Biotechnol Date: 2012-03-07 Impact factor: 54.908

6. GENCODE: the reference human genome annotation for The ENCODE Project.

Authors: Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal: Genome Res Date: 2012-09 Impact factor: 9.043

7. The ensembl regulatory build.

Authors: Daniel R Zerbino; Steven P Wilder; Nathan Johnson; Thomas Juettemann; Paul R Flicek
Journal: Genome Biol Date: 2015-03-24 Impact factor: 13.583

8. DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets.

Authors: Felipe Albrecht; Markus List; Christoph Bock; Thomas Lengauer
Journal: Nucleic Acids Res Date: 2016-04-15 Impact factor: 16.971

9. Integrative analysis of 111 reference human epigenomes.

Authors: Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal: Nature Date: 2015-02-19 Impact factor: 69.504

10. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.

Authors: Antonio Colaprico; Tiago C Silva; Catharina Olsen; Luciano Garofano; Claudia Cava; Davide Garolini; Thais S Sabedot; Tathiane M Malta; Stefano M Pagnotta; Isabella Castiglioni; Michele Ceccarelli; Gianluca Bontempi; Houtan Noushmehr
Journal: Nucleic Acids Res Date: 2015-12-23 Impact factor: 16.971

3 in total

1. In-depth characterization of the placental imprintome reveals novel differentially methylated regions across birth weight categories.

Authors: Maya A Deyssenroth; Carmen J Marsit; Jia Chen; Luca Lambertini
Journal: Epigenetics Date: 2019-08-12 Impact factor: 4.528

Review 2. Statistical and integrative system-level analysis of DNA methylation data.

Authors: Andrew E Teschendorff; Caroline L Relton
Journal: Nat Rev Genet Date: 2017-11-13 Impact factor: 53.242

3. VannoPortal: multiscale functional annotation of human genetic variants for interrogating molecular mechanism of traits and diseases.

Authors: Dandan Huang; Yao Zhou; Xianfu Yi; Xutong Fan; Jianhua Wang; Hongcheng Yao; Pak Chung Sham; Jihui Hao; Kexin Chen; Mulin Jun Li
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

3 in total