Yoann Pageaud1, Christoph Plass1, Yassen Assenov1,2. 1. Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany. 2. German Centre for Cardiovascular Research (DZHK), Partner Site Heidelberg/Mannheim, 69120 Heidelberg, Germany.
Abstract
Motivation: Deciphering relevant biological insights from epigenomic data can be a challenging task. One commonly used approach is to perform enrichment analysis. However, finding, downloading and using the publicly available functional annotations require time, programming skills and IT infrastructure. Here we describe the online tool EpiAnnotator for performing enrichment analyses on epigenomic data in a fast and user-friendly way. Results: EpiAnnotator is an R Package accompanied by a web interface. It contains regularly updated annotations from 4 public databases: Blueprint, RoadMap, GENCODE and the UCSC Genome Browser. Annotations are hosted locally or in a server environment and automatically updated by scripts of our own design. Thousands of tracks are available, reflecting data on a variety of tissues, cell types and cell lines from the human and mouse genomes. Users need to upload sets of selected and background regions. Results are displayed in customizable and easily interpretable figures. Availability and implementation: The R package and Shiny app are open source and available under the GPL v3 license. EpiAnnotator's web interface is accessible at http://computational-epigenomics.com/en/epiannotator. Contact: epiannotator@computational-epigenomics.com.
Motivation: Deciphering relevant biological insights from epigenomic data can be a challenging task. One commonly used approach is to perform enrichment analysis. However, finding, downloading and using the publicly available functional annotations require time, programming skills and IT infrastructure. Here we describe the online tool EpiAnnotator for performing enrichment analyses on epigenomic data in a fast and user-friendly way. Results: EpiAnnotator is an R Package accompanied by a web interface. It contains regularly updated annotations from 4 public databases: Blueprint, RoadMap, GENCODE and the UCSC Genome Browser. Annotations are hosted locally or in a server environment and automatically updated by scripts of our own design. Thousands of tracks are available, reflecting data on a variety of tissues, cell types and cell lines from the human and mouse genomes. Users need to upload sets of selected and background regions. Results are displayed in customizable and easily interpretable figures. Availability and implementation: The R package and Shiny app are open source and available under the GPL v3 license. EpiAnnotator's web interface is accessible at http://computational-epigenomics.com/en/epiannotator. Contact: epiannotator@computational-epigenomics.com.
Interpretation of large epigenomic datasets is usually context-dependent and associated to a genome assembly of interest. Unravelling relevant biological insights from such datasets can be a burdensome and time-consuming task. One common approach to overcome some of these difficulties is to perform enrichment analysis.The recent increase in the use of new technologies designed for profiling epigenetic marks allows us to access large amount of methylation data from different repositories—ENCODE (ENCODE Project Consortium, 2004), the UCSC Genome Browser (Karolchik ), the International Human Epigenome Consortium (Bujold ), Roadmap Epigenomics (Bernstein ), etc. (Fig. 1A). However, multiplication of sources for genomic and epigenomic datasets can complicate analysis and results interpretation.
Fig. 1.
EpiAnnotator workflow example for an enrichment analysis performed with user’s selected and background genomic regions and annotations from EpiAnnotator’s databanks
EpiAnnotator workflow example for an enrichment analysis performed with user’s selected and background genomic regions and annotations from EpiAnnotator’s databanks
2 Implementation
To address these potential difficulties, we developed the EpiAnnotator web service as an all-encompassing enrichment analysis tool in a logic of centralization and regular updates from large web resources (Fig. 1B). Thousands of annotations are accessible to researchers to enable them to conveniently conduct comparative enrichment analyses and generating rapidly their own results in the form of comprehensive publication quality figures. EpiAnnotator builds upon extensive bioinformatical tools dedicated to enrichment analysis on genetic and epigenetic data. The R package LOLA (Sheffield and Bock, 2016) provides enrichment analysis but lacks visualization. The widely used DAVID service (Huang et al., 2009) focuses on genes only and is not integrated with epigenomic repositories. DeepBlue (Albrecht ) provides a programmatic interface for accessing such repositories but does not perform enrichment analysis. The Genomation toolkit (Akalin ) and the EpiExplorer web service (Halachev ) are suitable for summarization and visualization of genomic intervals but lack functionalities for enrichment analysis. Moreover, most of the tools described above require programming expertise from their users, whereas EpiAnnotator provides a user-friendly web interface relying on a Shiny app. Its functionalities are implemented in a back-end R package which utilizes the robust IRanges package from Bioconductor (Lawrence ) to optimize the computation of region overlap.
3 Workflow
EpiAnnotator enrichment analysis commonly needs three sources of data: a BED file containing a set of selected genomic regions of interest, another BED file containing a set of background genomic regions (Fig. 1C), and annotations, i.e. reference sets of genomic regions. Both the selected and background genomic regions are uploaded by the user (Fig. 1D). Annotations are selected from the EpiAnnotator interface after specifying the databank to be used (Fig. 1E). In addition to the default collection of annotations, we provide access to the LOLA core and extended databases. Conveniently, users who focus on studies using the HumanMethylation450 or MethylationEPIC assay can upload sets of probe identifiers instead of their targeted CpG sites. Annotations are analyzed for overlap with the uploaded regions (Fig. 1F). Two options are available to the user: an enrichment analysis using Fisher’s exact test (Fig. 1G) or an overview of the data. The result of an enrichment analysis is a table listing number of overlapping regions, as well as fold changes and P-values. EpiAnnotator provides multiple visualizations through easily interpretable plots. As an example, Figure 1H shows a summary plot displaying the results of the enrichment analysis performed with selected and background genomic regions from Taylor and annotation tracks from Taberlay ). The border and size of the circles denote significance level of the overlap; degree of enrichment or depletion is represented by the fill color. EpiAnnotator’s interface has been designed to be compatible with both computer screen and smartphone displays.
4 Conclusion
A key element allowing EpiAnnotator to decrease the long-extended computation time to a few seconds, is the usage of pre-computed distances for the reference set of genomic regions hosted in EpiAnnotator’s databanks. The databases are updated every two months to provide users with the latest annotations. Using EpiAnnotator does not require any coding skills, gives access to thousands of annotations through a web interface and provides enrichment analysis results along with high quality figures.Click here for additional data file.
Authors: D Karolchik; R Baertsch; M Diekhans; T S Furey; A Hinrichs; Y T Lu; K M Roskin; M Schwartz; C W Sugnet; D J Thomas; R J Weber; D Haussler; W J Kent Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971
Authors: David Bujold; David Anderson de Lima Morais; Carol Gauthier; Catherine Côté; Maxime Caron; Tony Kwan; Kuang Chung Chen; Jonathan Laperle; Alexei Nordell Markovits; Tomi Pastinen; Bryan Caron; Alain Veilleux; Pierre-Étienne Jacques; Guillaume Bourque Journal: Cell Syst Date: 2016-11-15 Impact factor: 10.304
Authors: Bradley E Bernstein; John A Stamatoyannopoulos; Joseph F Costello; Bing Ren; Aleksandar Milosavljevic; Alexander Meissner; Manolis Kellis; Marco A Marra; Arthur L Beaudet; Joseph R Ecker; Peggy J Farnham; Martin Hirst; Eric S Lander; Tarjei S Mikkelsen; James A Thomson Journal: Nat Biotechnol Date: 2010-10 Impact factor: 54.908
Authors: Michael Lawrence; Wolfgang Huber; Hervé Pagès; Patrick Aboyoun; Marc Carlson; Robert Gentleman; Martin T Morgan; Vincent J Carey Journal: PLoS Comput Biol Date: 2013-08-08 Impact factor: 4.475
Authors: Konstantin Halachev; Hannah Bast; Felipe Albrecht; Thomas Lengauer; Christoph Bock Journal: Genome Biol Date: 2012-10-03 Impact factor: 13.583
Authors: Phillippa C Taberlay; Aaron L Statham; Theresa K Kelly; Susan J Clark; Peter A Jones Journal: Genome Res Date: 2014-06-10 Impact factor: 9.043
Authors: Clarissa Gerhauser; Francesco Favero; Thomas Risch; Ronald Simon; Lars Feuerbach; Yassen Assenov; Doreen Heckmann; Nikos Sidiropoulos; Sebastian M Waszak; Daniel Hübschmann; Alfonso Urbanucci; Etsehiwot G Girma; Vladimir Kuryshev; Leszek J Klimczak; Natalie Saini; Adrian M Stütz; Dieter Weichenhan; Lisa-Marie Böttcher; Reka Toth; Josephine D Hendriksen; Christina Koop; Pavlo Lutsik; Sören Matzk; Hans-Jörg Warnatz; Vyacheslav Amstislavskiy; Clarissa Feuerstein; Benjamin Raeder; Olga Bogatyrova; Eva-Maria Schmitz; Claudia Hube-Magg; Martina Kluth; Hartwig Huland; Markus Graefen; Chris Lawerenz; Gervaise H Henry; Takafumi N Yamaguchi; Alicia Malewska; Jan Meiners; Daniela Schilling; Eva Reisinger; Roland Eils; Matthias Schlesner; Douglas W Strand; Robert G Bristow; Paul C Boutros; Christof von Kalle; Dmitry Gordenin; Holger Sültmann; Benedikt Brors; Guido Sauter; Christoph Plass; Marie-Laure Yaspo; Jan O Korbel; Thorsten Schlomm; Joachim Weischenfeldt Journal: Cancer Cell Date: 2018-12-10 Impact factor: 31.743
Authors: Giovanna Merchand-Reyes; Ramasamy Santhanam; Frank H Robledo-Avila; Christoph Weigel; Juan de Dios Ruiz-Rosado; Xiaokui Mo; Santiago Partida-Sánchez; Jennifer A Woyach; Christopher C Oakes; Susheela Tridandapani; Jonathan P Butchar Journal: J Immunol Date: 2022-08-22 Impact factor: 5.426
Authors: Bethany L Mundy-Bosse; Christoph Weigel; Yue-Zhong Wu; Salma Abdelbaky; Youssef Youssef; Susana Beceiro Casas; Nicholas Polley; Gabrielle Ernst; Karen A Young; Kathleen K McConnell; Ansel P Nalin; Kevin G Wu; Megan Broughton; Matthew R Lordo; Ekaterina Altynova; Everardo Hegewisch-Solloa; Daniel Y Enriquez-Vera; Daniela Dueñas; Carlos Barrionuevo; Shan-Chi Yu; Atif Saleem; Carlos J Suarez; Edward L Briercheck; Hernan Molina-Kirsch; Thomas P Loughran; Dieter Weichenhan; Christoph Plass; John C Reneau; Emily M Mace; Fabiola Valvert Gamboa; David M Weinstock; Yasodha Natkunam; Michael A Caligiuri; Anjali Mishra; Pierluigi Porcu; Robert A Baiocchi; Jonathan E Brammer; Aharon G Freud; Christopher C Oakes Journal: Blood Cancer Discov Date: 2022-03-01
Authors: Brian Giacopelli; Min Wang; Ada Cleary; Yue-Zhong Wu; Anna Reister Schultz; Maximilian Schmutz; James S Blachly; Ann-Kathrin Eisfeld; Bethany Mundy-Bosse; Sebastian Vosberg; Philipp A Greif; Rainer Claus; Lars Bullinger; Ramiro Garzon; Kevin R Coombes; Clara D Bloomfield; Brian J Druker; Jeffrey W Tyner; John C Byrd; Christopher C Oakes Journal: Genome Res Date: 2021-03-11 Impact factor: 9.043
Authors: Peter McErlean; Christopher G Bell; Richard J Hewitt; Zabreen Busharat; Patricia P Ogger; Poonam Ghai; Gesa J Albers; Emily Calamita; Shaun Kingston; Philip L Molyneaux; Stephan Beck; Clare M Lloyd; Toby M Maher; Adam J Byrne Journal: Am J Respir Crit Care Med Date: 2021-10-15 Impact factor: 21.405