Literature DB >> 28011773

ALTRE: workflow for defining ALTered Regulatory Elements using chromatin accessibility data.

Elizabeth Baskin, Rick Farouni, Ewy A Mathé.   

Abstract

Summary: Regulatory elements regulate gene transcription, and their location and accessibility is cell-type specific, particularly for enhancers. Mapping and comparing chromatin accessibility between different cell types may identify mechanisms involved in cellular development and disease progression. To streamline and simplify differential analysis of regulatory elements genome-wide using chromatin accessibility data, such as DNase-seq, ATAC-seq, we developed ALTRE ( ALT ered R egulatory E lements), an R package and associated R Shiny web app. ALTRE makes such analysis accessible to a wide range of users-from novice to practiced computational biologists. Availability and Implementation: https://github.com/Mathelab/ALTRE. Contact: ewy.mathe@osumc.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28011773      PMCID: PMC5408819          DOI: 10.1093/bioinformatics/btw688

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Assays that measure chromatin accessibility genome-wide, such as FAIRE-seq (Giresi ), DNase-seq (Crawford ; John ; Thurman ), and ATAC-seq (Buenrostro ), enable global mapping of regulatory elements (REs), including promoters and enhancers. Organization of these REs, particularly enhancers, is cell-type specific (Kieffer-Kwon ; Rendeiro ; Stergachis ) and is a strong determinant of disease mutational landscapes, including those of cancer (Polak ). Thus, identifying REs that differ in accessibility between cell types, such as cancerous and non-cancerous cell lines and tissues, holds promise for pinpointing mechanisms involved in disease progression. Furthermore, REs that control disease-related genes and pathways can be investigated as putative therapeutic targets, or may even be such targets themselves (Heinz ; Lam ). To the best of our knowledge, no comprehensive and user-friendly workflow for downstream analysis of chromatin accessibility data is available. Downstream analysis includes guiding chromatin accessibility alignment and peak data to interpretable results of REs and pathways of interest. However, there are no standardized approaches or guidelines. Typically, individual data analyses pipelines must be created from scratch in-house, thereby making reproducible, shareable data-analysis difficult. ALTRE provides a workflow so users can identify altered REs between two different cell types or conditions, and includes a Shiny (RStudio shiny: Easy web applications in R. 2014) web interface for those not as fluent in the R statistical language.

2 Implementation

2.1 Data preparation and set-up

Typical of high-throughput sequencing data, chromatin accessibility data are delivered in FASTQ files. Quality control, alignment and peak calling of the FASTQ file reads, described in detail elsewhere (Baek ; Boyle ; Jalili ; Rashid ; Zhang ), must be performed before using ALTRE. To start the ALTRE workflow, users need to generate a comma-separated-values CSV file with 4 columns for each sample to be analyzed: (1) name of alignment (BAM) files; (2) name of peak (BED) files; (3) sample name; (4) replicate number. All files should be placed in the same folder and the software will detect the location of the files when reading in the CSV. A minimum of 2 replicates per sample is required to run the workflow. To get started with ALTRE, users need to have R (≥3.2.0) installed.

2.2 General aspects and design

ALTRE was designed to be user-friendly and to streamline differential analysis of REs genome-wide. The steps of the workflow analysis are delineated in Figure 1 and include loading data, defining consensus peaks (found in multiple replicates), annotating (e.g. Transcription Start Site (TSS)-distal and TSS-proximal) and optionally merging peaks, identifying significantly altered REs based on quantitative data using DESeq2 (Love ), creating tracks for visualizing categorized REs in a genome browser, comparing altered REs with those defined based on binary (peak present/absent) data only, and finally, defining pathways that are enriched in cell- or condition-type specific or shared REs using GREAT (Gu, Z. rGREAT: Client for GREAT Analysis. R package version 1.4.2. 2016; McLean ).
Fig. 1.

Snapshot of ALTRE Shiny web application showing workflow steps

Snapshot of ALTRE Shiny web application showing workflow steps ALTRE’s embedded Shiny app takes alignment files (BAM format) and hotspot/peak files (BED format) as input. The workflow guides users through the steps described above and delineated in Figure 1. At each step, users can define thresholds, such as number of replicate samples required to define a peak as consensus, and fold changes and p-value cutoffs for definition of cell type specific or shared REs. Users can then quickly retrieve summary statistics and visualization plots (heatmaps, barplots) to ensure the appropriateness of their parameters. For ease of use, default options are provided at each step for guidance. Of note, while tools for differential binding and annotation of sequencing data exist (Bailey ; Chabbert ; Ross-Innes ; Yu ; Zhu, 2013; Zhu ; Stark and Brown, ‘DiffBind: differential binding analysis of ChIP-Seq peak data’ 2011), ALTRE supports peak merging and annotation, differential analysis and pathway enrichment analysis in one streamlined tool.

3 Results and discussion

Users can install ALTRE with the function install_github() from the devtools R package (Wickham H and Chang, W. 2016. devtools: Tools to Make Developing R Packages Easier). Full installation instructions are found at https://github.com/Mathelab/ALTRE. Users can then run the workflow either in the R console or by launching the embedded web application by typing ‘runShinyApp()’ in the R console. A detailed vignette (https://mathelab.github.io/ALTRE/vignette.html) walks users through an example workflow analysis step-by-step. A sample dataset is provided on GitHub and can be accessed at https://mathelab.github.io/ALTREsampledata/. This sample dataset includes ENCODE data for cancerous and associated non-cancer lung cell lines, A549 and SAEC, respectively. On a machine with 16 GB memory and a 2.5 GHz Intel Core i7 processor, the workflow takes ∼334 s to complete for the example dataset using all chromosomes. For real-time analysis of results, the ALTRE Shiny app enables users to change their parameters and directly visualize the effect of those changes through summary statistics tables and plots. For example, users can readily visualize the number of REs that are sample-type specific or shared based on their input fold change and adjusted P-value thresholds through a volcano plot and an associated statistics table. In addition, processed data can be saved after key steps in the analysis and all plots can be modified (e.g. colors) and saved as high resolution images. With the increasing interest in researching REs to better understand transcriptional regulation and diseases, and improvements in techniques to assess these regions (Buenrostro ), chromatin accessibility assays are being increasingly generated. With this in mind, ALTRE provides a user-friendly workflow that guides the analysis and interpretation of these data.

Funding

This work was supported by The Ohio State University and the Translational Data Analytics Initiative. Conflict of Interest: none declared. Click here for additional data file.
  24 in total

1.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS).

Authors:  Gregory E Crawford; Ingeborg E Holt; James Whittle; Bryn D Webb; Denise Tai; Sean Davis; Elliott H Margulies; YiDong Chen; John A Bernat; David Ginsburg; Daixing Zhou; Shujun Luo; Thomas J Vasicek; Mark J Daly; Tyra G Wolfsberg; Francis S Collins
Journal:  Genome Res       Date:  2005-12-12       Impact factor: 9.043

2.  Integrative analysis of ChIP-chip and ChIP-seq dataset.

Authors:  Lihua Julie Zhu
Journal:  Methods Mol Biol       Date:  2013

3.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin.

Authors:  Paul G Giresi; Jonghwan Kim; Ryan M McDaniell; Vishwanath R Iyer; Jason D Lieb
Journal:  Genome Res       Date:  2006-12-19       Impact factor: 9.043

4.  Quantitative analysis of genome-wide chromatin remodeling.

Authors:  Songjoon Baek; Myong-Hee Sung; Gordon L Hager
Journal:  Methods Mol Biol       Date:  2012

5.  Genome-scale mapping of DNase I hypersensitivity.

Authors:  Sam John; Peter J Sabo; Theresa K Canfield; Kristen Lee; Shinny Vong; Molly Weaver; Hao Wang; Jeff Vierstra; Alex P Reynolds; Robert E Thurman; John A Stamatoyannopoulos
Journal:  Curr Protoc Mol Biol       Date:  2013-07

6.  F-Seq: a feature density estimator for high-throughput sequence tags.

Authors:  Alan P Boyle; Justin Guinney; Gregory E Crawford; Terrence S Furey
Journal:  Bioinformatics       Date:  2008-09-10       Impact factor: 6.937

7.  ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions.

Authors:  Naim U Rashid; Paul G Giresi; Joseph G Ibrahim; Wei Sun; Jason D Lieb
Journal:  Genome Biol       Date:  2011-07-25       Impact factor: 13.583

8.  Cell-of-origin chromatin organization shapes the mutational landscape of cancer.

Authors:  Paz Polak; Rosa Karlić; Amnon Koren; Robert Thurman; Richard Sandstrom; Michael Lawrence; Alex Reynolds; Eric Rynes; Kristian Vlahoviček; John A Stamatoyannopoulos; Shamil R Sunyaev
Journal:  Nature       Date:  2015-02-19       Impact factor: 49.962

9.  The accessible chromatin landscape of the human genome.

Authors:  Robert E Thurman; Eric Rynes; Richard Humbert; Jeff Vierstra; Matthew T Maurano; Eric Haugen; Nathan C Sheffield; Andrew B Stergachis; Hao Wang; Benjamin Vernot; Kavita Garg; Sam John; Richard Sandstrom; Daniel Bates; Lisa Boatman; Theresa K Canfield; Morgan Diegel; Douglas Dunn; Abigail K Ebersol; Tristan Frum; Erika Giste; Audra K Johnson; Ericka M Johnson; Tanya Kutyavin; Bryan Lajoie; Bum-Kyu Lee; Kristen Lee; Darin London; Dimitra Lotakis; Shane Neph; Fidencio Neri; Eric D Nguyen; Hongzhu Qu; Alex P Reynolds; Vaughn Roach; Alexias Safi; Minerva E Sanchez; Amartya Sanyal; Anthony Shafer; Jeremy M Simon; Lingyun Song; Shinny Vong; Molly Weaver; Yongqi Yan; Zhancheng Zhang; Zhuzhu Zhang; Boris Lenhard; Muneesh Tewari; Michael O Dorschner; R Scott Hansen; Patrick A Navas; George Stamatoyannopoulos; Vishwanath R Iyer; Jason D Lieb; Shamil R Sunyaev; Joshua M Akey; Peter J Sabo; Rajinder Kaul; Terrence S Furey; Job Dekker; Gregory E Crawford; John A Stamatoyannopoulos
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

10.  Rev-Erbs repress macrophage gene expression by inhibiting enhancer-directed transcription.

Authors:  Michael T Y Lam; Han Cho; Hanna P Lesch; David Gosselin; Sven Heinz; Yumiko Tanaka-Oishi; Christopher Benner; Minna U Kaikkonen; Aneeza S Kim; Mika Kosaka; Cindy Y Lee; Andy Watt; Tamar R Grossman; Michael G Rosenfeld; Ronald M Evans; Christopher K Glass
Journal:  Nature       Date:  2013-06-02       Impact factor: 49.962

View more
  2 in total

1.  Altered regulation of DPF3, a member of the SWI/SNF complexes, underlies the 14q24 renal cancer susceptibility locus.

Authors:  Leandro M Colli; Lea Jessop; Timothy A Myers; Sabrina Y Camp; Mitchell J Machiela; Jiyeon Choi; Renato Cunha; Olusegun Onabajo; Grace C Mills; Virginia Schmid; Seth A Brodie; Olivier Delattre; David R Mole; Mark P Purdue; Kai Yu; Kevin M Brown; Stephen J Chanock
Journal:  Am J Hum Genet       Date:  2021-08-13       Impact factor: 11.025

Review 2.  Bibliometric review of ATAC-Seq and its application in gene expression.

Authors:  Liheng Luo; Michael Gribskov; Sufang Wang
Journal:  Brief Bioinform       Date:  2022-05-13       Impact factor: 13.994

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.