Literature DB >> 29522192

esATAC: an easy-to-use systematic pipeline for ATAC-seq data analysis.

Zheng Wei1, Wei Zhang1, Huan Fang1, Yanda Li1, Xiaowo Wang1.   

Abstract

Summary: ATAC-seq is rapidly emerging as one of the major experimental approaches to probe chromatin accessibility genome-wide. Here, we present 'esATAC', a highly integrated easy-to-use R/Bioconductor package, for systematic ATAC-seq data analysis. It covers essential steps for full analyzing procedure, including raw data processing, quality control and downstream statistical analysis such as peak calling, enrichment analysis and transcription factor footprinting. esATAC supports one command line execution for preset pipelines and provides flexible interfaces for building customized pipelines. Availability and implementation: esATAC package is open source under the GPL-3.0 license. It is implemented in R and C++. Source code and binaries for Linux, MAC OS X and Windows are available through Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/esATAC.html). Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2018        PMID: 29522192      PMCID: PMC6061683          DOI: 10.1093/bioinformatics/bty141

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Assay for transposase accessible chromatin with high-throughput sequencing (ATAC-seq) is a sensitive method to probe chromatin accessibility genome-wide (Buenrostro ). The library preparation is fast, easy-to-perform and requires low amount of biological sample. These advantages make ATAC-seq become a popular way to study open chromatin, nucleosome positioning and transcription factor (TF) footprinting in cell lines or primary tissues by a booming number of laboratories. Compared with its easy-to-perform experiment, ATAC-seq data analysis may take much more time and effort. Highly integrated cross-platform software to process ATAC-seq data is still lacking. Researchers need to set up their own local pipeline and use multiple tools, each of them provides partial functions of the entire data analysis workflow. Installing those tools from diverse sources, learning their manuals, testing their functions and integrating them together are tedious and time-consuming. To fill this gap, we developed an easy-to-use R/Bioconductor package named ‘esATAC’. esATAC systematically integrates the state-of-the-art software for full procedure ATAC-seq data analysis, covering raw data processing, downstream statistical analysis and multiple quality control (QC) functions. For the ease of user, esATAC provides preset pipelines that can be executed by one command line under R/Bioconductor environment on different platforms. Advanced users can easily create customized pipelines through flexible interfaces in esATAC. Multi-core and memory control mechanisms have been implemented to optimize hardware utilization.

2 Design and implementation

The flowchart of esATAC is shown in Figure 1a.
Fig. 1.

(a) esATAC workflow. esATAC pipeline is mainly divided into two parts, raw data processing and statistical analysis. QC functions at multiple levels are provided, including sequencing QC, library QC and functional annotation QC. (b) and (c) Examples of analyzing ATAC-seq data (GEO accession number GSE47753, see Supplementary Material). (b) CTCF footprinting. (c) Fragment length distribution. Periodicity of approximately 200 base pairs (bp) for nucleosome protection and 10.4 bp for the pitch of the DNA helix is shown by fast Fourier transformation in the upper right corner

(a) esATAC workflow. esATAC pipeline is mainly divided into two parts, raw data processing and statistical analysis. QC functions at multiple levels are provided, including sequencing QC, library QC and functional annotation QC. (b) and (c) Examples of analyzing ATAC-seq data (GEO accession number GSE47753, see Supplementary Material). (b) CTCF footprinting. (c) Fragment length distribution. Periodicity of approximately 200 base pairs (bp) for nucleosome protection and 10.4 bp for the pitch of the DNA helix is shown by fast Fourier transformation in the upper right corner

2.1 Data analysis workflow

The workflow can be mainly divided into two parts, raw data processing and statistical analysis. In the raw data processing part, esATAC can directly handle ATAC-seq raw data in FASTQ format. It wraps AdapterRemoval (Schubert ) for adapter trimming and Bowtie2 (Langmead ) for reads alignment. esATAC will sort the mapped reads, remove duplicates, shift reads for Tn5 insertion (Buenrostro ) and generate intensity profile in BigWig format for genome browser visualization. In the statistical analysis part, esATAC provides a comprehensive analyzing procedure for mapped ATAC-seq reads. It identifies open chromatin peak regions using F-seq (Boyle ), which specializes in seeking genome-wide profiling of open chromatin regions with high sensitivity (Koohy ). The peaks are annotated and related gene ontology terms are reported (see Supplementary Material). esATAC has integrated known TF motifs in JASPAR database (Mathelier ) to find potential TF binding sites in the peak regions, and generate TF footprinting plots (Fig. 1b).

2.2 Quality control

esATAC provides multiple level QC functions. Raw sequencing reads quality report will be generated (Gaidatzis ). esATAC performs fragment length QC analysis, providing that typical ATAC-seq fragment length distribution has a clear periodicity caused by nucleosome protection and the pitch of the DNA helix (Fig. 1c). Other QC methods adopted by ENCODE consortium have been integrated (see Supplementary Material), and concordance between replicates can be reported.

2.3 Implementation

For user convenience, we preset pipelines to analyze single sample and case-control paired samples for human and mouse. Users only need to provide the raw sequencing files and can execute the entire pipeline with one command in R. Dependent data like annotation files and bowtie2 index can be downloaded and built automatically. An HTML summary report for comprehensive QC and statistical analysis will be generated. The package is managed by dataflow graph, therefore users can easily understand and trace the pipeline processing modules (see Supplementary Material). Mechanisms in esATAC such as inputs legality checking ensure that sophisticated users are able to customize the pipeline or integrate other tools from any intermediate stages easily.esATAC provides memory control and parallel computing options to maximize the computing efficiency. Breakpoint detection has been established to ensure that users do not have to redo the finished processes in case the program was interrupted.

3 Conclusion

We proposed esATAC aiming to make ATAC-seq data analysis easy for a wide range of users. esATAC covers whole procedure for ATAC-seq data processing. It can be installed on different platforms and perform ‘one command line for result’ analysis. Users without sophisticated programming skills can get started easily. At the same time, all the sub-functions are componentized, making it a flexible platform for advanced users to build pipelines for specialized applications.

Funding

This work was supported by the National Science Foundation of China [grant nos. 31371341, 61773230 and 61721003], Tsinghua University Initiative Scientific Research Program [no. 20141081175] and the Open Research Fund of State Key Laboratory of Bioelectronics, Southeast University. Conflict of Interest: none declared. Click here for additional data file.
  7 in total

1.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

2.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position.

Authors:  Jason D Buenrostro; Paul G Giresi; Lisa C Zaba; Howard Y Chang; William J Greenleaf
Journal:  Nat Methods       Date:  2013-10-06       Impact factor: 28.547

3.  F-Seq: a feature density estimator for high-throughput sequence tags.

Authors:  Alan P Boyle; Justin Guinney; Gregory E Crawford; Terrence S Furey
Journal:  Bioinformatics       Date:  2008-09-10       Impact factor: 6.937

4.  QuasR: quantification and annotation of short reads in R.

Authors:  Dimos Gaidatzis; Anita Lerch; Florian Hahne; Michael B Stadler
Journal:  Bioinformatics       Date:  2014-11-21       Impact factor: 6.937

5.  AdapterRemoval v2: rapid adapter trimming, identification, and read merging.

Authors:  Mikkel Schubert; Stinus Lindgreen; Ludovic Orlando
Journal:  BMC Res Notes       Date:  2016-02-12

6.  A comparison of peak callers used for DNase-Seq data.

Authors:  Hashem Koohy; Thomas A Down; Mikhail Spivakov; Tim Hubbard
Journal:  PLoS One       Date:  2014-05-08       Impact factor: 3.240

7.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Authors:  Anthony Mathelier; Oriol Fornes; David J Arenillas; Chih-Yu Chen; Grégoire Denay; Jessica Lee; Wenqiang Shi; Casper Shyr; Ge Tan; Rebecca Worsley-Hunt; Allen W Zhang; François Parcy; Boris Lenhard; Albin Sandelin; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2015-11-03       Impact factor: 16.971

  7 in total
  22 in total

1.  SAturated Transposon Analysis in Yeast (SATAY) for Deep Functional Mapping of Yeast Genomes.

Authors:  Agnès H Michel; Benoît Kornmann
Journal:  Methods Mol Biol       Date:  2022

2.  Unbiased discovery of autoantibodies associated with severe COVID-19 via genome-scale self-assembled DNA-barcoded protein libraries.

Authors:  Joel J Credle; Jonathan Gunn; Puwanat Sangkhapreecha; Daniel R Monaco; Xuwen Alice Zheng; Hung-Ji Tsai; Azaan Wilbon; William R Morgenlander; Andre Rastegar; Yi Dong; Sahana Jayaraman; Lorenzo Tosi; Biju Parekkadan; Alan N Baer; Mario Roederer; Evan M Bloch; Aaron A R Tobian; Israel Zyskind; Jonathan I Silverberg; Avi Z Rosenberg; Andrea L Cox; Tom Lloyd; Andrew L Mammen; H Benjamin Larman
Journal:  Nat Biomed Eng       Date:  2022-08-19       Impact factor: 29.234

3.  Analytical Approaches for ATAC-seq Data Analysis.

Authors:  Jason P Smith; Nathan C Sheffield
Journal:  Curr Protoc Hum Genet       Date:  2020-06

4.  The phosphatase PAC1 acts as a T cell suppressor and attenuates host antitumor immunity.

Authors:  Liang Liu; Yizhe Sun; Jia Song; Qi Yin; Guangze Zhang; Fang Qi; Zixi Hu; Zeliang Yang; Zhe Zhou; Ying Hu; Lianhai Zhang; Jiafu Ji; Xuyang Zhao; Yan Jin; Michael A McNutt; Yuxin Yin
Journal:  Nat Immunol       Date:  2020-01-13       Impact factor: 25.606

5.  Decline in IGF1 in the bone marrow microenvironment initiates hematopoietic stem cell aging.

Authors:  Kira Young; Elizabeth Eudy; Rebecca Bell; Matthew A Loberg; Tim Stearns; Devyani Sharma; Lars Velten; Simon Haas; Marie-Dominique Filippi; Jennifer J Trowbridge
Journal:  Cell Stem Cell       Date:  2021-04-12       Impact factor: 25.269

6.  MAP3K2-regulated intestinal stromal cells define a distinct stem cell niche.

Authors:  Ningbo Wu; Hongxiang Sun; Xiaoyun Zhao; Yao Zhang; Jianmei Tan; Yuanyuan Qi; Qun Wang; Melissa Ng; Zhaoyuan Liu; Lingjuan He; Xiaoyin Niu; Lei Chen; Zhiduo Liu; Hua-Bing Li; Yi Arial Zeng; Manolis Roulis; Dou Liu; Jinke Cheng; Bin Zhou; Lai Guan Ng; Duowu Zou; Youqiong Ye; Richard A Flavell; Florent Ginhoux; Bing Su
Journal:  Nature       Date:  2021-03-03       Impact factor: 69.504

7.  CoBRA: Containerized Bioinformatics Workflow for Reproducible ChIP/ATAC-seq Analysis.

Authors:  Xintao Qiu; Avery S Feit; Ariel Feiglin; Yingtian Xie; Nikolas Kesten; Len Taing; Joseph Perkins; Shengqing Gu; Yihao Li; Paloma Cejas; Ningxuan Zhou; Rinath Jeselsohn; Myles Brown; X Shirley Liu; Henry W Long
Journal:  Genomics Proteomics Bioinformatics       Date:  2021-07-18       Impact factor: 6.409

8.  Harmonization of quality metrics and power calculation in multi-omic studies.

Authors:  Sonia Tarazona; Leandro Balzano-Nogueira; David Gómez-Cabrero; Andreas Schmidt; Axel Imhof; Thomas Hankemeier; Jesper Tegnér; Johan A Westerhuis; Ana Conesa
Journal:  Nat Commun       Date:  2020-06-18       Impact factor: 14.919

9.  Argonaute-CLIP delineates versatile, functional RNAi networks in Aedes aegypti, a major vector of human viruses.

Authors:  Kathryn Rozen-Gagnon; Meigang Gu; Joseph M Luna; Ji-Dung Luo; Soon Yi; Sasha Novack; Eliana Jacobson; Wei Wang; Matthew R Paul; Troels K H Scheel; Thomas Carroll; Charles M Rice
Journal:  Cell Host Microbe       Date:  2021-03-31       Impact factor: 21.023

10.  The loss of heterochromatin is associated with multiscale three-dimensional genome reorganization and aberrant transcription during cellular senescence.

Authors:  Xianglin Zhang; Xuehui Liu; Zhenhai Du; Lei Wei; Huan Fang; Qiongye Dong; Jing Niu; Yanda Li; Juntao Gao; Michael Q Zhang; Wei Xie; Xiaowo Wang
Journal:  Genome Res       Date:  2021-06-17       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.