| Literature DB >> 29490630 |
Jianhong Ou1, Haibo Liu2, Jun Yu2, Michelle A Kelliher2, Lucio H Castilla2, Nathan D Lawson2, Lihua Julie Zhu3,4.
Abstract
BACKGROUND: ATAC-seq (Assays for Transposase-Accessible Chromatin using sequencing) is a recently developed technique for genome-wide analysis of chromatin accessibility. Compared to earlier methods for assaying chromatin accessibility, ATAC-seq is faster and easier to perform, does not require cross-linking, has higher signal to noise ratio, and can be performed on small cell numbers. However, to ensure a successful ATAC-seq experiment, step-by-step quality assurance processes, including both wet lab quality control and in silico quality assessment, are essential. While several tools have been developed or adopted for assessing read quality, identifying nucleosome occupancy and accessible regions from ATAC-seq data, none of the tools provide a comprehensive set of functionalities for preprocessing and quality assessment of aligned ATAC-seq datasets.Entities:
Keywords: ATAC-seq; ATACseqQC; Chromatin accessibility; Quality control
Mesh:
Substances:
Year: 2018 PMID: 29490630 PMCID: PMC5831847 DOI: 10.1186/s12864-018-4559-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Functions implemented in the ATACseqQC package
| Function Name | Usage Description |
|---|---|
|
| Read in bam files to R leveraging Rsamtools and create a GAlignments object |
|
| Perform quality assessment on alignments and Filter BAM files to remove duplicates, mitochondrial reads and low-quality or discordant alignments |
|
| Plot size distribution of sequenced fragments in ATAC-seq libraries |
|
| Streamline the visualization of read distribution along genomic regions of interest, such as those containing housekeeping genes |
|
| Shift 5′ end of aligned reads in GAlignments object |
|
| |
|
| Split the shifted bam files based on ranges of fragment sizes in nucleosome-free, mono-, di-, tri-nucleosome bins and so on |
|
| Shift 5′ end of aligned reads and split the updated bam files in one step |
|
| Export lists of GAlignment objects back into bam files |
|
| Get enrichment signals for nucleosome-free and nucleosome-bound signals |
|
| Calculate the maximal similarity score for each given sequence against a PWM of a TF binding motif |
|
| Discover and visualize footprints of a given transcription factor |
|
| |
|
| Estimate library complexity, available for version 1.3.12 or later |
|
| |
|
| Plot saturation curves based on the total number or width of significant peaks detected for a serial of subsamples, available for version 1.3.12 or later |
ATAC-seq datasets used for the ATACseqQC case studies. The four datasets chosen for detailed quality control are highlighted in bold
| SRA Run Accession | Condition | Comment | Study Description | Reference |
|---|---|---|---|---|
| SRR891269 | EBV-transformed lymphoblastoid cell line | GM12878, 50 k cells | Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position | [ |
| SRR891270 | EBV-transformed lymphoblastoid cell line | GM12878, 50 k cells | ||
| SRR891271 | EBV-transformed lymphoblastoid cell line | GM12878, 50 k cells | ||
| SRR891272 | EBV-transformed lymphoblastoid cell line | GM12878, 500 cells | ||
| SRR891274 | EBV-transformed lymphoblastoid cell line | GM12878, 500 cells | ||
| SRR891275 | CD4+ T-cells purified using negative selection | CD4+ T cells, day 1 | ||
| SRR891276 | CD4+ T-cells purified using negative selection | CD4+ T cells, day 1 | ||
| SRR891277 | CD4+ T-cells purified using negative selection | CD4+ T cells, day 2 | ||
| SRR891278 | CD4+ T-cells purified using negative selection | CD4+ T cells, day 2 | ||
| SRR3295017 | Uninfected | HFF_uninfected | Toxoplasma gondii remodels the cis-regulatory landscape of infected human host cells | [ |
| SRR3295018 | HFF cells, uninfected | HFF_uninfected | ||
| SRR3295019 | HFF cells, uninfected | HFF_uninfected | ||
| SRR3295020 | HFF cells infected with | HFF_infected | ||
| SRR3295021 | HFF cells infected with | HFF_infected | ||
| SRR3295022 | HFF cells infected with | HFF_infected | ||
| SRR5720369 | J-Lat A72 cells treated with DMSO | Replicate 1 | The Short Isoform of BRD4 Promotes HIV-1 Latency by Engaging Repressive SWI/SNF Chromatin Remodeling Complexes | [ |
| SRR5720370 | J-Lat A72 cells treated with JQ1 | Replicate 1 | ||
| SRR5720371 | J-Lat A72 cells treated with DMSO | Replicate 2 | ||
| SRR5720372 | J-Lat A72 cells treated withJQ1 | Replicate 2 | ||
| SRR5800797 | Breast cancer cell line T47D, multiH1sh Control | Replicate 1, 75 k cells | Analysis of the DNA accessibility upon knocking-down multiple histone H1 variants by ATAC-seq | Vallés AJ and Izquierd-Bouldstridge A., unpublished |
| SRR5800798 | Breast cancer cell line T47D, multiH1sh Control | Replicate 2, 75 k cells | ||
| SRR5800799 | Breast cancer cell line T47D, multiH1sh Dox | Replicate 1, 75 k cells | ||
| SRR5800800 | Breast cancer cell line T47D, multiH1sh Dox | Replicate 2, 75 k cells | ||
| SRR5800801 | Breast cancer cell line T47D, RDsh control | Replicate 1, 75 k cells | ||
| SRR5800802 | Breast cancer cell line T47D, RDsh Dox Control | Replicate 1, 75 k cells |
Fig. 1Diagnostic plots for four representative ATAC-seq datasets: SRR891270, SRR3295017, SRR5720369 and SRR580802. (a, d, g and j) Size distributions of sequenced fragments with reads passed filtering criteria for each library. (b, e, h and k) Heatmaps showing the distributions of signals around transcription start sites (TSSs), resulting from inferred nucleosome-free fragments and nucleosome-bound (mono-, di- and tri-nucleosome) fragments. To plot TSS-associated signals arising from nucleosome-bound fragments, fragments associated with di- and tri-nucleosomes were split into two and three sub-reads in silico, respectively. (c, f, i and l) Smoothed histograms of signals showing in b, e, h and k. The sample corresponding to SRR891270 was optimally transposed by Tn5, preloaded with sequencing adapters, while the sample resulting in SRR580802 was over-transposed. The other two datasets were resulted from sub-optimal transposition. Biased size selection could have occurred during library preparation for SRR5720369. Shown here are signals around TSSs on the human chromosomes 1 and 2
Fig. 2Read distribution along genomic regions containing housekeeping genes for the optimal (SRR891270), near optimal (SRR3295017 and SRR5720369) and over-transposed (SRR580802) ATAC-seq libraries. (a) C1orf43; (b) CHMP2A; (c) EMC7; (d) GPI; and (e) PSMB2
Fig. 3CTCF footprints inferred from the four representative ATAC-seq datasets: SRR891270 (a), SRR3295017 (b), SRR5720369 (c) and SRR580802 (d). Shown here are aggregated CTCF footprints on the human chromosomes 1 and 2
Fig. 4Sequence depth analysis and library complexity evaluation. It is important to know that it is not meaningful to perform saturation analysis of sequencing depth or library complexity for over-transposed ATAC-seq assays. (a) Total peak number-based saturation analysis of sequencing depth for SRR891270. Sequenced fragments in the filtered BAM file (called effective fragments here) are subsampled to get 10%, 20%, 30%…, 80% and 90% of total effective fragments. Broad peaks were called for each subsample and the full dataset using MACS2. The numbers of significant peaks (FDR ≤ 0.05) are plotted against the corresponding numbers of effective fragments. A smooth curve is fitted by using the geom_loess function in the ggplot2 package. The gray band shows the 95% confidence interval of the predicted peak numbers. (b) Total peak width-based saturation analysis of sequencing depth for SRR891270. The same procedure is used to fit the saturation curve except that the total width of significant peaks (FDR ≤ 0.05) for each subsample and the full dataset is used. (c) Library complexity analysis results for SRR891269-SRR891271, three biological replicates using 50 K cells, and for SRR891272 and SRR891274, two biological replicates using 500 cells. Number of distinct fragments was estimated for each given number of putative sequenced fragments free of mitochondrial reads