Literature DB >> 28550296

Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9.

Lindsey Montefiori1, Liana Hernandez1, Zijie Zhang1, Yoav Gilad1, Carole Ober1, Gregory Crawford2, Marcelo Nobrega3, Noboru Jo Sakabe4.   

Abstract

ATAC-seq is a high-throughput sequencing technique that identifies open chromatin. Depending on the cell type, ATAC-seq samples may contain ~20-80% of mitochondrial sequencing reads. As the regions of open chromatin of interest are usually located in the nuclear genome, mitochondrial reads are typically discarded from the analysis. We tested two approaches to decrease wasted sequencing in ATAC-seq libraries generated from lymphoblastoid cell lines: targeted cleavage of mitochondrial DNA fragments using CRISPR technology and removal of detergent from the cell lysis buffer. We analyzed the effects of these treatments on the number of usable (unique, non-mitochondrial) reads and the number and quality of peaks called, including peaks identified in enhancers and transcription start sites. Both treatments resulted in considerable reduction of mitochondrial reads (1.7 and 3-fold, respectively). The removal of detergent, however, resulted in increased background and fewer peaks. The highest number of peaks and highest quality data was obtained by preparing samples with the original ATAC-seq protocol (using detergent) and treating them with CRISPR. This strategy reduced the amount of sequencing required to call a high number of peaks, which could lead to cost reduction when performing ATAC-seq on large numbers of samples and in cell types that contain a large amount of mitochondria.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28550296      PMCID: PMC5446398          DOI: 10.1038/s41598-017-02547-w

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

ATAC-seq aims at identifying DNA sequences located in open chromatin, i.e., genomic regions whose chromatin is not densely packaged and that can be more easily accessed by proteins than closed chromatin. The ATAC-seq technique makes use of the Tn5 transposase, an optimized hyperactive transposase that fragments and tags the genome with sequencing adapters in regions of open chromatin[1]. The output of the experiment is millions of DNA fragments that can be sequenced and mapped to the genome of origin for identification of regions where sequencing reads concentrate and form “peaks”. While ATAC-seq often generates high-quality data with low background, certain cell types and tissues yield an enormous fraction (typically 20–80%) of unusable sequences of mitochondrial origin. In order to reduce the amount of wasted sequencing reads, targeted cleavage of DNA fragments has recently been used to deplete mitochondrial ribosomal RNA-derived fragments in RNA-sequencing libraries[2]. In another study, Wu et al. targeted the mitochondrial genome in ATAC-seq experiments using 114 guide RNAs (gRNAs) and observed a ~50% decrease in mitochondrial reads and no adverse modification of the read enrichment pattern[3]. To analyze the effect of this approach on the quality of the data, we designed 100 gRNAs targeting the human mitochondrial chromosome every ~250 base pairs and treated lymphoblastoid cell line (LCL) ATAC-seq sequencing libraries with these gRNAs and Cas9 enzyme[4], hereafter referred to as anti-mt CRISPR. We compared this method to a modified ATAC-seq protocol that also aims at reducing mitochondrial reads by removing detergent from the cell lysis step, which is believed to prevent lysis of the mitochondrial membrane[5]. We observed that while both methods considerably reduced the number of mitochondrial reads sequenced, each method displayed different effects on the number of peaks called. Whereas the removal of detergent from the lysis buffer had the largest effect in reducing mitochondrial reads, it resulted in decreased quality of the ATAC-seq libraries, as measured by the number of peaks called at a given sequencing depth, the total number of reads in peaks, and the fraction of transcription start sites (TSS’s) and enhancers identified. Conversely, in addition to decreasing the number of mitochondrial reads, the anti-mt CRISPR treatment also resulted in a greater number of peaks, a greater number of reads in peaks, and higher overlap of peaks with TSS’s and enhancers. Performing anti-mt CRISPR requires the one-time purchase of gRNA template oligos, as well as purchase of the Cas9 enzyme. However, the gRNAs can be generated from template DNA oligos indefinitely and shared as a community resource, potentially trivializing the up-front cost. Laboratories generating large numbers of ATAC-seq experiments on cell types that yield a high fraction of mitochondrial reads could benefit from mitochondrial depletion to decrease the cost of sequencing.

Results

The anti-mt CRISPR treatment consisted of 100 gRNAs targeting the human mitochondrial genome at regular intervals, which is usually densely covered by ATAC-seq reads generated from LCLs, as shown in Fig. 1. The rationale was to cleave targeted DNA fragments in the sequencing library, rendering them unable to bind and amplify on the Illumina HiSeq flow cell. Similarly to Gu et al.[2] and Wu et al.[3], we chose to treat the final (PCR amplified) sequencing library with the gRNA/Cas9 mix instead of the unamplified tagmented DNA because of the small amount of DNA present in the sample at this earlier step. Although treating the samples before PCR amplification might result in lower fractions of mitochondria, we chose the conservative approach of treating larger amounts of DNA to reduce technical variability.
Figure 1

ATAC-seq read densities in the mitochondrial chromosome and one nuclear genome region. Top: The mitochondrial chromosome (chrM) is densely covered by uniquely mapped reads. Genomic location of the 100 mitochondrial guide RNAs (red tick marks) designed to target the human mitochondrial chromosome (top). Bottom: compare chrM to a 16.5 kb region of the nuclear genome (hg38, chr9:90,791,567–90,808,137). The chrM and chr9 tracks are shown in different height scales for easier visualization. Seven samples were pooled and 227 M reads were sampled.

ATAC-seq read densities in the mitochondrial chromosome and one nuclear genome region. Top: The mitochondrial chromosome (chrM) is densely covered by uniquely mapped reads. Genomic location of the 100 mitochondrial guide RNAs (red tick marks) designed to target the human mitochondrial chromosome (top). Bottom: compare chrM to a 16.5 kb region of the nuclear genome (hg38, chr9:90,791,567–90,808,137). The chrM and chr9 tracks are shown in different height scales for easier visualization. Seven samples were pooled and 227 M reads were sampled. To develop and analyze the anti-mt CRISPR treatment, we used 50,000 human LCL cells per sample and generated a total of 27 pairs of ATAC-seq libraries for Illumina high-throughput sequencing according to the protocol of Buenrostro et al.[6](Array Express accession number E-MTAB-5205 and Supplemental File 2). We split each of the 27 libraries into two equal parts, leaving one half untreated and treating the other half with 100 mitochondrial gRNAs and Cas9. Due to the single turn-over nature of Cas9, Gu et al.[2] used an excess of enzyme and of gRNA to deplete mitochondrial ribosomal DNA. Based on this notion, we used 100X Cas9 and 100X gRNA excess. We assumed 50% mtDNA fragments in the PCR-amplified ATAC-seq library to calculate exact amounts to be used in the treatment (see Methods). We obtained between 9.8 M and 108.6 M reads per sample in four batches of experiments. Because different numbers of reads were sequenced from each sample due to imprecision in DNA quantification and the number of multiplexed samples, we randomly sampled a fixed number of sequenced reads from each library in order to compare across samples. This approach allowed us to assess which library preparation method yielded the best results regardless of how it affected the number of aligned or usable (unique, non-mitochondrial) reads. After aligning reads to the human genome, we removed mitochondrial reads and reads aligned to identical coordinates and called peaks using HOMER[7] and MACS2[8]. Qualitatively similar results were obtained with both peak callers at three read depths (9.8 M (54 samples), 17 M (52 samples) and 21.9 M (47 samples)) and using different parameters to call peaks (Supplemental File 1). The results reported in the figures were obtained with MACS2 using custom parameters and 21.9 M sequenced reads. Results for all other read depths and parameters are presented in Supplemental Figs S1 and S2 and Supplemental Tables S1 and S2 in Supplemental File 1. Figure 2 shows the comparison between 14 ATAC-seq samples before and after treatment with anti-mt CRISPR, using the original ATAC-seq protocol that includes detergent (DT). Visual inspection of the data showed that the untreated and treated samples were similar (Fig. 2a), indicating that the treatment did not damage the samples. As expected, the anti-mt CRISPR treatment resulted in depletion of mitochondrial reads, while the number of reads in the nuclear genome increased (Fig. 2b).
Figure 2

ATAC-seq was performed on human lymphoblastoid cells and half of each sample was left untreated (green) and the other half was treated with anti-mt CRISPR (purple). (a) Representative genomic region (hg38, chr2:74,425,417–74,586,546) showing read counts (usable reads) in 5 replicate pairs (DT) at the same sequencing depth of 21.9 M reads. Differences between treated and untreated samples were minimal, indicating that the treatment did not damage the samples. (b) ATAC-seq reads in the mitochondrial chromosome and in a 16.5 kb region of chromosome 9 around the SYF promoter (same as Fig. 1). For each condition, all samples were pooled together and 227 M reads were sampled. (c) Treated samples yielded 1.7-fold fewer mitochondrial reads compared to untreated samples. (d) Accordingly, the number of unique, non-mitochondrial (usable) reads was 1.7-fold higher in treated samples than in their untreated counterparts. (e) At the same sequencing depth, 1.6-fold more peaks were called in the treated samples. Only 6 data points are shown because the treated halves of samples 18 and 19 (same batch) had only 14.5 M and 9.8 M reads each and were combined for improved peak calling. (f) Anti-mt CRISPR-treated samples shared a similar number of peaks with treated replicates and untreated samples. The top 20,000 peaks of each sample were used in this analysis. Comparison of peaks at the read count level also supports that peaks from treated samples do not substantially differ from untreated samples. Fold-differences were calculated on the medians. (c–e): all samples normalized to 21.9 M sequenced reads.

ATAC-seq was performed on human lymphoblastoid cells and half of each sample was left untreated (green) and the other half was treated with anti-mt CRISPR (purple). (a) Representative genomic region (hg38, chr2:74,425,417–74,586,546) showing read counts (usable reads) in 5 replicate pairs (DT) at the same sequencing depth of 21.9 M reads. Differences between treated and untreated samples were minimal, indicating that the treatment did not damage the samples. (b) ATAC-seq reads in the mitochondrial chromosome and in a 16.5 kb region of chromosome 9 around the SYF promoter (same as Fig. 1). For each condition, all samples were pooled together and 227 M reads were sampled. (c) Treated samples yielded 1.7-fold fewer mitochondrial reads compared to untreated samples. (d) Accordingly, the number of unique, non-mitochondrial (usable) reads was 1.7-fold higher in treated samples than in their untreated counterparts. (e) At the same sequencing depth, 1.6-fold more peaks were called in the treated samples. Only 6 data points are shown because the treated halves of samples 18 and 19 (same batch) had only 14.5 M and 9.8 M reads each and were combined for improved peak calling. (f) Anti-mt CRISPR-treated samples shared a similar number of peaks with treated replicates and untreated samples. The top 20,000 peaks of each sample were used in this analysis. Comparison of peaks at the read count level also supports that peaks from treated samples do not substantially differ from untreated samples. Fold-differences were calculated on the medians. (c–e): all samples normalized to 21.9 M sequenced reads. At the same sequencing depth, the anti-mt CRISPR-treated samples yielded considerably less mitochondrial reads (Fig. 2c). This result is similar to the level of reduction of ~50% reported by Wu et al.[3]. Consequently, more usable reads (non-mitochondrial reads with unique coordinates), were generated (Fig. 2d and Supplementary Fig. S1). The increased number of usable reads resulted in 50% more peaks in the treated halves of all samples (Fig. 2e and Supplementary Fig. S2), demonstrating the importance of removing excess mitochondrial reads from ATAC-seq samples. One concern when treating samples with CRISPR/Cas9 was whether off-target gRNA/Cas9 activity would affect the data to a significant extent. To address this issue, we compared the percentage of peaks common across replicates (1 bp overlap) and across anti-mt CRISPR-treated and untreated samples. Figure 2f shows that the degree of overlap between untreated and treated samples was not smaller than the degree of overlap between replicates of the same condition, indicating that the anti-mt CRISPR treatment did not cause loss of peaks or create artefactual peaks. This observation is in accordance with a previous report that CRISPR treatment of sequencing libraries did not modify the read enrichment pattern[3]. We also found evidence that the anti-mt CRISPR-treated samples identified more transcription start sites and enhancers than untreated samples (see below), indicating that mtDNA cleavage did not negatively affect the data. Analysis of samples normalized by the number of usable reads instead of total number of reads sequenced (see below), corroborates the idea that the anti-mt CRISPR does not damage ATAC-seq samples.

Effect of removing detergent from the original ATAC-seq protocol

Another method that has been used to reduce the fraction of mitochondrial reads is the removal of detergent from the cell lysis step of the ATAC-seq protocol[5]. We generated seven ATAC-seq samples with no detergent (ND) and observed several differences compared to the original protocol with detergent (DT). Interestingly, the fraction of unique reads was considerably higher in ND samples compared to DT samples (56.5% vs. 32.5%, respectively), which could reflect a lower fraction of mitochondrial fragments before PCR amplification of the sequencing library. In addition, the fraction of reads uniquely aligned to the genome was slightly higher in ND samples, compared to samples prepared with the original protocol (83.6% vs. 74.6%, respectively). This difference is due to discarding reads that map to both mitochondrial and nuclear genomes (6% of ND reads and 17% of DT reads) in order to retain only uniquely aligned reads. Because we started our analyses with the same number of sequenced reads, these differences in mappability were accounted for in our comparisons. Figure 3 shows that the removal of detergent had a pronounced depletive effect on mitochondrial reads compared to untreated DT libraries (Fig. 3a) and consequently increased the fraction of usable reads (Fig. 3b). Despite this 2.4-fold increase of ND usable sequences (0.45/0.19), the number of peaks called was higher by only 1.04-fold compared to untreated DT samples (Fig. 3c, 26,651/25,664). The mean fold-difference using other parameters to call peaks and read depths was higher at 1.2-fold, but still substantially lower than the increase in usable reads (Supplemental Table S1). This difference could be due to the increased background in ND samples (Fig. 3d and 3e), as suggested previously[5]. We considered background reads as reads that were not mitochondrial and were not in any ATAC-seq peak identified in any DT or ND sample. The lower signal/noise ratio in ND samples (Fig. 3d) provides an explanation for why fewer peaks were identified in ND samples. Thus, although removing detergent from the lysis buffer increased the overall number of non-mitochondrial reads, the background read coverage also increased, resulting in fewer peaks called at the same sequencing depth compared to the original protocol.
Figure 3

Effect of detergent removal from the ATAC-seq protocol. (a) The fraction of mitochondrial reads in samples prepared without detergent was considerably smaller than those prepared with the original protocol. Treatment with anti-mt DNA CRISPR led to further decrease of mitochondrial reads (3.1-fold). (b) The fraction of unique, non-mitochondrial reads was considerably higher when detergent was not used. Surprisingly, the anti-mt DNA CRISPR treatment had only a marginal effect on the fraction of usable reads (1.1-fold increase). (c) At the same sequencing depth, only 1.1-fold more peaks were called in the ND treated samples. DT samples 18 and 19 were combined as in Fig. 2c. (d) ND samples displayed higher background (the number of non-mitochondrial reads outside peaks identified in any DT or ND sample). Numbers in blue above the bars are the ratio between number of reads in peaks and the number of background reads (signal/noise) (e) An example illustrating the higher background in ND samples, highlighted by the dashed boxes (chr6:420,146–448,555). Fold-differences calculated on medians.

Effect of detergent removal from the ATAC-seq protocol. (a) The fraction of mitochondrial reads in samples prepared without detergent was considerably smaller than those prepared with the original protocol. Treatment with anti-mt DNA CRISPR led to further decrease of mitochondrial reads (3.1-fold). (b) The fraction of unique, non-mitochondrial reads was considerably higher when detergent was not used. Surprisingly, the anti-mt DNA CRISPR treatment had only a marginal effect on the fraction of usable reads (1.1-fold increase). (c) At the same sequencing depth, only 1.1-fold more peaks were called in the ND treated samples. DT samples 18 and 19 were combined as in Fig. 2c. (d) ND samples displayed higher background (the number of non-mitochondrial reads outside peaks identified in any DT or ND sample). Numbers in blue above the bars are the ratio between number of reads in peaks and the number of background reads (signal/noise) (e) An example illustrating the higher background in ND samples, highlighted by the dashed boxes (chr6:420,146–448,555). Fold-differences calculated on medians. Treating the ND samples with anti-mt CRISPR, i.e. combining anti-mt CRISPR treatment with the detergent-free lysis buffer, led to a 3.1-fold decrease in the fraction of mitochondrial reads compared to untreated ND samples (Fig. 3a, 0.16/0.05). However, unlike DT samples, the fraction of unique, non-mitochondrial reads increased only slightly (Fig. 3b, median fold-change: 1.1), probably because the fraction of mitochondrial reads was already small. Additionally, the effect of the anti-mt CRISPR treatment was inconsistent, with three samples showing a decrease in the fraction of usable reads and four showing an increase (Fig. 3b, dashed lines). When calling peaks in anti-mt CRISPR-treated ND samples, this inconsistency was also observed in some of the comparisons performed with different peak calling parameters and read depths, with some of the samples showing an increase in the number of peaks over their untreated counterparts, while other samples had the opposite effect (Supplemental Fig. S1). When comparing the effect of the anti-mt CRISPR treatment between ND and DT samples, the former underperformed DT samples in terms of peaks called by 1.3-fold fewer peaks (median of 30,182 vs. 40,682, respectively). Therefore, combining the anti-mt CRISPR treatment with removal of detergent from the lysis buffer did not provide substantial gains over the original protocol with detergent that was treated with anti-mt CRISPR. In addition to the number of peaks called in the different treatments, we evaluated the quality of peaks using two other parameters: (i) the fraction of Gencode[9] transcription start sites (TSS’s) (Fig. 4a) and (ii) the fraction of Epigenome Roadmap[10] annotated enhancers overlapping peaks (Fig. 4b and Supplemental Fig. S2). The highest fraction of TSS’s and enhancers was identified in samples treated with anti-mt CRISPR, regardless of whether they were generated with or without detergent. Whereas both anti-mt CRISPR-treated DT and ND samples identified similar numbers of TSS’s, enhancers were identified at a higher rate using detergent in conjunction with the anti-mt CRISPR treatment. This difference could be explained by the notion that chromatin tends to be more open in promoters to allow transcription, while enhancers, due to their dynamic nature, would be less accessible. In this scenario, finding enhancers requires lower background and higher quality data, which we have shown is best represented by the detergent anti-mt CRISPR samples.
Figure 4

Comparison of the fraction of functional regions overlapping ATAC-seq peaks. (a) The fraction of transcription start sites (TSS’s) overlapping an ATAC-seq peak (+/− 1 kb) was slightly higher in the DT samples than in the ND samples (1.05-fold). (b) Treated DT samples identified a greater number of Epigenome Roadmap GM12878 lymphoblastoid cell active enhancers (1.9-fold) than anti-mt CRISPR untreated ND samples. Fold-differences calculated on medians.

Comparison of the fraction of functional regions overlapping ATAC-seq peaks. (a) The fraction of transcription start sites (TSS’s) overlapping an ATAC-seq peak (+/− 1 kb) was slightly higher in the DT samples than in the ND samples (1.05-fold). (b) Treated DT samples identified a greater number of Epigenome Roadmap GM12878 lymphoblastoid cell active enhancers (1.9-fold) than anti-mt CRISPR untreated ND samples. Fold-differences calculated on medians. To further investigate differences caused by the anti-mt CRISPR treatment and by removing detergent, we normalized samples by the number of usable reads (Fig. 5), instead of total sequenced reads (Figs 2, 3 and 4). Figure 5 shows that at 6.2 M usable reads, ND samples clearly underperformed DT samples. It also shows that the anti-mt CRISPR treatment removed mitochondrial reads without altering the samples in other ways, since the number of peaks identified, fraction of reads in peaks, fraction of TSS and enhancers identified is the same. Notice that 34 M reads from DT samples (median) were necessary to obtain 6.2 M usable reads, while only 17 M reads from DT samples treated with anti-mt CRISPR were necessary to obtain the same number.
Figure 5

Comparison of ATAC-seq samples normalized by total number of usable reads instead of total number of sequenced reads. (a) The number of peaks is higher in DT than in ND samples. (b) The fraction of TSS and (c) enhancers identified by ATAC-seq peaks is higher in DT than in ND samples. (d) DT samples have more reads in peaks than ND samples. The differences between samples treated with anti-mt CRISPR and left untreated are not statistically significant, showing that the anti-mt CRISPR treatment does not damage the samples.

Comparison of ATAC-seq samples normalized by total number of usable reads instead of total number of sequenced reads. (a) The number of peaks is higher in DT than in ND samples. (b) The fraction of TSS and (c) enhancers identified by ATAC-seq peaks is higher in DT than in ND samples. (d) DT samples have more reads in peaks than ND samples. The differences between samples treated with anti-mt CRISPR and left untreated are not statistically significant, showing that the anti-mt CRISPR treatment does not damage the samples. We conclude that reducing mitochondrial reads by cleavage of DNA sequencing fragments using an anti-mt CRISPR strategy yielded the best results in terms of numbers of peaks identified and their quality at the same sequencing depth.

Other treatments

Given the success of using CRISPR/Cas9 to reduce the amount of mitochondrial reads, we tested modifications of the treatment to enhance the degree of depletion of mitochondrial reads (Fig. 6). We tested (i) a longer Cas9 incubation of 2 hours instead of 1 hour, (ii) addition of Cas9 for an additional 1 hour after the initial 1 hour treatment (Cas9 boost) and (iii) adding 40X, 200X and 400X Cas9 instead of 100X. None of the treatments led to enhanced depletion of mitochondrial reads and, intriguingly, the 200X and 400X Cas9 treatments performed poorer than the 100X treatment (Fig. 6).
Figure 6

Modifications of the anti-mt CRISPR treatment. Compared to the treatment shown in Fig. 1 (100X gRNA, 100X Cas9, 1 h incubation), labeled “standard”, modifications in the treatment did not show improvement. The number of peaks is comparable or even lower in the modified treatments, compared to the standard treatment. Due to the low number of reads in 6 samples, the results presented were obtained with 9.8 M reads randomly sampled. See also Supplemental Fig. S3.

Modifications of the anti-mt CRISPR treatment. Compared to the treatment shown in Fig. 1 (100X gRNA, 100X Cas9, 1 h incubation), labeled “standard”, modifications in the treatment did not show improvement. The number of peaks is comparable or even lower in the modified treatments, compared to the standard treatment. Due to the low number of reads in 6 samples, the results presented were obtained with 9.8 M reads randomly sampled. See also Supplemental Fig. S3. We did not test a larger number of gRNAs targeting the mitochondrial genome, but it is likely that using 200 gRNAs instead of 100, for example, could further reduce the fraction of mitochondrial reads. However, as guide RNAs are priced per unit, the cost of the treatment increases linearly with the number of targets.

Discussion

Our CRISPR/Cas9 treatment targeting 100 loci of the human mitochondrial chromosome successfully reduced the number of mitochondrial reads in LCLs by 1.7-fold, similarly to Wu et al.[3], and increased the number of usable reads by 1.6-fold. Consequently, at the same read depth, samples generated with the original ATAC-seq protocol (DT) and treated with CRISPR/Cas9 and anti-mt gRNAs resulted in 1.6-fold more peaks than the untreated controls. More TSS’s and enhancers were identified by peaks called in the treated samples, showing that the treatment increases the signal and does not induce unwanted changes in the data. Removing detergent from the cell lysis step (ND) resulted in even lower number of mitochondrial reads (3.1-fold), but the peaks called were fewer and of lower quality. While the anti-mt CRISPR treatment improved ND samples, resulting in increased number of peaks called, it did not improve over DT samples. We observed more variability in treated ND samples than DT samples, as well as higher background, lower number of peaks and lower overlap with LCL enhancers. In conclusion, our data show that treating samples prepared using detergent with gRNAs/Cas9 targeting mtDNA was the best way to reduce mtDNA contamination in LCLs, increase the number of peaks, and improve identification of features such as TSS’s and enhancers. Given the cost of gRNA oligos, sacrificing sequencing reads may be more economical than depleting mitochondrial reads if only a few samples are generated. In Supplemental File S3 we provide a cost calculator based on the numbers obtained in this study and the cost of one lane of sequencing at the University of Chicago Functional Genomic Core Facility. As we have not tested the anti-mt CRISPR treatment and detergent removal in other cell types and cell lines, it is possible that different results may be obtained in other systems, which will affect the cost. Caution should be taken when multiplexing anti-mt CRISPR-treated samples with samples that have not been treated. Treated samples will yield fewer sequencing reads unless a higher library concentration is used relative to other untreated samples. This is because the cleaved mitochondrial fragments will remain in the library but will not be sequenced since they cannot be amplified by bridge amplification. Sequencing a full lane of samples treated the same way does not require any adjustments. During the execution of this project, an improved ATAC-seq method was published, termed Fast-ATAC[11], which uses a milder detergent in the cell lysis buffer. This treatment was reported to decrease the fraction of mitochondrial reads from 50% to 11%, while increasing the enrichment of reads in peaks over background and yielding more fragments per cell. The authors noted that cells that are more resistant to lysis may require a stronger detergent, i.e., the original ATAC-seq protocol, in which case, using the CRISPR treatment we analyzed here will remain useful. Since the cost of gRNAs is fixed and can be distributed among multiple laboratories, reducing mtDNA contamination using an anti-mt CRISPR treatment could still lead to significant savings if large numbers of samples are generated.

Methods

Human lymphoblastoid cell line growth and harvesting

Human lymphoblastoid cell line NA19193 was obtained from Coriell Cell Repository. Cells were grown in RPMI 1640 medium lacking L-Glutamine (Corning), supplemented with 15% fetal bovine serum, 1% GlutaMAX (ThermoFisher) and 1% penicillin-streptomycin solution (ThermoFisher) at a density of 0.5 × 106 to 1.0 × 106 cells/mL. Cells were passaged every 2–3 days to maintain this density. Cells were harvested for ATAC-seq by centrifugation at 500 x g for 5 minutes at 4 °C and resuspended in PBS. Cells were counted using a hemocytometer and 50,000 cells were immediately placed into a 1.5 mL Eppendorf tube for ATAC-seq.

Preparation of ATAC-seq libraries

ATAC-seq libraries were generated according to the protocol of Buenrostro et al.[6] with minor changes. Instead of NEB Next High-Fidelity 2X PCR Master Mix, we used Q5 Hot Start High-Fidelity 2X Master Mix (New England Biolabs). Following PCR-amplification, instead of using a column to clean the reaction, we used a 0.8X Ampure bead purification and eluted the library with 20 μL nuclease-free water. For the ND samples, Igepal-CA630 was removed from the lysis buffer and replaced with water. One microliter of the library was used to run a high sensitivity Bioanalyzer to determine fragment size distribution and concentration.

Anti-mitochondrial CRISPR/Cas9 treatment

To deplete ATAC-seq libraries of DNA fragments derived from the human (hg38) mitochondrial genome, 100 high-quality guide RNAs that specifically targeted the mitochondrial genome roughly every 250 base pairs were chosen using the gRNA design tool at http://crispr.mit.edu (full list of guide sequences is in Supplemental File 2). Full-length guide RNAs were designed according to Gu et al.[2] and generated from single-stranded oligo templates (Integrated DNA Technologies) according to Linet al.[12]. Briefly, each oligo consisted of the sequence 5′-TAATACGACTCACTATAG(N20)GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT-3′ where N20 corresponds to the 20 nucleotide guide RNA seed sequence. The PAM sequence would occur at the 3′ end of the N20sequence. Oligos were purchased as a 200 picomole plate from Integrated DNA Technologies and received as a lyophilized pool. They were resuspended in 1 mL TE 1.0 buffer (10 mM Tris-HCl, 0.1 mM EDTA). A Nanodrop was used to determine the concentration of the oligos and 8 ng were used as template for PCR to make them double-stranded. The PCR reaction consisted of 4 μL 5X HF Buffer (New England Biolabs), 0.4 μL 10 mM dNTPs, 1 μL of each 10 μM primer (For: 5′-TAATACGACTCACTATAG, Rev: 5′-AAAAAAAGCACCGACTCGGTGC), 0.2 μL Phusion High-Fidelity DNA Polymerase (New England Biolabs) and nuclease-free water to a final volume of 20 μL. Thermocycler conditions were 98 °C for 30 s, followed by 30 cycles of 98 °C for 10 s, 56 °C for 10 s, 72 °C for 10 s, and then a final extension of 72 °C for 5 minutes. The reaction was cleaned using a Qiagen MinElute Purification kit and eluted in 10 μL of nuclease-free water. Enough PCR reactions were performed to obtain 1 μg of double-stranded template (should be in a volume of less than 8 μL). Transcription was carried out on 1 μg of template using the MEGAshortscript T7 Transcription kit (Thermo Fisher) following manufacturer’s instructions and then cleaned with the MEGAclear Transcription Clean-Up kit (Thermo Fisher). gRNAs were eluted from the column with 50 μL of RNase-free water and the concentration was determined using a Nanodrop, aliquoted and stored at −80 °C. We estimated that half of the DNA in each library was of mitochondrial origin, thus a 20 nM ATAC-seq library contained a mtDNA target concentration of 10 nM. Based on this value, 40, 100, 200, or 400 molar excess of Cas9 enzyme was used (New England Biolabs #M0386M) along with 100 molar excess gRNAs in a 30 μL reaction. The reaction was set up according to the protocol for Cas9 from S. pyogenes (New England Biolabs #M0386M). Briefly, the appropriate amounts of Cas9 enzyme and gRNAs were mixed with 3 μL of 10X Cas9 Buffer and water to a final volume of 22 μL. This was incubated at 25 °C for 10 minutes and then 8 μL of the ATAC-seq library was added and the reaction was incubated at 37 °C for one hour. For the two-hour treatment, the incubation was extended an additional hour; for the “Cas9 boost” treatment, the same amount of Cas9 enzyme was added after 1 hour of incubation and left for an additional hour. Reactions were subsequently treated with 1 μL of 20 mg/mL proteinase K for 15 minutes and purified using a Qiagen MinElute kit followed by elution in 10 μL nuclease-free water. Treated libraries were run on a high sensitivity Bioanalyzer chip to assess fragment size distribution and concentration (Supplemental Fig. S4). Because the multiplexing barcodes are added before treatment, for each batch of experiments, samples were sequenced on two lanes of an Illumina Hi-Seq 4000 instrument, separating anti-mt-CRISPR untreated and treated samples.

Peak calling

Illumina reads were trimmed using cutadapt[13] and aligned to hg38 with Bowtie 2 version 2.2.3[14] with default parameters. Reads with mapping quality lower than 10 were discarded. Mitochondrial reads and reads aligned to the same coordinates were removed. HOMER version 4.8.3 was run with 3 sets of parameters: (i) “default”: -style dnase -gsize 2.5e9, (ii) “ENCODE”: ‐localSize 50000 –size 150 –minDist 50 -fragLength 0 (https://www.encodeproject.org/pipelines/ENCPL035XIO/), (iii) “custom”: -gsize 2.5e9 -F 2 -L 2 -fdr 0.005 -region. MACS2 version 2.1.0 was run with 2 sets of parameters: (i) “default”: –nomodel –shift −100 –extsize 200 -q 0.01, (ii) “custom”: –nomodel –llocal 20000 –shift −100 –extsize 200.

Fraction of TSS and enhancers intersecting peaks

Transcription start sites were obtained from the Gencode[9] GRCh38 basic set (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.basic.annotation.gtf.gz), totaling 106,926 2 kb intervals centered on the TSS, and intersected with ATAC-seq peaks using bedtools intersect with the -u option[15]. Epigenome Roadmap[10] 15-state ChromHMM coordinates were obtained from http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/all.mnemonics.bedFiles.tgz. Coordinates were converted to hg38 using the UCSC Genome Browser liftOver tool and active enhancer (Enh7) states were intersected with peaks using bedtools with the -u option.

Mean fraction of common peaks and mean Pearson’s R2 of read counts

We ranked peaks called by MACS2 by –log10(qvalue) and used bedtools intersect to count the number of peaks common between the top 20,000 peaks of each sample. The fraction presented in Fig. 2f is the mean fraction of peaks common between samples of a given group (e.g. treated vs. treated). To calculate the degree of similarity of read counts in ATAC-seq peaks, we merged all peaks from all samples and counted the number of reads in each peak in each sample. We then calculated the R2of the read counts per peak in pairs of samples in each group (e.g. treated vs. treated) and obtained the mean per group presented in Fig. 2f.

Statistical tests

Due to the small number of replicates, we chose the more conservative Wilcoxon rank sum test to compare treatments in the boxplots shown (R statistical package version 3.3.1)[16]. Student t-tests were in agreement with the results presented, yielding smaller P-values. Paired tests were used to compare treated/untreated pairs and unpaired tests were used to compare samples prepared with and without detergent. One-sided P-values are presented, since we were interested in specific directions of change. Two-tailed P-values do not change our conclusions. Fold-differences of DT samples were calculated pairwise and the median was reported. Fold-differences of DT versus ND samples were calculated on the median of each group.

Ethics Statement

The NA19193 cell line was purchased from Coriell Cell Repository. The original samples were collected by the HapMap project in between 2001–2005. All of the samples were collected with extensive community engagement, including discussions with members of the donor communities about the ethical and social implications of human genetic variation research. Donors gave broad consent to future uses of the samples, including their use for extensive genotyping and sequencing, gene expression and proteomics studies, and all other types of genetic variation research, with the data publicly released. All methods were carried out in accordance with known guidelines and regulations. Supplemental figures and tables Sample information Cost estimator
  13 in total

1.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

2.  The landscape of accessible chromatin in mammalian preimplantation embryos.

Authors:  Jingyi Wu; Bo Huang; He Chen; Qiangzong Yin; Yang Liu; Yunlong Xiang; Bingjie Zhang; Bofeng Liu; Qiujun Wang; Weikun Xia; Wenzhi Li; Yuanyuan Li; Jing Ma; Xu Peng; Hui Zheng; Jia Ming; Wenhao Zhang; Jing Zhang; Geng Tian; Feng Xu; Zai Chang; Jie Na; Xuerui Yang; Wei Xie
Journal:  Nature       Date:  2016-06-15       Impact factor: 49.962

3.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities.

Authors:  Sven Heinz; Christopher Benner; Nathanael Spann; Eric Bertolino; Yin C Lin; Peter Laslo; Jason X Cheng; Cornelis Murre; Harinder Singh; Christopher K Glass
Journal:  Mol Cell       Date:  2010-05-28       Impact factor: 17.970

4.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

5.  ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide.

Authors:  Jason D Buenrostro; Beijing Wu; Howard Y Chang; William J Greenleaf
Journal:  Curr Protoc Mol Biol       Date:  2015-01-05

6.  GENCODE: the reference human genome annotation for The ENCODE Project.

Authors:  Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

7.  Model-based analysis of ChIP-Seq (MACS).

Authors:  Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu
Journal:  Genome Biol       Date:  2008-09-17       Impact factor: 13.583

8.  Integrative analysis of 111 reference human epigenomes.

Authors:  Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal:  Nature       Date:  2015-02-19       Impact factor: 69.504

9.  Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery.

Authors:  Steven Lin; Brett T Staahl; Ravi K Alla; Jennifer A Doudna
Journal:  Elife       Date:  2014-12-15       Impact factor: 8.140

10.  Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution.

Authors:  M Ryan Corces; Jason D Buenrostro; Beijing Wu; Peyton G Greenside; Steven M Chan; Julie L Koenig; Michael P Snyder; Jonathan K Pritchard; Anshul Kundaje; William J Greenleaf; Ravindra Majeti; Howard Y Chang
Journal:  Nat Genet       Date:  2016-08-15       Impact factor: 38.330

View more
  21 in total

1.  Integrated Functional Genomic Analysis Enables Annotation of Kidney Genome-Wide Association Study Loci.

Authors:  Karsten B Sieber; Anna Batorsky; Kyle Siebenthall; Kelly L Hudkins; Jeff D Vierstra; Shawn Sullivan; Aakash Sur; Michelle McNulty; Richard Sandstrom; Alex Reynolds; Daniel Bates; Morgan Diegel; Douglass Dunn; Jemma Nelson; Michael Buckley; Rajinder Kaul; Matthew G Sampson; Jonathan Himmelfarb; Charles E Alpers; Dawn Waterworth; Shreeram Akilesh
Journal:  J Am Soc Nephrol       Date:  2019-02-13       Impact factor: 10.121

2.  ATAC-Me Captures Prolonged DNA Methylation of Dynamic Chromatin Accessibility Loci during Cell Fate Transitions.

Authors:  Kelly R Barnett; Benjamin E Decato; Timothy J Scott; Tyler J Hansen; Bob Chen; Jonathan Attalla; Andrew D Smith; Emily Hodges
Journal:  Mol Cell       Date:  2020-01-29       Impact factor: 17.970

3.  Dual detection of chromatin accessibility and DNA methylation using ATAC-Me.

Authors:  Lindsey N Guerin; Kelly R Barnett; Emily Hodges
Journal:  Nat Protoc       Date:  2021-10-18       Impact factor: 13.491

4.  Landscape of Germline and Somatic Mitochondrial DNA Mutations in Pediatric Malignancies.

Authors:  Petr Triska; Kristiyana Kaneva; Daria Merkurjev; Noor Sohail; Marni J Falk; Timothy J Triche; Jaclyn A Biegel; Xiaowu Gai
Journal:  Cancer Res       Date:  2019-02-01       Impact factor: 12.701

5.  Unique and assay specific features of NOMe-, ATAC- and DNase I-seq data.

Authors:  Karl J V Nordström; Florian Schmidt; Nina Gasparoni; Abdulrahman Salhab; Gilles Gasparoni; Kathrin Kattler; Fabian Müller; Peter Ebert; Ivan G Costa; Nico Pfeifer; Thomas Lengauer; Marcel H Schulz; Jörn Walter
Journal:  Nucleic Acids Res       Date:  2019-11-18       Impact factor: 16.971

Review 6.  Analyzing Circulating Tumor Cells One at a Time.

Authors:  Veronica Ortiz; Min Yu
Journal:  Trends Cell Biol       Date:  2018-06-08       Impact factor: 20.808

7.  Analytical Approaches for ATAC-seq Data Analysis.

Authors:  Jason P Smith; Nathan C Sheffield
Journal:  Curr Protoc Hum Genet       Date:  2020-06

8.  A workflow for simplified analysis of ATAC-cap-seq data in R.

Authors:  Ram Krishna Shrestha; Pingtao Ding; Jonathan D G Jones; Dan MacLean
Journal:  Gigascience       Date:  2018-07-01       Impact factor: 6.524

9.  CRISPR-Cap: multiplexed double-stranded DNA enrichment based on the CRISPR system.

Authors:  Jeewon Lee; Hyeonseob Lim; Hoon Jang; Byungjin Hwang; Joon Ho Lee; Junhyuk Cho; Ji Hyun Lee; Duhee Bang
Journal:  Nucleic Acids Res       Date:  2019-01-10       Impact factor: 16.971

Review 10.  Considerations in the analysis of plant chromatin accessibility data.

Authors:  Kerry L Bubb; Roger B Deal
Journal:  Curr Opin Plant Biol       Date:  2020-02-26       Impact factor: 7.834

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.