Literature DB >> 29577419

An optimized FAIRE procedure for low cell numbers in yeast.

David Segorbe¹, Derek Wilkinson¹, Alexandru Mizeranschi¹, Timothy Hughes², Ragnhild Aaløkken², Libuše Váchová³, Zdena Palková¹, Gregor D Gilfillan².

Abstract

We report an optimized low-input FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements-sequencing) procedure to assay chromatin accessibility from limited amounts of yeast cells. We demonstrate that the method performs well on as little as 4 mg of cells scraped directly from a few colonies. Sensitivity, specificity and reproducibility of the scaled-down method are comparable with those of regular, higher input amounts, and allow the use of 100-fold fewer cells than existing procedures. The method enables epigenetic analysis of chromatin structure without the need for cell multiplication of exponentially growing cells in liquid culture, thus opening the possibility of studying colony cell subpopulations, or those that can be isolated directly from environmental samples.

Entities: Chemical Disease Gene Species

Keywords: HTS; NGS; Saccharomyces cerevisiae; chromatin-accessibility; epigenetics

Mesh：

Substances：
Chromatin
Formaldehyde

Year: 2018 PMID： 29577419 PMCID： PMC6099244 DOI： 10.1002/yea.3316

Source DB: PubMed Journal: Yeast ISSN： 0749-503X Impact factor: 3.239

INTRODUCTION

Eukaryotic genomic regions that are accessed by regulatory proteins or the transcription apparatus must be partially released from their chromatin packaging. Accordingly, chromatin accessibility has been recognized as a proxy for actively transcribed genomic regions and regulatory elements (Tsompana & Buck, 2014). A number of genome‐wide assays of chromatin accessibility exist, including MNase‐seq, DNase‐seq, Sono‐seq, FAIRE‐seq and ATAC‐seq (Auerbach et al., 2009; Boyle et al., 2008; Buenrostro, Giresi, Zaba, Chang, & Greenleaf, 2013; Gaulton et al., 2010; Schones et al., 2008). These assays employ physical, chemical or enzymatic methods to release the least tightly packed chromatin fragments (accessible chromatin), or in the case of MNase‐seq, digest the accessible chromatin, leaving the remainder. Libraries of released and isolated DNA fragments can then be prepared for high‐throughput sequencing (the exception being ATAC‐seq, in which case library preparation by transposase insertion is the procedure which releases accessible chromatin; Buenrostro et al., 2013). Mapping of sequencing reads then allows determination of the location and extent of open chromatin. For an informative comparison of the above methods, readers are directed to the review by Tsompana and Buck (2014). FAIRE relies on the differential affinity of nucleosomal vs. free DNA for organic solvent (Hogan, Lee, & Lieb, 2006; Nagy, Cleary, Brown, & Lieb, 2003). Specifically, formaldehyde cross‐linking is employed to fix in vivo protein–DNA interactions, and subsequent sonication employed to fragment and solubilize the chromatin. Histone–DNA interactions are particularly stable, so nucleosomal DNA is cross‐linked to histone proteins with greater efficiency than other protein–DNA interactions. Fixed chromatin fragments are then subjected to repeated phenol‐chloroform extraction, with nucleosomal DNA partitioning with the organic–aqueous interface. In contrast, free DNA molecules (open chromatin) associate with the aqueous phase and are preferentially recovered. FAIRE typically requires fewer input cells than the MNase and DNase I methods, and has the added advantage that less optimization is required (in particular, it avoids enzyme titration; Tsompana & Buck, 2014). However, FAIRE suffers from a lower signal‐to‐noise ratio than other methods, and has largely been supplanted by ATAC‐seq (Buenrostro et al., 2013), which offers an improved signal‐to‐noise ratio, is faster to perform and requires even fewer cells. Nonetheless, FAIRE retains one key advantage of particular relevance in yeast: ATAC‐seq in yeast requires the creation of spheroplasts to permeabilize cells to the Tn5 transposase (Schep et al., 2015). The time and temperatures required to incubate cells for spheroplast creation may allow epigenetic changes to occur, which is not a concern with FAIRE. Of particular relevance, the efficiency of spheroplast formation varies between cell types and ages (Klis, Mol, Hellingwerf, & Brul, 2002). We therefore chose to use the FAIRE assay to study yeast chromatin accessibility. The FAIRE assay was devised and first tested in yeast cells (Giresi, Kim, McDaniell, Iyer, & Lieb, 2007; Nagy et al., 2003). However, as in the initial study, those performing the technique have continued to use relatively large numbers of cells from liquid cultures (Berchowitz, Hanlon, Lieb, & Copenhaver, 2009; Connelly, Wakefield, & Akey, 2014; Hogan et al., 2006; Simon, Giresi, Davis, & Lieb, 2012). We present here an optimized FAIRE procedure that works on low‐input amounts of cells (down to 4 mg of wet biomass, equivalent to a few colonies), removing the need for amplification in liquid culture. We estimate that this represents an ~100‐fold reduction in input relative to previously published procedures.

MATERIALS AND METHODS

Yeast strain and culture

The laboratory strain BY4742 (Matα, his3∆, leu∆, lys2∆, ura∆) was obtained from Euroscarf (http://www.euroscarf.de). For FAIRE experiments, 3‐day‐old colonies grown at 28°C on GMA plates (3% glycerol, 1% yeast extract, 2% agar) were collected in sterile 50/1.5 mL tubes for further processing.

FAIRE assay

Experiments were performed following a modified protocol based on Simon, Giresi, Davis, and Lieb (2013). For detailed instructions on performing the procedure, see the protocol provided as Supporting Information file 1. For the experiments reported here, yeast cells (three independent biological replicates each of 500, 100, 20 and 4 mg) were harvested by collecting multiple colonies by scraping with a spatula and fixed in 200 vols (w/v relative to initial weight harvested) of formaldehyde fixing buffer (1% final concentration formaldehyde, 0.05 m HEPES pH 7.4, 0.1 m NaCl, 10 mm EDTA) for 10 min at 28°C with gentle shaking. Cross‐linking was quenched by adding glycine to 125 mm and continuing incubation for 5 min at room temperature. Cells were collected by centrifugation for 10 min at 4000 at room temperature, and cell pellets were washed three times with 10 mL of phosphate‐buffered saline, pH 7.4. Cells were then resuspended in 17 vols of lysis buffer (10 mm Tris–HCl pH 8.0, 1 mm EDTA, 100 mm NaCl, 2% Triton X100 and 1% SDS) relative to original weights of cell pellets, and transferred to 1.5 mL screw‐cap tubes (up to 250 μL per tube). Chilled acid‐washed beads (Sigma Aldrich, Germany) were added and cells were disrupted with a FastPrep FP120 cell disruptor (MP Biomedicals, Santa Ana, CA, USA), using six cycles of 20 s shaking, with a 2 min pause on ice between cycles. Beads were washed with a further 550 μL lysis buffer, centrifuged for 5 min at 3000 at 4°C, and supernatant recovered. Supernatants were adjusted to 600 μL with additional lysis buffer and divided into two 1.5 mL Bioruptor microtubes (Diagenode, Liege, Belgium). Chromatin was sheared in a Bioruptor Pico sonicator (Diagenode), using three cycles of 30 s on and 30 s off at 4°C to achieve fragments of 300–400 bp, then divided into two aliquots for further processing as input or FAIRE samples. Input DNA samples were de‐cross‐linked by adding RNase A to 0.1 mg/mL and incubating at 37°C for 30 min, followed by addition of Proteinase K to 0.2 mg/mL with overnight incubation at 65°C. Finally, input DNA samples were purified by three rounds of phenol chloroform extraction following by ethanol precipitation. FAIRE DNA samples were first subjected to three rounds of phenol chloroform extraction, followed by ethanol precipitation. Then, chromatin was de‐cross‐linked with RNase A and Proteinase K as above. Finally, FAIRE and input DNA samples were purified using ChIP DNA clean and concentrator columns (Zymo Reseach, Irvine, CA, USA) according to the manufacturer's instructions.

Sequencing

FAIRE and input DNA samples were prepared for sequencing using ThruPLEX reagents (Rubicon Genomics, Ann Arbor, MI, USA) according to the manufacturer's instructions. An aliquot of 2–5 ng FAIRE or input control DNA was used as starting material for library preparation. Sequencing was performed on a NextSeq 500 instrument (Illumina, San Diego, CA, USA) with 75 bp single‐end reads.

Read mapping

The bcbio‐nextgen pipeline v. 1.0.2 (bcbio, n.d.) was used to automate the read mapping and peak calling stages of the analysis. Raw sequence reads were mapped to the reference genome (SacCer3) using the Bowtie2 read mapper, version 2.2.8 (Langmead & Salzberg, 2012), using default settings. Read mapping assessment and quality checking were performed with the following tools: Samtools version 1.4 (H. Li et al., 2009), Sambamba v. 0.6.6 (Tarasov, Vilella, Cuppen, Nijman, & Prins, 2015), BEDTools v. 2.26.0 (Quinlan & Hall, 2010) and FastQC v. 0.11.5 (FastQC, n.d.). Mapped sets of reads were down‐sampled to a fixed level of 10 million reads per sample in order to collect metrics for unmapped, multi‐mapping, single‐mapping and duplicate reads. Unique (non‐duplicate) reads that map to a single genomic location were retained from the original full sets of reads and a further down‐sampling to 3 million reads was performed, in order to enter peak calling with the same coverage level for all samples.

Peak calling

Peak calling was carried out with Bcbio, using MACS2 version 2.1.1.20160309 (Zhang et al., 2008) with the following settings: ‐q 0.1, −m 2 50, −‐bw 186, −g 12000000. The bandwidth parameter was set to the modal value of the size of sheared DNA after sonication. Each FAIRE sample was analysed together with its corresponding input control. Overlapping peaks between each sample group (500, 100, 20 or 4 mg cell starting amounts) were identified using the intersect tool from the BEDTools package (Quinlan & Hall, 2010).

IDR (irreproducible discovery rate) analysis

To perform IDR analysis (Q. Li, Brown, Huang, & Bickel, 2011), peaks called with MACS2 were analysed by the IDR protocol (IDR, n.d.). The number of significant peaks across the replicates for IDR rates of 0–30% was calculated in 0.1% increments.

Mapping of genomic features

A summary of peak distribution over different chromosomal features was prepared using the assignChromosomeRegions function of the ChIPpeakAnno Bioconductor package (Zhu, 2013).

RESULTS AND DISCUSSION

FAIRE‐seq was performed on a series of diminishing starting amounts of yeast cells from 3‐day‐old colonies. The highest input amount, 500 mg cells (wet weight), was chosen as a generous amount from which a high‐quality dataset was anticipated, and is approximately equivalent to the amount of cells that can be expected from 50 mL liquid cultures that have been published to date (Berchowitz et al., 2009; Connelly et al., 2014), estimated to be ~1 × 109 cells. Thereafter, 5‐fold decreasing amounts of cells were used to test the performance limits of the method over a broad range of inputs. The lowest amount tested here, 4 mg, thus lowers the required input cell numbers by ~100‐fold relative to previously published FAIRE procedures. FAIRE was performed on three replicates at each starting cell amount, and the resulting DNA fragments, enriched for open chromatin regions, were analysed by high‐throughput sequencing. For each replicate sample, an input DNA sample (i.e. cross‐linked, then de‐cross‐linked and subjected to DNA isolation) was prepared and sequenced in parallel to control for DNA amplification artefacts that could otherwise be mistaken for FAIRE regions of open chromatin. Sequencing generated 10–30 million reads per sample. To ensure a fair comparison of read yields between samples, all datasets were therefore down‐sampled to 10 million raw reads before further processing. Read mapping revealed a similar distribution of uniquely mapping, duplicate and non‐mapping reads in data from all starting cell amounts (Figure 1). Previous studies of scaled‐down method performance have revealed that lower inputs can result in an abundance of un‐mapped and duplicate reads (Brind'Amour et al., 2015; Gilfillan et al., 2012). However, within the ranges tested here, the input cell number had no effect on read mapping, and all samples produced highly satisfactory numbers of uniquely mapping (non‐duplicate) reads.

Figure 1

Genomic mapping of sequence reads. The proportions of unmapped reads (red), those mapping to single genomic positions (green), and those mapping to multiple locations (repeats, in blue) are shown. Reads mapped to single genomic positions are divided into those present as a unique copy and those present in two or more identical copies (duplicates). Results shown are the mean of three replicates for each method, using 10 million raw reads per replicate. Error bars show the standard deviation from the mean [Colour figure can be viewed at http://wileyonlinelibrary.com] We therefore proceeded to call peaks on the datasets using MACS2 software (Zhang et al., 2008). The corresponding input sample for each FAIRE sample was used as a control when calling peaks. Between 4200 and 6400 peaks were called in the individual samples, which corresponds to approximately one peak per gene in the yeast genome. Peaks were primarily found in promoter regions, consistent with expectations that FAIRE preferentially isolates regulatory DNA elements (Figure 2). Although fewest peaks were called in the 4 mg samples, there was not a consistent drop in peak number as input amounts were decreased (Table 1). There was also no indication in the data that peak calling was less reproducible as cell numbers decreased, as the standard deviation of peaks called attests. Visual browsing of the data in the Integrative Genomics Viewer (Robinson et al., 2011; Figure 3) confirmed the quality of all datasets, and clearly demonstrated a good signal‐to‐noise ratio in FAIRE datasets and scarcity of amplification artefacts in input controls.

Figure 2

Table 1

Peak calling, sensitivity and specificity. Peaks were called using MACS2, using 3 million uniquely mapping non‐duplicate reads per sample

Input amount (mg)	Mean number of peaks called	Mean number of peaks overlapping 500 mg dataset peaks	Mean number of peaks not found in 500 mg dataset	Mean sensitivity (percentage reference peaks detected)	Mean specificity (percentage method peaks found in reference dataset)
500	5096 ± 126	4574 ± 0	522 ± 126	100	90
100	5008 ± 812	3759 ± 333	1249 ± 630	82	75
20	6292 ± 128	4284 ± 140	2008 ± 262	94	68
4	4896 ± 283	3945 ± 354	951 ± 251	86	81

Peak regions present in all three 500 mg datasets (n = 4574) were used as the reference dataset to which all other samples were compared to measure sensitivity and specificity. Data presented are mean ± standard deviation of three replicates.

Figure 3

FAIRE‐seq and input data in a 5 kb genomic region visualized in the Integrative Genomics Viewer. (a) A single example of FAIRE‐seq and corresponding input sample is shown for each input amount. Peaks called by MACS2 are shown as black bars. Genes are shown at the bottom of the figure. (b) Data for all replicates shown as heat maps of the same genomic region. y‐Axis scale in all cases is read depth 0–100

Genomic locations of FAIRE peaks. y‐Axis shows percentage peaks for each given starting amount of yeast cells. Promoter regions were defined as overlapping or within 1 kb upstream of the transcription start site. Downstream regions are those within 1 kb downstream of the gene but not overlapping the 3′ end of the gene. Intergenic regions were those that were > 1 kb removed from transcription start sites or gene 3′ ends. No significant differences were detected between the different starting amounts (two‐tailed Student's t‐test) Peak calling, sensitivity and specificity. Peaks were called using MACS2, using 3 million uniquely mapping non‐duplicate reads per sample Peak regions present in all three 500 mg datasets (n = 4574) were used as the reference dataset to which all other samples were compared to measure sensitivity and specificity. Data presented are mean ± standard deviation of three replicates. FAIRE‐seq and input data in a 5 kb genomic region visualized in the Integrative Genomics Viewer. (a) A single example of FAIRE‐seq and corresponding input sample is shown for each input amount. Peaks called by MACS2 are shown as black bars. Genes are shown at the bottom of the figure. (b) Data for all replicates shown as heat maps of the same genomic region. y‐Axis scale in all cases is read depth 0–100 To assess sensitivity (false negatives) and specificity (false positives) as input amounts were decreased, we designated those peaks common to all three 500 mg datasets as a ‘reference’ to which the remaining datasets could be compared (Table 1). All three lower input amounts faithfully detected 82–94% of the peaks present in the 500 mg reference dataset. Despite having the lowest number of overall peaks called, the 4 mg dataset did not detect fewer reference peaks. Specificity (a low value indicates the presence of additional peaks not present in the reference dataset) was also high in all samples, and once again was not lowest in the 4 mg dataset. To assess reproducibility, we examined the differences in peak calls within each input amount (Figure 4a) and between the input amounts (Figure 4b). The least variation was seen between replicates of the 500 mg input samples (Figure 4a). However, variation did not consistently increase with decreasing input amount, and individual replicate samples prepared from 4 mg input appeared superior to those prepared from 100 mg input. When limiting the comparison to only peaks present in all three replicates from each input amount (Figure 4b), the majority of peaks were detected at all input amounts. To more objectively measure reproducibility, we performed IDR analysis (Q. Li et al., 2011) of the peak calling data. The number of significant peaks across the replicates for different IDR rates (0–30% in 0.1% increments) was calculated (Figure 4c). The results show that the 500 mg dataset has the highest reproducibility, but all three other input amounts also had good reproducibility, and there was no clear difference in performance between the lower input amounts.

Figure 4

Reproducibility of peak calls. (a) Venn diagrams showing the overlapping peak calls in the triplicate samples within each input amount. (b) Venn diagram of inter‐group peak overlaps, using only peaks common to all three replicates at each input amount. (c) Reproducibility measured by the irreproducible discovery rate (IDR) at different numbers of selected peaks, plotted at various IDR cutoffs. High reproducibility produces a curve with a late transition to high IDR values The conclusion of the above analyses is that no consistent drop in performance, measured in terms of read mapping, sensitivity, specificity or reproducibility, could be attributed to decreasing input cell numbers over a ~100‐fold range, and the method performs well with as little as 4 mg cells input. Indeed, the FAIRE assay appears surprisingly robust to input cell numbers over the range tested here, although it undeniably worked best at the 500 mg input amount. Attempts to further increase the reproducibility of the data by adding yet more replicates did not improve the outcome (data not shown), suggesting that experimental variation has more effect than input cell number at these amounts. To the best of our knowledge, the method presented here represents an ~ 100‐fold decrease in the input cell amount compared with earlier yeast FAIRE methods (Berchowitz et al., 2009; Connelly et al., 2014). In addition, it is the first protocol to demonstrate FAIRE on cells taken directly from colonies cultured on agar plates, rather than from liquid culture. We underline that the performance of the method has not been tested on logarithmically growing liquid cultured cells, which have typically been used for chromatin accessibility studies to date. Our goal in developing this method was to enable comparison of cells from aged and biofilm‐model colony‐forming yeasts (manuscript in preparation). However, the advantages of a procedure that caters to low‐input amounts, does not require liquid culture, and that does not rely on incubation to generate spheroplasts extend beyond this goal: with a requirement of only 4 mg cells, this procedure should be amenable to samples of wild yeast collected from environmental samples. Furthermore, epigenetic states are sensitive to changes in environment and culture conditions, so the ability to analyse chromatin accessibility in yeast samples fixed in formaldehyde immediately after isolation is particularly important for the study of epigenetic states. Finally, as there was no dramatic deterioration in data quality observed with decreasing cell numbers in this study, it appears we have not approached the lower limits of the method, so further decreases in input cell number may be possible.

Availability of data

The raw data presented in this study have been archived in the NCBI's Gene Expression Omnibus repository under accession number GSE104124 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE104124).

FUNDING

The research leading to these results has received funding from the Norway Grants Norwegian Financial Mechanism 2009–2014 project no. 7F14083 under contract no. MSMT‐28477/2014. Part of the research was performed in BIOCEV supported by CZ.1.05/1.1.00/02.0109 BIOCEV provided by ERDF and MEYS. Data S1 Click here for additional data file.

25 in total

1. Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA.

Authors: Jeremy M Simon; Paul G Giresi; Ian J Davis; Jason D Lieb
Journal: Nat Protoc Date: 2012-01-19 Impact factor: 13.491

2. Integrative analysis of ChIP-chip and ChIP-seq dataset.

Authors: Lihua Julie Zhu
Journal: Methods Mol Biol Date: 2013

3. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin.

Authors: Paul G Giresi; Jonghwan Kim; Ryan M McDaniell; Vishwanath R Iyer; Jason D Lieb
Journal: Genome Res Date: 2006-12-19 Impact factor: 9.043

4. An ultra-low-input native ChIP-seq protocol for genome-wide profiling of rare cell populations.

Authors: Julie Brind'Amour; Sheng Liu; Matthew Hudson; Carol Chen; Mohammad M Karimi; Matthew C Lorincz
Journal: Nat Commun Date: 2015-01-21 Impact factor: 14.919

5. Sambamba: fast processing of NGS alignment formats.

Authors: Artem Tarasov; Albert J Vilella; Edwin Cuppen; Isaac J Nijman; Pjotr Prins
Journal: Bioinformatics Date: 2015-02-19 Impact factor: 6.937

Review 6. Dynamics of cell wall structure in Saccharomyces cerevisiae.

Authors: Frans M Klis; Pieternella Mol; Klaas Hellingwerf; Stanley Brul
Journal: FEMS Microbiol Rev Date: 2002-08 Impact factor: 16.408

7. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position.

Authors: Jason D Buenrostro; Paul G Giresi; Lisa C Zaba; Howard Y Chang; William J Greenleaf
Journal: Nat Methods Date: 2013-10-06 Impact factor: 28.547