Literature DB >> 24999558

Discovery of Chromatin-Associated Proteins via Sequence-Specific Capture and Mass Spectrometric Protein Identification in Saccharomyces cerevisiae.

Julia Kennedy-Darling, Hector Guillen-Ahlers¹, Michael R Shortreed, Mark Scalf, Brian L Frey, Christina Kendziorski, Michael Olivier¹, Audrey P Gasch, Lloyd M Smith.

Abstract

DNA-protein interactions play critical roles in the control of genome expression and other fundamental processes. An essential element in understanding how these systems function is to identify their molecular components. We present here a novel strategy, Hybridization Capture of Chromatin Associated Proteins for Proteomics (HyCCAPP), to identify proteins that are interacting with any given region of the genome. This technology identifies and quantifies the proteins that are specifically interacting with a genomic region of interest by sequence-specific hybridization capture of the target region from in vivo cross-linked chromatin, followed by mass spectrometric identification and quantification of associated proteins. We demonstrate the utility of HyCCAPP by identifying proteins associated with three multicopy and one single-copy loci in yeast. In each case, a locus-specific pattern of target-associated proteins was revealed. The binding of previously unknown proteins was confirmed by ChIP in 11 of 17 cases. The identification of many previously known proteins at each locus provides strong support for the ability of HyCCAPP to correctly identify DNA-associated proteins in a sequence-specific manner, while the discovery of previously unknown proteins provides new biological insights into transcriptional and regulatory processes at the target locus.

Entities: CellLine Chemical Disease Gene Species

Keywords: ChIP; DNA-binding proteins; DNA−protein interactions; GENECAPP; HyCCAPP; X-element; chromatin; chromatin immunoprecipitation; hybridization; mass spectrometry; proteomics; rDNA; ribosome biogenesis; transcription factors; transcriptional regulation

Mesh：

Substances：

Year: 2014 PMID： 24999558 PMCID： PMC4123949 DOI： 10.1021/pr5004938

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

DNA–protein interactions are fundamental to the control of genome expression and play critical roles in mediating DNA replication,[1] chromatin organization/segregation,[2] and the transcription of genes and noncoding RNAs.[3] Some proteins associate with DNA in a site- and sequence-specific manner, such as transcription factors that recognize specific cis regulatory elements to modulate the transcription of nearby genes. Other proteins, such as histones and cohesins, bind much of the genome, often with periodic binding patterns.[4] Numerous technologies exist to study these interactions, including DNase footprinting,[5] formaldehyde-assisted isolation of regulatory elements (FAIRE),[6] and chromatin immunoprecipitation (ChIP).[7] DNase footprinting and FAIRE provide information on protein occupancy by revealing sites of the genome protected by or depleted of bound proteins.[5,6] ChIP-seq is used extensively to examine sites across the genome that are bound by DNA-associated proteins.[8] However, a major limitation is that ChIP is protein-centric, in that a candidate protein must be chosen as a potential DNA binder. Therefore, while ChIP-seq reveals the specific binding patterns of a given protein within the genome, it does not provide information about other proteins bound to those DNA regions. In the absence of a means to discover proteins bound to specific chromosomal loci in living cells, much of how the genome is replicated, protected, and expressed will remain obscure. To address these limitations, we describe here a corollary technology to ChIP that is DNA-centric; that is, rather than isolating DNA–protein complexes through capture of the protein, we capture a DNA locus and identify the proteins interacting with that DNA sequence. Knowledge of these DNA-associated proteins is essential to understanding the physiological role and functional properties of the locus. Importantly, this technology is not only able to identify already known DNA interactors, such as those that are studied using ChIP, but also is able to reveal new and previously unsuspected proteins. Conceptually this is very similar to ChIP analysis, differing only in the nature of the affinity capture step (DNA hybridization as opposed to antibody binding). In practice, however, it is much more difficult. First, in contrast to ChIP, where captured DNA is amplified by the polymerase chain reaction (PCR) for subsequent analysis, no amplification is possible for captured protein, and thus much less material is available for mass spectrometric (MS) analysis. Second, the expression levels of different proteins within cells vary by many orders of magnitude, and in some cases important proteins are present at exceedingly low levels. This dramatically complicates their separation and MS analysis. Despite these difficulties there have been two reports of DNA-centric capture approaches. Déjardin and Kingston described the “Proteomics of Isolated Chromatin segments” (PICh) strategy to identify proteins bound to human or Drosophila telomeric regions using locked nucleic acid (LNA) probes.[9,10] Byrum et al. developed an approach referred to as “Chromatin Affinity Purification with Mass Spectrometry” (ChAP-MS), in which a transcription factor (LexA) binding site is engineered into the yeast genome at one specific locus and affinity captured via binding of a LexA–Protein A conjugate.[11] They employed this methodology to identify proteins associated with the yeast GAL1 gene promoter.[11] While constituting an important early proof-of-principle, the Kingston approach has thus far been limited to the capture of very high copy number short repetitive sequences,[9,10] and the ChAP-MS approach requires either engineering of the target genome to introduce a protein binding site[11] or introduction of a transcription activator-like (TAL) fusion protein.[12] In the present work we address these limitations, with development of a new technology, Hybridization Capture of Chromatin Associated Proteins for Proteomics (HyCCAPP), which combines (i) sequence-specific hybridization capture of DNA fragments of interest directly from a cleared yeast lysate, (ii) state-of-the-art mass spectrometric analysis, and (iii) a bioinformatics analysis pipeline to statistically differentiate between real and background signal (Figure 1). We validated HyCCAPP by capturing and interrogating four genomic regions in Saccharomyces cerevisiae: two regions within the rDNA locus, 25S and 5S (∼150–200 copies/cell), the X-element adjacent to the telomeres (∼35 copies/cell) and the GAL1-10 promoter (single copy/cell) (Figure 2). The different loci have different functions, which are reflected in the proteins that interact with them. These differences were evident in the results from HyCCAPP analysis, which produced distinct sets of proteins for each of the four loci, validating the specificity of the technology. These locus-specific protein lists include many, although not all, previously identified protein interactors, as well as numerous previously unknown interactors that expand our understanding of the biology at these loci. Eleven of 17 proteins identified in our proteomic analysis to interact with the various loci were validated by chromatin-immunoprecipitation from TAP-tagged yeast strains, confirming the ability of the technology to discover new interactors. HyCCAPP is a robust technology that can be used to study any DNA fragment of interest in the yeast genome.

Figure 1

Figure 2

Four genomic loci analyzed by HyCCAPP. The four DNA regions studied are depicted by the pink ovals. These include the 25S and 5S regions of the rDNA locus (a 150 copy 9 kb tandem repeat on chromosome XII), the X-element (a 35 copy 462 bp sequence near the telomeric repeats), and the single copy GAL1-10 promoter region. The 25S rDNA locus is transcribed as part of the 35S rRNA by RNA polymerase I, while the 5S rDNA locus is transcribed by RNA polymerase III. The X-element is located either directly adjacent to the telomeric repeats or separated by one or more copies of the Y′ element. The GAL1-10 promoter is a 660 bp region containing the upstream activating sequence (UAS) for divergent control of both the GAL1 and GAL10 genes.

HyCCAPP is a multistep process that uses sequence-specific hybridization to capture DNA loci of interest from formaldehyde cross-linked cells. The overall procedure involves (1) cross-linking cells with formaldehyde followed by cell lysis, (2) hybridization capture from sonicated and RNaseA-treated lysate using desthiobiotin oligonucleotide capture probes and streptavidin-coated magnetic beads, (3) mass spectrometric analysis of captured proteins using LC–MS/MS, and (4) bioinformatic analysis of mass spectrometric data to determine the proteins enriched at the captured DNA locus of interest relative to non-enriched yeast lysate. Four genomic loci analyzed by HyCCAPP. The four DNA regions studied are depicted by the pink ovals. These include the 25S and 5S regions of the rDNA locus (a 150 copy 9 kb tandem repeat on chromosome XII), the X-element (a 35 copy 462 bp sequence near the telomeric repeats), and the single copy GAL1-10 promoter region. The 25S rDNA locus is transcribed as part of the 35S rRNA by RNA polymerase I, while the 5S rDNA locus is transcribed by RNA polymerase III. The X-element is located either directly adjacent to the telomeric repeats or separated by one or more copies of the Y′ element. The GAL1-10 promoter is a 660 bp region containing the upstream activating sequence (UAS) for divergent control of both the GAL1 and GAL10 genes.

Materials and Methods

Cell Growth and Cross-Linking for HyCCAPP

Saccharomyces cerevisiae strain Y1788 cells were grown in yeast extract peptone dextrose (YPD) purchased from Sigma-Aldrich (Y1375). Small scale inoculations of 5 mL were shaken overnight at 30 °C at 200 rpm in an Amerex 747 shaker incubator until saturation. Four large flasks (4 L) containing 1.5 L of YPD were each inoculated with 5 mL of saturated culture and shaken at 30 °C at 200 rpm until the OD600 was between 1.75 and 2.00 measured with an Agilent 8453 UV–vis spectrometer. This cell density is greater than that typically employed in ChIP experiments (OD600 of 0.5–1), in order to reduce the volumes of yeast culture and amounts of formaldehyde required. Flasks were removed from the shaker incubator and placed on stir plates with a 2-in. stir bar added to each flask. Formaldehyde (37% solution Sigma-Aldrich F38775) was added to each flask to a final concentration of just under 3% (122 mL). The cells were stirred at room temperature for 30 min. Excess formaldehyde was then quenched with the addition of 5 M Tris pH = 8 (Teknova T5581) to a final concentration of 714 mM, or 250 mL per 1.5 L of cells. The quenching reaction was stirred for an additional 10 min at room temperature. The cross-linked cells were then centrifuged at 4 °C at 5,000g for 20 min using an Avanti J-25I centrifuge. The cell pellet was washed once with 1X PBS (10X PSB stock Teknova P0191) and centrifuged again at 4 °C at 3,000g for 30 min using an Allegra 6KR centrifuge. Cell pellets were then stored at −80 °C and were stable for at least 2 months.

Cell Lysis and Chromatin Sonication/Solubilization

Cell pellets were resuspended in lysis buffer made fresh each time containing 200 mM NaCl (Teknova S0250), 20 mM EDTA (Teknova E0307-06), 50 mM Tris pH = 7 (Teknova T1070) and protease inhibitors diluted 200X from the Sigma-Aldrich cocktail in DMSO (P8215). A standard HyCCAPP experiment requires 1 L of lysis buffer. Cell pellets from 3 L of cross-linked cells were resuspended in 200 mL of lysis buffer. The cell suspension was lysed using Constant Systems TS Series Cell Disruptor at 30 kpsi. After lysis, SDS was added to the cell lysate to a final concentration of 1% from a 20% solution (Bio-Rad 161-0418) and incubated in a 60 °C water bath for 8 min. The cell lysate was then sonicated in 50 mL aliquots using a MisoniX Ultrasonic Processor S4000 at 20 V for 3 min 30 sec with 4 s on/off intervals. The solution was kept in an ice bath for the duration of the sonication. It is critical that the solution does not heat up during the sonication step, as this will reverse the formaldehyde cross-links. Insoluble cellular debris was cleared from solution with a 12 min, 4 °C, 8,000g centrifugation. The pellet was then removed, and the supernatant was diluted into lysis buffer such that the final concentration of SDS was 0.2% (5-fold dilution). RNaseA (Life Technologies 12091-039) was then added to a final concentration of 60 μg/mL. For 1L of cell lysate, this corresponds to 3 mL from the 20 mg/mL stock. The lysate was incubated at 37 °C for 1 h at 150 rpm in the Amerex shaker. After RNaseA treatment, the lysate was centrifuged at 15,000g for 15 min at room temperature to remove insoluble particulates. Sera-Mag streptavidin particles (Thermo Fisher Scientific 30152104010350) were washed once with wash buffer (200 mM NaCl, 0.2% SDS, 50 mM Tris pH = 8), resuspended in wash buffer, and added to the lysate to remove endogenously biotinylated moieties. For each 1 L of cell lysate from 3 L of cross-linked cells, 4 mL of Sera-Mag particles was used for this step. After washing, the beads were resuspended in ∼5 mL of wash buffer. The solution was shaken at 150 rpm at room temperature for 1 h. The streptavidin particles were then removed using the DynaMag-50 magnet (Life Technologies 12302D) in 50 mL aliquots. The removal of magnetic particles from each 50 mL aliquot of lysate should take between 2 and 4 min. It is important to use a magnet of this strength in order to limit the overall preparation time.

Hybridization/Capture/Washing/Elution

The amount of DNA in the lysate was measured using PicoGreen (Life Technologies P7581). From a 3 L cell lysate preparation, typically between 1 and 3 pmol of chromatin was present. For hybridization, the amount of capture oligonucleotide added depends on copy number of the target region and number of capture oligonucleotides used. For each of the multicopy loci, 100 pmol of capture oligonucleotide was added to about 300 mL of lysate from a 3 L cell preparation. For the single-copy locus, 20 pmol of each of the 10 capture probes was added to a total of 1 L of lysate from a 3 L cell preparation. The volume of capture oligonucleotide added to each lysate sample was negligible relative to the volume of the lysate (1–5 μL into ∼300–1000 mL). The lysate samples with added capture oligonucleotides were shaken at 150 rpm and incubated at 37 °C for 3 h. After this time, the sample was removed from heat and cooled to room temperature (∼15 min). Sera-Mag streptavidin particles were washed and resuspended in a volume of wash buffer equal to their volume prior to washing. For each 100 pmol of capture oligonucleotide, 1 mL of bead slurry was added. The bead slurry was then added to the lysate sample and shaken at 150 rpm at room temperature for 1 h. The beads were then isolated in a 50 mL conical tube held against a DynaMag-50 magnet, by repeatedly adding aliquots of the lysate sample to the tube and allowing the beads to be drawn to the magnet. For the single-copy locus, the beads were collected into two 50 mL conical tubes. After removal of the final aliquot of lysate material, the beads were washed with wash buffer four times. A 5 mL aliquot of the lysate material was set aside at this step and stored at 4 °C. For each washing step, the wash buffer was gently added to the beads to minimize the formation of bubbles from the detergent. To resuspend the beads, the 50 mL conical centrifuge tube was gently inverted multiple times and then placed on a rocker to gently and continually mix the solution but prevent formation of large amounts of bubbles. The samples were rocked at room temperature for 5 min in between each washing step. After the fourth wash, the beads were concentrated into 3 mL of wash buffer and collected against a smaller magnet, the Magna-Sep (Life Technologies K1585-01), into low retention 1.7 mL tubes. For each multicopy locus, the entire amount of beads was concentrated into a single tube. For the single-copy locus, because twice as many beads were added, the beads were split between two tubes. The beads were washed twice in this smaller volume and incubated gently on a rocker for 1 h at room temperature. After the final wash, the wash buffer was replaced with 1.2 mL of release buffer for each tube (200 mM NaCl, 0.2% SDS, 50 mM Tris pH = 8, 10 mM d-Biotin (Life Technologies B-20656)). The samples were gently rocked for 2 h at room temperature to release all of the captured material from the beads. At the end of this incubation, the beads were collected against the magnet, and the solution was removed from each tube, leaving the beads behind. Trichloroacetic acid (350 μL) was then added to each of the samples, which were gently vortexed and placed on ice for 10 min. At this stage, the same amount of TCA was added to two 1.2 mL aliquots of cell lysate, which were then also placed on ice. The samples were then centrifuged at maximum speed at 4 °C for 20 min using a Eppendorf 5417R centrifuge. The supernatant was removed, and the pellet was washed twice with chilled acetone with 5 min centrifugations after each wash. The pellet was then heated at 95 °C to remove excess acetone. At this step, the samples can be stored at −20 °C for future processing. To solubilize the DNA–protein complexes in the pellet and reverse the formaldehyde cross-links, 100 μL of 200 mM Tris pH = 8 containing 1 mg of RapiGest[13] (commercially available from Waters Corporation, Milford, MA), kindly provided by Professor Neil Kelleher, was added to each tube. The sample was gently pipetted, centrifuged briefly, and then heated at 95 °C for 25 min. The sample was gently vortexed every 5 min to aid in solubilization of the pellet. After heating, the samples were allowed to cool and were processed for either qPCR analysis or mass spectrometry analysis.

qPCR Analysis

Samples for qPCR analysis were diluted at least 10-fold from the cross-link reversal sample into 1X TE buffer (10 mM Tris pH = 7 1 mM EDTA). This step is necessary to dilute the SDS. Taqman assays were designed for each of the loci studied and ordered from IDT. A standard curve was made using dilutions of purified yeast genomic DNA. Each sample was analyzed in duplicate on a 96-well plate (Roche 04729692001). Each well contained 5 μL of sample, 10 μL of LightCycler 480 probes master (Roche 04707494001), 4.5 μL water, and 0.5 μL 40X primer probe mix. Each plate was centrifuged for 2 min at 2,000g after pipetting and analyzed using the Roche 480 LightCycler. Each qPCR run included a 5 min preincubation step at 95 °C, amplification cycles, and a 2 min cooling step at 40 °C. Each amplification cycle included a 10 s 95 °C incubation with a temperature ramp of 4.4 °C/s, a 30 s incubation at 60 °C with a temperature ramp of 2.2 °C/s, and a third 1 s incubation at 72 °C with a temperature ramp of 4.4 °C/s. Detection of the FAM fluorophore was performed during the 72 °C incubation using a 483–533 filter set. Analysis of the resultant qPCR curves and calculation of Cp values was performed using the Roche 480 LightCycler software and the 2nd quant/2nd derivative function.

Mass Spectrometry Analysis

After the reverse cross-linking step, 10 μL of 300 mM iodoacetamide (Sigma-Aldrich I1149) was added to each sample and incubated for 30 min at room temperature. Next, 15 μL of 300 mM DL-dithiothreitol (Sigma-Aldrich D9779) was added, and the samples were incubated again for 30 min at room temperature. The sample was diluted with 900 μL of 25 mM ammonium bicarbonate (Teknova A2012), and trypsin (Promega V5111) was added to a ratio of 5–10 μg sample per μg trypsin. The sample was incubated at 37 °C overnight with gentle rotation. Trifluoroacetic acid (TFA) (Sigma-Aldrich T6508) was added to a final concentration of 0.5% TFA in each of the trypsin digest tubes (50 μL in 1000 μL solution). The tubes were incubated at 37 °C for 30 min and then centrifuged at 13,000g for 10 min. The RapiGest byproducts are water-immiscible, so some precipitation may be observed. The supernatant was transferred to another tube. The samples were then desalted using a C18 solid-phase extraction pipet tip (OMIX C18, 100 μL, Agilent Technologies). The tip was conditioned in 70% ACN, 0.1% TFA, and equilibrated in 0.1% TFA. The peptides were bound by pipetting the entire volume over to a low-binding tube (in 200 μL increments). The sample was then pipetted back-and-forth between the two tubes three times. The OMIX tip was then rinsed in 0.1% TFA five times. The bound peptides were eluted into 150 μL of 70% acetonitrile (ACN), 0.1% TFA. The samples were then dried in a Savant SVC-100H SpeedVac Concentrator (about 30 min). After removing all liquid, the samples were reconstituted in 25 μL of 95:5 H2O/ACN, 0.1% formic acid and then vortexed several times. The samples were then spun in a benchtop centrifuge for 2 min, and 24 μL of the supernatant was loaded into HPLC sample vials. Care should be taken to avoid bubbles. This sample allows for 3 technical replicate injections of 8 μL each. Samples were analyzed by HPLC-ESI-MS/MS using a system consisting of a high performance liquid chromatograph (nanoAcquity, Waters) connected to an electrospray ionization (ESI) Orbitrap mass spectrometer (LTQ Velos, Thermo Fisher Scientific). HPLC separation employed a 100 × 365 μm fused silica capillary microcolumn packed with 20 cm of 3 μm diameter, 100 Å pore size, C18 beads (Magic C18, Bruker), with an emitter tip pulled to approximately 2 μm using a laser puller (Sutter Instruments). Peptides were loaded on-column at a flow rate of 500 nL/min for 30 min and then eluted over 120 min at a flow rate of 300 nL/min with a gradient of 2% to 30% ACN, in 0.1% formic acid. Full-mass scans were performed in the FT orbitrap between 300 and 1500 m/z at a resolution of 60,000, followed by 10 MS/MS HCD scans of the 10 highest intensity parent ions at 42% relative collision energy and 7,500 resolution, with a mass range starting at 100 m/z. Dynamic exclusion was enabled with a repeat count of two over the duration of 30 s and an exclusion window of 120 s.

Mass Spectrometry Data Analysis

The acquired precursor MS and MS/MS spectra were searched against an S. cerevisiae protein database (Uniprot reviewed database, containing 6,883 sequences) using SEQUEST, within the Proteome Discoverer 1.3.0.339 software package (Thermo Fisher Scientific). Masses for both precursor and fragment ions were treated as monoisotopic. Oxidized methionine (+15.995 Da) and carbamidomethylated cysteines (+57.021 Da) were allowed as dynamic modifications. The database search permitted for up to two missed trypsin cleavages and ion masses were matched with a mass tolerance of 10 ppm for precursor masses and 0.1 Da for HCD fragments. The output from the SEQUEST search algorithm was validated using the Percolator algorithm. The data were filtered using a 5% false discovery rate, based on q-values. All raw data files and searched data files are available at PeptideAtlas.org under the name “HyCCAPPyeast”.

Bioinformatics Analysis

Delta-Rank Distribution

The result of the mass spectrometric data searching was exported into Microsoft Excel. The numbers of peptide spectral matches (PSMs) were summed across the three technical replicates from each biological replicate. Here, biological replicates are HyCCAPP experiments from different cell growths, while technical replicates are different MS runs from the same biological replicate sample. The total number of PSMs per technical replicate were normalized across all runs to account for variability in protein amount. The total number of normalized PSMs was then summed across the three technical replicates for all HyCCAPP experiments and yeast lysate controls. Proteins not identified in a given mass spectrometric run, but identified in at least one of the other mass spectrometric runs from the HyCCAPP experiments or yeast lysate samples, were given a PSM value of 0.0417 (1/total number of biological replications (1/24)) in place of the zero. Although the ensuing analysis involves rank order, imputing a small number in place of zeros is necessary in order to approximate fold change. The four biological replicates for the HyCCAPP experiments against each of the four loci were averaged. There were eight biological replicates of the yeast lysate control sample run on the mass spectrometer: four of the samples were from each of the four multicopy HyCCAPP experiments, and four corresponded to the GAL1-10 HyCCAPP experiment. To compare the proteins identified in the HyCCAPP experiments to those in the yeast lysate, we used the set of four corresponding yeast lysate samples for each of the loci studied. Each of the lists of proteins were then sorted relative to the number of PSMs, and the proteins were ranked; the protein with the most PSMs was given a rank of 1, and the second most abundant protein a rank of two, etc. The lists from each of the four loci studied were then compared with corresponding yeast lysate samples. Differences in rank were calculated for each protein identified between the HyCCAPP sample and the yeast lysate sample, where a positive delta-rank indicated enrichment in the HyCCAPP experiment. The resultant delta-rank distribution was used to filter true hits from background contribution.

Delta-Rank Threshold Calculation

To determine the delta-rank threshold to filter out true hits from the remainder of identified proteins for each of the four loci, we compared different biological replicates of yeast lysate samples to each other. In the HyCCAPP analysis, the most enriched proteins captured on loci of interest have the largest delta-rank. Comparing different biological replicates of yeast lysate to one other will provide a delta-rank distribution that results from a sample where only biological and technical fluctuations are responsible for differences across samples. Among the eight biological replicates of yeast lysate, there are 70 different possible comparisons when averaging four biological replicates and comparing two such averages (since 70 = 8 choose 4 = 8!/(4!4!)). We combined the yeast lysate samples in all 70 combinations and determined the delta-rank corresponding to different thresholds. For each yeast lysate delta-rank comparison, the number of proteins corresponding to 1% of the sample, 2.5%, 5%, and 10% were determined, and the corresponding delta-rank of each of these proteins was extracted. These ranks were then averaged and used as the threshold levels for the HyCCAPP delta-rank distributions for each of the four loci. False discovery rates (FDRs) were calculated for each of the loci for the four thresholds by dividing the threshold level (e.g., 10%) by the number of proteins above the corresponding delta-rank in the gene:yeast lysate delta-rank distribution normalized to the total number of proteins in each distribution. For example, there were 1831 total delta-ranks from the 25S rDNA gene:yeast lysate comparison. From this list, 393 proteins were above the 10% threshold. Therefore, the FDR was determined to be 10%/(393/1831) or 46.6%.

Chromatin Immunoprecipitation

TAP-tagged strains were obtained from the Thermo Scientific Yeast TAP Tagged ORF library (YSC1177). Small scale (5 mL) inoculations were grown in YPD overnight to saturation and diluted into 500 mL of YPD. Once cells reached an OD of 0.75, formaldehyde was added to a final concentration of 1% and cross-linked for 20 min at room temperature with a stir bar added and stirred at medium speed. Tris was then added to a final concentration of 500 mM to inactivate unreacted formaldehyde,[14] and the sample was stirred at room temperature for an additional 10 min. Cells were isolated through centrifugation at 5,000g at 4 °C for 20 min. The pellet was washed once with 1X PBS, centrifuged at 3,000g for 30 min at 4 °C, and stored at −80 °C. Cells from 125 mL of cell culture were resuspended in about 1.5 mL of lysis buffer (50 mM Tris pH = 7, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na-deoxycholate with 1X protease inhibitor cocktail (Halt Protease inhibitor cocktail Thermo Fisher Scientific 78429)). About 400 μL of glass beads (Sigma-Aldrich G8772) was added to a 2 mL screw-top tube (Sarstedt, 72.693), and the cell suspension was pipetted over the beads. The cells were then disrupted using a bead beater (BioSpec 3110BX) at standard settings for 4 × 50 s cycles. In between cycles, the samples were kept on ice. The solution was then pipetted to a new tube, leaving behind the beads, and sonicated using a microtip sonicator (Fisher Scientific, 550 Sonic Dismembrator) at setting 5 for 3 min total time with 4 s on/off cycles. During sonication, the samples were kept in an ice bath. The samples were then centrifuged at 14,000g for 5 min to remove insoluble debris. The supernatant was then removed and split in half into two tubes. A 50 μL aliquot of this input solution was diluted into 300 μL of 1XTE and stored at −20 °C. Five microliters of the TAP Tag antibody (Thermo Fisher Scientific CAB1001) was added to one tube of each sample. The other half of the lysate sample served as the no-antibody control. The tubes were then incubated overnight at 4 °C with constant rotation. The samples were then equilibrated to room temperature, and 50 μL of Dynabeads Protein G beads (Life Technologies 100.04D) was added to each tube and rotated at room temperature for 1 h. The beads were then collected against a magnet and washed seven times: twice with lysis buffer, twice with lysis buffer containing 500 mM NaCl, twice with wash buffer (10 mM Tris pH = 7, 250 mM LiCl, 0.25% NP-40, 0.25% Na-deoxycholate and 1 mM EDTA), and once with stringent wash buffer (10 mM Tris pH = 7, 1 mM EDTA, 140 mM NaCl, 0.5% SDS). The samples were then eluted into 95 μL of elution buffer (50 mM Tris pH = 7, 1 mM EDTA and 0.5% SDS) by heating for 30 min at 65 °C. The beads were then isolated against a magnet, and the solution was diluted into 300 μL of 1X TE. Proteinase K (2 μL, New England BioLabs P8107S) was then added to each tube, including the input material samples, and incubated at 42 °C for 1 h. The samples were then transferred to 65 °C and incubated for 4–5 h. Phenol/chloroform/isoamyl alcohol mixture (0.4 mL, Sigma-Aldrich 77617) was then added to each sample, followed by vortexing and centrifugation to separate the immiscible layers. The aqueous layer from each sample was isolated and diluted into 1.3 mL of 100% ethanol and incubated at −20 °C overnight. The DNA was then isolated by centrifugation (20 min, 4 °C, 15,000g), and the pellet was washed once with 70% ethanol. The concentration of DNA was measured using a NanoDrop (Thermo Scientific NanoDrop 2000c UV–vis spectrophotometer). Equal amounts of DNA were used for qPCR analysis. Each sample was diluted in 1X TE for qPCR analysis, as described above.

Results

HyCCAPP Procedure

Each step in the HyCCAPP process is critical to success, and was explored and optimized in order to arrive at the overall protocol employed here. Brief descriptions of issues and solutions are provided below, and a detailed overall protocol is provided in Materials and Methods.

Cell Number, Cross-Linking, and Lysis

A typical detection limit for the mass spectrometric identification of peptides in a complex background is in the low femtomole (1 fmol = 10–15 mol = 6 × 108 molecules) range. Assuming a 1% overall efficiency of the HYCAPP process (see below), 6 × 1011 cells are needed to obtain 10 fmol of a single copy/cell protein. We performed HyCCAPP at a scale sufficient to see most proteins (1011 yeast cells per experiment). Because many DNA–protein interactions are dynamic and transient in nature, formaldehyde was utilized to stabilize DNA–protein interactions. Formaldehyde is a widely used zero-length cross-linking reagent that reacts rapidly with proteins and DNA, and the resultant cross-links can be reversed if desired.[15] The concentration (3%) and reaction time (30 min at room temperature) were optimized to maximize DNA-related protein IDs (see Materials and Methods). After testing multiple lysis methods (including bead beating, conventional French press, lysozyme digestion, and high-pressure cell disruptors), we obtained the best reproducibility and efficiency (90% cell lysis, based upon total DNA and protein recovery) with a commercially available cell disruptor system (unpublished).

Chromatin Solubilization and Fragmentation

Hybridization capture requires that the cross-linked chromatin be effectively solubilized and efficiently fragmented. We employed high concentrations of detergent (1% sodium dodecyl sulfate (SDS)) and a short 60 °C incubation for chromatin solubilization. Including this step in the procedure greatly improved overall protein and DNA recovery. The size distribution of chromatin fragments is also a critical variable in HyCCAPP. Longer DNA fragments carry more associated protein molecules, increasing MS signal intensities, but the binding location of those proteins is necessarily less well-defined, compromising spatial resolution. After exploring both enzymatic (restriction enzymes and micrococcal nuclease) and physical fragmentation of the cross-linked chromatin, we found that sonication provided the most reproducible results, generating chromatin fragments ∼1 kb on average. In ChIP procedures the chromatin is generally fragmented more extensively than this, in order to produce shorter DNA fragments, which in turn yield greater resolution. However, in HyCCAPP, there are two significant advantages to employing longer chromatin fragments: first, as mentioned above, the longer fragments carry more associated proteins, facilitating MS detection, and second, the probability that DNA molecules are fragmented within the capture sequence (thereby reducing capture yield) is lower for longer fragments.

Capture Oligonucleotide Design and Hybridization

We designed 30 nt long capture oligonucleotides, with calculated melting temperatures from 50 to 70 °C (Supplementary Table 1). Binding to streptavidin-conjugated beads is a useful approach to capture cross-linked chromatin that has been hybridized to biotinylated oligonucleotides; however, we found that elution of the captured material with heat produced unacceptably high levels of background proteins, compromising our ability to identify low-abundance captured proteins by MS. We therefore employed capture oligonucleotides that included a 5′-desthiobiotin moiety.[16] Desthiobiotin is a biotin analogue that binds with 100-fold lower affinity to streptavidin than biotin, allowing elution by competitive displacement with free biotin under mild conditions. The selective elution of the capture oligonucleotides/hybridized chromatin from support particles greatly reduced background levels from nonspecifically bound proteins, as revealed by MS analysis of the eluted material. For each of the multicopy loci, only a single capture oligonucleotide was used; however, for the single-copy GAL1-10 locus, 10 capture oligonucleotides were utilized in parallel in order to increase overall capture efficiency. The optimum molar ratio of capture oligonucleotide to targeted chromatin locus was 100 for the multicopy loci and 20 for each of the 10 capture oligonucleotides for the single-copy GAL1-10 locus. Hybridization of capture oligonucleotides to the cross-linked and fragmented chromatin was carried out in RNase A-treated lysate at 37 °C for 3 h, followed by capture of the resultant complexes on magnetic beads. The beads were washed and then incubated in a biotin-containing buffer to elute bound material.

Efficiency and Specificity of Hybridization Capture for Multicopy Loci

Hybridization capture efficiency and specificity for multicopy loci were evaluated using qPCR to determine the number of copies of target DNA captured, and PicoGreen binding was used to determine the total amount of DNA captured (target region plus nonspecifically captured DNA).[17] See Supplementary Table 2 for qPCR primer sequences. qPCR showed a capture efficiency (copies of target DNA captured divided by input copies) of 0.6–1.2% for each of the three multicopy loci. The capture specificity (copies of target DNA captured using the target capture oligonucleotide divided by that captured using an off-target oligonucleotide) ranged from 77- to 812-fold relative to the other multicopy loci (Figure 3). This specificity measurement reveals the amount of the targeted DNA captured relative to the two other regions; however, it does not reveal the total amount of off-target DNA captured. To measure this, we used PicoGreen binding to compare the total DNA obtained in specific capture experiments with the total DNA obtained in control capture experiments using a scrambled sequence oligonucleotide that is not significantly complementary to any region within the yeast genome. The difference in total DNA obtained in the specific and control capture experiments is an approximation of the amount of DNA captured through off-target hybridization. For each of the three multicopy loci, the ratio of specifically captured to nonspecifically captured total DNA is at least 2 (Supplementary Figure 1), meaning that at least 50% of the total DNA captured in each HyCCAPP experiment corresponds to the targeted region. Although this value may seem low, it is important to note that, assuming the contaminating DNA represents sequences distributed widely across the genome, the captured fragment greatly outnumbers any other specific sequence, and thus nonspecific capture is not expected to recover particular associated proteins with high enrichment.

Figure 3

Multicopy locus hybridization capture efficiency and specificity. The fractions of target DNA captured by HyCCAPP from the 25S rDNA locus, 5S rDNA locus, and X-element, as measured by locus-specific qPCR assays (Supporting Information) are shown in panels A, B, and C, respectively. Error bars represent the standard deviation of duplicate measurements. The first, second, and third bars in each plot show the fractions of DNA captured using the 25S rDNA capture oligonucleotides, the 5S rDNA capture oligonucleotides, and the X-element capture oligonucleotides, respectively. Numerical values in each case are shown.

Efficiency and Specificity of Hybridization Capture for the Single Copy GAL1-10 Promoter

In order to increase hybridization capture efficiency for the single-copy divergent GAL1-10 promoter, we employed 10 capture oligonucleotides across a ∼1400 bp region (Figure 4a). Due to the greater length of this region, compared to the multicopy loci examined, we designed two qPCR assays targeting regions approximately 1 kb apart. These assays were employed to measure the capture efficiency, capture specificity (by comparison to capture with a scrambled oligonucleotide control), and the approximate length of the captured fragments. We combined capture oligonucleotides 1–5 to target the GAL10 end of the capture region and capture oligonucleotides 6–10 to target the GAL1 end of the capture region. These combinations were used in separate hybridization capture experiments. As shown in Figure 4b, oligonucleotides 1–5 captured ∼0.4% of the GAL10 region and ∼0.1% of the GAL1 region, while oligonucleotides 6–10 captured ∼0.6% of the GAL1 region and ∼0.1% of the GAL10 region. This corresponds to a total capture efficiency of ∼1.2%, comparable to the results obtained for the multicopy loci using only a single capture sequence. The capture specificity compared to a scrambled oligonucleotide control was approximately 100-fold. The differences in the two qPCR assay signals for each set of capture oligonucleotides reflects and is consistent with the distribution of chromatin fragment sizes produced by the sonication (∼1 kb average length, data not shown); more signal is obtained from the qPCR assay closest to the capture oligonucleotide sequences.

Figure 4

GAL1-10 locus hybridization capture efficiency and specificity. Ten capture oligonucleotides (30 nt) were designed for the GAL1-10 promoter region spanning a total length of about 1400 bp (a). The capture probes alternated spacing of 100 bp and 200 bp to avoid synchronicity with nucleosome repetition. Two qPCR assays were designed, one targeting a region near the start of the GAL10 gene (dark gray) and one targeting a region near the start of the GAL1 gene (light gray). HyCCAPP was performed using either capture oligonucleotides 1–5, capture oligonucleotides 6–10, or a scrambled oligonucleotide control (b). Error bars represent the standard deviation of duplicate measurements.

MS Analysis

MS analysis of protein samples was performed by standard methods (see Materials and Methods). The raw MS data were processed using rank order statistics and false discovery rate (FDR) criteria to distinguish true binders from false positive binders. This process will be presented in detail above.

Validation of HyCCAPP through Analysis of Four Genomic Loci in Yeast

To develop and test the HyCCAPP procedure, we analyzed four genomic loci in yeast, each with distinct expectations in terms of bound proteins (see below): the 25S rDNA (∼150 copies/cell), 5S rDNA (∼150–200 copies/cell), X-element (∼35 copies/cell), and the GAL1-10 promoter (single copy/cell) (Figure 2). Capture oligonucleotides were designed as described in Materials and Methods, and the sequences and calculated melting temperatures are provided in Supplementary Table 1. Each experimental run employed 1011 yeast cells. This experimental scale was used for either a single HyCCAPP experiment on the GAL1-10 region or for three HyCCAPP experiments on any of the multicopy loci. An equal volume of yeast lysate was removed from each cell preparation for separate MS analysis and comparison to captured material. Four biological replicates were performed for each of the four loci.

Bioinformatic Analysis

More than 1000 proteins were identified in HyCCAPP experiments for each of the four loci (with four biological replicates each). Although these numbers are large, they are comparable to the numbers of proteins identified in affinity purification and mass spectrometry (AP-MS) experiments,[18] which are conceptually similar to the DNA capture experiments presented here. As has been shown in AP-MS studies, these large lists include both true protein–protein interactions and nonspecific protein binders, and the challenge is designing proper control experiments and accurately identifying the true binders.[19] Similarly, the protein lists obtained in HyCCAPP experiments include both the desired proteins that truly interact with the target region (henceforth “true hits”) and false positive proteins that are not actually specifically interacting with the target region (henceforth “false positives”). False positives often correspond to the most highly abundant proteins present; alternatively, characteristics such as a general but nonspecific affinity for nucleic acids (e.g., any basic and thus positively charged protein) or for the capture beads or associated streptavidin can also lead to artifactual binding. The goal of the bioinformatics data analysis is to reduce the incidence of false positives as much as possible, without also eliminating the true hits. We implemented a rank order methodology to identify the proteins that were most enriched in the captured samples relative to yeast lysate. Relative protein abundances were estimated from their corresponding numbers of peptide spectral matches (PSMs). Because we are operating close to the limit of protein detection, many true positives may not be reproducibly identified in the MS analysis (even if they are reproducibly captured). We therefore opted to combine data from biological replicates, to maximize the breadth of our analysis, as follows. For both yeast lysate and captured samples, proteins were sorted by abundance and ranked correspondingly. For each captured sample (four biological replicates for each of four loci), the average difference in rank between the sample and its paired lysate was calculated for all proteins, yielding a “delta-rank” distribution for each of the four loci (see Figure 5 and Materials and Methods). The proteins that were most enriched in the captured samples have large positive delta-ranks and thus correspond to the right end of the delta-rank distributions; conversely, the proteins that were most abundant in the yeast lysate and were not captured in the HyCCAPP process have large negative delta-ranks and are thus represented at the left end of the delta-rank distributions. The distributions are centered around a delta-rank value of zero, consistent with substantial recovery of abundant, nonspecifically bound proteins. Importantly, each of the distributions shows a significant tail on the right side (large positive delta-rank). Many of these proteins are likely to be the true hits.

Figure 5

Delta-rank histograms obtained from bioinformatic analysis of HyCCAPP MS results for each of the four loci studied. Delta-rank values are plotted along the x-axis, with positive delta-ranks indicating enrichment in the HyCCAPP sample and negative delta-ranks indicating enrichment in the yeast lysate sample. The y-axis values are the numbers of observations of each delta-rank, divided by the total number of delta-rank observations (density). Delta-rank values obtained in gene vs lysate comparisons are shown in red (fit to an area density plot), and delta-rank values obtained from lysate vs lysate comparisons are shown in blue. The vertical dashed black line on each plot indicates the 10% threshold value employed (see text). We sought to identify a threshold of delta-rank values that would maximize true hits while minimizing false positives, essentially setting a false discovery rate (FDR). To determine the threshold differentiating the most enriched proteins from background proteins, we repeated the delta-rank analysis for 70 pairs of yeast lysate samples, generating a background distribution of delta-ranks (Supplementary Figure 2). This background distribution captures the natural fluctuations in delta-rank between lysate samples, in the absence of any capture enrichment. We then identified delta-rank thresholds for the top 1%, 5%, or 10% of lysate delta-ranks, with associated conservative FDR values (false discovery corresponds here to nonspecifically captured proteins) of <30–50% (see Materials and Methods). The true FDR is likely lower than these values, because of how missing proteins not observed in the lysate are treated. We settled on a threshold (10%) that identified the greatest fraction of known binding proteins in our analysis (vertical lines, Figure 5).

Expected Proteins Are Enriched via HyCCAPP

Common Proteins Found at All Loci

The filtered lists of proteins at the chosen threshold for each locus contain from 393 to 610 proteins (see Supporting Information). These lists contain proteins that are unique to each locus, proteins that are shared by some of the four loci, and proteins common to all four loci. There were 117 proteins found at all loci. These proteins would be expected to include both ubiquitous DNA-binding proteins and proteins that bind nonspecifically to chromatin. We evaluated these proteins by GO analysis using the web-based tool FunSpec.[20] GO term enrichment was found for ubiquitous DNA binding functionalities, including DNA helicases (p = 8.1 × 10–6), proteins modulating DNA topology (p = 9.0 × 10–4), and proteins localized to the nucleus (p = 6.8 × 10–3). Other enriched GO terms include spindle pole components (p = 2.3 × 10–6), COPI vesicle (p = 1.3 × 10–10), ER to Golgi transport (p = 1.8 × 10–4), and cell-wall proteins (p = 2.2 × 10–4). While some of these proteins may indeed correspond to true hits,for example, spindle pole bodies that are likely attached to chromatin through protein–protein interactions,[21] others likely result from nonspecific protein binding. For further analysis of locus-specific binders, we therefore removed proteins from the list found at all four loci; while this removes commonly bound DNA binders, it is also likely to remove contaminating proteins that have high affinity for DNA.

Locus-Specific Protein Lists

We sought to compile protein lists specific for each of the four loci, to capture the unique physiology of each region. It is known that some proteins bind at both the telomeric and rDNA regions (e.g., silencing proteins[22]); we therefore included proteins that were found at one or more of these loci (5s rDNA, 25S rDNA, X-element) but excluded those that were either present at all four loci or present at three of the four if one of the three was GAL1-10. For the single-copy GAL1-10 locus we included proteins if they were also identified at no more than one of the multicopy loci. These four locus-specific protein lists are provided in the Supporting Information. We expect these locus-specific protein lists to contain proteins that reflect the function of the gene and largely exclude contaminant proteins. To confirm this, we first combined the locus-specific lists to verify that the majority of proteins are in fact DNA-associated. This analysis revealed a striking enrichment for DNA-binding processes. The seven GO terms enriched with a Bonferroni-corrected p-value of less than 10–4 include DNA binding, transcriptional control, and both nucleus and nucleolus localization (as expected due to the representation of nucleolar-localized rDNA loci, Table 1). The group was also enriched for genes that produce benomyl sensitivity when deleted, as expected for proteins involved in chromosome segregation since microtubule-disrupting benomyl interferes with chromosome dynamics.[23]

Table 1

GO Term Enrichment of Locus-Specific Proteinsa

GO term	p-value
Nucleus	8.66 × 10^–13
DNA-dependent transcription	3.88 × 10^–7
Transcriptional control	7.12 × 10^–6
Nucleolus	1.30 × 10^–5
DNA conformation modification	1.66 × 10^–5
Benomyl sensitivity	4.54 × 10^–5

The locus-specific protein lists obtained for each of the four loci were combined and analyzed for GO enrichment using FunSpec. The results with a Bonferroni-corrected p-value of <10–4 are shown.

Functional Groups Missing from Capture Experiments

For comparison, we also performed GO term enrichment analysis on the pooled set of proteins from the opposite end of the delta-rank distributions (left side of the delta-rank distributions of Figure 5). As mentioned above, these proteins are likely to correspond to abundant proteins present in the lysate that were not captured in the HyCCAPP experiments. Table 2 shows the over-represented GO terms, which are very different from those obtained for the captured proteins and correspond primarily to cytosolic processes such as enzymatic activities, metabolic processes, and biosynthetic and degradative pathways. The striking differences in GO term enrichment for proteins from the two ends of the distributions provides a strong indication that the HyCCAPP technology is truly isolating DNA–protein complexes representative of normal interactions in vivo.

Table 2

GO Term Enrichment of Yeast Lysate Enriched Proteinsa

GO term	p-value
Oxidoreductase activity	<1 × 10^–14
Catalytic activity	<1 × 10^–14
Oxidation–reduction process	<1 × 10^–14
Cytoplasm	<1 × 10^–14
Mitochondria	<1 × 10^–14
Metabolic process	1.84 × 10^–14
Cellular amino acid biosynthetic process	3.9 × 10^–12
Proteasome complex	4.76 × 10^–12
Electron transport	4.78 × 10^–12
ER	4.5 × 10^–11
Nucleotide binding	1.14 × 10^–9
Protein processing (proteolytic)	1.24 × 10^–9
Proteasome storage granule	5.47 × 10^–9
ATP binding	2.71 × 10^–8
Binding	8.82 × 10^–8
Transferase activity	9.42 × 10^–8
ER membrane	2.33 × 10^–7
Pyridoxal phosphate binding	3.15 × 10^–7

The locus-specific protein lists for each of the four loci were partially selected on the basis of a delta-rank threshold of 314. To generate a similar list of proteins that are most enriched in the yeast lysate relative to each of the HyCCAPP samples, proteins with a delta-rank of −314 or below were combined from each of the four HyCCAPP experiments. This list was then analyzed for GO enrichment. There were nearly 40 GO terms enriched with a Bonferroni-corrected p-value of <10–4. About half of the terms are listed here, which are representative of the total list and is devoid of redundant GO terms (see Supplementary Table 3 for complete list).

Validation of HyCCAPP Capture by Chromatin Immunoprecipitation

As discussed in detail below, we chose 17 proteins bound to various loci to validate by traditional chromatin immunoprecipitation (ChIP) and qPCR (Figure 6). We chose proteins for which there was some corroborating evidence that they may represent new binders (see below). Eleven of the 17 proteins (65%) were enriched by ChIP at their respective loci, but not at the control INO1 promoter (with the exception of some chromatin factors; see Supplementary Table 4). These results highlight our ability to identify new binding proteins at these regions, validating the HyCCAPP procedure. Below we discuss each locus specifically in terms of known binders and new proteins implicated as bound to the regions.

Figure 6

ChIP-qPCR confirmation assays for selected proteins identified in HyCCAPP. ChIP-qPCR was performed for 17 TAP-tagged yeast strains expressing proteins found in HyCCAPP experiments at the GAL1-10 promoter (a) and the 25S rDNA locus (b–d). Recovered DNA for immunoprecipitation experiments and no-antibody controls was quantified by qPCR using the primers developed against the GAL10 promoter region (see Figure 4a and Supporting Information) and 25S rDNA region (Supporting Information). The qPCR signal from the ChIP samples is an average of two biological replicates and is depicted as the left (dark gray) bar in each chart, while the qPCR signal from the no antibody control sample is shown as the right (light gray) bar in each chart. For each TAP-tagged strain, the qPCR signal was normalized to input DNA. ChIP enrichment was considered significant if both replicates showed at least 2-fold enrichment compared to the no-antibody control (indicated with *), since the t test has lower power on duplicates (see Supplementary Table 4).

Known Proteins and New Biology Captured by HyCCAPP

X-Element

Chromosome ends in yeast are composed of telomeric and subtelomeric regions, including X-elements and Y′-elements that occur in varying combinations.[24] It is well-known that telomeric repeats are involved in genome stability, affect chromosome maintenance, and play a role in transcriptional silencing.[24] The subtelomeric regions are less well characterized but are also thought to play a role in chromosome stability.[24] The 462 bp X-element core is the only sequence motif found on every chromosome end. This region contains an autonomously replicating sequence (ARS), which is bound by the origin recognition complex (ORC).[1,24] The proximity of the X-element to the telomeric region varies depending upon the presence of one or more adjacent Y′-elements. In HyCCAPP experiments for the X-element, we thus expect to identify proteins not only associated with the X-element but also associated with telomeric repeats and Y′ elements. After applying the bioinformatics filtering and removal of common proteins, there were 232 locus-specific proteins found at the X-element. The list was enriched for both single-stranded and double-stranded telomeric DNA binding proteins (p = 2.1 × 10–3, 3.6 × 10–3), nuclear proteins (p = 3.0 × 10–4) and chromosome-associated proteins (p = 2.0 × 10–3), all functions expected for this DNA locus based on its physiological role. Included on the list were several expected proteins, including Orc4 that binds to the ARS,[1] Sir4 involved in telomeric silencing,[25] and Cdc13,[26] Stm1,[27] Top1,[28] Dot5,[29] and Ebs1[30] that are all known to bind to the telomeres. We also identified over 20 other proteins that are involved in DNA and chromosome maintenance and nearly 50 other proteins that are known to localize to the nucleus. To garner support for other possible telomere-associated proteins, we compared the X-element locus-specific list to proteins identified in screens for mutants with defects in telomere maintenance. Eleven proteins were identified that had previously been shown to affect telomere length when mutated;[31] several of these proteins were not previously known to associate directly with the X-element. We also assessed if novel proteins identified at this locus interact genetically or physically with other captured proteins, under the assumption that true binders on our list may be more likely to interact than random. Indeed, we observed 430 genetic interactions[32] (GIs) among genes encoding the X-element-bound proteins, more than expected by chance (p = 0.002). We also observed 157 protein–protein interactions[32] (PPIs) (p = 0.138), which although not statistically significant still demonstrates that many of the captured proteins are known to be physically associated with each other. Organizing the X-element captured proteins based on these interactions identified specific complexes or subgroups of functionally enriched proteins that further highlight the physiological processes at or near the X-elements (Figure 7). In all, nearly 40% of the proteins captured at the X-element are known to directly associate with telomeres, aid in their maintenance, bind DNA, or reside in the nucleus. Together these results provide further strong validation that HyCCAPP is indeed identifying locus-specific protein binders.

Figure 7

Interaction network and GO Slim enrichment for locus-specific proteins at X-element locus. Between the 232 locus-specific proteins at the X-element, we observed 430 GIs and 157 PPIs, represented by the blue and gray lines, respectively, in the network. The protein interaction network was organized and analyzed for GO-slim enrichment using the GOlorize plugin for Cytoscape 2.4.[62] Seven GO categories were enriched with a Benjamini & Hochberg False Discovery Rate correction of less than 0.05, with proteins in significantly enriched categories colored according to the key. Refined functional enrichment was performed for each protein cluster (encircled) using the program FunSpec.[20] Given the confidence garnered for X-element captured proteins, we looked for new functions implicated in telomere biology. Of note is the capture of Pfa4, a palmitoyltransferase required for protein secretion and reported to reside in the endoplasmic reticulum (ER) membrane.[33] Pfa4 was identified in a screen for proteins required for heterochromatin silencing[34] and later shown to palmitoylate the telomere binding protein Rif1.[35] Pfa4 is required for Rif1 localization to attachment sites of telomeres to nuclear membrane (which is contiguous with the ER membrane). The gene encoding Pfa4 has interactions with genes encoding several other X-element captured proteins, including the nuclear envelope protein Mps3 required for telomere organization during cell division.[36] Our results strongly suggest that Pfa4 is physically bound to telomeres, likely at the sites of their attachment to the nuclear membrane (this hypothesis has not yet been tested due to lack of availability of either a TAP-tagged Pfa4 strain or a ChIP-compatible anti-Pfa4 antibody).

GAL1-10

The GAL1-10 promoter region is a single-copy locus in the yeast genome that has been extensively studied.[37] As a single-copy region, its analysis by HyCCAPP is more challenging than the analysis of multicopy loci, as less DNA and thus less associated protein for MS analysis will be captured (see Discussion). This 660 bp intergenic region contains an upstream activating sequence (UAS) that controls expression of the divergent GAL1 and GAL10 genes, as well as short noncoding RNAs that may regulate expression of the flanking genes.[38] After applying the bioinformatics filtering and removal of common proteins, there were 415 locus-specific proteins found at the GAL1-10 locus. Numerous proteins known to be associated with this region were identified as well as many DNA-binding proteins and other DNA-related proteins. These include Rsc3 and Sth1, both members of the RSC complex[39] that is known to bind at the GAL UAS. Additionally, we identified three members of the Kin28 complex (Kin28, Cdc28, and Ccl1), which is also known to bind to the GAL1-10 region.[40] The capture of these proteins is especially interesting, since the downstream open reading frames should be transcriptionally silenced under our growth conditions. Also surprising was the capture of RNA Pol II subunits, which is expected for the locus under inducing conditions. Together, the capture of these proteins may reflect a poised transcription complex or could indicate the expression of noncoding regulatory RNAs in the region.[38] In all, over 40 known DNA-related proteins were present on the GAL1-10 locus-specific list. Interestingly, we also found a significant enrichment of proteins required for normal growth on galactose (including transcriptional regulator Bdf1, Spt20, RSC subunit Sth1, and Zeo1, a protein reported to reside at the plasma membrane, p = 9.1 × 10–3), supporting their association with the GAL1-10 promoter. Using a traditional ChIP-qPCR assay with strains expressing TAP-tagged proteins, we confirmed that two of the four proteins (Bdf1, Sth1, and possibly Zeo1) bind the GAL1-10 locus under these conditions (Figure 6a).

25S and 5S rDNA

The rDNA locus in yeast is a 9 kb tandem repeat that physically localizes to the nucleolus and encodes rRNA along with an autonomously replicating sequence (ARS) near the 5S gene.[41] Roughly half of the rDNA repeats are transcribed at any given moment, while the others are transcriptionally silenced.[42] Each rDNA unit encodes two transcripts: the 35S transcribed by RNA polymerase (Pol) I and the 5S transcribed by RNA Pol III,[43] separated by intergenic spacers that can be conditionally transcribed by Pol II.[44] The 35S transcript is processed into 18S, 5.8S, and 25S fragments that are chemically modified via methylation and pseudouridylation, folded around ribosomal proteins (RPs), and exported to the cytosol where they mature into functional complexes.[42] Errors in the complicated assembly process are monitored by a surveillance system that may be coupled to nuclear export and triggers rapid degradation of misfolded complexes.[42] Proper ribosome assembly requires precise stoichiometry of rRNA and RPs, although how their synthesis is coordinated remains unclear. Both rRNA and RP production are tightly controlled according to the translational demands of the cell and regulated in an unknown manner via the growth-regulating signaling pathways Ras/PKA and TOR.[45] Due to its repetitive nature, the rDNA locus is subject to high levels of DNA breaks and is therefore a hotspot for DNA damage repair.[46] Through the HyCCAPP procedure, we identified 216 locus-specific proteins enriched at the region encoding the 25S rRNA (referred to as the “25S locus”) and 316 proteins at the 5S region, with 78 proteins shared by both loci. Both lists were independently enriched (p < 1 × 10–3) for proteins involved in DNA/nucleotide binding and proteins localized to the nucleus. The 25S list was also weakly enriched for expected functions including chromatin regulators (p = 1 × 10–3), proteins involved in transcription-coupled repair (p = 4 × 10–4), nucleolar proteins (p = 1 × 10–4), and RNA helicases (p = 5 × 10–3), while the 5S list was enriched for RNA polymerase subunits (p = 8 × 10–6) and kinases (p = 1 × 10–4). We found 46 and 37 proteins on the 25S and 5S lists, respectively, that participate in processes expected to occur at these loci (including chromatin regulators, DNA damage-repair proteins, proteins annotated as involved in rRNA synthesis[41] or known to be required for the ribosome biogenesis[42]). We validated four of the DNA damage proteins found at the 25S locus by ChIP-qPCR (Figure 6b). Our pipeline identified an additional 24 and 26 proteins at the 25S and 5S loci, respectively, that were identified in screens for rRNA processing/ribosome biogenesis factors,[47,48] for a total of 39 proteins across the two lists. Both lists were enriched above chance for proteins that display genetic interactions (GIs) (p = 0.002 for the 5S list and p = 0.015 for the 25S list), consistent with the functional relationships of proteins on the lists (Figure 8). In all we identified 111 proteins with support for binding at one or both loci and an additional 156 proteins that interacted genetically or physically with the supported binders, together representing 60% of the proteins enriched at these loci.

Figure 8

Interaction network and GO Slim enrichment for locus-specific proteins at 25S and 5S rDNA. As shown in Figure 7 but for the combined 215 and 315 locus-specific proteins captured at the 25S and/or 5S regions. We therefore delved deeper into the striking diversity of biological processes captured by HyCCAPP of the rDNA locus (Figure 8). Nearly 50 proteins were involved in chromatin regulation, chromosome segregation/recombination, or DNA damage repair, consistent with the high frequency of DNA breaks and recombination errors at the rDNA repeats. We also captured two replication proteins (Mcm2, Orc5) at the 5S locus, consistent with the known ARS in the region. Both rDNA loci were bound by polymerase subunits, including subunits specific to Pol III (Tfc6 at 5S) and Pol II (at both regions), as well as subunits common to all three polymerases (at both loci). One component of the UAF transcription complex, Rrn5, that works with Pol I to transcribe the 35S transcript[49] was also captured at the 25S region. The extensive processing of the 35S rRNA was long thought to occur post-transcriptionally, but recent evidence suggests that processing can occur on the nascent transcript as transcription is occurring.[50] We found a striking number of rRNA processing enzymes and helicases, several of which we validated by ChIP (Figure 6c). A number of additional proteins identified at the rDNA locus are not characterized in terms of ribosome biogenesis but were identified in a screen for genes required for rRNA processing;[47] our finding that they are localized to the rDNA locus supports a role in rRNA processing. In addition to rRNA processing factors, we captured several proteins involved in rRNA methylation (Bmt2 and Rcm1), also in agreement with the recent realization of cotranscriptional rRNA modification.[50,51] Preribosome complexes must be exported to the cytosol for maturation; it has been proposed that export could serve as a surveillance mechanism for proper assembly, yet how this occurs is not known.[52] HyCCAPP identified several proteins involved in nuclear transport, including nuclear pore subunit Nup145,[53] preribosome export chaperones Ltv1[54] and Srp40,[55] and members of the THO (Tho2, Hrp1) and TREX-2 (Thp1, Sac3, Tex2) complexes. The THO/TREX-2 complex is thought to provide surveillance for proper pre-mRNA production, by coupling transcription, processing, and nuclear export.[56] Capture of the THO/TREX-2 complex is especially intriguing, since the complex is not known to monitor rRNA complexes. TREX-2 subunit Sac3 is required for normal ribosome biogenesis,[48] further supporting the role of this complex in rRNA surveillance and/or export. Somewhat surprisingly, HyCCAPP identified at the rDNA loci a number of signaling proteins that are central to growth-rate regulation and RP synthesis. Ifh1, a transcription factor controlling RP expression,[45b,57] was captured at the 25S region and validated by ChIP (Figure 6d), raising the possibility that Ifh1 plays a role in coordinating rRNA transcription and RP synthesis, perhaps via transcriptional regulation or through its known association with processing factors.[58] We identified several upstream regulators of cellular growth, including members of the TOR pathway (including Tor-complex subunits Lst8, Tsc11 and downstream kinase Sch9), the RAS/PKA pathway (RAS guanine-nucleotide exchange factor Cdc25, heterotrimeric G-protein Gpg1, and stress-activated cAMP phosphodiesterase, Pde2), Sak1 (a kinase regulating the AMPK kinase, Snf1[59]), and others. We validated the rDNA binding of Pde2, Cdc25, and Sak1 by ChIP (Figure 6d). Binding of these regulators to the rDNA locus is reminiscent of the known binding of Tor1 to this region, which is required for proper rDNA transcriptional regulation in response to nutrient cues.[60] While further dissection will be required, our results suggest that other regulators of yeast growth control reside at the rDNA locus and may coordinate ribosome production with translational requirements.

Discussion

We describe here a new technology, HyCCAPP, for the identification of proteins bound to specific genomic loci in yeast. The approach utilizes in vivo formaldehyde cross-linking to covalently attach DNA-associated proteins to the genomic DNA, chromatin fragmentation and sequence-specific DNA hybridization to capture the region of interest, mass spectrometry to identify the associated proteins, and a rank order-based bioinformatic filtering process to enrich true binding proteins over nonspecific background binding proteins. The technology was demonstrated in the analysis of four genomic regions in yeast: the multicopy 5S rRNA, 25S rRNA, and X-element loci, and the single-copy GAL1-10 promoter locus. In each case a locus-specific pattern of target-associated proteins was revealed, which includes both previously known and previously unknown target-associated proteins. The binding of the previously unknown proteins was confirmed by immunoprecipitation from TAP-tagged yeast strains and qPCR in 11 of 17 cases, an ∼65% rate of validation. The identification of many previously known proteins at each locus provides strong support for the ability of HyCCAPP to correctly identify DNA-associated proteins in a sequence-specific manner, while the discovery of previously unknown proteins provides new biological insights into transcriptional and regulatory processes at the target locus. We present below a brief discussion of several interesting aspects of the HyCCAPP technology.

Proteins Expected but Not Observed

While it is gratifying that many proteins previously identified at the target regions are observed in our analysis, there are also many binding proteins identified in the literature that were not identified in this HyCCAPP analysis. For example, we initially chose the GAL1-10 region for analysis because it is one of the best-characterized promoters of the yeast genome, with many known protein interactors. However, while we did detect Rsc3, Sth1, Kin28, among other known binders,[39,40] we did not observe promoter binders Gal4, Gal80,[37b] Mig1,[37e] or Reb1.[37a] Interestingly, these known binding proteins were also missed in the recent ChAP-MS study of the same region,[11] which employed an engineered construct of the GAL1-10 promoter that enabled capture using antibody binding. The absence of some known binding proteins in the captured material was also observed in the work of Déjardin and Kingston on telomeric regions.[9] Although further work will be required to understand the basis for such discrepancies, several possibilities are worthy of mention. First, it is important to understand that HyCCAPP is fundamentally limited by the sensitivity of the MS analysis. MS detection limits for different peptides and proteins vary widely, depending upon factors such as their ionization efficiency, solubility, and polarity. If none of the tryptic peptides corresponding to a target protein are present at levels greater than their detection limit, they will not be observed. Typical peptide MS detection limits for discovery proteomics in complex mixtures are in the 1–10 fmol (6 × 108 to 6 × 109 copies) range, but in many cases they may lie outside that range. Thus, if for any reason, the amount of the peptides from a given protein lies below the detection limit for those peptides, the peptides will not be observed. Second, it is possible that the occupancy of the promoter region is low; in other words, if only a fraction of the genomes in the cell culture are bound at the moment of cross-linking, that may cause the MS signal obtained for that protein to be below the detection limit of the analysis. This limitation does not affect standard ChIP protocols, because of the very high levels of PCR amplification they employ, which can reveal even just a few copies of the target DNA region. Third, HyCCAPP employs formaldehyde as a cross-linker. Advantages of formaldehyde for this application include that it is inexpensive, rapidly crosses the cell wall and cell membrane, reacts rapidly, produces reversible cross-links, and has been very widely used and validated in ChIP protocols. However, it is a very short “zero-length” cross-linker, which means that if the formaldehyde-reactive sites on the DNA and protein are not physically close enough to one another, they will not be cross-linked. It is thus possible that the efficiency of cross-linking is extremely low, which is not a problem in ChIP studies due to the high levels of amplification employed but may be too low for HyCCAPP, as the protein that is captured must be directly analyzed without any amplification possible.

Single-Stranded DNA Regions Are Accessible in Cross-Linked Chromatin

A central issue in the development of HyCCAPP was the ability to hybridize-capture probes effectively to the double-stranded genomic DNA present in chromatin. Déjardin and Kingston employed LNA probes to address this issue, with the idea that the increased thermodynamic stability of LNA:DNA duplexes would enable strand invasion and displacement, allowing hybridization capture.[9] The HyCCAPP procedure described here is an evolution of a strategy we described previously, referred to as GENECAPP.[61] In GENECAPP, which was employed successfully for in vitro studies on a model system, restriction enzyme digestion was used to fragment chromatin at defined sites, followed by exonuclease digestion to produce single-stranded regions suitable for hybridization capture with ordinary capture oligonucleotides.[61] However, control experiments in which no restriction enzyme or exonuclease digestions were employed revealed to our surprise that hybridization occurred equally well in their absence. This observation led to the development of HyCCAPP, in which neither restriction digestion nor exonuclease digestion are employed, greatly reducing the expense and complexity of the process. Interestingly, if the same protocol is applied to yeast genomic DNA, rather than chromatin, either with or without cross-linking, no DNA is captured. It is likely that this has bearing upon the observed capture efficiency of ∼1% in HyCCAPP, as only accessible single-stranded regions can be captured. Further exploration of the mechanism underlying this behavior will be necessary to fully understand this aspect of HyCCAPP.

Utility of HyCCAPP in Discovering New Biology

Our initial results here highlight the power of HyCCAPP to discover new biology, through the identification of novel proteins bound to any particular region of a genome. In this sense, the diverse processes that occur at the rDNA locus provide a perfect showcase for the procedure. We captured proteins involved in chromosome functions (including DNA replication, repair, and segregation), transcription (spanning RNA Pol subunits, Mediator components, and site-specific transcription factors), and cotranscriptional processes (such as rRNA processing, modification, export, and surveillance). Several of the novel captured proteins were reported to reside in other subcellular compartments but were identified in screens affecting rRNA biogenesis or (or telomere silencing in the case of Pfa4), strongly supporting a real functional connection. Furthermore, the capture and ChIP validation of upstream signaling proteins bound to the rDNA locus underscores the power to formulate new hypotheses and seed future study.

Conclusion and Future Directions

In conclusion, HyCCAPP offers a powerful new approach to the study of locus-specific chromatin-associated proteins in yeast. We are currently in the process of extending the strategy to the analysis of mammalian genomes, which are ∼300-fold more complex than the yeast genome, and implementing a multiplex strategy to permit many loci to be captured in parallel, thereby increasing throughput while reducing cost and labor. As these and other improvements to the technology are made, the capabilities of HyCCAPP will continue to develop, providing an increasingly powerful new tool for the study of genomic processes.

73 in total

1. Proper metaphase spindle length is determined by centromere proteins Mis12 and Mis6 required for faithful chromosome segregation.

Authors: G Goshima; S Saitoh; M Yanagida
Journal: Genes Dev Date: 1999-07-01 Impact factor: 11.361

2. A genome-wide screen for Saccharomyces cerevisiae deletion mutants that affect telomere length.

Authors: Syed H Askree; Tal Yehuda; Sarit Smolikov; Raya Gurevich; Joshua Hawk; Carrie Coker; Anat Krauskopf; Martin Kupiec; Michael J McEachern
Journal: Proc Natl Acad Sci U S A Date: 2004-05-25 Impact factor: 11.205

Review 3. New clues to understand the role of THO and other functionally related factors in mRNP biogenesis.

Authors: Rosa Luna; Ana G Rondón; Andrés Aguilera
Journal: Biochim Biophys Acta Date: 2011-12-20

Review 4. A structural perspective on the where, how, why, and what of nucleosome positioning.

Authors: Gaurav Arya; Arijit Maitra; Sergei A Grigoryev
Journal: J Biomol Struct Dyn Date: 2010-06

5. GAL1-GAL10 divergent promoter region of Saccharomyces cerevisiae contains negative control elements in addition to functionally separate and possibly overlapping upstream activating sequences.

Authors: R W West; S M Chen; H Putz; G Butler; M Banerjee
Journal: Genes Dev Date: 1987-12 Impact factor: 11.361

6. Multiprotein transcription factor UAF interacts with the upstream element of the yeast RNA polymerase I promoter and forms a stable preinitiation complex.

Authors: D A Keys; B S Lee; J A Dodd; T T Nguyen; L Vu; E Fantino; L M Burson; Y Nogi; M Nomura
Journal: Genes Dev Date: 1996-04-01 Impact factor: 11.361

Review 7. Ribosome biogenesis in the yeast Saccharomyces cerevisiae.

Authors: John L Woolford; Susan J Baserga
Journal: Genetics Date: 2013-11 Impact factor: 4.562

8. Purification of proteins associated with specific genomic Loci.

Authors: Jérôme Déjardin; Robert E Kingston
Journal: Cell Date: 2009-01-09 Impact factor: 41.582

9. The Ku complex in silencing the cryptic mating-type loci of Saccharomyces cerevisiae.

Authors: Erin E Patterson; Catherine A Fox
Journal: Genetics Date: 2008-08-20 Impact factor: 4.562

10. Components of the yeast spindle and spindle pole body.

Authors: M P Rout; J V Kilmartin
Journal: J Cell Biol Date: 1990-11 Impact factor: 10.539

12 in total

1. Multiplexed programmable release of captured DNA.

Authors: Julia Kennedy-Darling; Matthew T Holden; Michael R Shortreed; Lloyd M Smith
Journal: Chembiochem Date: 2014-08-26 Impact factor: 3.164

2. Elucidating Protein-DNA Interactions in Human Alphoid Chromatin via Hybridization Capture and Mass Spectrometry.

Authors: Katherine E Buxton; Julia Kennedy-Darling; Michael R Shortreed; Nur Zafirah Zaidan; Michael Olivier; Mark Scalf; Rupa Sridharan; Lloyd M Smith
Journal: J Proteome Res Date: 2017-08-04 Impact factor: 4.466

Review 3. Purification and enrichment of specific chromatin loci.

Authors: Mathilde Gauchier; Guido van Mierlo; Michiel Vermeulen; Jérôme Déjardin
Journal: Nat Methods Date: 2020-03-09 Impact factor: 28.547

Review 4. Formaldehyde crosslinking: a tool for the study of chromatin complexes.

Authors: Elizabeth A Hoffman; Brian L Frey; Lloyd M Smith; David T Auble
Journal: J Biol Chem Date: 2015-09-09 Impact factor: 5.157

5. Multiplexed Sequence-Specific Capture of Chromatin and Mass Spectrometric Discovery of Associated Proteins.

Authors: Yunxiang Dai; Julia Kennedy-Darling; Michael R Shortreed; Mark Scalf; Audrey P Gasch; Lloyd M Smith
Journal: Anal Chem Date: 2017-07-11 Impact factor: 6.986

6. Adaptation of Hybridization Capture of Chromatin-associated Proteins for Proteomics to Mammalian Cells.

Authors: Hector Guillen-Ahlers; Prahlad K Rao; Danu S Perumalla; Maria J Montoya; Avinash Y L Jadhav; Michael R Shortreed; Lloyd M Smith; Michael Olivier
Journal: J Vis Exp Date: 2018-06-01 Impact factor: 1.355

Review 7. Mass Spectrometry-Based Tools to Characterize DNA-Protein Cross-Linking by Bis-Electrophiles.

Authors: Arnold Groehler; Amanda Degner; Natalia Y Tretyakova
Journal: Basic Clin Pharmacol Toxicol Date: 2017-03-14 Impact factor: 4.080

8. HyCCAPP as a tool to characterize promoter DNA-protein interactions in Saccharomyces cerevisiae.

Authors: Hector Guillen-Ahlers; Prahlad K Rao; Mark E Levenstein; Julia Kennedy-Darling; Danu S Perumalla; Avinash Y L Jadhav; Jeremy P Glenn; Amy Ludwig-Kubinski; Eugene Drigalenko; Maria J Montoya; Harald H Göring; Corianna D Anderson; Mark Scalf; Heidi I S Gildersleeve; Regina Cole; Alexandra M Greene; Akua K Oduro; Katarina Lazarova; Anthony J Cesnik; Jared Barfknecht; Lisa A Cirillo; Audrey P Gasch; Michael R Shortreed; Lloyd M Smith; Michael Olivier
Journal: Genomics Date: 2016-05-13 Impact factor: 5.736

9. Elucidating the in vivo interactome of HIV-1 RNA by hybridization capture and mass spectrometry.

Authors: Rachel A Knoener; Jordan T Becker; Mark Scalf; Nathan M Sherer; Lloyd M Smith
Journal: Sci Rep Date: 2017-12-05 Impact factor: 4.379

10. Determination of local chromatin composition by CasID.

Authors: Elisabeth Schmidtmann; Tobias Anton; Pascaline Rombaut; Franz Herzog; Heinrich Leonhardt
Journal: Nucleus Date: 2016-09-27 Impact factor: 4.197