Literature DB >> 33649590

Dual DNA and protein tagging of open chromatin unveils dynamics of epigenomic landscapes in leukemia.

Jonathan D Lee^1,2,3,4, Joao A Paulo⁵, Ryan R Posey^6,7,8, Vera Mugoni^6,7,8, Nikki R Kong⁹, Giulia Cheloni^6,7,8, Yu-Ru Lee^6,7,8,10, Frank J Slack¹¹, Daniel G Tenen^9,12, John G Clohessy^6,7,8,13, Steven P Gygi⁵, Pier Paolo Pandolfi^{14,15,16,17,18}.

Abstract

The architecture of chromatin regulates eukaryotic cell states by controlling transcription factor access to sites of gene regulation. Here we describe a dual transposase-peroxidase approach, integrative DNA and protein tagging (iDAPT), which detects both DNA (iDAPT-seq) and protein (iDAPT-MS) associated with accessible regions of chromatin. In addition to direct identification of bound transcription factors, iDAPT enables the inference of their gene regulatory networks, protein interactors and regulation of chromatin accessibility. We applied iDAPT to profile the epigenomic consequences of granulocytic differentiation of acute promyelocytic leukemia, yielding previously undescribed mechanistic insights. Our findings demonstrate the power of iDAPT as a platform for studying the dynamic epigenomic landscapes and their transcription factor components associated with biological phenomena and disease.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2021 PMID： 33649590 PMCID： PMC8272231 DOI： 10.1038/s41592-021-01077-8

Source DB: PubMed Journal: Nat Methods ISSN： 1548-7091 Impact factor: 28.547

Introduction

In the eukaryotic cell, DNA and protein intertwine as chromatin, forming a dynamic landscape comprised of genes, their regulatory sequence elements, and the transcription factor complexes modulating gene expression[1-3]. To perform their regulatory activities, transcription factor components require access to these encoded DNA elements, otherwise impeded by nucleosomal occupancy or higher-order steric hindrance[4,5]. These regions of open chromatin are continuously remodeled to control access of the transcriptional machinery and to modulate gene expression[4,6]. Thus, profiles of accessible genomic regions and their corresponding proteomes would provide a comprehensive framework to understand genome-wide transcriptional regulation, especially as it applies to cellular identity or disease. While sequence-based profiling methods of open chromatin, such as DNase hypersensitivity[6,7] and the assay for transposase-accessible chromatin using sequencing (ATAC-seq)[8], have expanded our understanding of the interplay between chromatin states and transcription, identification of the transcription factor components associated with these accessible chromatin regions remains inferential from these datasets[9]. Specifically, these bioinformatic “footprinting” approaches are limited to sequence-specific transcription factors with long residence times on chromatin, despite known binding of a number of transcription factors with undetectable footprints[9,10]. On the other hand, mass spectrometry-based methods have emerged to characterize proteins associated with open chromatin directly such as through chromatin fractionation[11-14], yet these approaches neither specify differentially bound genomic loci nor provide insight into their transcriptional regulatory activity. To bridge these two approaches, we developed an integrative DNA And Protein Tagging (iDAPT) platform, combining biochemical enrichment via a bifunctional transposase/peroxidase probe and bioinformatic analysis of both genomic and proteomic profiles of open chromatin from a single nuclear lysate preparation (Fig. 1a).

Fig. 1.

Transposase/peroxidase fusion probes tag DNA at regions of open chromatin.

(a) Schematic of integrative DNA And Protein Tagging (iDAPT). TP, transposase/peroxidase fusion protein. (b) Integrative Genomics Viewer (IGV) genome track view of ATAC-seq (Nextera Tn5, Tn5-F) and iDAPT-seq (TP3, TP5) libraries at a ubiquitously accessible control region. Libraries were generated from the GM12878 cell line. (c) Scatterplots comparing genome-wide transposon insertion frequencies of Nextera Tn5 (ATAC-seq) with either in-house Tn5-F (ATAC-seq) or the transposase/peroxidase fusion TP3 (iDAPT-seq) in the GM12878 cell line. Pearson correlation coefficients are displayed inline. (d) Distribution of Pearson correlation coefficients between TP3 or Tn5-F ATAC-see and co-immunostaining of markers of active chromatin (RNA Pol II S2P, H3K27Ac) or repressive chromatin (H3K9me3) per nucleus in the HT1080 cell line. Numbers of nuclei assessed per marker are displayed inline, with images obtained from a single experiment. Center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. p-values, two-sided Wilcoxon rank-sum test with Bonferroni correction. (e) Representative images of co-immunofluorescence staining of chromatin state markers with TP3 ATAC-see in the HT1080 cell line. Similar results were visually confirmed for more than ten nuclei for each chromatin marker and are quantified in (d). Scale bars, 5 μm.

Results

Tn5 transposase preferentially tags and fragments (tagments) sterically accessible DNA in native chromatin[8]. Because Tn5 transposase remains physically bound to its DNA substrate after insertion of its transposon payload[15], we hypothesized that Tn5 transposase may also serve as an anchor for proximal labeling of proteins associated with open chromatin. The APEX2 peroxidase represents an attractive choice for iDAPT due to its widespread use as a genetic tag for spatially restricted proteomic enrichment, its short labeling timeframe of one minute, and its previously described peroxidase activity as a purified protein[16,17]. For these reasons, we fused APEX2 with Tn5 transposase for peroxidase-mediated biotin labeling and sequential transposition. We cloned and purified a series of transposase/peroxidase fusion probes consisting of APEX2 peroxidase fused either N- or C-terminal to Tn5 transposase (peroxidase/transposase [PT] and transposase/peroxidase [TP], respectively), adjoined via several linkers (L1-L5) (Extended Data Fig. 1a–b). C-terminal peroxidase (TP1-TP5) fusions yielded ATAC-seq library quantifications similar to commercial (Nextera) Tn5 transposase and in-house purified untagged or FLAG-tagged Tn5 transposases (C-terminal FLAG [Tn5-F] and N-terminal FLAG [F-Tn5]), whereas N-terminal peroxidase (PT1-PT5) fusions exhibited decreased transposase activity (Extended Data Fig. 1c). DNA fragment size analysis of ATAC-seq libraries generated from all TP fusions yielded distributions corresponding to ~200 base pair-wide nucleosomal periods typically observed with open chromatin enrichment[8] (Extended Data Fig. 1d). Furthermore, we observed an expected gel shift of linearized DNA in the presence of transposase domain-containing enzymes but not in the presence of FLAG-tagged APEX2 domain alone (APEX2-F)[15], with corresponding DNA fragmentation profiles dependent on both transposase-DNA association and absence of the divalent cation chelator EDTA[18] (Extended Data Fig. 1e–f).

Extended Data Fig. 1

Optimization of transposase/peroxidase fusion probes for transposase activity.

(a) Schematic of recombinant fusion protein linear sequence. PT, peroxidase/transposase; TP, transposase/peroxidase; F, FLAG; L, linker. (b) Sequences of protein linkers tested for fusion protein activity. (c) Quantitative PCR assessment of pre-amplified GM12878 ATAC-seq libraries generated with the corresponding enzymes (n = 1 independent experiment). (d) TapeStation DNA HS 5000 assessment of fragment size distributions of GM12878 ATAC-seq libraries. Nucleosomal fragmentation is marked inline. (e and f) Gel shift assay (e) and DNA fragment distributions (f) of tagmentation reactions of linearized pSMART plasmid with the corresponding enzymes. Gel shift and DNA fragments were measured on a 1% agarose gel. Images are representative of two independent experiments. MEDS, Mosaic End double-stranded transposon.

Next, we generated ATAC-seq/iDAPT-seq libraries of GM12878 cells using the recently developed OmniATAC protocol, which improves signal-to-noise ratios, decreases mitochondrial read proportions, and increases assay reproducibility as compared to the original ATAC-seq protocol, with Nextera Tn5, in-house purified Tn5-F, and representative fusion probes TP3 and TP5[19]. Here we distinguish iDAPT-seq from ATAC-seq with the use of TP fusion enzymes for tagmentation, allowing for subsequent proteomic labeling and enrichment (Fig. 1a). ATAC-seq and iDAPT-seq libraries exhibited similar nucleosomal periodicities in their fragment size distributions, high signal-to-noise ratios, and broad decreases in mitochondrial read proportions relative to published GM12878 ATAC-seq libraries generated via the original ATAC-seq protocol[8,18-20] (Extended Data Fig. 2a–c). Furthermore, TP3 and TP5 iDAPT-seq libraries exhibit high correlations with Tn5 transposase-generated ATAC-seq libraries (Fig. 1b–c, Extended Data Fig. 2d). Thus, TP3 and TP5 fusion enzymes yield high quality iDAPT-seq libraries, akin to ATAC-seq libraries generated via Tn5 transposase enzyme lacking a peroxidase domain.

Extended Data Fig. 2

Assessment of transposase activity on native chromatin.

(a) Fragment size distributions of GM12878 ATAC-seq/iDAPT-seq libraries. (b) Ratio of transposon insertions at Ensembl v94 transcription start sites (TSS) relative to background from in-house ATAC-seq/iDAPT-seq and published ATAC-seq libraries from refs. [8,18–20] generated from the GM12878 cell line (n = 1). (c) Proportion of non-mitochondrial reads from GM12878 ATAC-seq/iDAPT-seq libraries. (d) Heatmap of pairwise Pearson correlation coefficients of genome-wide transposon insertion frequencies for the indicated GM12878 ATAC-seq/iDAPT-seq libraries.

As further assessment of TP localization to open chromatin, we performed ATAC-see, an assay of in situ transposase activity and localization[18], with co-immunofluorescence of various markers of chromatin state. TP3 and Tn5-F exhibit similarly positive correlations with histone H3 lysine 27 acetylation (H3K27Ac) and RNA polymerase II serine-2 phosphorylation (RNAPII S2P) immunofluorescence signals, markers of transcriptionally active chromatin, and similarly poor correlations with H3 lysine 9 trimethylation (H3K9me3) immunofluorescence, a marker of transcriptionally inactive chromatin, albeit with slight differences in colocalization patterns between the two probes (Fig. 1d–e). These data indicate that our TP fusion probes retain native Tn5 transposase activity and preferentially tag open chromatin. Having confirmed TP fusion tagging of and localization to open chromatin, we next assessed APEX2 peroxidase functionality when fused with Tn5 transposase. First to confirm this, we added 1 mM hydrogen peroxide to purified proteins alone and detected peroxidase activity from the fusion proteins via resorufin fluorescence after one minute (Supplementary Fig. 1a–b). Interestingly, all TP fusions exhibit higher peroxidase activities than APEX2-F alone, possibly due to increased thermal stability or heme binding of APEX2 dimer formation induced by the proximity of the two C-termini of dimeric Tn5 transposase[16,21-23] (Supplementary Fig. 1c). Next, in extracted HEK293T nuclei, we observed strong peroxidase-dependent biotin signal in the presence of the TP3 fusion probe and low signal in the presence of the negative control probes Tn5-F and APEX2-F (Supplementary Fig. 2). Residual APEX2-F-mediated signal further decreased with additional washing and blocking steps while maintaining strong TP3-mediated biotin signal (Supplementary Fig. 2). In line with our hypothesis that Tn5 transposase remains physically bound to native chromatin, Tn5 transposase and TP3 fusion enzyme are found in the nuclear lysate, whereas APEX2 is mostly lost despite equimolar addition of recombinant protein to the tagmentation buffer (Supplementary Fig. 1a, 2b–c). Indeed, we found all TP fusion enzymes to promote strong biotin labeling in K562 nuclei, with TP5 and TP3 enzymes exhibiting the highest levels of labeling (Extended Data Fig. 3a). Finally, we confirmed that this labeling is dependent on the presence of both hydrogen peroxide and biotin-phenol (Extended Data Fig. 3b). Thus, our findings indicate that TP probes label transposase-accessible chromatin in a peroxidase-dependent manner.

Extended Data Fig. 3

Assessment of iDAPT protein labeling in the K562 cell line.

(a) Western blot of labeled nuclear lysates with negative (Tn5-F, APEX2-F) and fusion (TP1–5) probes. Images are representative of two independent experiments. Ratios, relative total streptavidin intensities normalized by corresponding PCNA intensities. (b) Western blot of labeled nuclear lysates with either single enzymatic domains (T, Tn5-F; A, APEX2-F) or the TP3 fusion probe with or without either biotin-phenol or hydrogen peroxide (H2O2). Images are representative of two independent experiments. Ratios, relative total streptavidin intensities normalized by corresponding PCNA intensities. (c) Heatmap of pairwise Pearson correlation coefficients of K562 iDAPT-MS profiles for the indicated probes. (d) Venn diagram of significant proteins (log2 fold change > 0 and false discovery rate < 5%) identified by TP5 or TP3 versus negative control probes by iDAPT-MS.

With our optimized iDAPT protocol, we performed quantitative mass spectrometry on the iDAPT-enriched proteome (iDAPT-MS) from K562 nuclei[24] (Fig. 2a, Supplementary Table 1). As negative control probes enrich for nonspecific background signal, akin to an IgG negative control for an immunoprecipitation assay, we interpreted the substantial proteomic content enriched by TP over negative control probes as bona fide proteins proximal to Tn5 transposase localization in isolated nuclei (Fig. 2b). By hierarchical clustering and correlation analyses, nuclear lysates labeled via TP3 and TP5 segregate from lysates labeled via single enzymatic domains, with substantial overlap between TP3- and TP5-enriched proteomes (Extended Data Fig. 3c–d). We observed a similarly substantial iDAPT-MS enrichment pattern from TP3 versus negative control probes from the NB4 cell line, incorporating an additional wash step to block endogenous peroxidase activity prior to tagmentation and biotin labeling (Extended Data Fig. 4, Supplementary Table 2).

Fig. 2.

iDAPT-MS reveals the open chromatin-associated proteome.

(a) Schematic of iDAPT-MS experimental design and SL-TMT sample labeling for K562 profiling. (b) Volcano plot of proteins enriched by fusion (TP3 and TP5) versus negative control (Tn5-F and APEX2-F) probes in K562 nuclei. Blue points, log2 fold change > 0 and false discovery rate (FDR) < 5%; red points, CisBP sequence-specific transcription factors; black points, points with corresponding gene symbol labels. (c) IGV genome track view of iDAPT-seq (TP3) libraries generated from either intact nuclei or genomic DNA from K562 cells and CUT&RUN libraries from K562 nuclei using ERH, WBP11, or normal rabbit IgG antibodies. (d) Representative images of co-immunofluorescence staining of the SC35 nuclear speckle marker with Tn5-F ATAC-see in the HT1080 cell line. Similar results were visually confirmed for more than ten nuclei for each chromatin marker and are quantified in Extended Data Fig. 6c. Scale bars, 5 μm. (e and f) Mediator (e) and BAF (f) CORUM complex enrichment by iDAPT-MS with fusion probes in both K562 and NB4 cell lines. NES (normalized enrichment score) and p-value, gene set enrichment analysis. Legend, individual protein-level iDAPT-MS enrichment. (g) MAX BioGrid first-order protein interaction network enrichment by iDAPT-MS with fusion probes in the K562 cell line. NES (normalized enrichment score) and p-value, gene set enrichment analysis. Legend, individual protein-level iDAPT-MS enrichment. (h) Distribution of Jaccard indices between MAX ChIP-seq peaks and ChIP-seq peaks of first-order protein interactors within regions of open chromatin in the K562 cell line. MAX ChIP 1, ENCFF618VMC. MAX ChIP 2, ENCFF900NVQ. BG, background ChIP-seq epitopes, collated from ENCODE K562 ChIP-seq datasets of proteins not annotated to interact with MAX by BioGrid. Center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; black points, outliers. Red point, replicate MAX ChIP-seq epitope. p-values, two-sided Wilcoxon rank-sum test. n, number of represented ChIP-seq epitopes.

Extended Data Fig. 4

Assessment of iDAPT protein labeling in the NB4 cell line.

(a) Western blot of labeled nuclear lysates with Tn5-F or TP3 probes and with or without pre-transposition blocking of endogenous peroxidase activity with 0.1% sodium azide and 0.03% hydrogen peroxide. Images are of a single experiment. Ratios, relative total streptavidin intensities normalized by corresponding PCNA intensities. (b) Schematic of iDAPT-MS experimental design and SL-TMT sample labeling for NB4 cell line profiling. (c) Volcano plot of proteins enriched by fusion (TP3) versus negative control (Tn5-F and APEX2-F) probes in NB4 nuclei. Blue points, log2 fold change > 0 and false discovery rate (FDR) < 5%; red points, CisBP sequence-specific transcription factors; black points, points with corresponding gene symbol labels. (d) Heatmap of pairwise Pearson correlation coefficients of NB4 iDAPT-MS profiles for the indicated probes and treatment conditions.

To validate highly enriched proteins by iDAPT-MS, we performed CUT&RUN (ERH and WBP11) and analyzed published ENCODE ChIP-seq datasets from the K562 cell line[25,26] (Supplementary Table 3). We found substantial enrichment of protein binding at sites of open chromatin (Fig. 2c, Extended Data Fig. 5). These results demonstrate the ability of iDAPT-MS to discover proteins associated with open chromatin.

Extended Data Fig. 5

Analysis of open chromatin protein localization by ChIP-seq and CUT&RUN.

(a) Scatterplot of protein enrichment profiles by iDAPT-MS from both K562 and NB4 cell lines. (b and c) CUT&RUN (top) and immunoprecipitation (bottom) enrichment of ERH (b) and WBP11 (c) in K562 cells relative to normal rabbit IgG antibody. Western blotting images are of a single experiment. Red lines, CUT&RUN enrichment of target epitopes across K562 iDAPT-seq peaks. Black lines, CUT&RUN enrichment of normal rabbit IgG antibody across K562 iDAPT-seq peaks. Solid and dashed lines, duplicate CUT&RUN analyses. (d) Distribution of CUT&RUN peaks overlapping K562 iDAPT-seq peaks. CUT&RUN peaks were determined using a 1% false discovery rate cutoff from MACS2. (e) Number of iDAPT-seq peaks overlapping ChIP-seq peaks in K562 cells. Listed proteins are profiled in K562 cells by the ENCODE consortium (Supplementary Table 3) and are enriched by K562 iDAPT-MS (5% FDR).

Next, we performed enrichment analyses of our iDAPT-MS datasets. Subcellular enrichment analysis identified nuclear speckles and nucleoplasm in both K562 and NB4 iDAPT-MS datasets[27] (Extended Data Fig. 6a–b). Indeed, ATAC-see signal of Tn5-F colocalizes with the nuclear speckle marker SC35 in multiple cell lines, in agreement with recent reports of nuclear speckle localization at active promoters[28,29] (Fig. 2d, Extended Data Fig. 6c–e). We further identified significant enrichment of protein complexes such as Mediator, which regulates communication from enhancer- and promoter-bound transcription factors to RNA polymerase II[30], and BAF, which remodels chromatin accessibility[31], in both K562 and NB4 cell lines[32] (Fig. 2e–f). Chromatin remodelers and RNA-binding proteins were highly represented (>50% of annotated proteins) among enriched proteins, whereas transcription factors and histone variants were not as well represented (<25% of annotated proteins) (Extended Data Fig. 6f). While histone protein H2AX/H2AFX was highly enriched in both NB4 and K562 iDAPT-MS proteomes, other detected histone proteins were weakly enriched over negative control probes or not detected, suggesting that histone proteins as a class are not predominantly enriched by iDAPT-MS (Fig. 2b, Extended Data Fig. 4c, 6f–g).

Extended Data Fig. 6

Analysis of subcellular enrichment by iDAPT-MS.

(a and b) Subcellular enrichment of K562 (a) and NB4 (b) iDAPT-MS profiles, using annotations from the Human Protein Atlas. NES (normalized enrichment score) and FDR (false discovery rate), gene set enrichment analysis. (c) Distribution of Pearson correlation coefficients between Tn5-F ATAC-see and co-immunostaining of the SC35 nuclear speckle marker or chromatin state markers (RNA Pol II S2P, H3K27Ac) per nucleus in three cancer cell lines. Numbers of nuclei assessed per marker are displayed inline, with images drawn from two independent experiments. Center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. (d and e) Representative images of co-immunofluorescence staining of the SC35 nuclear speckle marker with Tn5-F ATAC-see in the MDA-MB-231 (d) and the DU145 (e) cancer cell lines. Similar results were visually confirmed for more than ten nuclei for each cell line and are quantified in (c). Scale bars, 5 μm. (f) Proportion of annotated proteins detected and significantly enriched (log2 fold change > 0 and FDR < 0.05) by iDAPT-MS for the given protein families. n, total number of proteins annotated in each protein family. (g) Distribution of iDAPT-MS log2 fold changes of detected histone and non-histone proteins. Center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; black points, outliers. n, number of quantified proteins by iDAPT-MS per group. p-value, two-sided Wilcoxon rank-sum test with Bonferroni correction.

Despite low background peroxidase signal, APEX2-F yields some proteomic enrichment over Tn5-F, although not as strongly as signal generated by TP3/TP5 (Supplementary Fig. 3a–f). To assess whether APEX2-F has a different labeling propensity over TP3/TP5 fusion probes in K562 nuclei, we used quantile normalization as a proxy for normalizing APEX2-F peroxidase activity with TP3 and TP5 activities (Supplementary Fig. 3g). We found this quantile normalization scheme to yield similar subcellular enrichment patterns, albeit with increased mitochondrial enrichment, as with our primary streptavidin/trypsin peptide normalization scheme (Extended Data Fig. 6a, Supplementary Fig. 3h). Taken together, these data suggest that TP fusion proteins exhibit different labeling patterns from diffusely nuclear APEX2. Next, we compared iDAPT-MS enrichment relative to other techniques used to assess protein abundance on chromatin. First, we collated sets of detected proteins from K562 RNA-seq (protein-coding transcripts)[25], whole cell proteome[33], and nuclear proteome[34] datasets and then assessed the proportions of proteins detected across subcellular compartments in each of these datasets to normalize for proteome complexity. While we observed mild subcellular enrichment differences between RNA-seq and whole cell proteome datasets, we found increased enrichment of nucleoli, nucleoplasm, and nucleus localization terms from iDAPT-MS and nuclear proteome datasets (Supplementary Fig. 4a–b). The K562 iDAPT-MS-enriched proteome exhibits increased enrichment of nuclear speckles, nucleoplasm, and nuclear body localization terms and decreased cytosolic, plasma membrane, and Golgi apparatus localization terms over the nuclear proteome (Supplementary Fig. 4b). Second, we assessed how iDAPT-MS enrichment compares with incremental salt extractions from K562 nuclei, partitioning euchromatic and heterochromatic proteins via disrupting electrostatic protein-protein and protein-DNA interactions[34] (Supplementary Fig. 4c–d). After converting protein sets to subcellular enrichment scores and performing principal component analysis, we found that K562 iDAPT-MS coincides with proteins identified by both isotonic and 250 mM salt extractions along the first principal component, largely representing euchromatic proteins. Third, we compared iDAPT-MS enrichment with additional published salt extraction- and micrococcal nuclease (MNase) fragmentation-based chromatin proteomic datasets in a similar manner[12-14] (Supplementary Fig. 4e–f). Indeed, iDAPT-MS enrichment corresponds with chromatin proteomes enriched by light MNase digestion and salt extraction along the first principal component. Together, these findings demonstrate that iDAPT-MS enriches for the open chromatin proteome. An advantage of iDAPT-MS over ATAC-seq/iDAPT-seq or chromatin immunoprecipitation (ChIP)-based approaches is its ability to capture numerous transcription co-factors associated with open chromatin in a single assay. As proof of principle, we found the MAX protein interaction network to be significantly enriched on open chromatin by K562 iDAPT-MS[35] (Fig. 2g). To validate this finding, ChIP-seq analysis suggests protein interactors of MAX colocalize more tightly with MAX across the open chromatin landscape than do non-interacting proteins (Fig. 2h, Supplementary Table 3). Therefore, iDAPT-MS together with protein interaction annotations facilitates the identification of active transcription factor protein complexes on open chromatin, expanding the inference of cis-regulatory transcription factor networks. Transcription factors regulate gene expression by binding to DNA in a sequence-specific manner and recruiting transcriptional activators and/or repressors to their target genes. Most transcription factors are found within regions of open chromatin, a pattern we also observed in our iDAPT-MS data[3,6,36] (Fig. 3a, Extended Data Fig. 7a). As iDAPT enables profiling of both genomic and proteomic content of the open chromatin landscape, we sought to compare transcription factor enrichment profiles obtained from iDAPT-MS and iDAPT-seq approaches. To assess the enrichment of transcription factors obtained via iDAPT-seq, we profiled both nuclei and “naked” genomic DNA from both K562 and NB4 cell lines. iDAPT-seq analysis confirms loss of both nucleosomal enrichment and promoter insertion preference in naked DNA; furthermore, insertion profiles segregate along the first principal component and exhibit skewed statistical significance towards chromatinized peaks in both datasets (Extended Data Fig. 7b–h).

Fig. 3.

Integrative analysis of iDAPT-MS and iDAPT-seq classifies transcription factor activities on open chromatin at steady state.

(a) Enrichment of CisBP sequence-specific transcription factors via K562 iDAPT-MS. Normalized enrichment score (NES) and p-value, gene set enrichment analysis. (b) Schematic of bivariate footprinting analysis of iDAPT-seq data. FPD, footprint depth. FA, flanking accessibility. (c) Bivariate footprinting analysis of native chromatin versus naked genomic DNA from the K562 cell line. Red, class A transcription factors; blue, class B transcription factors; gray, class C transcription factors. (d-f) K562 genome-wide footprint of CTCF (d, class A), RELA/p65 (e, class B), and IKZF1 (f, class C) from native chromatin (red) and naked DNA (black). iDAPT-MS LFC, log2 fold change; FDR, limma false discovery rate. ChIP-seq NES, normalized enrichment score; p, gene set enrichment analysis p-value. (g) Comparison of CisBP sequence-specific transcription factors enriched by iDAPT-MS versus iDAPT-seq footprinting analysis in the K562 cell line.

Extended Data Fig. 7

Assessment of TP3 iDAPT-seq from native chromatin versus naked genomic DNA.

(a) Enrichment of CisBP sequence-specific transcription factors via NB4 iDAPT-MS. Normalized enrichment score (NES) and p-value, gene set enrichment analysis. (b) Fragment size distributions of iDAPT-seq libraries generated from K562 and NB4 native chromatin and naked genomic DNA. (c and d) Ratio of transposon insertions at Ensembl v94 transcription start sites (TSS) relative to background from K562 (c) and NB4 (d) iDAPT-seq datasets. (e and f) Principal component analysis of genome-wide transposon insertion frequencies from K562 (e) and NB4 (f) iDAPT-seq libraries. (g and h) Volcano plot of K562 (g) and NB4 (h) iDAPT-seq profiles analyzed with DESeq2. Peak statistics are listed below. FDR, false discovery rate; LFC, log2 fold change.

With these iDAPT-seq profiles, we performed footprinting analysis to infer transcription factor activities at their cognate motifs. By a genome-wide bivariate footprinting approach, accounting for both transcription factor footprint depth (FPD) and flanking chromatin accessibility (FA) near the transcription factor motif, we observed significant enrichment of most CisBP transcription factor motifs in iDAPT-seq profiles from native chromatin[10,36] (Fig. 3b–c, Extended Data Fig. 8a–c). We categorized motifs emerging from our footprint analysis into three classes: strong footprinting (class A), weak footprinting (class B), and no or negative footprinting (class C) (Extended Data Fig. 8d). In line with previous reports, transcription factors with longer residence times on chromatin exhibit stronger footprints: for instance, CTCF, an insulator protein with a long retention time on DNA, exhibits a strong footprint (class A) and is detected by both iDAPT-MS and ChIP-seq[9,37] (Fig. 3d). RELA/NF-κB complexes (class B) have short DNA residence times and substantially weaker footprinting potential, despite being detected by both iDAPT-MS and ChIP-seq[38] (Fig. 3e). While class C motifs such as IKZF1 exhibit nonsignificant or even significantly negative footprinting activity, several of these transcription factors are nonetheless found on open chromatin by both iDAPT-MS and ChIP-seq (Fig. 3f–g, Extended Data Fig. 8e). Broadly, we observed no clear relationship between inferred transcription factor footprint activity by iDAPT-seq and magnitude of transcription factor abundance by iDAPT-MS (Fig. 3g, Extended Data Fig. 8f). Indeed, ChIP-seq and iDAPT-MS both directly identify transcription factors spanning all three classes of footprint activities (Extended Data Fig. 8e, Supplementary Table 3), yet neither assay alone can inform how transcription factor binding might affect chromatin accessibility. Conversely, footprinting analysis of iDAPT-seq is able to detect changes to chromatin accessibility, but these changes may be independent of whether a transcription factor is bound or not. Thus, we posit that, for the analysis of transcription factors with annotated motifs, iDAPT-seq and iDAPT-MS together identify transcription factors bound to open chromatin and reveal their activity on chromatin accessibility as a consequence of their abundance, providing greater insight into transcription factor mechanisms than either assay alone.

Extended Data Fig. 8

Classification of transcription factors by footprinting activity.

(a and b), Classification scheme of transcription factor motifs by composite footprinting score from K562 (a) or NB4 (b) iDAPT-seq datasets. Separation of class A and B motifs was determined by a two-state Gaussian mixture model; separation of class B and C motifs was demarcated by either a false discovery rate > 5% or footprinting score < 0. (c) Bivariate footprinting analysis of native chromatin versus naked genomic DNA from the NB4 cell line. Red, class A transcription factors; blue, class B transcription factors; gray, class C transcription factors. (d) Tabulation of transcription factor footprinting classifications for those transcription factors significantly enriched by both K562 and NB4 iDAPT-MS. (e) Number of significant CisBP transcription factors in each footprinting class as determined by iDAPT-MS or ENCODE ChIP-seq, with corresponding numbers of associated transcription factor motifs per class as determined by iDAPT-seq. (f) Comparison of CisBP sequence-specific transcription factors enriched by fusion probe iDAPT-MS versus iDAPT-seq footprinting analysis in the NB4 cell line.

Next, we assessed how transcription factor abundances and chromatin accessibility states correlate upon granulocytic differentiation of the NB4 acute promyelocytic leukemia (APL) cell line. Differentiation of NB4 cells via all-trans retinoic acid (ATRA) leads to degradation of the PML-RARA oncogenic fusion protein, decreased proliferation, and granulocytic differentiation of the leukemia[39] (Fig. 4a–b, Extended Data Fig. 9a–c). iDAPT-MS reveals a dramatic shift in the open chromatin proteome, with profiles clustering by treatment (Extended Data Fig. 4b, d). In line with previous reports, we observed negative enrichment of RARA, degraded upon ATRA treatment[40,41], and positive enrichment of PU.1/SPI1, CEBPB, and CEBPE, upregulated in response to ATRA[42-44] (Extended Data Fig. 9d). Pathway enrichment analysis reveals positive associations with MAPK signaling, neutrophil differentiation, and the innate immune response (Extended Data Fig. 9e). On the other hand, loss of histone deacetylase enrichment, the most significantly negative pathway, may explain the previously described decrease in histone acetylation states and sensitivity to histone deacetylase inhibitors in APL[45,46]. These observations validate the ability of iDAPT-MS to capture both specific proteins and proteomic signatures as they dynamically shift upon changes in cell identity.

Fig. 4.

iDAPT profiling of the NB4 acute promyelocytic leukemia cell line upon all-trans retinoic acid (ATRA) treatment reveals dynamics of transcription factor activity.

(a) Schematic of the consequences of PML-RARA fusion oncogene on hematopoiesis and relief of its differentiation blockade by ATRA treatment. (b) Representative flow cytometry plots of NB4 cells treated with or without ATRA after 48 hrs. (c) Comparison of CisBP sequence-specific transcription factor enrichment by TP3 iDAPT-MS (log2 fold change) versus iDAPT-seq footprinting analysis (composite footprinting score) in the NB4 cell line upon treatment with either ATRA or DMSO. Roman numerals, transcription factor classification as described in Extended Data Fig. 10a. (d-g) PU.1/SPI1 and BCL11A BioGrid first-order protein interaction networks (d and f) and corresponding genome-wide motif footprints (e and g) upon treatment with either ATRA (red) or DMSO (black) in the NB4 cell line. NES (normalized enrichment score) and p-value, gene set enrichment analysis. Legend, individual protein-level iDAPT-MS enrichment. (h) Assessment of NB4 cell line-specific genetic dependencies versus NB4 iDAPT-MS negative enrichment upon ATRA treatment. Dependency scores are as reported from the CRISPR (Avana) 19Q3 dataset.

Extended Data Fig. 9

Analysis of NB4 iDAPT-MS profiles upon treatment with ATRA.

(a) Representative gating strategy for flow cytometry analyses as in Fig. 4b. (b) Western blotting analysis of the PML epitope from the NB4 cell line upon 48 hr ATRA treatment versus DMSO vehicle treatment (0.01%). Images are representative of two independent experiments. PCNA, loading control. (c) NB4 cell counts after 48 hrs of treatment with either 1 μM ATRA or vehicle (0.01% DMSO), as measured by CellTiter-Glo (n = 5 independent wells). p-value, Welch two-tailed t-test. (d) Volcano plot of proteins enriched by the TP3 fusion probe in NB4 nuclei treated with either ATRA or DMSO. Blue points, log2 fold change > 0 and false discovery rate (FDR) < 5%; red points, log2 fold change < 0 and false discovery rate (FDR) < 5%; black points, points with corresponding gene symbol labels. (e) ReactomeDB pathway enrichment analysis from iDAPT-MS of NB4 ATRA versus DMSO treatment. FDR, gene set enrichment analysis false discovery rate.

Given the different transcription factor classes captured by iDAPT at steady state, we explored how transcription factor activities and abundances change on open chromatin upon ATRA-mediated cellular differentiation. By iDAPT-seq, we observed both increased and decreased regions of open chromatin and motif footprinting activity upon ATRA treatment, with footprinting parameters FPD and FA correlating strongly with composite footprinting scores (Supplementary Fig. 5). Intriguingly, both concordant and discordant enrichment patterns between iDAPT-seq and iDAPT-MS transcription factor enrichment profiles were observed (Fig. 4c). Furthermore, some transcription factors exhibit only one of either differential footprinting or protein abundance, discrepancies that have been observed previously between chromatin accessibility and chromatin immunoprecipitation-based assays[9,10]. To corroborate our findings, we replaced our iDAPT-seq footprinting and iDAPT-MS analyses with either motif enrichment analysis via ChromVAR or RNA-seq analysis, which correlates well with our iDAPT-MS protein analysis, both yielding similar transcription factor patterns[47-49] (Supplementary Fig. 6–7). Hence, iDAPT reveals nine distinct classes (classes I-IX) arising as a consequence of integrating both iDAPT-seq, a readout of transcription factor activity, and iDAPT-MS, a readout of transcription factor protein abundance at open chromatin (Fig. 4c, Extended Data Fig. 10a). Furthermore, we interpreted concordance (classes III, VII) as chromatin activating activity by the transcription factor of interest and discordance (classes I, IX) as chromatin repression (Fig. 4c, Extended Data Fig. 10a). In support of this functional classification scheme, among transcription factors decreasing in abundance upon ATRA treatment, those classified as activating (class VII), which should be easier to tag by TP fusion proteins in the vehicle-treated setting, are generally more enriched by TP3 over negative control probes than repressive transcription factors (class I) (Extended Data Fig. 10b). Thus, iDAPT-MS and iDAPT-seq together uncover functional relationships between transcription factor binding dynamics and chromatin accessibility, which neither assay can elucidate alone.

Extended Data Fig. 10

Integrative analysis of iDAPT-MS and iDAPT-seq transcription factor abundance and activities.

(a) Schematic outlining the nine classes emerging from the changes in transcription factor abundances and activities on open chromatin upon ATRA treatment. Concordant or discordant changes in abundance and activities suggest activating or repressive activities on chromatin, respectively. (b) Distribution of log2 fold changes of transcription factor abundances as enriched by TP3 versus negative control iDAPT-MS profiles from untreated NB4 cells, separated by repressive (class I, increasing chromatin accessibility, decreasing protein abundance) or activating (class VII, decreasing chromatin accessibility, decreasing protein abundance) transcription factors as classified upon NB4 treatment with ATRA (mean ± s.e.m.). n, number of represented proteins from NB4 iDAPT-MS. p-value, two-sided Wilcoxon rank-sum test.

As iDAPT-MS reveals abundance changes of proteins beyond transcription factors, we assessed how proteins interacting with transcription factors may cooperate to regulate chromatin accessibility states. For a given transcription factor, we superimposed iDAPT-MS protein abundance changes onto its first-order protein interaction network from BioGrid[35]. Of these putative transcription factor complex profiles, we found the PU.1/SPI1 protein interaction network to be the most significantly decreased complex upon ATRA treatment (Fig. 4d). Intriguingly, while many of its protein interactors such as the transcriptional corepressor SIN3A decrease in abundance, PU.1/SPI1 itself increases in abundance to promote chromatin accessibility at its cognate motif (class III)[42,50] (Fig. 4d–e). Furthermore, the decrease in RARA protein abundance, also an interactor of PU.1/SPI1, leads to increased chromatin accessibility at its binding motif due to its ATRA-mediated degradation, implicating its transcriptional repressive activity (class I)[51] (Supplementary Fig. 8a). Thus, in the APL setting, transcriptional repressors bind to PU.1/SPI1 to repress chromatin accessibility at PU.1/SPI1 motifs; this repressive binding is relieved upon ATRA treatment, enabling PU.1/SPI1 to activate transcription at its motifs. This analysis may be extended to other transcription factors and their protein complexes: BCL11A, together with many of its annotated protein interactors, decreases in abundance while increasing chromatin accessibility upon ATRA treatment (class I), suggestive of a coordinated downregulation of this repressive transcription factor and its protein complex components[52] (Fig. 4f–g). While JUNB[53-55], CEBPB[56], and CEBPE[57] have both activating and repressive behaviors reported, we observed class VII activating behavior from the JUNB transcription factor and class IX repressive behavior from the CEBPB and CEBPE transcription factors upon ATRA treatment, with their dynamic protein complex components providing potential context-specific insights into their regulatory activities on chromatin state (Supplementary Fig. 8b–c). In this manner, integrating protein interaction information with iDAPT-MS and iDAPT-seq profiles reveals the interplay between transcription factors, their activities on chromatin accessibility, and their putative protein complexes as these components change during ATRA treatment of NB4 cells. Given the numerous transcription factors and associated components differentially bound at open chromatin upon ATRA treatment, some of these newly identified proteins may have functional roles in APL differentiation. We superimposed our iDAPT-MS results with NB4 genetic dependencies and identified both PML and RARA, corroborating our analysis[58] (Fig. 4h). After filtering out essential genes across hematopoietic cell lines, we identified a number of candidate transcription factor effectors, including CEBPA, EBF3, and ZEB2, which may act downstream or independently of PML-RARA (Fig. 4h, Supplementary Fig. 9). In agreement with previous reports, our transcription factor classification scheme assigns ZEB2 as repressive[59] (class I) and EBF3[60-62] and CEBPA[63] as activating (class VII) (Fig. 4c, Supplementary Fig. 9c–d). This analysis reifies the power of combining forward genetic screens with iDAPT-MS to identify critical transcription factors and their regulators for a given biological phenotype. Finally, we assessed how our interpretations of transcription factor dynamics would change between iDAPT-MS, measuring protein abundances directly, and RNA-seq profiles. While we observed a positive correlation between iDAPT-MS and RNA-seq profiles upon ATRA treatment, several discordant cases emerged, including JUNB/JUND and RARA, with their RNA-seq effect sizes opposite in magnitude of their corresponding iDAPT-MS effects (Fig. 4c, Supplementary Fig. 7b–c). Indeed, ATRA binds to RARA, and prolonged ligand binding and transcriptional activity leads to RARA protein degradation[40] (Supplementary Fig. 8a). Furthermore, as transcript levels of RARA and several other protein interactors of PU.1/SPI1 do not fully match iDAPT-MS enrichment trends, the significantly negative enrichment of the PU.1/SPI1 protein complex observed upon ATRA treatment by iDAPT-MS is lost by RNA-seq (Supplementary Fig. 10). Thus, although bulk RNA-seq may broadly provide similar patterns as iDAPT-MS, the different levels of gene expression captured by the two techniques limit the ability of RNA-seq to replace proteomic analysis of open chromatin-associated proteins.

Discussion

In summary, we have developed iDAPT to capture both the genomic and proteomic contents of open chromatin, realized via a recombinant transposase/peroxidase probe. Integrative analysis of iDAPT-seq and iDAPT-MS profiles reveals nine transcription factor classes based on both changes in protein abundance on open chromatin (decreased, unchanged, or increased) and transcription factor activity (closed, unchanged, open). Furthermore, iDAPT-MS together with protein interaction annotations implicates changes in transcription factor complex compositions that may explain the corresponding changes in chromatin accessibility. Identification of such relationships between transcription factors, their protein complex components, and their functional outputs on chromatin accessibility may be informative for mechanistic and therapeutic study, especially in conjunction with genetic screening approaches. Indeed, in the context of APL, our analyses suggest targets for which approved therapies already exist, such as histone deacetylases, and those which may warrant further investigation, such as EBF3 and ZEB2. From our transcription factor classification scheme, we are able to assign activating or repressive activities to sequence-specific transcription factors based on their concordance or discordance between iDAPT-MS and iDAPT-seq profiles. At the heart of this finding is the question, if repressive factors close chromatin at their cognate binding sites, how are they still detected by iDAPT-MS? Due to chromatin “breathing” or stochastic transposition, Tn5 transposase may insert proximal to repressive transcription factors on chromatin, albeit at a decreased frequency as compared to activating transcription factors, enabling the tagging of such repressive factors for mass spectrometry detection. In support of this explanation, as in Fig. 4g and Supplementary Figs. 8a and c, repressive transcription factors (classes I and IX) exhibit detectable transposase activity proximal to their cognate binding motifs above background in both ATRA- and control-treated cells. On the other hand, the inference of transcription factor activity via genome-wide footprinting from iDAPT-seq/ATAC-seq datasets may be partially artifactual, leading to misleading classifications of transcription factor activity. First, footprinting analysis relies on the quality of curated DNA binding motifs, whereas actual transcription factor localization to open chromatin may not be restricted to such motif-containing chromatin regions. Second, genome-wide footprinting analysis in bulk may mask locus-specific or cell-specific transcription factor activities, a consequence of broadly enriching for transposase-accessible chromatin, only one of many regulatory features of gene expression. Thus, the combination of iDAPT-MS and iDAPT-seq provides a powerful opportunity to identify such key relationships between transcription factor abundance and genome-wide regulation of chromatin accessibility. In addition to chromatin accessibility state, additional factors such as histone and DNA modifications may modulate transcription factor activity at a given genetic locus[25]. To explore these relationships further, complementary methods to identify the transcription factors and associated proteins at these specific chromatin states include ChIP-based enrichment[64] and proximity labeling via chromatin reader domains[65]. At a finer genetic resolution are locus-specific enrichment methods, including recently developed CRISPR/Cas9-based proximity labeling approaches[11,66]. Integrating these methods with assays of the accessible genome such as ATAC-seq may reveal context-specific transcription factor activities and protein complex compositions that iDAPT would not reveal. On the other hand, classification of global transcription factor activities via iDAPT may better inform their regulation of cellular phenotypes, encompassing mechanistic information across all of its binding sites. Furthermore, as iDAPT does not require genetic manipulation of biological samples of interest as with traditional APEX2 or biotin ligase genetic tagging[16,17,66], our approach may be readily applied to numerous biological systems to uncover novel chromatin-level molecular correlates and mechanistic insights. Thus, our findings substantiate the unprecedented capability of iDAPT to unravel epigenomic landscapes as they change during development and disease.

Online Methods

Additional information may be found in the Life Sciences Reporting Summary.

Cell lines and culture conditions.

GM12878 cells (Coriell) were cultured in RPMI-1640 supplemented with L-glutamine (Gibco), 15% heat-inactivated fetal bovine serum (FBS) (Gibco), and 1% penicillin/streptomycin (Thermo Fisher Scientific). HT1080 (American Type Culture Collection, ATCC) were cultured in EMEM (ATCC) supplemented with 10% FBS and 1% penicillin/streptomycin. MDA-MB-231 (ATCC) and HEK293T (ATCC) cells were maintained in DMEM (Gibco) supplemented with 10% FBS, 1% L-glutamine, and 1% penicillin/streptomycin. DU145 (ATCC) and K562 (ATCC) cells were cultured in RPMI-1640 supplemented with 10% FBS and 1% penicillin/streptomycin. NB4 cells (DSMZ) were cultured in RPMI-1640 supplemented with 10% charcoal-stripped FBS (Gibco) and 1% penicillin/streptomycin. All-trans retinoic acid (ATRA, Sigma) was dissolved in DMSO at a concentration of 10 mM. Cells were incubated at 37 °C and 5% CO2. Genomic DNA was extracted from K562 and NB4 cells using the Quick-DNA MiniPrep kit (Zymo).

Cloning and purification of recombinant proteins.

Expression plasmids were acquired (pTXB1-Tn5, Addgene #60240) or cloned (APEX2 ORF from pTRC-APEX2, Addgene #72558) into the pTXB1 vector (NEB). Fusion constructs with different peptide linkers[67] were generated by site-directed mutagenesis (NEB). Plasmids containing C-terminally tagged gene constructs as described in this study are deposited to Addgene (#160081, #160083–160088). All enzymes were expressed and purified similarly as previously described[68]. In brief, plasmids were transformed into the Rosetta2 E. coli strain (EMD Millipore) and streaked out on an LB agar plate containing ampicillin and chloramphenicol. A single bacterial colony was inoculated into 10 mL LB with antibiotics and incubated overnight; this culture was then inoculated into 500 mL LB medium. Cultures were incubated at 37 °C until the optical density at 600 nm (OD600) reached ~0.9. Isopropyl β-D-1-thiogalactopyranoside (IPTG) was added to a final concentration of 250 μM, cultures were incubated for 2 h at 30 °C, and bacteria were pelleted and frozen at −80 °C. Bacterial pellets were resuspended in 40 mL HEGX lysis buffer (20 mM HEPES-KOH pH 7.2, 1 M NaCl, 1 mM EDTA, 10% glycerol, 0.2% Triton X-100, 20 μM PMSF) and sonicated with a Sonic Dismembrator 100 (Fisher Scientific) at setting 7, with 5 pulses of 30 s on/off on ice. Lysate was spun at 15,000 x g in a Beckman centrifuge (JA-10 rotor) for 30 min at 4 °C. 1 mL 10% PEI was then added to the supernatant with agitation and clarified by centrifugation (15,000 x g, 15 min, 4 °C). Supernatant was then applied to 5 mL chitin resin (NEB) prewashed with HEGX buffer and incubated for 1 h at 4 °C with agitation. Chitin slurry was applied to an Econo-Pak column (Bio-Rad) to remove unbound protein, washed with 20 column volumes of HEGX buffer and 1 column volume of HEGX with 50 mM DTT, and then incubated with 1 column volume of HEGX with 50 mM DTT for 48 h at 4 °C. After elution, the column was washed with 1 column volume of 2x dialysis buffer (2xDB: 100 mM HEPES-KOH pH 7.2, 0.2 M NaCl, 0.2 mM EDTA, 20% glycerol, 0.2% Triton X-100, 2 mM DTT). Eluates were combined, concentrated with a 10 kDa MWCO centrifugal filter (EMD Millipore), and subjected to buffer exchange with 2xDB using PD-10 desalting columns (GE Healthcare). Proteins were quantified via detergent-compatible Bradford assay (Thermo Fisher Scientific), snap frozen with liquid nitrogen, and stored at −80 °C.

Transposome adaptor preparation.

All transposome adaptors were synthesized at Thermo Fisher Scientific. The oligonucleotide sequences were similar as previously described[18,68]: Tn5MErev, 5’-[phos]CTGTCTCTTATACACATCT-3’; Tn5ME-A, 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’; Tn5ME-B: 5’-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3’; Tn5ME-A-AF647, 5’-/AlexaFluor647/TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’; Tn5ME-B-AF647: 5’-/AlexaFluor647/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3’. All oligos were resuspended in water to a final concentration of 200 μM each. Equimolar amounts of Tn5MErev/Tn5ME-A, Tn5MErev/Tn5ME-B, Tn5MErev/Tn5ME-A-AF647, and Tn5MErev/Tn5ME-B-AF647 were added together in separate tubes, denatured at 95 °C for 10 min, and cooled slowly to room temperature by removing the heat block. Tn5MEDS-A/Tn5MEDS-B and Tn5MEDS-A-AF647/Tn5MEDS-B-AF647 were combined at equimolar amounts to form 100 μM stocks of Tn5MEDS-A/B and Tn5MEDS-A/B-AF647, aliquoted, and stored at −20 °C.

Electrophoretic mobility shift assay and DNA fragmentation analysis.

pSMART HCAmp plasmid (Lucigen) was linearized with EcoRV-HF (NEB) and column-purified. DNA:protein complexes were assembled by incubating 12 pmol enzyme in 2xDB buffer with 15 pmol MEDS-A/B in water for 1 h at room temperature. 200 ng of linearized plasmid was then added to the enzyme mix and brought to a final volume of 20 μL containing 20% dimethylformamide, 20 mM Tris-HCl pH 7.5, and 10 mM MgCl2, with or without 50 mM EDTA. Tagmentation reactions were then incubated for 30 min at 37 °C. For gel shift analysis, reactions were subjected to electrophoresis on a 1% agarose gel in Tris-acetate-EDTA (TAE) buffer using gel loading dye without SDS (NEB). DNA fragmentation was assessed by adding SDS to a final concentration of 0.2% to the reaction mix after tagmentation and heating at 55 °C for 15 min. Reactions were then subjected to electrophoresis on a 1% agarose gel cast in TAE and ethidium bromide using gel loading dye with SDS (NEB). Images were acquired via a Gel Doc (Bio-Rad) via the Quantity One v4.2.1 software.

ATAC-seq/iDAPT-seq sample preparation.

The OmniATAC sample preparation protocol was used as previously described with modifications where indicated below[19]. 10 pmol enzyme (2 μL in 2xDB) was mixed with 12.5 pmol MEDS-A/B (1.25 μL in water) and incubated at room temperature for 1 h. In the meantime, 50,000 cells were centrifuged at 500 x g for 5 min at 4°C. Cells were resuspended in 50 μL lysis buffer 1 (LB1: 10 mM Tris-HCl pH 7.5, 10 mM NaCl, 3 mM MgCl2, 0.01% digitonin, 0.1% Tween-20, and 0.1% NP-40) with trituration, incubated on ice for 3 min, and then further supplemented with 1 mL lysis buffer 2 (LB2: 10 mM Tris-HCl pH 7.5, 10 mM NaCl, 3 mM MgCl2, and 0.1% Tween-20). Nuclei were pelleted (500 x g, 10 min, 4 °C), resuspended with 50 μL tagmentation reaction mixture (20% dimethylformamide, 10 mM MgCl2, 20 mM Tris-HCl pH 7.5, 33% 1xPBS, 0.01% digitonin, 0.1% Tween-20, and either 10 pmol enzyme equivalent of enzyme:DNA complex or 2.5 μL Nextera Tn5 [Illumina, TDE1 from FC-121–1030] in 50 μL total volume), and incubated at 37 °C for 30 min with agitation on a thermomixer (1,000 rpm). For iDAPT-seq libraries generated from K562 or NB4 cells or genomic DNA, bovine serum albumin (BSA) was added at a final concentration of 1% to lysis (LB1 and LB2) and tagmentation buffers. Tagmentation with naked genomic DNA was performed using 50 ng genomic DNA as substrate. After tagmentation, DNA libraries were extracted with DNA Clean and Concentrator-5 (Zymo) and eluted with 21 μL water. To determine optimal PCR cycle number for library amplification, quantitative PCR was performed similarly as previously reported on a StepOnePlus Real-Time PCR (Applied Biosystems) with the StepOne v2.3 software[8]. 2 μL of each ATAC-seq or iDAPT-seq library was added to 2x NEBNext Master Mix (NEB) and 0.4x SYBR Green (Thermo Fisher) with 1.25 μM of each primer (Primer 1: 5’-AATGATACGGCGACCACCGAGATCTACACTCGTCGGCAGCGTCAGATGTG-3’; Primer 2.1: 5’-CAAGCAGAAGACGGCATACGAGATTCGCCTTAGTCT CGTGGGCTCGGAGATGT-3’) in a final volume of 15 μL, and quantification was assessed using the following conditions: 72 °C for 5 min; 98 °C for 30 s; and thermocycling at 98 °C for 10 s, 63 °C for 30 s and 72 °C for 1 min. Optimal PCR cycle number was determined as the qPCR cycle yielding fluorescence between 1/4 and 1/3 of the maximum fluorescence. The remaining DNA library was then amplified accordingly by PCR using previously reported barcoded primers for library multiplexing[8], purified with DNA Clean and Concentrator-5 (Zymo), and eluted into 20 μL final volume with water. Libraries were then subject to TapeStation 2200 High Sensitivity D1000 or D5000 fragment size analysis (Agilent) and NextSeq 500 High Output paired-end sequencing (2×75 bp, Illumina) as indicated.

ATAC-seq/iDAPT-seq data preprocessing.

Paired-end sequencing reads were trimmed with TrimGalore v0.4.5 to remove adaptor sequence CTGTCTCTTATACACATCT, which arises at the 3’ end due to sequenced DNA fragments being shorter than the sequencing length (75 bp). Reads were aligned to the hg38 reference genome using bowtie2 v2.2.9 with options “--no-unal --no-discordant --no-mixed -X 2000”. Reads mapping to the mitochondrial genome were subsequently removed, and duplicate reads were removed with Picard v2.8.0. For insert size distribution, transcription start site (TSS) enrichment, and genome track visualization analyses, reads were downsampled to approximately 5 million paired-end fragments. Insert size distributions were determined by counting inferred fragment sizes from read alignments. TSS enrichment was performed by first shifting insert positions aligned to the reverse strand by −5 bp and the forward strand by +4 bp as previously described[8] and then determining the distance of each insertion to the closest Ensembl v94 transcription start site with Homer v4.9. Visualization was performed by mapping insertions to a genome-wide sliding 150 bp window with 20 bp offsets with bedops v2.4.30, followed by conversion to bigwig format with wigToBigWig from UCSC tools v363. Genome tracks were visualized with Integrative Genomics Viewer v2.5.0. Peaks were aligned by MACS2 v2.1.1 using options “callpeak --nomodel --shift −100 --extsize 200 --nolambda -q 0.01 --keep-dup all”, generating either individual peak sets from each library (GM12878 analysis) or a consensus peak set after consolidating all reads (K562, NB4 analyses). For GM12878 analysis, a union of all analyzed peaks was taken as a consensus peak set, and counts of insertions within peaks (downsampled to 5 million reads) were assessed using bedtools v2.26.0 with the multicov function. Correlation analysis was performed with log2 read counts + 1 and visualized using the pheatmap function in R v3.5.0. For K562 and NB4 analyses, consensus peaks overlapping with hg38 blacklist regions were removed (https://www.encodeproject.org/annotations/ENCSR636HFF/), and counts of insertions within peaks were assessed using the bedtools multicov function. Count matrices were processed with DESeq2 for differential insertions with shrunken log2 fold changes, and principal component analyses were performed with counts transformed by the varianceStabilizingTransformation function from DESeq2. Figures were generated with ggplot2 v3.1.1.

Co-immunofluorescence/ATAC-see analysis.

ATAC-see was performed similarly as previously described with slight modifications[18]. Enzyme and transposon DNA were mixed at a 1:1.25 enzyme:MEDS-A/B-AF647 molar ratio and incubated at room temperature for 1 h. Adherent cells were grown on glass coverslips (Fisher Scientific, 12–540A) until 80–90% confluent, washed with 1xPBS, fixed with 1% formaldehyde (Electron Microscopy Services) in 1xPBS for 10 min, and washed twice with ice-cold 1xPBS. Immobilized cells were lysed by incubation with LB1 for 3 min followed by LB2 for 10 min at room temperature. Cells were then subject to tagmentation (20% dimethylformamide, 10 mM MgCl2, 20 mM Tris-HCl pH 7.5, 33% 1xPBS, 0.01% digitonin, 0.1% Tween-20, and 80 pmol enzyme equivalent of enzyme:DNA complex in a total volume of 100 μL) for 30 min at 37 °C in a humidified chamber. Subsequently, cells were washed with 50 mM EDTA and 0.01% SDS in 1xPBS three times for 15 min each at 55 °C, lysed for 10 min with 0.5% Triton X-100 in 1xPBS at room temperature, and blocked with 1% BSA and 10% goat serum in PBS-T (1xPBS and 0.1% Tween-20) for 1 h in a humidified chamber. Primary antibody was added to slides in 1% BSA/PBS-T and incubated at 4 °C overnight; slides were then washed and subjected to secondary antibody staining for 1 h. Slides were washed with PBS-T three times for 15 min each, stained with DAPI (Sigma, 1 μg/mL) for 1 min, washed with PBS for 10 min, and mounted with Fluorescence Mounting Medium (Dako). Confocal microscopy images were taken with an LSM 880 Axio Imager 2 or an LSM 880 Axio Observer at 63x magnification (Zeiss). Images were processed with Fiji/ImageJ v2.0.0. Primary antibodies used were anti-RNA polymerase II CTD repeat YSPTSPS (phospho S2) (rabbit, Abcam ab5095, 1:500), anti-H3K27Ac (rabbit, Abcam ab4729, 1:500), anti-H3K9me3 (rabbit, Abcam ab8898, 1:500), anti-SC35 (mouse, SC-35, Abcam ab11826, 1:1000). Secondary antibodies used were Goat anti-Rabbit IgG (H+L) Secondary Antibody, Alexa Fluor 488 conjugate (Thermo Fisher Scientific A11008, 1:1000) and Goat anti-Mouse IgG (H+L) Cross-Adsorbed Secondary Antibody, Alexa Fluor 488 conjugate (Thermo Fisher Scientific A11001, 1:1000). Quantitative image analyses were performed with CellProfiler v3.1.5. Region of interests (ROIs) were identified from DAPI channel intensity values using minimum cross entropy thresholding, with each ROI corresponding to an individual nucleus. Pearson correlation coefficients were determined by comparing ATAC-see pixel intensities with corresponding immunofluorescence intensity values within each ROI to assess the nucleus-to-nucleus variation in colocalization.

Peroxidase activity assay.

5 pmol enzyme was incubated with 2.5 pmol hemin chloride (Cayman Chemical, dissolved in DMSO) for 1 h at room temperature. This molar ratio was selected given reports of APEX2 maximal heme occupancy between 40–57%. Heme:protein complexes were then subjected to 50 μM Amplex UltraRed (Thermo Fisher Scientific) and 1 mM hydrogen peroxide for 1 min at room temperature in a total volume of 100 μL with 1xPBS. Reactions were then quenched with 100 μL 2x quenching solution (10 mM Trolox, 20 mM sodium ascorbate, and 20 mM NaN3 in 1xPBS), and fluorescence intensities were measured on a SpectraMax iD3 plate reader with the SoftMax Pro v7.0.3 software, with excitation at 530 nm and emission at 590 nm.

DNA and protein tagging by iDAPT.

All iDAPT proteomic labeling assays were performed as described below unless indicated otherwise. 2.5 μmol MEDS-A/B, 2 μmol enzyme, and 1 μmol hemin chloride per channel were incubated at room temperature for 1 h. 1e7 cells per sample were washed (500 x g, 5 min, 4 °C), lysed and triturated in 100 μL LB1 (10 mM Tris-HCl pH 7.5, 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.01% digitonin, 0.1% Tween-20, 0.1% NP-40, and 1x cOmplete EDTA-free protease inhibitor cocktail [Roche]) for 3 min, and subsequently supplemented with an additional 1 mL of LB2 (10 mM Tris-HCl pH 7.5, 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.1% Tween-20, and 1x protease inhibitor). Nuclei were pelleted (500 x g, 10 min, 4 °C), resuspended with tagmentation reaction mixture (20% dimethylformamide, 10 mM MgCl2, 20 mM Tris-HCl pH 7.5, 33% 1xPBS, 1% BSA, 0.01% digitonin, 0.1% Tween-20, 500 μM biotin-phenol, 1x protease inhibitor, and 2 μmol enzyme equivalent of enzyme:DNA:heme complex in a total volume of 500 μL), and incubated at 37 °C for 30 min with agitation on a thermomixer (1,000 rpm). 5 μL of tagmentation mix was saved for quality assessment as described above for ATAC-seq/iDAPT-seq sample preparation. The remaining nuclear suspension was then washed 2x with 1xPBS supplemented with 500 μM biotin-phenol, 1% BSA, 0.1% Tween-20, and 1x protease inhibitor (3000 x g, 5 min, 4 °C) and labeled with 1 mM hydrogen peroxide and 500 μM biotin-phenol for 1 min in 1xPBS with 1x protease inhibitor in a volume of 500 μL. Peroxidation reactions were quenched with 500 μL 2x quenching buffer (10 mM Trolox, 20 mM sodium ascorbate, 20 mM NaN3, and 1x protease inhibitor in 1xPBS). Labeled nuclei were then pelleted, washed with 1x quenching buffer, resuspended in 500 μL RIPA containing protease inhibitors, and frozen at −80 °C. Lysates were thawed on ice, sonicated via a Sonic Dismembrator 100 (Fisher Scientific, setting 3, 15 s, 4 pulses), and incubated on ice for 30 min after the addition of 1 μL benzonase (EMD Millipore). Lysates were clarified by centrifugation (15,000 x g, 20 min, 4 °C), quantified via the detergent-compatible Bradford assay (Thermo Fisher Scientific), and subjected to either Western blotting or quantitative mass spectrometry analyses as described below. For NB4 cell analysis, an additional endogenous peroxidase blocking step was added after nuclear extraction and before tagmentation: nuclei were resuspended in 500 μL 1xPBS containing 1% BSA, 0.03% hydrogen peroxide, and 0.1% NaN3 and incubated on ice for 30 min. Nuclei were pelleted and washed 4x with 1xPBS/1% BSA (3000 x g, 5 min, 4 °C). Residual hydrogen peroxide was monitored by colorimetric assessment of supernatant via Quantofix peroxides test stick (Sigma).

Western blotting analysis.

Whole cell or nuclear lysates were generated by resuspending cells or nuclei in RIPA (Boston BioProducts) supplemented with 1x cOmplete EDTA-free protease inhibitor cocktail (Roche). Lysates were incubated on ice for 30 min, sonicated via a Sonic Dismembrator 100 (Fisher Scientific) at setting 3 with 3–4 pulses of 15 s on/off on ice, and treated with benzonase for an additional 30 min on ice. Lysates were clarified by centrifugation (15,000 x g, 20 min, 4 °C) and their concentrations quantified via the detergent-compatible Bradford assay (Thermo Fisher Scientific). All Western blots were run on NuPAGE 4–12% Bis-Tris protein gels (Thermo Fisher Scientific) and transferred to 0.2 μm nitrocellulose membranes (GE Healthcare). Membranes were blocked with 3% milk in PBS-T and incubated overnight with primary antibody and subsequently with secondary antibody after brief washing with PBS-T. Chemiluminescence was determined by applying ECL Western Blotting detection reagent (GE Healthcare) to membranes and imaging on an Amersham Imager 600 (GE Healthcare). Membranes were stripped with Restore PLUS Stripping Buffer (Thermo Fisher Scientific). Primary antibodies used were anti-FLAG M2 (mouse, Sigma-Aldrich, F1804, 1:2000), anti-PCNA (mouse, PC10, Santa Cruz Biotechnology sc-56, 1:1000), and anti-PML (rabbit, Bethyl A301–167A, 1:1000). Secondary antibodies used were Rabbit IgG, HRP-linked F(ab’)2 fragment (GE Healthcare NA9340, from donkey, 1:5000) and Mouse IgG, HRP-linked whole Ab (GE Healthcare NA931, from sheep, 1:5000). Streptavidin-HRP (Cell Signaling Technology #3999S, 1:1000) was also used for probing.

Streptavidin enrichment and tandem mass tag labeling.

250 μg (K562) or 150 μg (NB4) lysate was reduced with 5 mM DTT and then added to 60 μL (K562) or 90 μL (NB4) Pierce streptavidin bead slurry equilibrated 2x with RIPA buffer. Lysate/bead mixture was incubated with end-to-end rotation overnight at 4 °C. Beads were washed 3x with RIPA, 2x with 200 mM EPPS pH 8.5, and resuspended with 100 μL 200 mM EPPS pH 8.5, with beads resuspended and incubated with end-to-end rotation for 5 min per wash. 1 μL mass spectrometry-grade LysC (Wako) was added to each tube and incubated at 37 °C for 3 h with mixing, and an additional 1 μL mass spectrometry-grade trypsin (Thermo Fisher Scientific) was added, followed by overnight incubation at 37 °C with mixing. Beads were magnetized, and eluate was collected and subjected to downstream TMT labeling. Peptides were processed using the SL-TMT method[24]. TMT reagents (0.8 mg) were dissolved in anhydrous acetonitrile (40 μL), of which 10 μL was added to each peptide suspension (100 μL) with 30 μL of acetonitrile to achieve a final acetonitrile concentration of approximately 30% (v/v). Following incubation at room temperature for 1 h, the reaction was quenched with hydroxylamine to a final concentration of 0.3% (v/v). The TMT-labeled samples were pooled at a 1:1 ratio across all samples. The pooled sample was vacuum centrifuged to near dryness and subjected to C18 solid-phase extraction (SPE) (Sep-Pak, Waters).

Off-line basic pH reversed-phase (BPRP) fractionation.

We fractionated the pooled TMT-labeled peptide sample using BPRP HPLC[69]. We used an Agilent 1200 pump equipped with a degasser and a photodiode array (PDA) detector (set at 220 and 280 nm wavelength) from ThermoFisher Scientific (Waltham, MA). Peptides were subjected to a 50-min linear gradient from 9% to 35% acetonitrile in 10 mM ammonium bicarbonate pH 8 at a flow rate 600 μL/min over an Agilent 300Extend C18 column (3.5 μm particles, 4.6 mm ID and 220 mm in length). The peptide mixture was fractionated into a total of 96 fractions, which were consolidated into 24 super-fractions[70]. Samples were subsequently acidified with 1% formic acid and vacuum centrifuged to near dryness. Each consolidated fraction was desalted via StageTip, dried again via vacuum centrifugation, and reconstituted in 5% acetonitrile, 5% formic acid for LC-MS/MS processing.

LC-MS/MS proteomic analysis.

Samples were analyzed on an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific, San Jose, CA) coupled to a Proxeon EASY-nLC 1200 liquid chromatography (LC) pump (Thermo Fisher Scientific). Peptides were separated on a 100 μm inner diameter microcapillary column packed with 35 cm of Accucore C18 resin (2.6 μm, 150 Å, ThermoFisher). For each analysis, approximately 2 μg of peptides were separated using a 150 min gradient of 8 to 28% acetonitrile in 0.125% formic acid at a flow rate of 450–500 nL/min. Each analysis used an MS3-based TMT method[71,72], which has been shown to reduce ion interference compared to MS2 quantification[73]. The data were collected as described previously using an SPS-MS3 method[74].

Proteomic data analysis.

Mass spectra were processed using a Sequest-based pipeline[75], as described previously[76]. Database searching included all entries from the human UniProt database, which was concatenated with one composed of all protein sequences in the reversed order. Oxidation of methionine residues (+15.995 Da) was set as a variable modification, and TMT tags on lysine residues and peptide N-termini (+229.163 Da) and carbamidomethylation of cysteine residues (+57.021 Da) were set as static modifications. Peptide-spectrum matches (PSMs) were adjusted to a 1% false discovery rate (FDR)[77,78] using a linear discriminant analysis (LDA), as described previously[75]. For quantitation, we extracted the summed signal-to-noise (S:N) ratio for each TMT channel and omitted PSMs with poor quality, MS3 spectra with TMT reporter summed signal-to-noise of less than 100, or isolation specificity < 0.7[79]. PSM intensities were normalized by taking the median intensity of streptavidin and trypsin PSMs per sample as a normalization factor, as these proteins are added to each sample in equal amounts post-enrichment. Normalized PSMs were then log2-transformed and collapsed to proteins by arithmetic average, with priority given to uniquely mapping peptides. Hierarchical clustering, Pearson correlation, and principal component analyses were performed at the protein level. The limma package in R was used to determine differential protein abundances.

Protein enrichment analyses.

Gene set enrichment analyses of iDAPT-MS datasets were performed with the fgsea package (10,000 permutations) in R, using UniProt protein identifications ranked by their log2 fold changes from limma[80]. Gene sets used for analyses: CORUM (v3.0) protein complex annotations[32], Human Protein Atlas (v19) subcellular localization annotations with reliability demarcated as “Enhanced” or “Supported”[27], BioGrid (v3.5.178) multi-validated protein interaction annotations[35], ReactomeDB (v70) pathway to gene mappings from fgsea via the “reactomePathways” function[81], and CisBP transcription factors from the “human_pwms_v2” dataset curated as in the chromVARmotifs package in R[36,47]. All gene identities were converted to UniProt prior to analysis via biomaRt in R. Protein interaction networks were visualized with igraph v1.2.4. Four classes of nuclear proteins were collated: histones, chromatin remodelers, transcription factors, and RNA-binding proteins. Histone UniProt IDs were collated from Histone DB 2.0[82] and UniProt with search query “Nucleosome core”[83]. Chromatin remodeler proteins were obtained from UniProt IDs associated with “GO:0006338” (“chromatin remodeling”)[84] and CORUM protein complex components associated with the five primary chromatin remodelers[32]: NuRD, SWI, ISWI, INO80, SWR1. High-confidence RNA binding proteins were obtained from hRBPome[85], and transcription factors were obtained from Lambert et al[3]. K562 RNA-seq[25] (ENCFF664LYH and ENCFF855OAF), whole cell proteome[33], and nuclear proteome[34] datasets were downloaded and converted to UniProt IDs. RNA-seq genes were filtered for those with nonzero read counts (transcripts per million) in both replicates[25]. The whole cell proteomic dataset was filtered by removing peptides with missing quantitations[33]. The nuclear proteome dataset was preprocessed by removing peptides with multiple UniProt IDs and collating remaining UniProt IDs across all salt extraction conditions[34]. For determination of proteins associated with specific extraction conditions, we followed a procedure as reported by Federation et al.: peptide intensities were normalized by total intensities for a given sample, collapsed to protein intensities by arithmetic mean, scaled to maximum intensities of 1, and subjected to k-means clustering analysis using k = 8 for clustering[34]. Protein annotations from Alajem et al. were converted from mouse to human homologs via biomaRt in R, and gene sets (1000U, 45U, 3U) were compiled taking the sets of protein IDs with scores greater than 95 in either ES or NPC sample types[13]. Additional publicly available open chromatin proteome datasets were downloaded, and gene identities were converted to UniProt IDs[12,14]. Because published datasets differ in their analytical depths from our iDAPT-MS datasets, we converted gene identifiers to Human Protein Atlas subcellular enrichment proportions for better comparison. Specifically, the proportion for each subcellular localization term and for each dataset was calculated as the (number of proteins overlapping between the subcellular term and the dataset) / (number of proteins overlapping between all annotated Human Protein Atlas proteins and the dataset). These proportions were used as features for principal component analysis.

CUT&RUN sample preparation.

pAG/MNase (Addgene #123461) was expressed in Rosetta2 cells (EMD Millipore), purified with the Pierce His Protein Interaction Pull-Down kit (Thermo), and stored at either −80 °C for long-term storage or −20 °C for working stocks[86]. CUT&RUN was performed similarly as previously reported[26]. 500,000 K562 cells per assay were washed three times (room temperature, 3 min, 600 x g) in wash buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine, and 1x cOmplete EDTA-free protease inhibitor cocktail [Roche]). Concavalin A beads were activated by washing beads in binding buffer (20 μM HEPES pH 7.5, 10 mM KCl, 1 mM CaCl2, 1 mM MnCl2). 10 μL activated Concavalin A beads were added to 100 μL cell suspension and incubated with rotation for 10 min at room temperature. Supernatant was removed, and 100 μL wash buffer containing 0.01% digitonin (dig-wash buffer) was added. Antibodies were added at 1:50 concentration, and tubes were incubated with rotation overnight at 4 °C. Beads were washed with dig-wash buffer, pAG/MNase was added at a final concentration of 2 μg/mL, and suspensions were incubated for 1 h at 4 °C. Beads were further washed with wash buffer, resuspended in 100 μL wash buffer, and chilled to 0 °C in an ice-water bath. 2 μL 0.1 M CaCl2 was added to each tube, and tubes were incubated for 1 h at 0 °C. 100 μL stop buffer (340 mM NaCl, 20 mM EDTA, 4 mM EGTA, 0.05% digitonin, 100 μg/mL RNase A, 50 μg/mL GlycoBlue) was added, and tubes were incubated for 15 min 37 °C to release DNA fragments. Supernatant was collected, SDS (0.1% final) and proteinase K (250 μg/mL final) were added to each 200 μL sample, and tubes were incubated for 1 h at 50 °C. DNA was isolated by phenol/chloroform extraction, and libraries were constructed using the NEBNext Ultra kit (NEB) as previously described[52]. Libraries were then subject to TapeStation 2200 High Sensitivity D1000 fragment size analysis (Agilent) and NextSeq 500 High Output paired-end sequencing (2×42 bp, Illumina). Primary antibodies used for CUT&RUN were: ERH (Bethyl, A305–402A; 1:50), WBP11 (Bethyl, A304–855A; 1:50), and normal rabbit IgG (EMD Millipore, #12–370; 1:50). Antibodies used for CUT&RUN were validated by immunoprecipitation followed by Western blotting analysis. K562 cells were lysed in RIPA, and 1.5 μL antibody was added to 500 μg protein lysate and incubated overnight at 4 °C. The next day, lysates were incubated with 20 μL Pierce protein A magnetic beads (Thermo) for 2 h at 4 °C, beads were washed in RIPA buffer, and bound protein was boiled in 2x LDS sample buffer for 10 min. Resulting protein lysates were subjected to Western blotting analysis as described above. Primary antibodies used for Western blotting were: ERH (Atlas Antibodies, HPA002567; 1:1,000) and WBP11 (Bethyl, A304–857A; 1:1,000).

CUT&RUN analysis.

Paired-end sequencing reads were trimmed with TrimGalore v0.4.5 to remove adaptor sequence GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT with additional removal of fragments smaller than 25 bp. Reads were aligned to the hg38 reference genome using bowtie2 v2.2.9 with options “--no-unal --no-discordant --no-mixed --dovetail -I 25 -X 700”. Reads mapping to the mitochondrial genome were subsequently removed, and duplicate reads were removed with Picard v2.8.0. Reads smaller than 120 bp were retained for subsequent analysis. Visualization was performed by mapping insertions to a genome-wide sliding 150 bp window with 20 bp offsets with bedops v2.4.30, followed by conversion to bigwig format with wigToBigWig from UCSC tools v363. Genome tracks were visualized with Integrative Genomics Viewer v2.5.0. Open chromatin regions were defined as 1% FDR-thresholded MACS2 peaks obtained from K562 iDAPT-seq relative to genomic DNA input as described above. CUT&RUN signal was determined relative to these peak regions and normalized by the signal intensity between +1950 and +2000 bp distal to the peak summit, representing background enrichment. CUT&RUN peaks were called by MACS2 v2.1.1 using options “callpeak -q 0.01 --keep-dup all”. CUT&RUN and ChIP-seq peak overlap analyses were performed with bedtools v2.26.0 using the intersect function.

ATAC-seq/iDAPT-seq transcription factor analysis.

Motif enrichment analysis was performed with ChromVAR as previously described using the human_pwms_v2 set of curated CisBP transcription factor motifs[36,47]. ChromVAR motif deviations from the computeDeviations function were used for principal component analysis, and FDR-adjusted p-values were obtained with the differentialDeviations function with default settings. Bivariate footprinting analysis was performed similarly as previously described with slight modifications[10,87]. CisBP motifs curated from the ChromVAR human_pwms_v2 dataset[36,47] or motifs for ZEB2[88] and EBF3[89] were matched within peaks using matchMotifs from motifmatchr in R. Motif alignments were extended by 250 bp on each side, and adjusted transposon insertions were mapped to the corresponding regions. Motif flank height was determined by the average insertion rate between positions +1 to +50 bp, immediately flanking the motif. Background insertions were determined by the average insertion rate between positions +200 to +250 bp, distal to the positioned motif. Footprint height was determined by the 10% trimmed mean of the insertion rate within the 10–11 bp positioned around the center of the motif. Footprint depth (FPD) was determined as the log2 count ratio of footprint height over flank height; flanking accessibility (FA) was determined as the log2 count ratio of flank height over background. The norm of the orthogonal projection of FA and FPD scores onto the −45° line was used as a raw footprinting score. A linear regression model was implemented (footprinting score ~ transcription factor + transcription factor:treatment), from which the t-statistic of the interaction term per transcription factor motif (transcription factor:treatment) was used as the composite footprinting score, and the corresponding p-value, adjusted to false discovery rate with the Benjamini-Hochberg method, was used to assess significance. For analysis of transcription factor activity at steady-state, composite footprinting scores were modeled by a two-state Gaussian mixture model with mixtools in R, and class A footprinted motifs (strong footprinting) were determined to be those with greater than 50% probability of being in the Gaussian distribution further away from the origin. Class C footprinted motifs (no/negative footprinting) were determined as those with weak statistical significance (FDR > 5%) or negative enrichment (composite footprinting score < 0). Positive and significant footprinted motifs not in class A were demarcated as class B footprinted motifs (weak footprinting). Consensus transcription factor classifications were determined by concordance between K562 and NB4 steady-state footprinting analyses, limited to those transcription factors exhibiting positive significant enrichment from both corresponding iDAPT-MS datasets. For classification of transcription factors upon ATRA treatment, FDR < 5% thresholds of iDAPT-MS abundance and iDAPT-seq footprinting profiles were used to discriminate between classes.

ChIP-seq analysis.

ENCODE ChIP-seq transcription factor datasets were downloaded from the ENCODE data portal[25] (https://www.encodeproject.org/). ENCODE K562 ChIP-seq datasets are listed in Supplementary Table 3. In brief, ChIP-seq bed files aligned to hg38 and annotated as “optimal IDR peaks” were downloaded, and iDAPT-seq peaks overlapping with ChIP-seq peaks were collated. ChIP-seq enrichment within open chromatin was determined by gene set enrichment analysis using iDAPT-seq differential peaks ranked by log2 fold change using the fgsea package in R. Colocalization of ChIP-seq epitopes on open chromatin was determined using the Jaccard similarity coefficient, with colocalization determined if ChIP-seq peaks from different epitopes overlap a given iDAPT-seq peak.

Granulocytic differentiation analysis.

NB4 cells treated either with DMSO or 1 μM ATRA were washed with 2% fetal bovine serum prior to staining. Anti-human CD11b-PE-Cy7 antibody conjugate (Clone: ICRF44, Biolegend Catalog #301321; 1:100) and anti-human CD11c-APC antibody conjugate (Clone: B-ly6, BD Pharmingen #559877; 1:100) were incubated with samples for 20 min and then washed to remove excess antibody. Stained samples were analyzed on a Beckman Coulter CytoFLEX LX flow cytometer with the CytoExpert v2.3.1.22 software. Data were analyzed with FlowJo v10.0.7.

Cell proliferation assay.

NB4 cells were seeded at a density of 5e5 cells/mL subjected to either DMSO or 1 μM ATRA. After 48 h, 50 μL cell suspension was added to 50 μL CellTiter-Glo reagent, incubated for 10 min at room temperature, and assayed for luminescence with a SpectraMax iD3 plate reader.

Genetic dependency analysis.

Genetic dependency map (DepMap) scores generated from CRISPR/Cas9 pooled screening (Avana) were downloaded (19Q3, https://depmap.org/portal/). DepMap scores from hematopoietic cancer cell lines were collated, and the distribution of dependency scores was modeled as a two-state Gaussian mixture model with mixtools in R. Gene dependency was determined as the threshold corresponding to 50% probability of being in either distribution. Essential genes across hematopoietic cell lines were those genes representing dependencies across at least 50% of profiled hematopoietic cell lines.

RNA-seq analysis.

Raw sequencing reads (GSM1288651, GSM1288652, GSM1288653, GSM1288654, GSM1288659, GSM1288660, GSM1288661, GSM1288662, GSM2464389, GSM2464392) were aligned to a reference transcriptome generated from the Ensembl v94 database with salmon v0.14.1 using options “--seqBias --useVBOpt --gcBias --posBias --numBootstraps 30 --validateMappings”. Length-scaled transcripts per million were acquired using the tximport function, and log2 fold changes and false discovery rates were determined by DESeq2 in R, with batch as a covariate. Principal component analysis was performed with counts transformed by the varianceStabilizingTransformation function from DESeq2, and shrunken log2 fold changes were determined with DESeq2, which were used to rank genes for gene set enrichment analysis. For comparison of RNA-seq and mass spectrometry datasets, gene symbols and Ensembl gene IDs were matched to UniProt IDs via biomaRt.

Statistical analysis.

No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. All statistical analyses were performed in R[90]. Two-tailed statistical tests were used unless stated otherwise. Multiple comparison adjustments were performed as noted.

Data availability.

iDAPT-seq/ATAC-seq and CUT&RUN datasets are deposited in GEO (GSE158350). iDAPT-MS proteomics data are deposited to the ProteomeXchange Consortium via the PRIDE partner repository (PXD022252). Raw confocal image files (.czi) are deposited to the Dryad repository at https://doi.org/10.5061/dryad.4xgxd257p. Raw iDAPT-seq/ATAC-seq sequencing data (GSE158350, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158350) are associated with the following figures: Fig 1b–c and Extended Data Fig 2 (GM12878 ATAC-seq, iDAPT-seq); Fig 2g–h, Fig 3, and Extended Data Figs 5, 7–8 (K562 iDAPT-seq); Fig 4g, Extended Data Figs 7–8, and Supplementary Figs 5–9 (NB4 iDAPT-seq). Raw CUT&RUN sequencing data (GSE158350, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158350) are associated with the following figures: Fig. 2c and Extended Data Fig 5. Raw mass spectrometry data (PXD022252, https://www.ebi.ac.uk/pride/archive/projects/PXD022252) are associated with the following figures: Fig 2, Fig 3, Extended Data Figs. 3, 6, 8, and Supplementary Figs 3–4 (K562 iDAPT-MS); Fig 4, Extended Data Figs. 4, 6, 8–10, and Supplementary Figs. 4, 6–10 (NB4 iDAPT-MS). Preprocessed mass spectrometry data are available as supplementary tables (Supplementary Tables 1–2). Raw confocal microscopy image data (https://doi.org/10.5061/dryad.4xgxd257p) are associated with the following figures: Fig 1d–e, 2d, and Extended Data 6d–e. Publicly available sequencing datasets used are as follows: GM12878 ATAC-seq: https://www.ncbi.nlm.nih.gov//geo/query/acc.cgi?acc=GSE47753 (SRR891268, SRR891269, SRR891270, SRR891271), https://www.ncbi.nlm.nih.gov/bioproject/PRJNA482539 (SRR7586167, SRR7586168), https://www.ncbi.nlm.nih.gov/bioproject/PRJNA305986 (SRR2999312, SRR2999313, SRR2999314, SRR2999315), https://www.ncbi.nlm.nih.gov/bioproject/PRJNA380283 (SRR5427884, SRR5427885, SRR5427886, SRR5427887); ENCODE K562 ChIP-seq: https://www.encodeproject.org/, with unique identifiers listed in Supplementary Table 3; ENCODE K562 RNA-seq: https://www.encodeproject.org/files/ENCFF664LYH/@@download/ENCFF664LYH.tsv and https://www.encodeproject.org/files/ENCFF855OAF/@@download/ENCFF855OAF.tsv; NB4 +/− ATRA RNA-seq: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53258 (GSM1288651, GSM1288652, GSM1288653, GSM1288654), https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53259 (GSM1288659, GSM1288660, GSM1288661, GSM1288662), and https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93877 (GSM2464389, GSM2464392). Publicly available proteome datasets used are as follows: whole cell proteome: https://gygi.med.harvard.edu/sites/gygi.med.harvard.edu/files/documents/protein_quant_current_normalized.csv.gz; nuclear proteome and differential salt fractionation: https://ars.els-cdn.com/content/image/1-s2.0-S2211124720301303-mmc2.xlsx, Alajem et al.: https://www.cell.com/cms/10.1016/j.celrep.2015.02.064/attachment/daebc867–0c82–45ef-837b-b408682c76cf/mmc2.xlsx; Torrente et al.: https://doi.org/10.1371/journal.pone.0024747.s004 and https://doi.org/10.1371/journal.pone.0024747.s006; Kulej et al.: https://www.mcponline.org/highwire/filestream/35613/field_highwire_adjunct_files/5/TABLE_S5_Host_chromatin_bound_proteome.xlsx. Additional public reference datasets are as follows: hg38 reference genome: ftp://ftp.ensembl.org/pub/release-94/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz; hg38 blacklist regions: https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz; CORUM v3.0 complexes: http://mips.helmholtz-muenchen.de/corum/download/allComplexes.txt.zip; Human Protein Atlas v19: https://www.proteinatlas.org/download/subcellular_location.tsv.zip; BioGrid v3.5.178: https://downloads.thebiogrid.org/File/BioGRID/Release-Archive/BIOGRID-3.5.178/BIOGRID-MV-Physical-3.5.178.tab2.zip; Lambert et al. transcription factors: https://www.cell.com/cms/10.1016/j.cell.2018.01.029/attachment/ede37821-fd6f-41b7–9a0e-9d5410855ae6/mmc2.xlsx; HistoneDB 2.0: https://www.ncbi.nlm.nih.gov/research/HistoneDB2.0/HistoneDB/static/browse/dumps/seqs.txt; hRBPome: http://caps.ncbs.res.in/hrbpome/downloads/high_confidence_proteins.fasta; DepMap 19Q3: https://ndownloader.figshare.com/files/16757666. CisBP transcription factors (http://cisbp.ccbr.utoronto.ca/) were obtained via the command data(“human_pwms_v2”) in R package “chromVARmotifs”: https://github.com/GreenleafLab/chromVARmotifs. ReactomeDB v70 pathway annotations (https://reactome.org/) were obtained via the “reactomePathways” command in R package “fgsea”: https://bioconductor.org/packages/release/bioc/html/fgsea.html. Gene Ontology (http://geneontology.org/) was queried from org.Hs.eg.db using the “select” function from AnnotationDbi in R. UniProt IDs (https://www.uniprot.org/) were either downloaded from the UniProt website or collated via biomaRt in R (https://www.bioconductor.org/packages/release/bioc/html/biomaRt.html).

Code availability.

R code used in this manuscript is deposited at https://github.com/jonathandlee12/iDAPT-MS.

Optimization of transposase/peroxidase fusion probes for transposase activity.

Assessment of transposase activity on native chromatin.

(a) Fragment size distributions of GM12878 ATAC-seq/iDAPT-seq libraries. (b) Ratio of transposon insertions at Ensembl v94 transcription start sites (TSS) relative to background from in-house ATAC-seq/iDAPT-seq and published ATAC-seq libraries from refs. [8,18-20] generated from the GM12878 cell line (n = 1). (c) Proportion of non-mitochondrial reads from GM12878 ATAC-seq/iDAPT-seq libraries. (d) Heatmap of pairwise Pearson correlation coefficients of genome-wide transposon insertion frequencies for the indicated GM12878 ATAC-seq/iDAPT-seq libraries.

Assessment of iDAPT protein labeling in the K562 cell line.

Assessment of iDAPT protein labeling in the NB4 cell line.

Analysis of open chromatin protein localization by ChIP-seq and CUT&RUN.

Analysis of subcellular enrichment by iDAPT-MS.

Assessment of TP3 iDAPT-seq from native chromatin versus naked genomic DNA.

Classification of transcription factors by footprinting activity.

Analysis of NB4 iDAPT-MS profiles upon treatment with ATRA.

Integrative analysis of iDAPT-MS and iDAPT-seq transcription factor abundance and activities.

88 in total

1. Streamlined Tandem Mass Tag (SL-TMT) Protocol: An Efficient Strategy for Quantitative (Phospho)proteome Profiling Using Tandem Mass Tag-Synchronous Precursor Selection-MS3.

Authors: José Navarrete-Perea; Qing Yu; Steven P Gygi; Joao A Paulo
Journal: J Proteome Res Date: 2018-05-16 Impact factor: 4.466

2. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.

Authors: Joshua E Elias; Steven P Gygi
Journal: Nat Methods Date: 2007-03 Impact factor: 28.547

3. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues.

Authors: M Ryan Corces; Alexandro E Trevino; Emily G Hamilton; Peyton G Greenside; Nicholas A Sinnott-Armstrong; Sam Vesuna; Ansuman T Satpathy; Adam J Rubin; Kathleen S Montine; Beijing Wu; Arwa Kathiria; Seung Woo Cho; Maxwell R Mumbach; Ava C Carter; Maya Kasowski; Lisa A Orloff; Viviana I Risca; Anshul Kundaje; Paul A Khavari; Thomas J Montine; William J Greenleaf; Howard Y Chang
Journal: Nat Methods Date: 2017-08-28 Impact factor: 28.547

Review 4. Purification and enrichment of specific chromatin loci.

Authors: Mathilde Gauchier; Guido van Mierlo; Michiel Vermeulen; Jérôme Déjardin
Journal: Nat Methods Date: 2020-03-09 Impact factor: 28.547

5. Time-resolved Global and Chromatin Proteomics during Herpes Simplex Virus Type 1 (HSV-1) Infection.

Authors: Katarzyna Kulej; Daphne C Avgousti; Simone Sidoli; Christin Herrmann; Ashley N Della Fera; Eui Tae Kim; Benjamin A Garcia; Matthew D Weitzman
Journal: Mol Cell Proteomics Date: 2017-02-08 Impact factor: 5.911

6. Pervasive Chromatin-RNA Binding Protein Interactions Enable RNA-Based Regulation of Transcription.

Authors: Rui Xiao; Jia-Yu Chen; Zhengyu Liang; Daji Luo; Geng Chen; Zhi John Lu; Yang Chen; Bing Zhou; Hairi Li; Xian Du; Yang Yang; Mingkui San; Xintao Wei; Wen Liu; Eric Lécuyer; Brenton R Graveley; Gene W Yeo; Christopher B Burge; Michael Q Zhang; Yu Zhou; Xiang-Dong Fu
Journal: Cell Date: 2019-06-27 Impact factor: 41.582

7. jun-B inhibits and c-fos stimulates the transforming and trans-activating activities of c-jun.

Authors: J Schütte; J Viallet; M Nau; S Segal; J Fedorko; J Minna
Journal: Cell Date: 1989-12-22 Impact factor: 41.582

8. Jun-B differs in its biological properties from, and is a negative regulator of, c-Jun.

Authors: R Chiu; P Angel; M Karin
Journal: Cell Date: 1989-12-22 Impact factor: 41.582

9. Pol II phosphorylation regulates a switch between transcriptional and splicing condensates.

Authors: Yang Eric Guo; John C Manteiga; Jonathan E Henninger; Benjamin R Sabari; Alessandra Dall'Agnese; Nancy M Hannett; Jan-Hendrik Spille; Lena K Afeyan; Alicia V Zamudio; Krishna Shrinivas; Brian J Abraham; Ann Boija; Tim-Michael Decker; Jenna K Rimel; Charli B Fant; Tong Ihn Lee; Ibrahim I Cisse; Phillip A Sharp; Dylan J Taatjes; Richard A Young
Journal: Nature Date: 2019-08-07 Impact factor: 49.962

10. CORUM: the comprehensive resource of mammalian protein complexes.

Authors: Andreas Ruepp; Barbara Brauner; Irmtraud Dunger-Kaltenbach; Goar Frishman; Corinna Montrone; Michael Stransky; Brigitte Waegele; Thorsten Schmidt; Octave Noubibou Doudieu; Volker Stümpflen; H Werner Mewes
Journal: Nucleic Acids Res Date: 2007-10-26 Impact factor: 16.971