H Courtney Hodges1, Benjamin Z Stanton2,3,4, Katerina Cermakova1,5, Chiung-Ying Chang2, Erik L Miller2, Jacob G Kirkland2, Wai Lim Ku3, Vaclav Veverka5, Keji Zhao6, Gerald R Crabtree7,8. 1. Department of Molecular & Cellular Biology, Center for Precision Environmental Health, and Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA. 2. Departments of Pathology, Genetics, and Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA. 3. Systems Biology Center, Laboratory of Epigenome Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA. 4. Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, USA. 5. Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic. 6. Systems Biology Center, Laboratory of Epigenome Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA. zhaok@nhlbi.nih.gov. 7. Departments of Pathology, Genetics, and Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA. crabtree@stanford.edu. 8. Howard Hughes Medical Institute, Chevy Chase, MD, USA. crabtree@stanford.edu.
Abstract
Mutation of SMARCA4 (BRG1), the ATPase of BAF (mSWI/SNF) and PBAF complexes, contributes to a range of malignancies and neurologic disorders. Unfortunately, the effects of SMARCA4 missense mutations have remained uncertain. Here we show that SMARCA4 cancer missense mutations target conserved ATPase surfaces and disrupt the mechanochemical cycle of remodeling. We find that heterozygous expression of mutants alters the open chromatin landscape at thousands of sites across the genome. Loss of DNA accessibility does not directly overlap with Polycomb accumulation, but is enriched in 'A compartments' at active enhancers, which lose H3K27ac but not H3K4me1. Affected positions include hundreds of sites identified as superenhancers in many tissues. Dominant-negative mutation induces pro-oncogenic expression changes, including increased expression of Myc and its target genes. Together, our data suggest that disruption of enhancer accessibility represents a key source of altered function in disorders with SMARCA4 mutations in a wide variety of tissues.
Mutation of SMARCA4 (BRG1), the ATPase of BAF (mSWI/SNF) and PBAF complexes, contributes to a range of malignancies and neurologic disorders. Unfortunately, the effects of SMARCA4 missense mutations have remained uncertain. Here we show that SMARCA4 cancer missense mutations target conserved ATPase surfaces and disrupt the mechanochemical cycle of remodeling. We find that heterozygous expression of mutants alters the open chromatin landscape at thousands of sites across the genome. Loss of DNA accessibility does not directly overlap with Polycomb accumulation, but is enriched in 'A compartments' at active enhancers, which lose H3K27ac but not H3K4me1. Affected positions include hundreds of sites identified as superenhancers in many tissues. Dominant-negative mutation induces pro-oncogenic expression changes, including increased expression of Myc and its target genes. Together, our data suggest that disruption of enhancer accessibility represents a key source of altered function in disorders with SMARCA4 mutations in a wide variety of tissues.
Epigenetic deregulation is widely regarded as a hallmark of cancer and
certain neurologic disorders. Among the most frequently mutated chromatin regulatory
complexes are polymorphic BAF (mSWI/SNF) and PBAF complexes[1,2]. These ATP-dependent chromatin remodelers are mutated in
~20% of cancers[3,4], with mutations in specific subunits
contributing to specific cancers[1].
For cancers such as malignant rhabdoid tumors[5,6] and synovial
sarcoma[7], tissue-specific
alterations of individual BAF subunits arise in ~100% of cases, and
repair of these defects in cell-culture models leads to loss of both transformation
and proliferation. Moreover, animal models and epigenomic studies of human patients
have provided extensive evidence that these complexes are major tumor
suppressors[8] whose
mutations contribute to transformation in many cell types[9].In malignant rhabdoid tumors (MRTs) and synovial sarcoma (SS), the ability of
BAF complexes to oppose Polycomb repression proximal to transcription start sites
(TSSs) is central to their role in these malignancies. In MRTs, biallelic loss of
SMARCB1 induces abnormal subunit composition[10,11], and impairs opposition to Polycomb silencing at the
tumor-suppressor gene p16 (INK4A, refs. [12-14]). Silencing of p16 in turn impairs
cell-cycle control, thereby inducing rapid transformation and proliferation without
genomic instability[15,16]. In SS, the protein product of the
SS18-SSX gene fusion[17] incorporates into BAF complexes, leading to reduction of
the Polycomb-associated silencing mark H3K27me3 near the TSSs of oncogenic genes
such as EGR1 (ref. [18]) and SOX2 (ref. [7]). For both MRTs and SS, changes of Polycomb
occupancy at TSSs across the genome coincide with transformation, and in both cases,
these changes are reversible by exogenous expression or repair of the underlying BAF
defect[7,19,20].Recent studies in BAF-deficient cancer models have revealed alterations at
enhancers distal to TSSs, which contain transcription factor binding sites. In MRT
cells, loss of SMARCB1 induces characteristic changes of the
permissive mark H3K27ac at tissue-restricted enhancers, but largely preserves the
function of superenhancers[11].
Moreover, in animal models of colon cancer, loss of the BAF subunit Arid1a leads to
impaired enhancer function, with deregulation of APC–beta-catenin signaling
pathways[21]. Similar
results have been observed in mouse embryonic fibroblasts[22]. In mouse embryonic stem cells (mESCs),
topoisomerase inhibition leads to impairment of BAF’s ability to generate
accessible sites over the genome, indicating these factors work in tandem to
generate accessibility[23]. Thus,
emerging evidence suggests that altered function at enhancers arises upon loss of
BAF activity, in addition to Polycomb deregulation.SMARCA4 (BRG1), a central ATPase of BAF complexes, is the
most frequently mutated Snf2-like gene in human cancer[1], and has been identified as a major tumor suppressor
in pan-cancer studies[1,8]. Unlike the tissue-restricted
alterations of SMARCB1 and SS18, loss-of-function
mutations of SMARCA4 are enriched in diverse cancer types,
contributing to a range of cancers including those of the lung[24-26], ovaries[27-29],
skin[4,30], thoracic sarcomas[31], and lymphomas[32,33].
SMARCA4 inactivation is especially common in small cell
carcinoma of the ovary, hypercalcemic type (SCCOHT)[27,28]. Dual
biallelic inactivation of SMARCA4 and the related ATPase
SMARCA2 are highly specific to SCCOHT[29], indicating that complete loss of BAF and PBAF
ATP-dependent chromatin remodeling activity is an essential feature of this
malignancy.In contrast to the precise, reproducible genetic changes observed in SCCOHT,
the impacts of diverse heterozygous SMARCA4 mutations distributed
across many cancer types have remained uncertain and controversial. Recently, we
showed that deletion and heterozygous cancer mutations of the SMARCA4 ATPase induce
genome-wide accumulation of Polycomb Repressive Complexes 1 and 2 (PRC1 and PRC2) at
bivalent TSSs across the genome[34].
However, it has remained uncertain how these mutations impair SMARCA4’s
fundamental mechanisms to promote malignancy. Here we show that these mutations
induce divergent effects on SMARCA4 dynamics in living cells, but the impairment of
normal function through distinct mechanisms results in convergent changes of the DNA
accessibility landscape at enhancers and superenhancers used in many cell types.
Results
Heterozygous mutation of SMARCA4 is a common feature of many
cancers
To assess the frequency of heterozygous SMARCA4 missense
mutations in cancer, we performed a pan-cancer analysis of annotated human tumor
sequencing studies, including The Cancer Genome Atlas (TCGA) whole-exome
studies[35] and
MSK-IMPACT[36] targeted
sequencing studies where SMARCA4 mutations and deletions were
observed. We eliminated samples with only silent mutations and no other changes,
and furthermore restricted our analysis to samples where copy number analysis
had also been performed, leaving 927 human tumor samples. Analysis of the
overall genetic state of SMARCA4 in these samples (Figure 1a) shows that most samples
(56.5%, 524/927) contain a single allele affected by missense mutations
without other aberrations. In contrast, other major tumor suppressors, such as
ARID1A (20.9%, 370/1773, p<2.2e-16) and
RB1 (14.5%, 155/1066, p<2.2e-16), have
significantly lower fractions of samples with only single missense mutations
(Figure 1b).
Figure 1
Heterozygous SMARCA4 mutations contribute to many cancers
(a) Heat map illustrating the genomic states of 927 human
tumor samples from different tissues with SMARCA4 mutations and deletions.
Samples without genetic changes to SMARCA4 or with only silent mutations are not
shown. (b)
SMARCA4 has disproportionately more missense mutations than
other tumor suppressors like ARID1A and RB1. Pie chart showing the proportion of
tumor samples showing patterns of genetic aberrations. Fisher exact test
comparing proportion of missense mutations compared to SMARCA4,
*** p<2.2e-16. (c) Expression of SMARCA4
in tumor samples bearing missense mutations show comparable levels of
SMARCA4, consistent with heterozygous expression.
To assess whether samples with missense mutations had comparable levels
of expression as non-mutated samples, we analyzed RNA-seq data from TCGA
studies. We found that samples with single missense mutations had comparable
levels of SMARCA4 expression (Figure 1c), consistent with heterozygous mutation. Our analysis
shows that heterozygous mutation is the most common mode of disruption to many
different cancers where SMARCA4 is mutated.
Cancer mutations of SMARCA4 induce distinct dynamic defects
of remodeling in living cells
Frequent cancer mutations within the SMARCA4 ATPase domain have provided
a rich source of alleles to probe the mechanisms of these proteins. To gain
insight into their mechanisms, we generated a homology model of SMARCA4 bound to
a nucleosome based on the high degree of sequence homology with structures of
related Snf2 homologues from diverse organisms, including Saccharomyces
cerevisiae Snf2 (PDB accession code 5X0X)[37], Thermothelomyces thermophila
Snf2 (PDB 5HZR)[38], and
Saccharomyces cerevisiae Chd1 (PDB 3MWY)[39]. Full details of the
construction of this homology model are provided in the Methods section, and
three-dimensional coordinates of mouse and human SMARCA4 homology models are
provided in PDB format as supplemental data. Our model revealed that frequently mutated
positions of SMARCA4 lie in functionally important surfaces, such as the DNA
groove between the N- and C-terminal ATPase domains, and the ATP cleft (Figure 2a, Figure S1).
Figure 2
SMARCA4 cancer mutations affect functional surfaces and induce distinct
dynamic defects
(a) Homology model of SMARCA4 ATPase domains bound to a
nucleosome, derived from Snf2 and Snf2-like family member structures from other
organisms (details in main text and Methods). Three-dimensional coordinates of
the SMARCA4 homology models are provided in PDB format as supplemental data.
(b) Distribution of missense mutations of
SMARCA4 across all cancers compiled by cBioPortal.
56% of mutations target the N- and C-terminal ATPase domain.
(c) Solvent accessible surface area (SASA) for frequently
mutated residue positions of SMARCA4 in human cancers. Buried and surface
mutations are frequent sites of mutations, including the ATP-binding cleft,
DNA-binding groove, and other sites with unknown function. (d) FRAP
recovery curves for SMARCA4-GFP mutants of the ATP-binding cleft show slow
photobleaching recovery. (e) FRAP recovery curves for SMARCA4-GFP
mutants of the DNA-binding groove show fast photobleaching recovery. Curves for
WT controls (dotted) are the same for all plots in panels (d) and (e).
(f) Positions found within the ATP-binding cleft or DNA-binding
groove show similar FRAP recovery kinetics. Two-sample KS test compared to WT, *
p<0.05; ** p<0.01; *** p<0.001. Plotted values are mean
t1/2 recovery times, error bars are SEM from n=10 cells.
(g) Model of how SMARCA4 surface mutations alter dynamic
engagement of chromatin. DNA-binding groove mutants impair engagement with DNA,
while ATP-binding cleft mutants impair ATP hydrolysis, which may be required for
transition to a mobile state.
We analyzed the positions of all cancer missense mutations on
cBioPortal[40]; we found
that most (56%, N=575) SMARCA4 mutations affected the
bipartite ATPase domain (Figure 2b). By
calculating the solvent-accessible surface area for each position, we found that
frequently mutated positions in cancer are distributed in structurally diverse
locations of the domain. Some mutation hotspots, such as T910, are expected to
affect residues that are buried or have little exposed surface area (Figure 2c). As disruption of folding or
thermodynamic stability would be expected to impair function in a non-specific
manner, we turned our focus to residues that were frequently mutated at
important surfaces, specifically, the ATP cleft and DNA groove (Figure S1).We hypothesized that mutations of these positions might induce distinct
dynamic defects, since they occur at surfaces with unique molecular functions.
We therefore generated a targeted library of mutations at these sites, including
p.G782S, p.G784E, p.K785R, p.T786I, p.T786N, p.P811R, p.L815P, p.Y860H, p.E861K,
p.D881G, p.E882D, p.E882K, and p.H884N (numbered according to the human protein
product; UniProt, P51532). We expressed these as C-terminal GFP fusions, and
performed a screen of these for dynamic defects using fluorescence recovery
after photobleaching (FRAP) following lentiviral transduction in 293T cells
(Figure 2d–f). Compared to
wild-type SMARCA4-GFP, we found that mutants located at the ATP cleft
consistently resulted in slower FRAP recovery (Figure 2f). In contrast, mutants located at the DNA groove
consistently resulted in fast recovery times compared to wild-type (Figure 2f). These data are consistent with a
model whereby DNA-groove mutants disrupt engagement of immobile chromatin, but
ATP-cleft mutants disrupt ATP-dependent release from immobile chromatin (Figure 2g). This model is supported by
previous observations that the ATPase activity of SMARCA4 and orthologs is
DNA-stimulated[41,42], as ATP hydrolysis requires
prior interaction with nucleic acid substrate.To validate the observed dynamics from this screen, we generated a
transgenic mouse expressing SMARCA4 fused to the fluorescent protein Dendra2
(ref. [43]), comparable to GFP
(Figure S2). This
fusion did not affect protein function, and homozygous SMARCA4-Dendra2 mice have
been bred and deposited at Jackson Laboratories for cryopreservation (stock
#27901). In mESCs, SMARCA4-Dendra2 displays a high degree of stability
(with stability half-life comparable to the cell doubling time), and shows FRAP
dynamic properties identical to SMARCA4-GFP expressed via lentiviral
transduction (Figure
S3), confirming the validity of our approach. Interestingly, we find that
the delayed FRAP recovery of ATPase mutants arises during interphase but not
mitosis, where BAF is largely excluded from condensed mitotic chromosomes (Figure S3–S4) consistent with
previous observations that SMARCA4 is phosphorylated during mitosis[44]. Our results reveal a
convergence between tumor sequencing studies, molecular structure, and live-cell
dynamics.
Cancer mutants induce convergent effects on the chromatin landscape
Because ATP-cleft and DNA-groove mutants induced opposing dynamic
effects, we anticipated that they might give rise to distinct effects on the
chromatin landscape, even in a heterozygous setting frequently observed in
cancer. To probe the functional impacts of commonly mutated positions, we used
mESCs as a convenient model to investigate the conserved effects following
SMARCA4 disruption. These cells are advantageous for
fundamental studies due to the clonal nature of their growth, lack of genetic
heterogeneity, negligible mutation load, and the availability of genetic tools
such as conditional knockouts. ES cells also do not express the alternate BAF
ATPase SMARCA2 (BRM), permitting us to exclude confounding effects from this
homologous gene. Using a conditional SMARCA4 knockout mESC
line[45], we rescued
SMARCA4 deletion with lentiviral expression of wild-type
SMARCA4-GFP fusion and either wild-type or mutant SMARCA4 tagged with V5
(SMARCA4-V5). Resulting cells had expression levels of SMARCA4
equivalent to those of endogenous protein, as previously described (Figure S4 in
ref. [34]). As a control, we
compared results from these cells to those from similarly prepared cells
expressing wild-type SMARCA4-V5 instead of a mutant. This strategy allowed us to
observe the effects of mutation in a setting that mimics heterozygous mutation
present in human cancers.We performed ATAC-seq[46] to identify locations where SMARCA4
mutants altered genomic DNA accessibility. To our surprise, we discovered that
heterozygous mutants gave rise to highly-correlated changes across the genome
compared to wild-type cells (Figure
3a–c). Accessibility losses associated with mutants were
generally stronger and more frequent than sites that gained accessibility. We
assigned sites into one of three categories: decreased, unchanged, and
increased, using the definitions found in the Methods section. We analyzed the
genomic annotations of sites that decreased accessibility and discovered that
relative to unchanged sites, these were highly enriched at non-coding regions,
particularly introns and intergenic regions (Figure 3d).
Figure 3
Convergent effects of SMARCA4 cancer mutants on the DNA accessibility
landscape
(a) Genome tracks showing loss of accessibility at an
annotated enhancer downstream of Bcl11b. Heterozygous mutant
SMARCA4 cells but not wild-type cells show similar loss of enhancer
accessibility. (b) Heat map of Pearson correlation coefficients
(PCC) of reads densities in peaks shows correlated genome-wide changes induced
by heterozygous SMARCA4 mutants. Changes are uncorrelated with changes between
wild-type cell-culture replicates. (c) Heat map showing the 2,000
genome-wide sites with the highest variance across datasets. All mutants show a
similar trend of altered accessibility across these sites. (d)
Genomic annotation enrichment datasets for decreased sites across mutant
ATAC-seq. Decreased sites are enriched at non-coding regions such as introns and
intergenic regions. (e) Lasso multivariate regression identifies a
consistent set of features associated with positive or negative changes of
accessibility in mutants compared to wild-type. Features associated with altered
accessibility include several factors and marks associated with enhancers.
(f) Characterization of ATAC-seq in sites with decreased
accessibility in mutant cells, along with ChIP-seq of Smarcc1, Lsd1, and H3K4me1
in wild-type cells. (g) Characterization of ATAC-seq in sites with
unchanged accessibility in mutant cells, along with ChIP-seq of Smarcc1, Lsd1,
and H3K4me1 in wild-type cells. (h) Characterization of ATAC-seq in
sites with increased accessibility in mutant cells, along with ChIP-seq of
Smarcc1, Lsd1, and H3K4me1 in wild-type cells.
We furthermore analyzed a dataset of 111 public ChIP-seq studies in
wild-type mESCs, which permitted us to investigate the association of factors
with gains or losses of accessibility in our current study. Lasso multivariate
regression for all our mutant ATAC-seq datasets revealed a robust and consistent
association with several factors related to enhancers in mESCs, including Lsd1,
Smc1, H3K27ac, Oct4, and Brd4 (Figure
3e).Because these changes were highly correlated across all mutants, we
conclude that these mutants surprisingly have convergent effects on the DNA
accessibility landscape. We therefore focused the remainder of our analysis on
G784E, which is located at the ATP-binding cleft in the Walker A motif and makes
direct contact with ATP (Figure 2a). We
find that ATAC accessibility sites were enriched in wild-type cells with ChIP
density of Smarcc1, a core subunit of BAF complexes (Figure 3f–h). Furthermore, sites with decreased
accessibility following expression of G784E SMARCA4 (n=8,805) were highly
enriched with H3K4me1 in wild-type cells (Figure
3f), a mark frequently found at enhancers. Sites that increased
accessibility (n=2,887) typically had low levels of Lsd1 (Figure 3h), an H3K4 demethylase involved in enhancer
decommissioning[47].
These results suggest that the gains of accessibility following expression of
mutant SMARCA4 depend on the absence of Lsd1. Unchanged sites (n=24,656)
generally contained mixed amounts of these factors, and include many regulatory
sites such as housekeeping TSSs.
Polycomb accumulation does not reduce local accessibility
Polycomb repression is thought to be mediated by altering DNA
accessibility[48]. We
investigated the relationship between changes of accessibility and Polycomb
accumulation that occurs over the genome. We examined direct overlap between
changes of Polycomb occupancy (reflected by ChIP-seq of Ring1b, a core catalytic
subunit of PRC1), and DNA accessibility (reflected by ATAC-seq), in two
settings: SMARCA4 conditional biallelic deletion[23,34], and heterozygous ATPase mutation (G784E). Although
both conditional knockout and heterozygous mutation result in strong and
characteristic accumulation of PRC1 and PRC2 at bivalent TSSs over the
genome[34], we observed
no direct relationship between increases of Polycomb and altered accessibility
(Figure 4a–d). This observation
did not arise due to non-overlap of peaks, as ATAC-seq and Ring1b ChIP peaks had
high enrichment at both TSSs and enhancers (Figure S5), and changes
at individual sites were highly reproducible (Figure S6).
Figure 4
Accessibility losses and PRC1 changes do not directly overlap
(a) ATAC accessibility sites ranked by altered ATAC-seq
read density in SMARCA4 knockout (left) and in hetereozygous
G784E SMARCA4 (right). Genomic sites lacking both ATAC and Ring1b changes are
not shown. Corresponding changes of Ring1b ChIP-seq read density at these same
sites show that Ring1b is broadly unchanged at these sites, and shows no clear
relationship to accessibility changes measured by ATAC. (b) Example
genome track showing representative accessibility but not Ring1b changes.
(c) Ring1b ChIP sites ranked by altered ATAC-seq read density
in SMARCA4 knockout (left) and in heterozygous G784E SMARCA4
(right). Corresponding changes of Ring1b ChIP-seq read density at these same
sites show that Ring1b is broadly unchanged at these sites, and shows no clear
relationship to accessibility changes measured by ATAC. Genomic sites lacking
both ATAC and Ring1b changes are not shown. (d) Example genome
track showing representative Ring1b but not accessibility changes.
(e) Genomic fold changes of enhancers (Enh) and transcription
start sites (TSS) using ATAC-seq upon conditional Ring1b knockout. TSSs show few
changes while enhancers show significantly greater variability (KS test
p<2.2e-16). Center line of box plot is mean, box limits are 1st and 3rd
quartiles, whiskers are limits + 1.5*IQR (inter-quartile range), points are
actual values of outliers.
We confirmed this result by performing ATAC-seq following dual
conditional knockout of Ring1a and Ring1b
(ref. [49]), both catalytic
subunits of PRC1, and found that global accessibility at TSSs showed few
changes. Instead, altered accessibility at enhancers was far more apparent
(Figure 4e, p<2.2e-16).
Therefore, the increased occupancy of PRC1 we previously described at TSSs
following SMARCA4 inactivation[34] does not directly coincide with altered
accessibility. We conclude that DNA accessibility changes are distinct from
accumulation of Polycomb complexes at bivalent TSSs. Our results challenge
longstanding in vitro models of the repressive mechanisms associated with
Polycomb factors.
Epigenomic alterations to enhancer identity
We next sought to compare the effects of heterozygous SMARCA4 mutation
on accessibility at enhancers and TSSs. Accessibility was broadly unaffected at
TSSs, but was reduced at annotated mESC enhancers[50] (Figure
5a–b, additional examples in Figure S7).
Accessibility at TSSs and enhancers was also broadly mirrored by altered ChIP
read density of RNAP2 (Figure
S8), suggesting that altered accessibility has functional importance
for regulation of transcription.
Figure 5
Heterozygous SMARCA4 mutation induces loss of accessibility and H3K27ac at
active enhancers and superenhancers from many tissues
(a) Mean ATAC-seq read density at TSSs and enhancers in
wild-type and heterozygous mutant cells. (b) Example genome track
showing representative loss of accessibility and H3K27ac but preservation of
H3K4me1 at an enhancer. (c) Relationship between H3K4me1 ChIP and
ATAC fold changes from wild-type and mutant cells shows significant correlation
with a small effect size (m). (d) Relationship between RNAP2 ChIP
and ATAC fold changes between wild-type and mutant cells shows significant
correlation with a small effect size. (e) Relationship between
H3K27ac ChIP and ATAC fold changes between wild-type and mutant cells shows
significant correlation with a large effect size. (f) Abundance of
H3K4me1 and H3K27ac in wild-type cells, classified by whether site increased,
decreased accessibility, or was unchanged. Sites with decreased accessibility
cluster have elevated levels of H3K4me1 and H3K27ac, consistent with active
enhancers. (g) Mean accessibility changes at the
Oct4-Sox2-Tcf-Nanog combination transcription factor motif. (h)
Heat map of 18 transcription factor motifs associated with reduced accessibility
across cell-culture replicates in heterozygous G784E SMARCA4 cells.
(i) Cumulative distribution of ATAC-seq fold change, classified
by genomic annotation. TSSs show few changes, while enhancers and superenhancers
show biased losses. Superenhancers show distributions of fold changes comparable
to enhancers. (j) Analysis of sites with decreased accessibility
that overlap with superenhancers from multiple tissue types. Color in heat map
shows fractional overlap with known annotated superenhancers from tissues
obtained from dbSUPER. (k) Distribution of the number of tissue
types associated with each superenhancer site analyzed in (j). (l)
Pie chart of the proportion of tissue-restricted superenhancers compared with
superenhancers that identified in >1 tissue type. RPM, reads per
million; Enh, enhancer; SE, superenhancer.
To identify other chromatin features accompanying accessibility changes,
we focused on three marks and factors: H3K4me1, RNAP2, and H3K27ac. We performed
ChIP-seq for these factors in both WT/WT and WT/G784E mESCs to measure their
changes at sites with altered accessibility. Despite significant correlation
(R=0.334, p<2.22e-16), the effect size of altered H3K4me1 was low
(m=0.104), showing that sites with decreased accessibility generally maintained
high levels of H3K4me1 (Figure 5c). RNAP2
also showed significant correlation (R=0.230, p<2.2e-16), with a modest
effect size (m=0.114). However, RNAP2 ChIP revealed that a small number of
decreased sites also showed losses of RNAP2, suggesting that RNAP2 loading may
only be affected when certain combinations of other changes occur (Figure 5d). On the other hand, H3K27ac showed
robust effects (R=0.358, p<2.2e-16) with a prominent effect size
(m=0.370), indicating a stronger relationship between H3K27ac and DNA
accessibility (Figure 5e). Hence, these
sites transition from being “active” enhancers (with high
H3K27ac and high accessibility) to “poised” enhancers (with
maintained H3K4me1, but reduced H3K27ac and reduced accessibility).Consistently, sites that lost accessibility (N=2,692) had the highest
levels of H3K27ac in wild-type cells (Figure
5f). In contrast, the sites that showed increased accessibility
(N=1,040) generally had lower levels of both H3K4me1 and H3K27ac in wild-type
cells. Since most of these did not take on new elevated levels of H3K4me1 in
mutant cells (Figure 5c), we speculate that
these sites may have taken on other unknown regulatory roles in mutant
cells.By exploring average accessibility changes over transcription factor
sequence motifs, ATAC-seq permitted us to identify a set of transcription
factors whose cognate sequences show reduced accessibility. One of these was the
combination motif for Oct4-Sox2-Tcf-Nanog, which undergoes a ~2-fold
reduction of nucleosome-free ATAC-seq reads over the motif (Figure 5g). By scanning for reproducible changes, we
obtained a list of 18 transcription factor motifs, with recurrent losses of
accessibility following expression of G784E SMARCA4 (Figure 5h). Our results show that mutant SMARCA4 induces
alteration of the enhancer landscape affecting a diverse set of factors, rather
than targeting a small subset of transcription factors. These changes in the
enhancer network arise without expression of SMARCA2 (BRM, see Figure S9), allowing us
to attribute our observations to SMARCA4 mutants.
Sites with lost accessibility in ES cells include hundreds of superenhancers
from many tissue types
In malignant rhabdoid tumors, loss of SMARCB1 affects
active enhancers, but leaves superenhancers relatively protected[11]. We therefore sought to assess
whether heterozygous SMARCA4 mutations would have similar
effects. By classifying sites into genomic annotations for mESC transcription
start sites (TSS), overall enhancers (Enh), or superenhancers (SE), we find that
superenhancers show distributions of accessibility changes comparable with
overall enhancers (Figure 5i), both of
which are significantly tilted towards losses compared to TSSs
(p<2.2e-16). Therefore, superenhancers are significantly impacted by
heterozygous SMARCA4 mutation, unlike biallelic loss of
SMARCB1.Chromatin regulatory interactions in embryonic stem cells anticipate
cell-type-specific architecture that arises upon lineage commitment[51]. We therefore hypothesized
that many sites we observed with reduced accessibility may correspond to sites
with regulatory roles in differentiated cells. To assess whether affected sites
correspond to important regulatory sites in multiple cell types, we examined
overlap with an ensemble of annotated superenhancers across a variety of
differentiated tissues, including those from lung, intestines, hematological,
and neuronal lineages. By comparing to sites in the dbSUPER database of
superenhancers[52], we
found 940 sites (34.9% of decreased sites overall) are significantly
enriched for direct overlap with previously annotated superenhancer sites from
at least one tissue (Figure 5j–k,
p<2.2e-16). Plotting the distribution of associated tissue types for
each superenhancer (Figure 5l) reveals that
that most of these sites (51.5%, 484/940) are superenhancers in more
than a single tissue type. Therefore, heterozygous mutations of
SMARCA4 induce accessibility losses at sites identified to
have essential roles in maintaining cell identity across multiple tissue types.
Our findings furthermore show that BAF-dependent accessibility anticipates
lineage specification in several lineages.
Effects on genomic accessibility are dominant negative
We next sought to investigate whether the observed changes reflected
SMARCA4 haploinsufficiency, or were instead a
dominant-negative effect[53]. To
measure the impact of loss from a single allele, we derived a line of mESCs for
conditional deletion of only one allele of SMARCA4. This
actin::CreER SMARCA4(+/fl) line expresses Cre recombinase fused
to the estrogen receptor (CreER), and encodes a single floxed allele of
SMARCA4, with the remaining allele wild-type. This
permitted us to conditionally delete a single allele in the presence of
4-hydroxytamoxifen (Tam), and compare against mock treatment with ethanol
(EtOH). Western blotting showed reduction of half of SMARCA4 protein levels
(Figure 6a–b) after
tamoxifen-induced deletion. However, we did not observe reproducible changes in
chromatin accessibility using ATAC-seq following loss of the single allele, and
only a single site met the criteria for a decreased site out of 41,267 ATAC-seq
sites across the genome (Figure 6c).
Notably, we did not detect changes at enhancers or TSSs as we observed for the
ATPase mutants (Figure 6d–e).
Importantly, our results leave open the possibility of changes at sites of lower
accessibility or in other cell types, or of other haploinsufficient roles (e.g.,
in DNA repair).
Figure 6
Chromatin organization influences dominant-negative effects of SMARCA4
mutants on the open chromatin landscape
(a) Demonstration of a line of heterozygous conditional
deletion SMARCA4 cells. A single allele of
SMARCA4 is converted to a null allele in the presence of
4-hydroxytamoxifen (Tam) but not ethanol control (EtOH). (b)
Quantification of bands in (a), showing loss of half the SMARCA4 present caused
by incubation with Tam. (c) Plot of genome-wide fold changes for
all ATAC-seq sites following acute deletion of one allele of
SMARCA4. (d) Mean profiles of ATAC-seq read
density at TSSs and (e) enhancers following acute deletion of one
allele of SMARCA4. The absence of changes demonstrates that the
changes we observe in Figure 4 are not due
to haploinsufficiency, but are a dominant-negative effect. (f)
Heterozygous G784E SMARCA4 but not heterozygous null cells show clusters of
reduced accessibility in chromosomal A compartments (described in more detail in
the main text and Methods). (g) Quantification of genome-wide
ATAC-seq site changes based on presence in A or B compartments in heterozygous
null cells. (h) In heterozygous G784E SMARCA4 cells, ATAC-seq sites
in A compartments have reduced accessibility compared to sites in B
compartments. CDF, cumulative distribution function.
In contrast to loss of a single allele, G784E SMARCA4 resulted in
accessibility losses that were correlated in position over the body of the
chromosome (Figure 6f). Analysis of Hi-C
data (see Methods section) permitted us to partition each chromosome into large
multi-megabase-scale compartments termed “A” and
“B” compartments[54,55]. We found
that the more transcriptionally active A compartments tended to have more acute
accessibility losses than the more repressed B compartments (Figure 6f). Focal accessibility changes
within A compartments were significantly tilted towards losses compared to B
compartments (Figure 6g–h,
p<2.2e-16). Thus, enhancers in A compartments are differentially
sensitive to the expression of ATPase mutant SMARCA4. However, these same sites
are unaffected by loss of expression from a single allele of
SMARCA4. Therefore, the strong effects following
heterozygous SMARCA4 ATPase mutation reflect genetic dominance rather than
haploinsufficiency.
Pro-oncogenic changes of gene expression include Polycomb and non-Polycomb
targets
To discover the effects of the altered enhancer landscape on
transcription, we compared wild-type to heterozygous G784E SMARCA4 cells using
RNA-seq. Despite the thousands of sites of altered accessibility described
above, we found a modest set of changes to the transcriptional profiles of
cells, with a small number of genes showing increased (n=49) or decreased
(n=191) expression, using the criteria in the Methods section (Figure 7a). Genes with changes included
previously annotated Polycomb targets[56] (n=86), which were significantly enriched among all
altered genes (odds ratio=1.74, p=7.7e-5). We also observed altered expression
of genes that are not Polycomb targets (n=154; Figure 7b).
Figure 7
SMARCA4 ATPase mutants can induce pro-oncogenic gene expression
changes
(a) Summary of genome-wide RNA expression changes by
RNA-seq. Each point is a gene, colored by whether it is increased, decreased, or
unchanged, using the criteria described in the Methods section. (b)
Heat map of genes with altered expression based on whether they are annotated to
be promoters targeted by Polycomb Repressive Complexes (PRCs) in mESCs[56], or are not annotated to have
PRC-target promoters. (c) Expression of pluripotency factors
remains elevated in heterozygous G784E SMARCA4 cells. Myc is
expressed ~2-fold higher in mutant cells compared to wild-type cells.
Plotted values are mean normalized expression values (a.u.), error bars are
95% confidence intervals from independent cell-culture replicates (n=2).
(d) Gene set enrichment analysis shows significant enrichment
of downstream targets of Myc (p<2.2e-16). (e) Model of
dominant-negative effects following inactivation of SMARCA4 ATP hydrolysis or
DNA/nucleosome binding. (f) Effect of SMARCA4 heterozygous mutation
on the epigenetic landscape. Mutation and the ensuing compensatory changes
shifts the epigenetic-phenotypic landscape, resulting in stabilization of a new
phenotypic state that may adopt pathogenic patterns of gene expression.
Interestingly, despite reduced accessibility of their target sites, we
do not observe loss of expression of the core pluripotency network, including
Oct4 (Pou5f1), Sox2, Nanog, or Klf4.
Expression of these factors is maintained or even enhanced in heterozygous
mutant cells, as previously reported for complete knockout[45]. In contrast, we observe a
significant ~2-fold increase in the expression of the proto-oncogene
Myc (Figure 7a,c; log2
fold change=0.98, FDR-adjusted p=0.01). Gene set enrichment analysis also
revealed a significant enrichment of Myc target genes (FDR-adjusted
p<2.2e-16), one of the hallmarks of many malignancies (Figure 7d). Altogether, our results reveal
that cell identity may be maintained through compensating mechanisms, despite a
dramatic reshuffling of the enhancer and superenhancer landscape. In ES cells,
these compensating mechanisms include upregulation of Myc, a
hallmark of cancer that may be considered a pro-oncogenic response.
Interestingly, biallelic deletion of SMARCA4 in the same cells
leads to cell death[45], thus
dominant-negative mutations in cancer settings may similarly confer changes
distinct from biallelic loss.
Discussion
Here we analyzed human tumor sequencing data to identify the major role
played by heterozygous mutation of SMARCA4 (Figure 1). We have analyzed the available mutation frequencies
obtained through tumor sequencing studies in the context of structural models of
SMARCA4 we derived from recent discoveries in structural biology[37-39]. By doing so, we discovered that frequently mutated
positions of SMARCA4 lie on biologically significant surfaces (Figure 2a). This clustering is consistent with earlier
observations that mutations of key bioactive surfaces are significantly enriched
across several cancer types[57].
These mutations induce characteristic dynamic defects, consistent with a model where
binding of SMARCA4 to DNA/nucleosomes promotes immobilization, and release from this
immobile state is catalyzed by ATP hydrolysis (Figure
2g).Despite divergent dynamic effects, mutations in either the ATP cleft or DNA
groove give rise to similar changes across the chromatin landscape (Figure 3). The convergence of these defects
indicates that cancer positively selects for functional inactivation rather than
disruption of individual surfaces per se. Our results show that
frequently occurring disease mutations inactivate SMARCA4 by diverse modes, for
example, by impairing ATP hydrolysis, by disrupting DNA binding, and likely through
other mechanisms. Our ATAC-seq results confirm that the effects of these mutations
converge on the open chromatin landscape; we find consistent changes to DNA
accessibility at thousands of sites over the genome, including at a large number of
superenhancers from diverse cell types, regardless of how SMARCA4 is disrupted. In
contrast to biallelic loss of other BAF subunits, which largely arise in
lineage-restricted settings, heterozygous mutations of SMARCA4
contribute to malignancies from diverse tissue types. We propose that this broad
pattern largely reflects disruption of the enhancer accessibility landscape,
including superenhancer sites used by many tissue types.We find that a set of active enhancers in wild-type mESCs is particularly
predisposed to lose accessibility (Figure 5).
These changes lead to losses of accessibility at an array of transcription factor
motifs, and concomitant losses of H3K27ac modification, while in most cases
preserving H3K4me1 levels. Hence, enhancer identity (as defined by H3K4me1 marking)
is largely preserved, but active enhancers with high H3K27ac are particularly
sensitive to SMARCA4 ATPase mutation. These effects are
particularly pronounced in active A compartments, but do not arise upon loss of a
single allele of SMARCA4 (Figure
5), indicating dominant-negative effects for the missense mutants. Hence,
we find that chromosomal organization heavily influences the dominant-negative
effects of SMARCA4 disease mutants. Our data strongly argue that in
addition to its direct effects on PRC1 at bivalent genes (ref. [34]), SMARCA4
dynamically acts to generate accessibility in transcriptionally active compartments
of the genome, notably at existing superenhancers and latent/primordial
superenhancers.Despite the thousands of changes in the open chromatin landscape, these
alterations induce a relatively small number of significant transcriptional changes
(Figure 7a). Altered genes include both
Polycomb-target genes and non-Polycomb-targets, indicating that regulation of the
open chromatin landscape at enhancers constitutes a separate layer of regulation in
addition to the mechanisms previously described for PRC1 (ref. [34]). Of particular interest is the
up-regulation of the proto-oncogene Myc as well as its downstream
targets (Figure 7). Myc enhances the stability
of the pluripotency factors at their cognate sites on chromatin[58], and relieves transcriptional
pausing[59,60]. Therefore, it is appealing to speculate that
heterozygous expression of mutant SMARCA4 in mESCs leads to loss of
transcription factor accessibility or paused polymerases, and increased expression
of Myc may compensate by stabilizing the interaction of
transcription factors at their target sites or by relieving polymerase pausing
following SMARCA4 mutation.Altogether, our findings show that heterozygous SMARCA4
mutations induce loss of H3K27ac and accessibility at enhancers and superenhancers.
Because ATP-cleft and DNA-groove mutants both induce these effects, we propose that
SMARCA4’s dominant-negative role at enhancers arises via sequestration of
binding partners at unproductive transient binding sites, or in the nucleoplasm
(Figure 7e). Our results reveal that unlike
biallelic loss of SMARCB1 or SMARCA4, heterozygous
SMARCA4 missense mutations bias the epigenetic landscape by
altering the accessibility of superenhancers used in a variety of different tissues.
These changes permit exploration of other sites in phenotypic space, which may
promote pro-oncogenic changes of gene expression (Figure 7f). Given our findings, additional studies in human models of
cancer and neurologic disorders are warranted to fully test and examine the
mechanisms we describe here. Taken together, the distinct effects of
dominant-negative mutations suggest that therapeutic strategies targeting the large
number of missense SMARCA4 mutations in cancer and neurologic
disorders may require distinct approaches from those targeting biallelic
inactivation.
Methods
Culture of animal and human cells
Mouse ES cells used in this study were derived from pregnant female mice
in the Crabtree lab, and authenticated by examination of morphology and
expression of embryonic stem cell markers. These cells were cultured using
standard conditions. Briefly, ES culture media containing Dulbecco’s
Modified Eagle’s Medium (Cat# 10829018; Life Technologies),
15% FBS (Cat# ASM-5007; Applied StemCell),
Penicillin-Streptomycin (Cat# 15140122; Life Technologies), Glutamax
(Cat# 35050061; Life Technologies), HEPES buffer (Cat# 15630080;
Life Technologies), 2-mercaptoethanol (Cat# 21985023; Life
Technologies), MEM-NEA (Cat# 11140050; Life Technologies), and LIF
supplement[61], was
replaced daily, and ES cells were passaged every 48 hours. For inducible
deletions, Smarca4flox/flox actin–CreER ES cells[45] and Smarca4+/flox
actin–CreER ES cells, which previously tested negative for mycoplasma
contamination using PCR testing, were plated onto irradiated feeder mouse
embryonic fibroblasts, treated with 0.8 uM 4-hydroxytamoxifen (Tam) or ethanol
(EtOH) for 48 h, and harvested for further experiments after trypsin
dissociation at 72 h.
Western blotting
Cells were lysed for at least 30 min at 4 °C in RIPA buffer (50
mM Tris-HCl, pH 8.0, 150 mM NaCl, 0.1% SDS, 0.5% sodium
deoxycholate, 1% NP-40). Lysates were centrifuged for 30 min at
14,000g, and the supernatant was flash-frozen. The total
protein concentration was measured by Bradford assay (Bio-Rad). We boiled 10
µg of protein in gel loading buffer (Life Technologies) with 50 mM of
dithiothreitol and loaded it onto 4–10% BisTris NuPage gels
(Life Technologies). Bands were transferred to PVDF membranes (Bio-Rad), and
then membranes were blocked in 5% BSA/TBST. Blots were then incubated
with primary antibodies for 1 h at room temperature. Proteins were detected with
the LI-COR detection system.
Construction and breeding of Smarca4-Dendra2 transgenic mice
Mutation was obtained using a targeting sequence derived from BAC clone
bMQ-242m22 (Source Bioscience). The Dendra2 sequence was inserted using
homologous recombination with a floxed neomycin resistance cassette, This
resistance cassette was later excised by breeding to a mouse expressing
ubiquitous Cre recombinase, leaving behind a single loxP site downstream of
Dendra2. Validation of successful construction was performed with Southern and
Western blots, as well as fluorescence microscopy. Homozygous mice were obtained
with no apparent phenotype, and bred through filial crossing.
Genotyping of Smarca4-Dendra2 mice
Standard PCR conditions were used to genotype Smarca4-Dendra2 transgenic
mice. PCR amplification conditions were as follows: 55 °C annealing
temperature, 72 °C extension temperature for 30 seconds, 28 cycles.
Wild-type alleles are specifically amplified using forward primer: CCC TAC CAT
GGT GCA GGG CA, and reverse primer: AAT GTC TGG TTC AGT CTT CCT CA, to yield a
429-bp amplicon product. Smarca4-Dendra2 alleles are specifically amplified
using forward primer: GCC CAG CCA GGT GTG GTG AGC CCA ATT CCG ATC ATA TTC A, and
reverse primer: GGG CCT GTG AGC CTC CTG GA, to yield a 469-bp amplicon product.
Homozygous mice amplify only the wild-type or only the Smarca4-Dendra2 product,
while heterozygous mice amplify both products.
Lentiviral preparation and infection
Lentivirus was produced in Lenti-X 293T cells (Clontech) via
polyethylenimine transfection. Media were changed 24 h after transfection, and
after another 48 h media were collected and centrifuged for 2 h at 20,000 rpm.
Viral pellets were resuspended in PBS and used to infect cells. Cells were
selected with 2 µg/ml puromycin, as appropriate beginning 48 h after
infection.
Fluorescence recovery after photobleaching (FRAP)
Cells were maintained at 37 °C under humidified 5%
CO2 using chambered µ-Dish 35mm coverslips (Cat#
81156, Ibidi) in an enclosed inverted Leica SP2 laser scanning confocal
microscope. Using Leica software, half of the nucleus of each cell was
photobleached using 100 mW 405-nm laser excitation. Nuclear fluorescence was
monitored as a time course, and each image saved as a TIFF file. The degree of
fluorescence recovery within the region of interest (ROI, the photobleached
portion of the nucleus) was monitored within each image. Image masks were made
using MATLAB to identify the full nucleus (visualized prior to photobleaching),
and the ROI only. The ratio of fluorescence intensity within the ROI over the
intensity throughout the entire nucleus was plotted and fit to exponential
curves using non-linear least square fitting, to obtain the recovery rate.
Recovery half-lives derived from this fit were collected for each condition.
P-values to compare the populations of half-lives were obtained using the
two-sample Kolmogorov-Smirnov (KS) test.
Deconvolution fluorescence microscopy
Deconvolution microscopy was performed on fixed cells using the OMX
Blaze V4 microscope platform.
Assay of transposase-accessible chromatin (ATAC) library preparation
ATAC libraries were independently prepared from separately cultured
samples in duplicate, using methods previously described[46]. Briefly, we obtained nuclei
by resuspending cells in 0.5 ml of lysis buffer (0.1% Tween-20 in RSB
buffer) and incubated them for 10 min on ice. Nuclei were pelleted by
centrifugation for 10 min at 500g, resuspended in 50 µl of transposition
mix [1× Tagmentation DNA buffer, 2.5 µl Tagment DNA enzyme
(Illumina)], and incubated for 30 min at 37 °C. DNA was purified with a
MinElute PCR purification kit (Qiagen), and libraries were amplified by PCR with
barcoded Nextera primers (Illumina). For sequencing, libraries were
size-selected with Agencourt AMPure XP (Beckman Coulter) for fragments between
~50 and 1,000 bp in length according to the manufacturer's
instructions. Sequencing was performed using paired-end reads on the Illumina
HiSeq2000 sequencer.
Processing of ATAC-seq data
Analysis of high-throughput sequencing data was not performed blindly,
but the following techniques were applied uniformly to all datasets. Paired-end
reads were processed by mapping to the mm9 reference mouse genome using Bowtie
2.1.0 (ref. [62]). Duplicate
fragments and reads with mapping quality < 10 were discarded, leaving
only high-quality unique reads. For each fragment, two pseudofragments were
generated, which occupied 200 bp centered at the true fragment ends. Peak
calling was performed by MACS 2.1.1 (ref. [63]) based on the density of pseudofragments.For differential analysis, all peaks from control and treatment datasets
within ±1 kb were merged, and peaks below a threshold of 10 RPM in at
least one dataset were discarded to remove low-quality peak calls. Background
density of ATAC-seq signal was obtained by calculating the mean read density in
the 100-bp window 8-kb away from each peak. This background density was
subtracted from the overall read density within each peak in each dataset. For
each dataset, the total number of background-adjusted reads overlapping each of
the resulting peaks was compared for differential peak calling. Differential
peak calls were made using DESeq2 (ref. [64]), using the summed number of background-adjusted reads
at the top 5% of highest read-density sites as size factors. DESeq2
accounts for individual site variances across all replicates to make
differential peak calls. Log2 fold changes were calculated by the default use of
maximum a posteriori estimation using a zero-mean normal prior
(Tikhonov-Ridge regularization). FDR-corrected P-values were calculated using
the Benjamini-Hochberg procedure. Differential calls were made by requiring fold
changes of >1.5-fold in either direction and FDR-corrected p <
0.10. RPM values in genome tracks are the mean values across both replicates
from each condition. Mean density profiles were computed using bwtool[65] by calculating the mean
basepair coverage across all replicates for a given condition. Overlap of peaks
was performed using Bedtools[66].
ChIP-seq library preparation
ChIP libraries were independently prepared from separately cultured
samples in duplicate, according to previously described protocols[67-69]. Briefly, we fixed 10–30 million cells
in suspension for 15 min in 1% formaldehyde. Excess formaldehyde was
quenched by the addition of glycine to 100 mM. Fixed cells were washed,
pelleted, and flash-frozen. Pellets were thawed, resuspended in NP Rinse 1
buffer (50 mM HEPES-KOH, pH 8.0, 140 mM NaCl, 1 mM EDTA, pH 8.0, 10%
glycerol, 0.5% NP-40, 0.25% Triton X-100) and incubated for 10
min on ice to isolate nuclei. Nuclei were washed once with NP Rinse 2 buffer (10
mM Tris-HCl, pH 8.0, 1 mM EDTA, pH 8.0, 0.5 mM EGTA, pH 8.0, 200 mM NaCl) and
then twice with shearing buffer (0.1% SDS, 1 mM EDTA, pH 8.0, 10 mM
Tris-HCl, pH 8.0). Pellets were resuspended in 900 µl of shearing buffer
with protease inhibitors and sonicated in a Covaris E220 sonicator for
10–12 min to generate DNA fragments between approximately 200 and 1,000
bp in length. Chromatin was then immunoprecipitated overnight at 4 °C
with antibodies bound to Protein G Dynabeads. (Life Technologies) in ChIP buffer
(50 mM HEPES-KOH, pH 7.5, 300 mM NaCl, 1 mM EDTA, pH 8.0, 1% Triton
X-100, 0.1% DOC, 0.1% SDS). The chromatin-bead slurry was washed
with a magnetic rack four times with ChIP buffer, once with DOC buffer (10 mM
Tris-HCl, pH 8.0, 250 mM LiCl, 0.5% NP-40, 0.5% DOC, 1 mM EDTA,
pH 8.0), and once with TE buffer. Chromatin was eluted with Q7 elution solution
(0.1 M NaHCO3, 1% SDS). After RNase A (Life Technologies) and proteinase
K (New England BioLabs) digestion and de-crosslinking at 65 °C
overnight, DNA was extracted with phenol-chloroform and precipitated with
ethanol. Size selection was performed by extracting 200–400 bp DNA
fragments on a 2% agarose E-gel (Invitrogen) before PCR amplification,
then extracted using MinElute cleanup kits (Qiagen). PCR amplification was
performed using ≤ 14 cycles, and the resulting DNA quantified by Qubit
fluorometric quantitation. Sequencing was performed using single-end reads on
the Illumina HiSeq2000 sequencer. Antibodies used in these studies include
rabbit anti-H3K4me1 (Abcam, Cat# ab8895, Lot number: GR251663-1), rabbit
anti-H3K27ac (Abcam, Cat# ab4729, Lot number: GR104852-1), and mouse
monoclonal anti-RNAP2 (Covance clone 8WG16, Cat# MMS-126R).
Processing of ChIP-seq data
Analysis of high-throughput sequencing data was not performed blindly,
but the following techniques were applied uniformly to all datasets. Single-end
ChIP seq reads were processed by mapping to the mm9 reference mouse genome using
Bowtie 2.1.0 (ref. [62]),
rejecting reads that contain more than a single mismatch. Duplicate reads were
discarded, leaving only unique reads. For all analyses, peak calling was
performed by MACS 2.1.1 (ref. [63]) by comparing to input samples for each cell type.All peaks from control and treatment datasets within ±1 kb were
merged, and peaks below a threshold of 30 RPM (in at least one dataset) were
discarded to remove low-quality peak calls. For each dataset, the total number
of reads overlapping each of the resulting peaks was compared for differential
peak calling. Differential peak calls were made using DESeq2 (ref. [64]), using the summed number of
reads at all sites above the 95th-percentile as size factors to avoid the
influence of background ChIP-seq reads. DESeq2 accounts for individual site
variances across all replicates to make differential peak calls. Log2 fold
changes were calculated by the default use of maximum a
posteriori estimation using a zero-mean normal prior
(Tikhonov-Ridge regularization). FDR-corrected P-values were calculated using
the Benjamini-Hochberg procedure. Differential calls were made by requiring fold
changes of >1.5-fold in either direction and FDR-corrected p <
0.10. RPM values in genome tracks are the mean values across both replicates
from each condition. Calculation of mean genome track densities was performed
using bwtool[65] and browser
tracks were prepared using Gviz[70]. For presentation in heat maps, ChIP-seq data were ordered
by the amount of signal in the middle 5% of an 8-kb window centered
around each peak.
RNA-seq library preparation
RNA was isolated with QIAzol and extracted with phenol-chloroform. Using
the Ovation V2 RNA-seq system, RNA-DNA hybrids were formed using primers with
conjugated poly(dT) sequences fused to random DNA-hybridization sequences.
Following first-strand synthesis and cDNA amplification, cDNA was sheared with
Diagenode sonication (25-s pulse, followed by 30-s recovery at 4 °C for
2×10 cycles). Library preparation was performed as previously
reported[69] and
sequencing was performed using single-end reads on the Illumina HiSeq2000
sequencer.
Processing of RNA-seq data
Reads within coding regions were counted using htseq[71], with gene definitions
obtained from the refGene of RefSeq genes in the UCSC Browser. The of all counts
was processed using DESeq2, using default parameters. Log2 fold changes were
calculated using maximum a posteriori estimation using a
zero-mean normal prior (Tikhonov-Ridge regularization). FDR-corrected P-values
were calculated using the Benjamini-Hochberg procedure. Differential calls were
made by requiring fold changes of >1.5-fold in either direction and
FDR-corrected p < 0.10.
Public datasets analyzed in this study
Read densities were obtained from publicly available as NCBI GEO
datasets (Table S1).
For each of these chromatin feature datasets, read fragments were extended to
200 bp from the 3’ end, and basepair coverage was determined using
Bedtools[66].
Gene set enrichment analysis (GSEA)
Gene set enrichment analysis was performed by preparing a list of Refseq
transcripts based on the output of DESeq2 from RNA-seq data. RefSeq genes were
ranked by log2 fold change and this pre-ranked set was imported into the GSEA
application (Broad Institute). Enrichments and p-values were obtained directly
from the GSEA application.
Genome annotation enrichment
Enrichment for overlap with genomic annotations was performed by
comparing how frequently each peak fell into basic genomic annotations,
separately determined for each class of differential peak call described above.
Enriched GO terms and log enrichment of genomic annotation overlap was
calculated for each class of site using HOMER[72].
Statistics
All differential calls for genomic datasets (ChIP-seq and RNA-seq) were
made using DESeq2, where P-values are calculated using the Wald test, as
described in the DESeq2 documentation[64]. FDR-corrected P-values were calculated using the
Benjamini-Hochberg procedure. Analysis of correlation was performed using the
Pearson correlation test. Two-sample Kolmogorov-Smirnov (KS) tests were
performed as two-sided tests. All the above statistical tests were performed
using R. The hypergeometric test for genomic annotation overrepresentation was
performed as a one-sided test using HOMER[72]. When calculated p-values are smaller than the 64-bit
double precision machine epsilon (2-52= 2.22e-16), p-values are
reported as p<2.2e-16.
Lasso multivariate regression
Lasso multivariate regression[73] was performed essentially the same as previously
described[34]. Briefly,
we related the fold-change in ATAC-seq read density induced by Smarca4 ATPase
mutants (fc) to a linear combination of 111 individual
chromatin features:At each ATAC peak, the number of reads within ±3 kb from the
center of the peak was summed for each of 111 features, using previously
published datasets. The summed read count at each site was log10 transformed,
and scaled to unit variance across all sites (xi).
Lasso regression was performed using the R package “glmnet”
[74], with the default
mixing penalty parameter α=1. Values for the restricted parameter
were obtained for each Smarca4 mutant by 10-fold
cross-validation, with the minimal value selected that provided the lowest mean
cross-validated error.
Homology modeling
Models of the three-dimensional structures of murine and human SMARCA4
(aa 754–1335) were created in the YASARA v16.7.22.L.64 (ref. [75]) homology modeling module. The
target sequence was PSI-BLASTed against UniRef90 to build a position-specific
scoring matrix from related sequences. This profile was used to search the PDB
for potential modeling templates. The identified templates were ranked based on
the alignment score, the structural quality according to WHAT_CHECK (ref.
[76]) obtained from the
PDBFinder2 database[77]. The
template search resulted in identification of possible templates out of which 2
templates (PDB IDs 5HZR and 5X0X) were further used for homology modeling
purposes.Multiple sequence alignments anchored on structural alignments were
created by aligning the target sequence against UniRef90 and YASARA's
PSSP database using SSALN scoring matrices[78]. This approach provided 10 different homology modeling
templates. An indexed version of the PDB was used to determine the optimal loop
anchor points and collect possible loop conformations for the unassigned parts
of the sequence in each template. The side-chain rotamers of these sequences
were optimized in implicit solvent so that water molecules did not block the
search. The side-chain rotamers of the whole molecule were then fine-tuned
considering electrostatic and knowledge-based packing interactions as well as
solvation effects. To relieve steric clashes and to fix disordered geometry, the
models were subjected to a high-resolution energy minimization with a shell of
explicit solvent molecules, using the knowledge-based YASARA force field. The
stereochemical properties of each model were evaluated based on the per-residue
quality Z-score. To increase the accuracy, YASARA combined the highest scoring
parts of the 10 models to obtain a hybrid model (overall Z-score ~
−0.81).The structure of SMARCA4 in the hybrid model resembles the structure of
the Swi2/Snf2 core from Saccharomyces cerevisiae (PDB ID 5X0X)
and represents the conformation in a nucleosome/DNA-bound state. The interface
region of SMARCA4 and ATP was defined based on the crystal structure of
ATPγS in complex with Chd1 (PDB ID 3MWY), a Snf2-like ATPase with a high
degree of homology to mammalian SMARCA4. The model of full complex was subjected
to side-chain rotamer optimization and refined in an explicit solvent
knowledge-based YASARA force field. Three-dimensional coordinates of mouse and
human SMARCA4 homology models are provided in PDB format as supplemental data.
Molecular visualization
The Smarca4 homology model was rendered using The PyMOL Molecular
Graphics System, Version 1.8 (Schrödinger, LLC).
Calculation of solvent accessible surface area
Solvent accessible surface area was calculated based on the resulting
homology model for each residue position using PyMOL (v1.8). Values were
calculated using the get_area() function with parameters dot_solvent=1, and
dot_density=3.
Analysis of public tumor datasets
Figures and results reviewed here are based in part upon data generated
by the TCGA Research Network and hosted on cBioPortal[40]. Briefly, data was downloaded using the
cBioPortal web API, then processed to characterize the abundance of mutations at
each residue position of Smarca4.
Calculation of A/B compartments
Determination of chromatin compartments was performed essentially as
previously reported[54].
Briefly, publicly available Hi-C data from mESCs[79] was downsampled to 200-kb resolution, and
expected contact frequency between contact pairs (i,j) was
calculated based on the distance from position i to
j for all pairs. Observed-over-expected ratios for each
contact pair (i,j) was calculated by normalizing pair read
counts to expected pair read counts calculated as described above. The matrix of
observed-over-expected ratios constituted an apparent two-state
“checkerboard” matrix, which was clarified further by
calculating the pairwise correlation matrix of the pairwise
observed-over-expected matrix. The top principal component of the correlation
matrix permitted straightforward assignment into A and B compartments based on
the position of the principal component plot above or below the mean and the
transcriptional output of the contained genes obtained from RNA-seq.
Densitometry and analysis
All densitometry of Western blot bands was performed by calculating
integrated band intensities with LiCor ImageStudio 5.2.5.
Data presentation
All browser tracks were prepared using Gviz[70] based on bigWig files. Heatmaps were prepared
using heatmap.2() from the ‘gplots’ R package, and the
‘pheatmap’ R package. MA plots were prepared using heatscatter()
from the ‘LSD’ R package. Color combinations were heavily
influenced by the “Paired” and “Set1” palettes
from the ‘RColorBrewer’ R package.
Analysis of overlap with superenhancers in different cell types
Direct overlap with superenhancers from multiple cell types was
performed by downloading the current version of the dbSUPER database of
superenhancers[52] .
Briefly, the database of superenhancers was downloaded in BED format for each
cell type and fractional overlap was assessed for each peak of ATAC-seq
decreased sites using Bedtools. Homology to sequences in the human genome was
determined using the liftOver tool from the UCSC Genome Browser to map sequence
positions from the mouse mm9 genome build to the human hg19 genome build.
Data availability
Sequencing data sets generated and analyzed in this study are available in
the Gene Expression Omnibus (GEO) repository under accession numbers GSE88968,
GSE94041, and GSE98605. Source data for Figure
1 are available with the paper online. Three-dimensional coordinates of
the mouse SMARCA4 homology model are provided in PDB format as Supplementary Dataset 1.
Three-dimensional coordinates of the human SMARCA4 homology model are provided in
PDB format as Supplementary
Dataset 2. Other data and materials are available from the authors upon
request. A Life Sciences Reporting Summary for this article is available.
Authors: Charles Y Lin; Jakob Lovén; Peter B Rahl; Ronald M Paranal; Christopher B Burge; James E Bradner; Tong Ihn Lee; Richard A Young Journal: Cell Date: 2012-09-28 Impact factor: 41.582
Authors: Marcin Imielinski; Alice H Berger; Peter S Hammerman; Bryan Hernandez; Trevor J Pugh; Eran Hodis; Jeonghee Cho; James Suh; Marzia Capelletti; Andrey Sivachenko; Carrie Sougnez; Daniel Auclair; Michael S Lawrence; Petar Stojanov; Kristian Cibulskis; Kyusam Choi; Luc de Waal; Tanaz Sharifnia; Angela Brooks; Heidi Greulich; Shantanu Banerji; Thomas Zander; Danila Seidel; Frauke Leenders; Sascha Ansén; Corinna Ludwig; Walburga Engel-Riedel; Erich Stoelben; Jürgen Wolf; Chandra Goparju; Kristin Thompson; Wendy Winckler; David Kwiatkowski; Bruce E Johnson; Pasi A Jänne; Vincent A Miller; William Pao; William D Travis; Harvey I Pass; Stacey B Gabriel; Eric S Lander; Roman K Thomas; Levi A Garraway; Gad Getz; Matthew Meyerson Journal: Cell Date: 2012-09-14 Impact factor: 41.582
Authors: Joanna M Lubieniecka; Diederik R H de Bruijn; Le Su; Anke H A van Dijk; Subbaya Subramanian; Matt van de Rijn; Neal Poulin; Ad Geurts van Kessel; Torsten O Nielsen Journal: Cancer Res Date: 2008-06-01 Impact factor: 12.701
Authors: Cigall Kadoch; Diana C Hargreaves; Courtney Hodges; Laura Elias; Lena Ho; Jeff Ranish; Gerald R Crabtree Journal: Nat Genet Date: 2013-05-05 Impact factor: 38.330
Authors: I Versteege; N Sévenet; J Lange; M F Rousseau-Merck; P Ambros; R Handgretinger; A Aurias; O Delattre Journal: Nature Date: 1998-07-09 Impact factor: 49.962
Authors: Sandra Schick; Sarah Grosche; Katharina Eva Kohl; Danica Drpic; Martin G Jaeger; Nara C Marella; Hana Imrichova; Jung-Ming G Lin; Gerald Hofstätter; Michael Schuster; André F Rendeiro; Anna Koren; Mark Petronczki; Christoph Bock; André C Müller; Georg E Winter; Stefan Kubicek Journal: Nat Genet Date: 2021-02-08 Impact factor: 38.330
Authors: Keren Machol; Justine Rousseau; Sophie Ehresmann; Thomas Garcia; Thi Tuyet Mai Nguyen; Rebecca C Spillmann; Jennifer A Sullivan; Vandana Shashi; Yong-Hui Jiang; Nicholas Stong; Elise Fiala; Marcia Willing; Rolph Pfundt; Tjitske Kleefstra; Megan T Cho; Heather McLaughlin; Monica Rosello Piera; Carmen Orellana; Francisco Martínez; Alfonso Caro-Llopis; Sandra Monfort; Tony Roscioli; Cheng Yee Nixon; Michael F Buckley; Anne Turner; Wendy D Jones; Peter M van Hasselt; Floris C Hofstede; Koen L I van Gassen; Alice S Brooks; Marjon A van Slegtenhorst; Katherine Lachlan; Jessica Sebastian; Suneeta Madan-Khetarpal; Desai Sonal; Naidu Sakkubai; Julien Thevenon; Laurence Faivre; Alice Maurel; Slavé Petrovski; Ian D Krantz; Jennifer M Tarpinian; Jill A Rosenfeld; Brendan H Lee; Philippe M Campeau Journal: Am J Hum Genet Date: 2018-12-20 Impact factor: 11.025
Authors: Jake J Reske; Mike R Wilson; Jeanne Holladay; Marc Wegener; Marie Adams; Ronald L Chandler Journal: Hum Mol Genet Date: 2020-12-18 Impact factor: 6.150
Authors: Chai Bandlamudi; Jessica A Lavery; Joseph Montecalvo; Azadeh Namakydoust; Natasha Rekhtman; Gregory J Riely; Adam J Schoenfeld; Hira Rizvi; Jacklynn Egger; Carla P Concepcion; Sonal Paul; Maria E Arcila; Yahya Daneshbod; Jason Chang; Jennifer L Sauter; Amanda Beras; Marc Ladanyi; Tyler Jacks; Charles M Rudin; Barry S Taylor; Mark T A Donoghue; Glenn Heller; Matthew D Hellmann Journal: Clin Cancer Res Date: 2020-07-24 Impact factor: 12.531
Authors: Paul A Wade; Katherine Lachlan; Karin Weiss; Hayley P Lazar; Alina Kurolap; Ariel F Martinez; Tamar Paperna; Lior Cohen; Marie F Smeland; Sandra Whalen; Solveig Heide; Boris Keren; Pauline Terhal; Melita Irving; Motoki Takaku; John D Roberts; Robert M Petrovich; Samantha A Schrier Vergano; Amy Kenney; Hanne Hove; Elizabeth DeChene; Shane C Quinonez; Estelle Colin; Alban Ziegler; Melissa Rumple; Mahim Jain; Danielle Monteil; Elizabeth R Roeder; Kimberly Nugent; Arie van Haeringen; Michael Gambello; Avni Santani; Līvija Medne; Bryan Krock; Cara M Skraban; Elaine H Zackai; Holly A Dubbs; Thomas Smol; Jamal Ghoumid; Michael J Parker; Michael Wright; Peter Turnpenny; Jill Clayton-Smith; Kay Metcalfe; Hitoshi Kurumizaka; Bruce D Gelb; Hagit Baris Feldman; Philippe M Campeau; Maximilian Muenke Journal: Genet Med Date: 2019-08-07 Impact factor: 8.822