Swneke D Bailey1,2, Xiaoyang Zhang3,4, Kinjal Desai3, Malika Aid5, Olivia Corradin6, Richard Cowper-Sal Lari1,7, Batool Akhtar-Zaidi6,8,9, Peter C Scacheri6,8, Benjamin Haibe-Kains1,2,5, Mathieu Lupien10,11,12. 1. The Princess Margaret Cancer Centre-University Health Network, Toronto, M5G 1L7, Ontario, Canada. 2. Department of Medical Biophysics, University of Toronto, Toronto, M5G 1L7, Ontario, Canada. 3. Department of Genetics, Norris Cotton Cancer Center, Dartmouth Medical School, Lebanon, 03755, New Hampshire, USA. 4. Present address: Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA. 5. Bioinformatics and Computational Genomics Laboratory, Institut de Recherches Cliniques de Montréal (IRCM), Montreal, H2W 1R7, Quebec, Canada. 6. Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, 44106, Ohio, USA. 7. Present address: The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02139, USA. 8. Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, 44106, Ohio, USA. 9. Present address: Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts 02142, USA. 10. The Princess Margaret Cancer Centre-University Health Network, Toronto, M5G 1L7, Ontario, Canada. mlupien@uhnres.utoronto.ca. 11. Department of Medical Biophysics, University of Toronto, Toronto, M5G 1L7, Ontario, Canada. mlupien@uhnres.utoronto.ca. 12. Ontario Institute for Cancer Research, Toronto, M5G 1L7, Ontario, Canada. mlupien@uhnres.utoronto.ca.
Abstract
Chromatin interactions connect distal regulatory elements to target gene promoters guiding stimulus- and lineage-specific transcription. Few factors securing chromatin interactions have so far been identified. Here, by integrating chromatin interaction maps with the large collection of transcription factor-binding profiles provided by the ENCODE project, we demonstrate that the zinc-finger protein ZNF143 preferentially occupies anchors of chromatin interactions connecting promoters with distal regulatory elements. It binds directly to promoters and associates with lineage-specific chromatin interactions and gene expression. Silencing ZNF143 or modulating its DNA-binding affinity using single-nucleotide polymorphisms (SNPs) as a surrogate of site-directed mutagenesis reveals the sequence dependency of chromatin interactions at gene promoters. We also find that chromatin interactions alone do not regulate gene expression. Together, our results identify ZNF143 as a novel chromatin-looping factor that contributes to the architectural foundation of the genome by providing sequence specificity at promoters connected with distal regulatory elements.
Chromatin interactions connect distal regulatory elements to target gene promoters guiding stimulus- and lineage-specific transcription. Few factors securing chromatin interactions have so far been identified. Here, by integrating chromatin interaction maps with the large collection of transcription factor-binding profiles provided by the ENCODE project, we demonstrate that the zinc-finger protein ZNF143 preferentially occupies anchors of chromatin interactions connecting promoters with distal regulatory elements. It binds directly to promoters and associates with lineage-specific chromatin interactions and gene expression. Silencing ZNF143 or modulating its DNA-binding affinity using single-nucleotide polymorphisms (SNPs) as a surrogate of site-directed mutagenesis reveals the sequence dependency of chromatin interactions at gene promoters. We also find that chromatin interactions alone do not regulate gene expression. Together, our results identify ZNF143 as a novel chromatin-looping factor that contributes to the architectural foundation of the genome by providing sequence specificity at promoters connected with distal regulatory elements.
Cell fate determination relies on lineage-specific transcription programs set by master
transcription factors acting on distal regulatory elements, such as enhancers, and
proximal gene promoters1. Distal regulatory elements can be separated
from their target promoter(s) over large genomic distances. They are brought in close
proximity to one another through chromatin interactions/loops, defining the chromatin
architecture of the genome1. Close to 60% of chromatin interactions are
cell type specific23 and significantly correlate with lineage-specific
transcriptional programs2. These chromatin interactions form during
cellular differentiation45 and set the stage for stimulus-specific
transcriptional responses6. Although a role for non-coding RNA was
proposed, recent findings suggest that chromatin interactions rely on DNA sequences. For
instance, a single-nucleotide polymorphism (SNP) associated with pigmentation modulates
a chromatin interaction between a distal enhancer and the promoter of the oculocutaneous albinism II (OCA2) gene7. Similarly,
mutations in the DNA recognition sequence for the CCCTC-binding factor (CTCF) impinge on the formation of chromatin interactions8.CTCF is known to directly regulate the
formation of chromatin interactions in partnership with the cohesin and/or mediator
complexes9. It occupies distal regulatory elements located close to
enhancers51011 and defines the boundaries of topological domains
when paired with the cohesin complex1012. Genomic regions bound by the
mediator and cohesin complexes anchor interactions regulating lineage-specific gene
expression found within topological domains10. Although the mediator and
cohesin complexes lack DNA-binding domains, their recruitment to the chromatin commonly
coincides with CTCF1314 or other transcription factors such as the oestrogen receptor alpha15. However, CTCF and oestrogen receptor alpha bind chromatin far from promoter
regions1516 and cohesin-binding sites found at promoters relate to
tissue-specific transcription15. This suggests the existence of a
yet-to-be identified promoter-bound DNA recognition factor(s) capable of specifying the
target gene promoter(s) of distal regulatory elements.Here we report the enrichment of the zinc-finger protein ZNF143 at anchors of chromatin interactions
connecting promoters with distal regulatory elements. Our results indicate that
ZNF143 is directly recruited to
the promoter of genes engaged in chromatin interactions, where it binds to its DNA
recognition sequence. We also show that modulating ZNF143 binding by SNPs directly impacts chromatin interaction
frequencies. This reveals the dependency of chromatin interactions on DNA sequence and
implies that chromatin interactions can be affected by genetic alterations (genetic
variants or mutations) associated with inherited traits and diseases. Overall, our
results demonstrate that ZNF143 is a
new factor controlling the formation of chromatin interactions.
Results
ZNF143 binds promoters
and forms distal phantom events
CTCF and cohesin complex
proteins form a cluster distinct from other transcription factors, especially
those bound at gene promoters. To identify the transcription factor(s) involved
in securing chromatin interactions between promoters and distal regulatory
elements, we first looked for factors that bridge promoter factors with the
CTCF-cohesin cluster. For
this, we correlated the chromatin immunoprecipitation (ChIP)-seq signal
intensities of more than 70 transcription factors profiled by the Encyclopedia
of DNA elements (ENCODE)17 project across all regions of open
chromatin (see Methods) in GM12878 or K562 cells. In agreement with previous
reports, we find that ZNF143
is unique because it associates with the ‘CTCF-cohesin’ cluster18 in both cell lines (Supplementary Fig. 1). However, we show that its genome-wide-binding
profile is most similar to promoter-bound factors (Supplementary Fig. 1). In agreement,
ZNF143’s
correlation with the CTCF-cohesin cluster relies on its weakest binding sites (Fig. 1a), found primarily at distal regulatory elements
defined by the ‘CTCF-rich’ chromatin state19 (Fig. 1b). The strongest ZNF143-binding sites map to promoters (Fig.
1b) bound by RNA polymerase II (POL2; Fig. 1a)
and other promoter-associated factors, such as the TATA-binding protein (TBP) and the TBP-associated protein,
together forming a ‘promoter’ cluster (Supplementary Fig. 1). This agrees with the
reported enrichment of ZNF143’s DNA recognition motif at promoters20. These same strongest ZNF143-binding sites associate with weak CTCF and cohesin binding (Fig. 1a). Of all the transcription factors profiled using ChIP-seq
by the ENCODE project, ZNF143
is the only one correlated with the ‘CTCF-cohesin’ and the ‘promoter’
clusters in both GM12878 and K562 cells (Supplementary Fig. 1) indicating its potential role in mediating
chromatin interactions involving gene promoters.
Figure 1
ZNF143 binds promoters and
occupies CTCF and cohesin
bound distal regulatory elements.
(a) A heatmap of the signal intensities from ChIP-seq assays against
ZNF143, CTCF, SMC3 and POL2 across all
ZNF143-binding sites
(± 5 kilobases (kb)) called in GM12878 cells. (b) Violin plots
of the signal intensities from ChIP-seq assays against ZNF143, CTCF, SMC3 and POL2 at their respective
binding sites and the distributions of these sites across chromatin states
defined by epigenetic modifications. The violin plots are split to show the
distribution of the top decile of each factor separately. Enh, Enhancer;
Ins, Insulator; Pro, Promoter; Tx, Transcription. (c) A bar plot
revealing the fraction of ZNF143 chromatin-binding sites in GM12878 cells that
harbour its DNA recognition sequence. (d) The average binding
intensity of ZNF143,
CTCF, SMC3, and POL2 at POL2-bound
promoters (top) and CTCF-binding sites (bottom).
Enriched motif analysis reveals that more than 80% of the strongest ZNF143-binding sites harbour its DNA
recognition motif, while it is found in less than 30% of weak binding sites
(Fig. 1c). The presence of the motif suggests that
ZNF143 is recruited
directly to promoters where it binds next to POL2 (Fig.
1d). These results agree with its role as a promoter-bound
transcriptional activator2021222324. The fact that weak
ZNF143-bound sites rarely
harbour its DNA recognition motif and align with CTCF and the cohesin complex (Fig. 1b,d), suggests that ZNF143 indirectly binds distal regulatory elements. Although
tethering mechanisms allow indirect protein binding to the chromatin25, phantom binding events2627 resulting from the
use of crosslinked cells in ChIP-seq assays where chromatin interaction are
stabilized was recently proposed to account for indirect transcription factor
binding to the chromatin. Strong ZNF143 binding at sites deprived of its recognition motifs
may arise from chromatin interactions from a single enhancer, such as the locus
control region (LCR) at the β-globin gene cluster (see below), to multiple
gene promoters. Together our results support the direct binding of ZNF143 at promoters and indirect binding
to CTCF and the cohesin
complex bound distal regulatory elements, which may arise due to chromatin
interactions.
ZNF143 occupies the
anchors of chromatin interactions
A central feature of chromatin-looping factors is that they occupy anchors of
chromatin interactions3. High-resolution (~4 kilobases
(kb)) genome-scale maps of chromatin interactions generated from carbon-copy
chromatin conformation capture (5C) assays reveal 1,187 and 1,726
intrachromosomal pairs of chromatin interaction anchors in GM12878 and K562
cells, respectively3. Importantly, these chromatin interactions
are specific to promoters looping with distal regulatory elements3. Using these data, we determined the proportion of ZNF143-binding sites at chromatin
interaction anchors in comparison with the expected overlap calculated using
1,000 random-matched binding sets (RMBSs; see Methods). This revealed the
significant enrichment of ZNF143 at chromatin interaction anchors in both GM12878
(P=1.93 × 10−13) and K562 cells
(P=3.7 × 10−12; Fig. 2a
and Supplementary Fig. 2a). This
analysis also reveals the enrichment of CTCF and SMC3 at chromatin interaction anchors in GM12878 and K562
cells (P=3.28 × 10−12 and P=3.91 ×
10−12 for GM12878 cells and P=6.09 ×
10−14, P=2.50 × 10−6
for K562 cells, respectively; Fig. 2a and Supplementary Fig. 2a). Nominal significance
was detected for RAD21
(P=0.0475 and P=2.01 × 10−4 for
GM12878 and K562 cells, respectively; Fig. 2a and Supplementary Fig. 2a). None of the
seven additional factors (P300, ZNF384,
BHLHE40, MAZ, MXI1, TBP
and COREST) significantly
enriched at chromatin interaction anchors (Fig. 2a and
Supplementary Fig. 2a) share
the symmetrical nucleosome positioning or correlation with the
‘CTCF-cohesin’ cluster reported for ZNF143 (refs 17, 18, 28) (Supplementary Fig.
1a). These results agree with recent reports of ZNF143 enrichment within the anchors of
chromatin interactions called by paired-end tag sequencing (ChIA-PET) and
HiC2930.
Figure 2
ZNF143 preferentially
binds at chromatin interaction anchors.
(a) ZNF143-binding
sites across the genome are enriched within the anchors of chromatin
interactions reported in 5C assays. The normalized enrichment of
ZNF143 and other
transcription factors at both ends (anchors) of chromatin loops identified
by 5C assays in GM12878 cells is shown. Box plots represent the normalized
null distribution derived from the comparison between chromatin interactions
and 1,000 RMBSs. Red dots indicate the observed per cent overlap of the
transcription factor-binding sites within both 5C interaction anchors value
relative to the generated null distribution represented as z scores.
(b) Venn diagram depicting shared versus cell type-specific
ZNF143-binding sites
identified by ChIP-seq assays in GM12878, K562 and HelaS3 cells. (c)
The above panel shows the percentage of cell type-specific chromatin
interactions defined by 5C assays that harbour a DNaseI hypersensitivity
site (DHS) bound by ZNF143 specifically in GM12878, K562 or HelaS3 cells.
The bottom panel represents the proportion of promoters (± 2.5
kilobases (kb) from the transcription start site) of genes uniquely
expressed in GM12878, K562 or HelaS3 bound by ZNF143 specific to one of these
cell lines (G: GM12878, K: K562, H: HelaS3). The P value is derived
from a χ2-test; (NS) not significant;
*P≤0.05; ***P≤0.001. (d) Signal
intensities for 10 different epigenetic modifications profiled by ChIP-seq
in GM12878 (red), K562 (blue) and HelaS3 (green) cells across the unique top
decile ZNF143-binding
sites reported in GM12878 (top panel), K562 (middle panel) and HelaS3
(bottom panel) cells. The shaded area represents the s.e.m.
Considering that different cell types have distinct chromatin architectures, we
assessed whether ZNF143-binding events correspond with cell type-specific
chromatin interactions and gene expression. First, we compared the ZNF143-binding sites called in GM12878,
K562 and HelaS3 cells. This revealed thousands of cell type-specific sites
(Fig. 2b) and is similar to what is observed for
CTCF and cohesin3132. Comparing cell type-specific ZNF143-binding sites with chromatin
interactions unique to GM12878, K562 or HelaS3 cells revealed that ZNF143 binding directly relates to cell
type-specific chromatin interactions (Fig. 2c and Supplementary Fig. 2b). Epigenetic
modifications, such as the mono- and dimethylation of lysine 4 on histone 3
(H3K4me1 and H3K4me2, respectively) may contribute to the cell type specificity
of ZNF143, since these
modifications can assist transcription factors binding and relate to cell
type-specific binding profiles333435. In agreement, the
strongest cell type-specific ZNF143-binding sites harbour epigenetic modifications
typical of active chromatin1936, namely histone 3 lysine 4
monomethylation (H3K4me1), H3K4me2, histone 3 lysine 27 acetylation (H3K27ac)
and the histone variant H2A.Z
(Fig. 2d). Focusing on genes uniquely expressed in
GM12878, K562 or HelaS3 cells reveals that cell type-specific ZNF143 binding correlates with
differential gene expression (Fig. 2c and Supplementary Fig. 2c).The cell type-specific association between ZNF143 binding, chromatin interactions and gene expression
is exemplified by the LCR found ~50 kb upstream of the
β-globin gene cluster. The promoters of the β-globin genes
(haemoglobin delta
(HBD) and
haemoglobin gamma A
(HBG1)) are
bound by ZNF143 only in K562
cells (Supplementary Fig. 2e). The
LCR harbours a single ZNF143-binding site shared between GM12878, K562 and HelaS3
cells (Supplementary Fig. 2e).
Using an intercellular feature correlation (IFC) tool (see Methods), we
predicted interactions between the LCR and the promoter of the HBD and HBG1 genes in K562 but not in
GM12878 or HelaS3 cells (Supplementary
Fig. 2d). Chromatin conformation capture (3C) assays confirmed that
chromatin interactions connect the LCR and the promoter of the HBD and HBG1 genes only in K562 cells
(Supplementary Fig. 2d). This
agrees with these genes being expressed exclusively in K562 cells (Supplementary Table 1 and Supplementary Data 1). Chromatin interactions
predicted in all three cell lines for a ubiquitously expressed gene, such as the
one connecting the TBL1XR1 promoter to an ~160 kb upstream
regulatory element, validate by 3C assays in all cell lines (Supplementary Fig. 2e). These results support
the preferential binding of ZNF143 at chromatin interaction anchors, including cell
type-specific anchors related to lineage-specific transcriptional programs.
ZNF143 is required for
chromatin interactions
To directly assess the requirement of ZNF143 for the formation of chromatin interactions between
promoter and distal regulatory elements, we determined the impact of modulating
ZNF143 binding to the
chromatin on the frequency of chromatin interactions. We first focused on the
chromatin interactions predicted by IFC in HelaS3 cells between distal
regulatory elements and the promoter of the transducing beta-like 1 X-linked receptor (TBL1XR1) or the eukaryotic translation elongation factor
1-alpha (EEF1A1) genes (Fig. 3). Using
3C assays anchored at the promoters of the TBL1XR1 or EEF1A1 genes, we validated a series of predicted
chromatin interactions (Fig. 3a,b,e,f). Depletion of
ZNF143 using
small-interfering RNA (siRNA)-based silencing in HelaS3 cells significantly
decreased the frequency of these chromatin interactions (Fig.
3b,f). Consistently, a reduction in ZNF143 binding at the distal regulatory
elements and promoters of the TBL1XR1 and EEF1A1 genes was observed (Fig.
3c,g), as was a decrease in the expression of both the
TBL1XR1 and
EEF1A1 genes
(Fig. 3d,h). Overall, these results support a role for
ZNF143 in chromatin loop
formation.
Figure 3
ZNF143 is required for the
formation of chromatin interactions.
(a) Chromatin interactions predicted by the IFC analysis anchored on
the TBL1XR1 gene
promoter are represented by Bezier curves. Signal and peak files for
ZNF143, SMC3, RAD21 and CTCF defined by ChIP-seq assays in
HelaS3 are presented. Test (t1 and t2) regions (black boxes) and negative
control (nc1–5) regions (grey boxes) are shown. (b) 3C assays
anchored at the TBL1XR1
gene promoter reveal the interactions frequencies at a number of predicted
chromatin interactions in HelaS3 cells transfected with the siRNA control
(green bars). These interactions are diminished on silencing ZNF143 (grey bars). (c)
ChIP-qPCR assays against ZNF143 at the TBL1XR1 gene promoter (proximal) and distal site
(t1) mapping to the chromatin interactions are presented in HelaS3 cells
transfected with the siRNA control (green bars). The ChIP signal is
diminished on silencing ZNF143 (grey bars). (d) RT–qPCR assays
reveal the expression of the TBL1XR1 gene in HelaS3 cells transfected with
the siRNA control (green bars) and on silencing ZNF143 (grey bars).
(e–h) Similar to a–d but for
the EEF1A1 gene
locus The P value is derived from a t-test;
*P≤0.05; **P≤0.01. t1=test region (black boxes);
nc1–4=negative control regions (grey boxes). Error bars indicate the
s.e.m. Experiments were performed in triplicate. rel., relative.
The global depletion of ZNF143
induced by silencing its expression using siRNAs can indirectly impact chromatin
interactions. To bypass this limitation, we identified SNPs inducing
allele-specific binding of ZNF143 to the chromatin and determined their impact on
chromatin interactions. We first identified SNPs heterozygous in GM12878 cells
found at ZNF143-bound sites
using the genotype data provided by the 1,000 genomes project37.
Using our allele-specific binding from ChIP-seq (ABC) tool (see Methods), we
then identified 28 SNPs displaying an allele-specific bias in the ZNF143 ChIP-seq reads from GM12878 cells
(P<0.005). Two SNPs, rs2232015 and rs13228237, located within the
promoter of the protein arginine
methyltransferase 6 (PRMT6) and the first intron of the zinc-finger CCCH-type antiviral 1
(ZC3HAV1)
genes, respectively (Fig. 4), were in close proximity
(~300 bp) to restriction sites for HindIII (enzyme used in the 3C
assay). The rs2232015 SNP maps to the fourth position of the ZNF143 DNA recognition sequence (motif
1; Fig. 4a) the most prominent motif found within
~85% of the top 500 sites. The rs13228237 SNP changes the fourteenth
position of a reported extension of a ZNF143 DNA recognition sequence2238 (motif
2; Fig. 4b), which is found within ~25% of the top
500 sites. Consistent with the observation that the actual ZNF143-binding sites are located at
gene promoters, ~43% and ~76% of gene promoters
(±2.5 kb of the transcription start site) bound by ZNF143 were found to contain motif 1 or
motif 2 (motif P values <1 × 10−4) in
GM12878 cells, respectively. Interesting, motif 2 appears to be the most
prominent ZNF143 motif found
at gene promoters and most closely resembles the ZNF143 motif characterized using in
vitro methods2239. The imposed changes to the DNA
sequence based on the position-weighted matrix predict preferential binding of
ZNF143 to the reference A
and the variant C allele of the rs2232015 and rs13228237 SNPs, respectively,
compared with the other alleles (Fig. 4a,b). In agreement,
242 reads from the ZNF143
ChIP-seq data, mapping to the rs2232015 SNP, contain the reference A allele and
136 reads contain the variant T allele (P=5.47 ×
10−8; Fig. 4c). Likewise, of the
25 reads mapping to the rs13228237 SNP, five contain the reference G allele and
20 contain the variant C allele (P=4.08x10−3; Fig. 4d). Importantly, the signal intensity of the
ZNF143-binding site
containing the rs13228237 SNP is high (n=175) indicating that this SNP
falls within the centre of the inferred ZNF143-binding site and between the positive and negative
strand peaks of the unprocessed ChIP-Seq reads (Fig. 4d).
Allele-specific ChIP-quantitative PCR (qPCR) assays against ZNF143 in GM12878 cells validated the
predicted allelic imbalance for both SNPs (Fig. 4e,f and
Supplementary Fig. 3).
Consistent with ZNF143 being
directly responsible for chromatin loop formation, the decreased binding of
ZNF143 to the chromatin
caused by the variant allele at the rs2232015 SNP leads to a corresponding
allele-specific reduction of the chromatin interaction frequency measured by 3C
assays between the PRMT6 promoter and a distal regulatory element
~85 kb away (Fig. 4e and Supplementary Fig. 3). Interestingly, the
rs2232015 SNP modulates a portion of the ZNF143 recognition motif that is shared with THAP11 and recently shown in
vitro to be dispensable for ZNF143 binding22. These results, while
revealing that ZNF143 is
required, may indicate that a complex of factors specify chromatin interactions.
Similarly, the increased binding of ZNF143 to the chromatin caused by the variant C allele of
the rs13228237 SNP leads to an increase in the chromatin interaction frequency
between the first intron of the ZC3HAV1 gene and two distal regulatory elements
located ~200 kb away (Fig. 4f and Supplementary Fig. 3).
Interestingly, this ZNF143-binding site is located ~14 kb from the
transcription start site of the ZC3HAV1 gene and may represent an unknown isoform of
ZC3HAV1 gene.
Consistently, a transcription start site was predicted from 5′ cap
analysis of gene expression data 89 bp from the rs13228237 in GM12878 by
the ENCODE project (Supplementary Fig.
4). Expression quantitative trait loci (eQTL) analysis of the
rs2232015 and rs13228237 SNPs using RNA-Seq data from lymphoblastoid cells
(n=373) (ref. 40) genotyped as part of
the 1,000 Genomes Project41 reveals that the ZC3HAV1 expression is modulated
by the rs13228237 SNP in lymphoblastoid cells (P=1.73 ×
10−3; Fig. 4f). However, the
rs2232015 SNP is not significantly associated with the expression of the
PRMT6 gene
(P=0.063; Fig. 4e). This coincides with a
repressed element and poised promoter chromatin state at the distal regulatory
element looping to the PRMT6 promoter in the GM12878 cells (Supplementary Fig. 5), which contrasts with
the active state at regulatory elements looping to the ZC3HAV1 promoter (Supplementary Fig. 5). Interestingly, the
rs2232015 SNP is in strong linkage disequilibrium
(r2≥0.95) with two reported eQTLs captured by the
rs1762509 and rs9435441 SNPs4243. The rs1762509 and rs9435441
SNPs lead to allele-specific expression of the PRMT6 gene within the liver
cells and monocytes, respectively4243. Consistently, the
interacting distal regulatory element looping to the PRMT6 promoter is in an active
state within liver cells (Supplementary
Fig. 5). This suggests that chromatin interactions are not sufficient
to impact gene expression, as recently reported at the β-globin locus44 and that ZNF143 role in loop formation is not dependent on gene
transcription.
Figure 4
Genetic variants modulate ZNF143 binding to the chromatin changing the frequency of
chromatin interactions.
(a) Position of the rs2232015 SNP with regards to one of the
ZNF143 DNA
recognition sequences (motif 1). (b) Position of the rs13228237 with
regards to the second ZNF143 DNA recognition sequence (motif 2). (c)
Location of the rs2232015 SNP with respect to the binding profiles of
ZNF143, SMC3, RAD21 and CTCF in GM12878 cells (left panel).
Allele-specific bias in the ZNF143 ChIP-seq reads at the rs2232015 SNP is shown
(right panel). The number of reads mapping to the positive strand (solid
grey) and negative strand (dashed grey) are also shown for both the
reference and variant allele. (d) Same as for c but for the
rs13228237 SNP. (e) Allele-specific ChIP-qPCR against ZNF143 and 3C-qPCR results at the
rs2232015 SNP in GM12878 cells are presented. Bar charts illustrate the bias
in the allele ratio for the rs2232015 SNP in both assays. Error bars
indicate the s.e.m. Experiments were performed in triplicate. Results from
the eQTL analysis in lymphoblastoid cells are presented for the
PRMT6 gene.
Expression values are plotted by genotype. Expression values are presented
as probabilistic estimation of gene expression residuals (PEER) normalized
reads per kilobase of transcript per million mapped reads (RPKM). (f)
Same as in e but relevant to the rs13228237 SNP. Error bars indicate
the s.e.m.
Discussion
Cellular identity is dependent on lineage-specific transcriptional programmes set by
master transcription factors acting at regulatory elements that communicate with one
another through chromatin interactions1. Recently, the ENCODE
project17 observed well-positioned and symmetrical nucleosomes
flanking the binding sites of CTCF, RAD21
and SMC3, which contrasted the
variability observed surrounding the binding sites of other transcription factors
with the exception of ZNF143
(ref. 17). In agreement with this observation
representing a unique feature of chromatin-looping factors, we demonstrate that
ZNF143 is required at
promoters to stimulate the formation of chromatin interactions with distal
regulatory elements (Fig. 5). This aligns with its reported
role favouring POL2 occupancy at gene promoters22 and in the assembly
of the pre-initiation complex23. The fact that ZNF143 is ubiquitously expressed21 suggests that ZNF143 may be a regulator of the architectural foundations of
cell identity. Although the mechanisms accounting for cell type-specific
ZNF143-binding profiles are
unknown, chromatin interactions were recently reported to be set early during
lineage commitment6. In agreement, ZNF143 is required for zebrafish embryo
development45, for stem cell identity and for the self-renewal
ability of human embryonic stem cells4647. Altogether, our results
reveal that ZNF143 directly binds
promoters to secure chromatin interactions with distal regulatory elements.
ZNF143 provides a
sequence-dependent mechanism for the formation of chromatin interactions that can be
modulated by genetic variants underlying inherited traits and diseases.
Figure 5
Schematic representation of chromatin interactions involving gene
promoters.
ZNF143 contributes the
formation of chromatin interactions by directly binding the promoter of
genes establishing looping with distal element bound by CTCF.
Methods
Co-localization of transcription factor binding
Focusing on DHS sites identified by the Hotspots algorithm48 in
either GM12878 or K562 cells, we extracted the ChIP-seq signal from model
shifted wiggle files49 for ≥75 transcription factors
profiled in these cell lines. All these data sets are available through the
ENCODE project1750. Transcription factor ChIP-seq files were
downloaded from the ENCODE Data Coordination Center website (http://genome.ucsc.edu/ENCODE/), specifically from the
ENCODE/Stanford/Yale/USC/Harvard/ (GEO accession numbers: GSM935277; GSM935283;
GSM935294; GSM935301; GSM935309 to
GSM935311; GSM935316; GSM935319;
GSM935330 to GSM935332; GSM935336 to
GSM935338; GSM935340; GSM935343 to
GSM935345; GSM935349; GSM935355;
GSM935356; GSM935358; GSM935361;
GSM935363; GSM935368; GSM935371 to
GSM935378; GSM935385; GSM935386;
GSM935388; GSM935390 to GSM935394;
GSM935401; GSM935402; GSM935407;
GSM935409 to GSM935415; GSM935417;
GSM935419; GSM935420; GSM935422;
GSM935425; GSM935427 to GSM935431;
GSM935433; GSM935439; GSM935442;
GSM935450; GSM935464; GSM935466 to
GSM935475; GSM935478 to GSM935483;
GSM935487; GSM935488; GSM935490;
GSM935492; GSM935494 to GSM935497;
GSM935499 to GSM935507; GSM935516;
GSM935518; GSM935520; GSM935521;
GSM935524; GSM935532; GSM935539 to
GSM935541; GSM935544; GSM935546 to
GSM935549; GSM935556 to GSM935559;
GSM935562; GSM935564; GSM935565;
GSM935568; GSM935569; GSM935573 to
GSM935576; GSM935583; GSM935594;
GSM935595; GSM935597 to GSM935602;
GSM935608; GSM935611 to GSM935613;
GSM935616; GSM935618; GSM935631 to
GSM935634; GSM935642; GSM935645;
GSM935651 to GSM935653; GSM1003602 to
GSM1003605; GSM1003608 to GSM1003611;
GSM1003616; GSM1003617; GSM1003620 to
GSM1003622; GSM1003625; GSM1003634
GSM803338; GSM803341; GSM803342;
GSM803346; GSM803347; GSM803349 to
GSM803352; GSM803355; GSM803356;
GSM803362; GSM803363; GSM803378 to
GSM803380; GSM803383; GSM803384;
GSM803386 to GSM803392; GSM803401;
GSM803402; GSM803406 to GSM803408;
GSM803410; GSM803411; GSM803413;
GSM803414; GSM803416; GSM803420;
GSM803431; GSM803434; GSM803436;
GSM803439; GSM803440 to GSM803443;
GSM803446; GSM803447; GSM803468 to
GSM803471; GSM803473; GSM803477;
GSM803485; GSM803494; GSM803496;
GSM803504; GSM803505; GSM803508 to
GSM803511; GSM803515; GSM803520;
GSM803523 to GSM803525; GSM803531 to
GSM803534; GSM803537; GSM803538;
GSM803540; GSM1010721; GSM1010722;
GSM1010729 to GSM1010732; GSM1010744;
GSM1010745; GSM1010760; GSM1010771;
GSM1010779; GSM1010780; GSM1010782;
GSM1010820; GSM1010850; GSM1010867;
GSM1010877; GSM1010878; GSM1010881;
GSM1010890; GSM1010893 to GSM1010895;
GSM1010906) production laboratories. We
converted all.bam files to wiggle files (.wig files) using MACs 1.4 (ref.
49). DHS sites called by the Hotspots algorithm
for each cell line were downloaded from the ENCODE Data Coordination Center
website (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDnase/).The maximum signal intensity value of each transcription factor across all DHS
sites created the vectors used for the Pearson correlation (r)
calculation. Hierarchical clustering was then performed on the resulting
correlation matrix using average linkage and 1−r as the distance
metric. The input control was included in the analysis as a control. All
transcription factors with binding profiles that clustered with the control were
dismissed from the final figure. Since we correlated the binding profiles across
regions of open chromatin, this analysis not only removes failures but also
factors that bind to heterochromatin. This analysis was performed using the
first replicate for all transcription factors.
Transcription factor-binding sites across chromatin states
The chromHMM51-derived genomic annotations of chromatin states in
GM12878 and K562 cell lines were downloaded from the UCSC genome browser website
(http://genome.ucsc.edu).
The intersection between genomic annotations and the summit of the binding sites
for transcription factors were performed using the BEDTools software
package52.
Enrichment of transcription factor binding at looping sites
Carbon-copy chromatin conformation capture (5C) data sets generated in GM12878,
K562 and HelaS3 cell lines were downloaded from the ENCODE Data Coordination
Center website (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUmassDekker5C/)
(GEO Accession numbers: GSM970499, GSM970500, GSM970497).The proportion of paired-end tags (PET) where both interacting anchors overlap
transcription factor-binding sites (peak files) was determined using a custom
Perl script. The significance of this overlap was compared with that of 1,000
simulated random-matched binding sets (RMBSs) for each transcription factor.
Each simulated RMBS matched the experimental set in chromosome distribution,
absolute number, and size of the binding sites. We randomly selected binding
sites of equal or greater size, trimming larger sites, from the complete set of
all possible binding sites defined by the union of all reported binding sites
for all transcription factors in a given cell line provided by the ENCODE
project. Therefore, the probability of selecting a given binding site was equal
to its observed frequency in all of the profiled transcription factors.
Two-tailed P values were calculated from z scores using the
generated null distributions.This analysis was performed using the first replicate for all transcription
factors and when multiple groups profiled the same factor the first replicate
from the larger data set was used.
Identification of uniquely expressed genes
RNA-Seq data for the three cell lines, in four replicates, were downloaded from
NCBI gene Expression Omnibus (GEO accession numbers: GSM591661; GSM591673;
GSM591664; GSM591664; GSM958728;
GSM958730; GSM591670; GSM591671;
GSM591682; GSM591659; GSM765402;
GSM767848; GSM883635; GSM672833;
GSM591666; GSM591668; GSM591679;
GSM591660; GSM958729, Supplementary
Table 1). Reads were aligned to the human genome hg19 using the TopHat
software tool version 2 (ref. 53). To identify
genes that are uniquely expressed in each of the three cell lines, we used the
Cufflinks software tool version 2.1.1 (ref. 53).
First, we filtered all genes that have an FPKM (fragments per kilobase of exon
per million fragments mapped) value equal to 0 (no expression) in all three cell
lines. Next we identified genes that are unique to each cell line (expressed in
one cell line and not in the others) and genes found to be expressed in more
than one cell line (commonly expressed genes). To identify differentially
expressed genes between the three cell lines first, we did a one per one
comparison (K562-HelaS3, K562-GM12878 and HelaS3-GM12878). Then we performed one
per two comparisons to identify genes differentially expressed in one specific
cell line compared with the others.
Predicting chromatin interactions
We predicted chromatin interaction using an intercellular feature correlation
(IFC) approach similar to PreSTIGE (http://prestige.case.edu)54 and others195556 to calculate the Pearson correlation coefficient
(r) between two DHS sites based on the DNaseI hypersensitivity
signals generated by DNase-seq across all cell lines available by the ENCODE
project (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDnase/;
GEO accession numbers: GSM736491 to GSM736639). To provide cell type specificity to the
correlation analysis, we calculated the correlation coefficient, using DNase-seq
data sets from all available cell lines, only at DHS sites identified by the
hotspots algorithm48 for K562, GM12878 or HelaS3 independently.
This provides for correlated DHS in K562, GM12878 or HelaS3 cells, respectively.
We also restricted our analysis to ±500 kb surrounding the DHS
anchor site that contained our region of interest (promoter or ZNF143-binding site).
Allele-specific transcription factor binding
To call SNPs displaying an allele-specific bias in transcription factor binding,
we developed a software tool, which we refer to as allele-specific binding from
ChIP-seq (ABC). ABC directly compares differences in read abundance between
reference and variant alleles using a binomial probability test at heterozygous
SNPs identified by genotyping. The genotype information for the GM12878
(NA12878) cell line was downloaded from the 1000 Genomes Project’s
website (www.1000genomes.org). The ABC approach relies on the number of
aligned reads and contains the highest power to detect an allelic imbalance on
the edges of an identified binding site, or the maxima of each strand-specific
peak, obtained from single-end reads based on technical biases created by
short-read sequencing of the ends of ChIP fragments. Thus, ABC also aims to
determine the location of a particular genetic variant within a given binding
site by assessing the strand distribution of reads containing the two alleles,
not to be confused with a strand bias test applied for genotyping
algorithms57, since unlike genomic DNA the null expectation
of equal coverage of a particular genetic variant by reads in both orientations
is not held for reads derived from ChIP-seq assays. In addition, a position bias
where the alleles of a genetic variant are not equally distributed along the
length of the reads spanning it can be used to identify potential false-positive
allele-specific binding or potential transcription factor repositioning events.
ABC currently applies the Mann–Whitney U-test to assess a
potential position bias. SNPs violating the position test are dismissed.We restricted our analysis to heterozygous SNPs reported in GM12878 by the 1000
Genomes Project58. We prioritized SNPs mapping to the
ZNF143 DNA recognition
sequence. ABC was then employed to identify heterozygous SNPs leading to
observable allele-specific biases in the sequencing reads obtained from ChIP-seq
assays against ZNF143 within
GM12878 cells. Finally, we filtered out SNPs found within repetitive regions and
known segmental duplications because these variables can confound
allele-specific analyses. The ABC code used to identify SNPs causing
allele-specific binding can be accessed via GitHub (https://github.com/mlupien/ABC).For the eQTL analysis, RNA-seq performed on lymphoblastoid cell lines from the
1,000 genomes project41 was obtained from Lappalainen et
al.40 (EBI ArrayExpress accessions: E-GEUV-1, E-GEUV-2,
E-GEUV-3). 373 European individuals from four populations (CEPH (CEU) Finns
(FIN) British (GBR) and Toscani (TSI)) were stratified by SNP genotype.
Probabilistic estimation of gene expression residuals PEER59
normalized reads per kilobase of transcript per million mapped reads (RPKMs) was
associated with SNP genotype using linear regression.
Cell culture and transfection
GM12878, K562 and HelaS3 cells were grown in the RPMI (15% FBS), RMPI (10% FBS)
and DMEM (10% FBS) media, respectively. For siRNA transfection, HelaS3 cells
were transfected with scrambled siRNA (siNC) or siZNF143 using Lipofectamine RNAiMAX (Life
Technologies, 13378). RNA was extracted 72 h after
transfection using Qiagen RNeasy Kit
(Qiagen, 74104). Pre-verified Silencer Select
siRNAs (Life Technologies, s15192 and s15194) targeting ZNF143 was used: 5′-
GCAGAUUGUUUUACAAGGA -3′ and 5′- CGGUCGGUCCUUUACAACA -3′.
The GM12878 cells were obtained from the Coriell Institute for Medical Research
(www.coriell.org; Catalogue
ID GM12878). The K562 and HelaS3 cells were obtained from the American Type
Culture Collection (ATCC) (www.atcg.org; ATCC number CCL-243 and ATCC number CCL-2.2,
respectively).
Chromatin conformation capture (3C) assay
Chromosome Conformation Capture (3C) assays were performed as we previously
described60. In brief, cells were counted and balanced to the
same number (six million) before the 3C experiments to allow for comparison
between different cell types or treatments. Cells were crosslinked and lysed.
Chromatin was digested using 400 units of HindIII, followed by ligation with
4,000 units of T4 DNA ligase (NEB M0202S). Crosslinks were reversed by Qiagen
proteinase K digestion. 3C products were purified by phenol–chloroform
extraction, followed by qPCR. To control for random digestion, ligation and
different primer efficiencies, randomly ligated DNA fragments within the tested
loci were generated as previously described6162636465666768. A standard curve for the Ct
value of each 3C primer pair, anchor and bait, were generated from these
randomly ligated DNA fragments. The 3C frequency of each primer pair was
normalized to their corresponding standard curves and was further normalized to
a loading control, primers hybridized to the genomic region of the RHO
gene. Primers used are listed in Supplementary Table 2.
Chromatin immunoprecipitation (ChIP)
ChIP followed by qPCR was performed as we previously described33.
In brief, cells were crosslinked and lysed. Chromatin was sonicated and
immunoprecipitated with anti-ZNF143 (Novus Biologicals H00007702-M01), followed by
reverse crosslinking and DNA extraction. Four μg of anti-ZNF143 was used per five million cells
in each experiment. For ChIP assays after siRNA treatment, cells were harvested
72 h after transfection. The number of cells was counted and balanced
before ChIP. Primers used are listed in Supplementary Table 2.
Gene expression
RNA was isolated from HelaS3 cells using the QIAGEN RNeasy mini kit according to
manufacturer’s recommendations. The purified RNA was treated with DNaseI
to remove any possible DNA contamination. Reverse transcription PCR (RT) was
performed to convert RNA into cDNA using an ABI high-capacity cDNA reverse
transcription kit. The expression level of the queried genes was quantified by
qPCR (RT–qPCR), as previously described60. Primers used
are listed in Supplementary Table
2.
In vivo allele-specific ChIP assay
In vivo allele-specific ChIP assays were performed as we previously
described69. In brief, anti-ZNF143 immunoprecipitated and genomic
input DNA was qPCR amplified using allele-specific mismatch amplification
mutation assays primers70 to reveal the relative level of
enrichment for each allele. To confirm the allele specificity, the PCR product
from anti-ZNF143
immunoprecipitated and genomic input DNA were sequenced by Sanger sequencing.
Primers used are listed in Supplementary
Table 2.
In vivo allele-specific 3C assay
In vivo allele-specific 3C assays were performed as we previously
described69. A forward primer hybridizing to a sequence
outside of each SNP and its closest HindIII restriction enzyme site was used to
target each SNP region. A reverse primer hybridizing to a sequence close to the
HindIII site from the distal site was used to target the distal interacting
region. Each primer pair was used to amplify the HindIII 3C product from GM12878
cells. The amplified 3C products were assessed by qPCR, using allele-specific
mismatch amplification mutation assay primers, to determine the relative level
of each allele of the SNP involved in the chromatin loop. Allele specificity was
further demonstrated through Sanger sequencing of the amplified 3C products.
Additional information
How to cite this article: Bailey, S. D. et al. ZNF143 provides sequence specificity to
secure chromatin interactions at gene promoters. Nat. Commun. 6:6186 doi:
10.1038/ncomms7186 (2015).
Authors: Robert Wuerffel; Lili Wang; Fernando Grigera; John Manis; Erik Selsing; Thomas Perlot; Frederick W Alt; Michel Cogne; Eric Pinaud; Amy L Kenter Journal: Immunity Date: 2007-11-01 Impact factor: 31.745
Authors: Nathaniel D Heintzman; Gary C Hon; R David Hawkins; Pouya Kheradpour; Alexander Stark; Lindsey F Harp; Zhen Ye; Leonard K Lee; Rhona K Stuart; Christina W Ching; Keith A Ching; Jessica E Antosiewicz-Bourget; Hui Liu; Xinmin Zhang; Roland D Green; Victor V Lobanenkov; Ron Stewart; James A Thomson; Gregory E Crawford; Manolis Kellis; Bing Ren Journal: Nature Date: 2009-03-18 Impact factor: 49.962
Authors: Qianben Wang; Wei Li; Yong Zhang; Xin Yuan; Kexin Xu; Jindan Yu; Zhong Chen; Rameen Beroukhim; Hongyun Wang; Mathieu Lupien; Tao Wu; Meredith M Regan; Clifford A Meyer; Jason S Carroll; Arjun Kumar Manrai; Olli A Jänne; Steven P Balk; Rohit Mehra; Bo Han; Arul M Chinnaiyan; Mark A Rubin; Lawrence True; Michelangelo Fiorentino; Christopher Fiore; Massimo Loda; Philip W Kantoff; X Shirley Liu; Myles Brown Journal: Cell Date: 2009-07-23 Impact factor: 41.582
Authors: Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu Journal: Genome Biol Date: 2008-09-17 Impact factor: 13.583
Authors: Sevin Turcan; Vladimir Makarov; Julian Taranda; Yuxiang Wang; Armida W M Fabius; Wei Wu; Yupeng Zheng; Nour El-Amine; Sara Haddock; Gouri Nanjangud; H Carl LeKaye; Cameron Brennan; Justin Cross; Jason T Huse; Neil L Kelleher; Pavel Osten; Craig B Thompson; Timothy A Chan Journal: Nat Genet Date: 2017-11-27 Impact factor: 38.330
Authors: Sierra S Nishizaki; Natalie Ng; Shengcheng Dong; Robert S Porter; Cody Morterud; Colten Williams; Courtney Asman; Jessica A Switzenberg; Alan P Boyle Journal: Bioinformatics Date: 2020-01-15 Impact factor: 6.937
Authors: Bei Wei; Arttu Jolma; Biswajyoti Sahu; Lukas M Orre; Fan Zhong; Fangjie Zhu; Teemu Kivioja; Inderpreet Sur; Janne Lehtiö; Minna Taipale; Jussi Taipale Journal: Nat Biotechnol Date: 2018-05-21 Impact factor: 54.908
Authors: Yoon Hee Jung; Isaac Kremsky; Hannah B Gold; M Jordan Rowley; Kanchana Punyawai; Alyx Buonanotte; Xiaowen Lyu; Brianna J Bixler; Anthony W S Chan; Victor G Corces Journal: Mol Cell Date: 2019-05-02 Impact factor: 17.970
Authors: Haiyang Guo; Musaddeque Ahmed; Fan Zhang; Cindy Q Yao; SiDe Li; Yi Liang; Junjie Hua; Fraser Soares; Yifei Sun; Jens Langstein; Yuchen Li; Christine Poon; Swneke D Bailey; Kinjal Desai; Teng Fei; Qiyuan Li; Dorota H Sendorek; Michael Fraser; John R Prensner; Trevor J Pugh; Mark Pomerantz; Robert G Bristow; Mathieu Lupien; Felix Y Feng; Paul C Boutros; Matthew L Freedman; Martin J Walsh; Housheng Hansen He Journal: Nat Genet Date: 2016-08-15 Impact factor: 38.330
Authors: David Gonzalez; Annouck Luyten; Boris Bartholdy; Qiling Zhou; Miroslava Kardosova; Alex Ebralidze; Kenneth D Swanson; Hanna S Radomska; Pu Zhang; Susumu S Kobayashi; Robert S Welner; Elena Levantini; Ulrich Steidl; Gilbert Chong; Samuel Collombet; Min Hee Choi; Alan D Friedman; Linda M Scott; Meritxell Alberich-Jorda; Daniel G Tenen Journal: J Biol Chem Date: 2017-09-12 Impact factor: 5.157