Sreejana Ray1, Desiree Tillo1,2, Nima Assad1, Aniekanabasi Ufot1, Aleksey Porollo3,4, Stewart R Durell5, Charles Vinson1. 1. Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Room 5000, Building 37, Bethesda, Maryland 20892, United States. 2. Cancer Genetics Branch, National Cancer Institute, National Institutes of Health, Building 37, Bethesda, Maryland 20892, United States. 3. Center for Autoimmune Genomics and Etiology, Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio 45229, United States. 4. Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio 45267, United States. 5. Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health, Building 37, Bethesda, Maryland 20892, United States.
Abstract
Zta, the Epstein-Barr virus bZIP transcription factor (TF), binds both unmethylated and methylated double-stranded DNA (dsDNA) in a sequence-specific manner. We studied the contribution of a conserved asparagine (N182) to sequence-specific dsDNA binding to four types of dsDNA: (i) dsDNA with cytosine in both strands ((DNA(C|C)), (ii, iii) dsDNA with 5-methylcytosine (5mC, M) or 5-hydroxymethylcytosine (5hmC, H) in one strand and cytosine in the second strand ((DNA(5mC|C) and DNA(5hmC|C)), and (iv) dsDNA with methylated cytosine in both strands in all CG dinucleotides ((DNA(5mCG)). We replaced asparagine with five similarly sized amino acids (glutamine (Q), serine (S), threonine (T), isoleucine (I), or valine (V)) and used protein binding microarrays to evaluate sequence-specific dsDNA binding. Zta preferentially binds the pseudo-palindrome TRE (AP1) motif (T-4G-3A-2G/C 0T2C3A4 ). Zta (N182Q) changes binding to A3 in only one half-site. Zta(N182S) changes binding to G3 in one or both halves of the motif. Zta(N182S) and Zta(N182Q) have 34- and 17-fold weaker median dsDNA binding, respectively. Zta(N182V) and Zta(N182I) have increased binding to dsDNA(5mC|C). Molecular dynamics simulations rationalize some of these results, identifying hydrogen bonds between glutamine and A3 , but do not reveal why serine preferentially binds G3 , suggesting that entropic interactions may mediate this new binding specificity.
Zta, the Epstein-Barr virus bZIP transcription factor (TF), binds both unmethylated and methylated double-stranded DNA (dsDNA) in a sequence-specific manner. We studied the contribution of a conserved asparagine (N182) to sequence-specific dsDNA binding to four types of dsDNA: (i) dsDNA with cytosine in both strands ((DNA(C|C)), (ii, iii) dsDNA with 5-methylcytosine (5mC, M) or 5-hydroxymethylcytosine (5hmC, H) in one strand and cytosine in the second strand ((DNA(5mC|C) and DNA(5hmC|C)), and (iv) dsDNA with methylated cytosine in both strands in all CG dinucleotides ((DNA(5mCG)). We replaced asparagine with five similarly sized amino acids (glutamine (Q), serine (S), threonine (T), isoleucine (I), or valine (V)) and used protein binding microarrays to evaluate sequence-specific dsDNA binding. Zta preferentially binds the pseudo-palindrome TRE (AP1) motif (T-4G-3A-2G/C 0T2C3A4 ). Zta (N182Q) changes binding to A3 in only one half-site. Zta(N182S) changes binding to G3 in one or both halves of the motif. Zta(N182S) and Zta(N182Q) have 34- and 17-fold weaker median dsDNA binding, respectively. Zta(N182V) and Zta(N182I) have increased binding to dsDNA(5mC|C). Molecular dynamics simulations rationalize some of these results, identifying hydrogen bonds between glutamine and A3 , but do not reveal why serine preferentially binds G3 , suggesting that entropic interactions may mediate this new binding specificity.
The
bZIP motif occurs in a family of eukaryotic proteins with sequence-specific
dsDNA binding properties.[1,2] The bZIP domain is a
long bipartite alpha helix.[1] The C-terminal
is a leucine zipper coiled-coil region that homodimerizes and/or heterodimerizes.[3−7] The N-terminal interacts with the major groove of dsDNA in a sequence-specific
manner. Five conserved amino acids in the bZIP dsDNA-binding region
interact with the nucleotides (NXXASXXCR).[8−10] We will focus
on the invariant asparagine (N). Asparagine can form two hydrogen
bonds with adenine.[11,12] In the bZIP motif, the asparagine[2] forms two hydrogen bonds with adjacent nucleotides
on opposite strands of dsDNA.In all X-ray structures where
the bZIP dimer binds the T trinucleotide, including
the CREB complex with the CRE dsDNA palindrome (T),[13] and GCN4,[3] Fos-Jun,[4] and Zta[10] binding
the shorter pseudo-palindrome TRE (T), the asparagine side-chain
carbonyl oxygen accepts a hydrogen from N4 of C and the amide nitrogen donates a hydrogen to O4 of T on the opposite strand.[14]Asparagine in the bZIP domain of CEBPA binding to dsDNA recognizes
the A dinucleotide in the motif T.[15] Asparagine has a hydrogen bond
interaction with O4 of T as observed
for all the CRE and TRE structures, but now, the asparagine side-chain
carbonyl oxygen accepts a hydrogen from the N6 amino group of A that is stereochemically similar to N4 of C.[15] The Pap1-dsDNA
complex binds the T dinucleotide
(TAACGTT),[8] and
the asparagine does not contact any of the nucleotides. Figure S1 describes the interaction of these
bZIP domains with their preferred DNA motifs and illustrates the differences
among them.Mutagenesis experiments using a few dsDNA sequences
have determined
that the conserved asparagine contributes to sequence-specific dsDNA
binding.[16] In addition, recent studies
show that Zta binds dsDNA with modified cytosine. Zta binds many TRE
variants with methylated cytosine (M) replacing thymine
at two positions (M and M).[10,17] Zta also binds dsDNA with the methylated C/EBP half-site (GM) or its oxidative product 5-hydroxymethylcytosine
(5hmC, H)[10,18] or 5-formylcytosine,[19] similar to CREB1.[20] The impact of the conserved asparagine on biding to methylated cytosine
is unknown.We have extended these studies and examined five
mutants of the
conserved asparagine (Zta(N182Q), Zta(N182S), Zta(N182I), Zta(N182V),
and Zta(N182T)) binding four types of dsDNA including DNAs containing
modified cytosines using protein binding microarrays (PBMs).[18,21]
Results
PBMs with Four Types of
dsDNA and Data Analysis
Protein binding microarray (PBM)
experiments used Agilent DNA microarrays
containing 40,330 different single-stranded DNA 60-mers[22] in 16 sectors on a glass microarray slide.[18] We examined the sequence-specific dsDNA binding
of six GST-Zta chimeric constructs, Zta and five single amino acid
mutants (Q, S, I, V, or T) of Zta(N182) to four types of dsDNA. T7
DNA polymerase was used to generate three types of dsDNAs: (i) dsDNA
with cytosine on both strands (DNA(C|C)) and (ii, iii) dsDNA with
5mC or 5hmC in one strand and cytosine in the second strand ((DNA(5mC|C)
and DNA(5hmC|C)).[18,20,23] The fourth type of dsDNA was generated by enzymatic methylation
of both cytosines in all CG dinucleotides ((DNA(5mCG)).[24] Equal amounts of in vitro synthesized
chimeric protein containing GST and bZIP domains (Figure S2) were bound to these four types of dsDNA.[18] Binding was detected using a Cy5 conjugated
antibody to the GST epitope followed by the measurement of fluorescence
intensities at each of the array features.We evaluated the
PBM data in five ways. First, for an overall quantitative measure
of binding, we examined the fluorescence intensities of the 40,330
features in each of the 16 sectors on the glass slide. Second, we
examined the dependence on nucleotide and dinucleotide composition
on binding. Third, we determined a standardized score (Z-score)[25] for binding 8 bp long dsDNA
sequences (8-mers). The Z-score for a given 8-mer
is the number of standard deviations of the intensity of that 8-mer
(computed from all features containing the 8-mer) that is from the
global median intensity (computed from all array features). This measure
is sensitive to changes in median binding. Fourth, we generated dsDNA
logos[18] using the 50 strongest bound 8-mers.
Fifth, we examined the effects of single nucleotide variants (SNVs)
on binding the canonical TRE (T) motif 7-mer and related 7-mers containing
one or no cytosines. This allows us to determine the contribution
to binding of all nucleotides, including modified cytosines, at each
position in the motif.[18] We will first
describe the results for each of the five methods to binding unmodified
dsDNA and then binding to three dsDNAs containing modified cytosine.
Zta(N182) Mutants Binding with Unmodified
dsDNA(C|C)
Binding to 40,330 Features Containing dsDNA(C|C)
We compared previously published data for Zta binding to unmodified
dsDNA[18] and newly generated data for Zta
and five Zta(N182) mutants. Figure S8A presents
newly determined data for Zta (x axis), and Zta(N182Q)
(y axis) binding the 40,330 dsDNA features on the
array showing maximal Zta(N182Q) binding is about half less than Zta,
with median binding (a measure of non-specific binding) being 17-fold
weaker (Table S3). Zta(N182Q) binds some
features stronger than Zta, indicative of new sequence-specific dsDNA
binding (e.g., GTGAGTAA, TGAGTAAT, etc.)
(Figure S8A and Figure A). Zta(N182S) binding is about half that
of Zta, with median binding being 34-fold weaker (Figure S8B and Table S3). Again,
some features are more strongly bound by Zta(N182S) (e.g., TCACTCAT, ATCACTCA, ATGAGTGA, ATCACTGA, TCAGTGAC, etc.) than Zta, indicating new sequence-specific
dsDNA binding (Figure S8B and Figure B). Zta(N182T) binds
unmodified dsDNA features 3.3-fold weaker than Zta but with similar
specificity (Figure S8C and Figure C). Non-polar mutant Zta(N182I)
and Zta(N182V) median binding to unmodified dsDNA are similar to and
2.6-fold stronger than Zta, respectively (Table S3). However, they do not bind with specificity (Figure S11A–D), indicative of non-specific
binding and consistent with the importance of hydrogen bonds or other
electrically polarized interactions for affinity. The decreased median
bindings for Zta(N182Q) and Zta(N182S) contribute to the increased
skew in the distribution of binding intensities, indicating increased
specific dsDNA binding (Figure S3 and Table S4).
Figure 1
Zta, Zta(N182Q), Zta(N182S), and Zta(N182T) mutants binding to
DNA(C|C). (A) Scatter graph comparison of PBM 8-mer Z-scores of Zta (x axis) with Zta(N182Q) (y axis) dsDNA features containing cytosine in both strands
(DNA(C|C)). dsDNA features and 8-mers are colored into several groups:
those containing the TRE (TGA) and TRE variants changing C to G in one or both half-sites. Specific 8-mers are highlighted,
and their sequences are indicated with an arrow head that is strongly
bound by the Zta(N182Q) mutant and that is preferred by the wild-type
Zta. (B) Same as in panel (A) but for Zta(N182S) (y axis). Features and 8-mers are colored into several groups: those
containing the TRE and variants changing C to A. Similarly, some 8-mers features are
highlighted that are preferred by the Zta(N182S) mutant and that are
preferred by the thr wild type. (C) Same as in panel (A) but for Zta(N182T)
(y axis). Features and 8-mers are colored into several
groups.
Zta, Zta(N182Q), Zta(N182S), and Zta(N182T) mutants binding to
DNA(C|C). (A) Scatter graph comparison of PBM 8-mer Z-scores of Zta (x axis) with Zta(N182Q) (y axis) dsDNA features containing cytosine in both strands
(DNA(C|C)). dsDNA features and 8-mers are colored into several groups:
those containing the TRE (TGA) and TRE variants changing C to G in one or both half-sites. Specific 8-mers are highlighted,
and their sequences are indicated with an arrow head that is strongly
bound by the Zta(N182Q) mutant and that is preferred by the wild-type
Zta. (B) Same as in panel (A) but for Zta(N182S) (y axis). Features and 8-mers are colored into several groups: those
containing the TRE and variants changing C to A. Similarly, some 8-mers features are
highlighted that are preferred by the Zta(N182S) mutant and that are
preferred by the thr wild type. (C) Same as in panel (A) but for Zta(N182T)
(y axis). Features and 8-mers are colored into several
groups.
Binding
to Mono- and Dinucleotides
An intriguing aspect of this analysis
is that median dsDNA binding
is dramatically weakened for Zta(N182Q) and Zta(N1872S). Table S3 highlights that the weakening of median
binding is only observed for Zta(N182Q) and Zta(N182S) binding DNA(C|C)
and DNA(5mCG). To evaluate the reduction in median binding of Zta(N182Q)
and Zta(N182S), we examined binding to mono- and dinucleotides, hypothesizing
that asparagine may bind the adenine mononucleotide or the CA dinucleotide as observed in several protein-dsDNA structures.[12,26,27] Each feature contains a 60-mer
where 24 nucleotides near the glass surface are common to each feature,
and the remaining 36 nucleotides are variable. We counted the occurrences
of each mono- and dinucleotide in the variable 36-mer for the 90%
of the weakest bound features (Figures S12 and S13). For Zta, the more A:T base pairs, the stronger the binding,
saturating at approximately 20 adenines in the random 36-mer. This
is not observed for Zta(N182Q) and Zta(N182S). Dinucleotides containing
adenine except AG have similar shapes to the adenine
mononucleotide (Figure S13). Again, this
is not observed for Zta(N182Q) and Zta(N182S) helping to explain their
lower median binding. Zta(N182V) and Zta(N182I) have a strengthening
in binding with increased adenine, potentially reflecting the hydrophobic
amino acid side interacting with the methyl group of thymine, the
complement of adenine.
Binding to 8-mers (Z-Scores)
We next compared Z-scores
for dsDNA 8-mers, the
approximate length of sequence-specific binding for the bZIP motif.
Zta(N182Q) has higher Z-scores than Zta and binds
variants of the TRE where C is changed to A in one half-site with G being preferred (TGAG) (Figure S8A and Figure A). Zta(N182S) also has higher Z-scores than Zta (Figure B). Zta(N182S) changes binding to motifs where C is changed to G in one or
both halves of the pseudo-palindromic TRE motif (T)[10,18,28] (Figure B). Zta(N182T) has little change in binding specificity
(Figure C).
Sequence Logos
Sequence logos were
derived from the 50 strongest bound 8-mers (Figure ). Logos for binding to DNA(C|C) highlight
that both A and T are important for binding for both Zta and mutants.
Figure 2
Sequence logos
generated for well-bound 8-mers of Zta and the Zta(N182)
mutant for each DNA type. Logos were generated using the top 50 8-mers
(by Z-score) for each PBM experiment.
Sequence logos
generated for well-bound 8-mers of Zta and the Zta(N182)
mutant for each DNA type. Logos were generated using the top 50 8-mers
(by Z-score) for each PBM experiment.
Zta, Zta(N182Q), and Zta(N182S) Binding
Single Nucleotide Variants (SNVs) of Five 7-mers
SNVs of
five 7-mers were examined, the TRE (TGAG) and four 7-mers where C is changed
to either G or A in one half of the motif with either G or C at the center of the motif (Table A and Table S5). For the TRE that is palindromic except for the central
nucleotide, if one replaces the central G with a C, then one simply switches the
values in the two half-sites. Replacing T or A is catastrophic for binding.
Zta binding SNVs of the TRE highlights that binding the two half-sites
is different. With G, there is a modest preference
for C (Z-score = 97) compared
to A (Z-score = 60), while
on the other half of the motif, there is much greater specificity
with G being preferred (Z-score = 97) compared to T (Z-score = 23). The four variant 7-mers highlight how G or C affects binding
to 7-mers with either G or A. The central base pair has little effect on binding TGA, while G is preferred in binding TGA (Z-score = 60) compared to C (Z-score = 23).
Table 1
SNV Tables of Z-Scores
for (A) Zta Binding to the Canonical TRE Motif (TGAG), (B, C) Zta(N182Q) Binding to Two Variants
of the TRE Motif, and (D, E) Zta(N182S) Binding to Two Other Variants
of the TRE Motif
For Zta(N182Q)
(Table B,C and Table S6), we highlight
two 7-mers, TGAG and TGAC, which change C to A with either G or C. The preferential binding
of Zta(N182Q) for A only occurs with G. This change in binding specificity only occurs
on one side of the motif. Starting with TGAG, G (Z-score = 61) is preferred to T (Z-score = 24).For Zta(N182S) (Table D,E and Table S7), we highlight
two 7-mers, TGAG and TGAC. These 7-mers change the TRE C to G with either G or C. For both 7-mers, G is preferred. For both 7-mers, C is also preferred, indicating that Zta(N182S) changes binding
specificity in both halves of the motif. If one half of the TRE motif
changes to G, then G (Z-score = 138) is preferred to C (Z-score = 44).
Zta(N182) Mutants Binding Three Modified dsDNAs
For
dsDNA containing 5mC (M) in one strand, Zta(N182Q),
like Zta, binds strongest to 8-mers containing the C/EBP half-site GC(10,18) (Table S3 and Figure S16C and Figure A). A few 8-mers
better bound by Zta(N182Q) than Zta include AM and M, which
are similar to the oriLyt motif, T(29) (Figure A). Strong binding to 8-mers containing M has also been observed for C/EBP|ATF4 heterodimers.[24] Zta(N182Q) binding to dsDNA with 5hmC|C (H) in one strand or dsDNA(5mCG) is more variable (Figure S9B,C). Zta(N182Q) binds DNA(5hmC|C) less
well than Zta (Figure S9B and Table S3)
with several exceptions, including the 8-mer with the CG dinucleotide AH and the 8-mer
without the CG dinucleotide ATH (Figure B) being preferentially bound. Zta(N182Q) binding to DNA(5mCG) is
weaker than Zta (Figure S9C and Table S3). meZRE2 sequences[17] are poorly bound
by Zta(N182Q), and there are few 8-mers (e.g., GM and ATCAGM)
that are well bound by Zta(N182Q) (Figure C and Figure S18C). Similar results occur with Zta(N182S) (Table S3 and Figures S10A–C and S16B–S18B and Figure A–C).
Figure 3
Zta and Zta(N182Q) mutant binding modified
dsDNA. Scatter graph
comparison of PBM 8-mers Z-scores for wild-type Zta
(x axis) with mutant Zta(N182Q) (y axis) binding to DNA(5mC|C) (A). (B) Same as in panel (A) but for
binding to DNA(5hmC|C). dsDNA features and 8-mers are color coded
in several groups: those containing methylated or hydroxy-methylated
C/EBP half-site GC, TRE (TGA), remaining all other 8-mers with
cytosine and 8-mers without cytosines. (C) Same as in panel (A) but
for binding to DNA(5mCG). dsDNA features and 8-mers are color coded
in three groups: those containing meZRE2 (TGAGMGA), with
methylated CG dinucleotide, and 8-mers without CG dinucleotide (black).
All arrays were scanned with identical laser settings (100PMT), with
the exception of DNA(5mC|C) and DNA(5hmC|C) measurements (50 and 10PMT,
respectively). A selection of 8-mers is highlighted, and their sequences
are provided.
Figure 4
Zta and Zta(N182S) mutant binding modified dsDNA.
Scatter graph
comparison of PBM 8-mers Z-scores for wild-type Zta
(x axis) with mutant Zta(N182S) (y axis) binding to DNA(5mC|C) (A). (B) Same as in panel (A) but for
binding to DNA(5hmC|C). dsDNA features and 8-mers are color coded
in several groups: those containing methylated or hydroxy-methylated
C/EBP half-site GC, TRE (TGA), remaining all other 8-mers with
cytosine and 8-mers without cytosines. (C) Same as in pane l (A) but
for binding to DNA(5mCG). dsDNA features and 8-mers are color coded
in three groups: those containing meZRE2 (TGAGMGA), with
methylated CG dinucleotide, and 8-mers without CG dinucleotide (black).
All arrays were scanned with identical laser settings (100PMT), with
the exception of DNA(5mC|C) and DNA(5hmC|C) measurements (50 and 10PMT,
respectively). A selection of 8-mers is highlighted, and their sequences
are provided.
Zta and Zta(N182Q) mutant binding modified
dsDNA. Scatter graph
comparison of PBM 8-mers Z-scores for wild-type Zta
(x axis) with mutant Zta(N182Q) (y axis) binding to DNA(5mC|C) (A). (B) Same as in panel (A) but for
binding to DNA(5hmC|C). dsDNA features and 8-mers are color coded
in several groups: those containing methylated or hydroxy-methylated
C/EBP half-site GC, TRE (TGA), remaining all other 8-mers with
cytosine and 8-mers without cytosines. (C) Same as in panel (A) but
for binding to DNA(5mCG). dsDNA features and 8-mers are color coded
in three groups: those containing meZRE2 (TGAGMGA), with
methylated CG dinucleotide, and 8-mers without CG dinucleotide (black).
All arrays were scanned with identical laser settings (100PMT), with
the exception of DNA(5mC|C) and DNA(5hmC|C) measurements (50 and 10PMT,
respectively). A selection of 8-mers is highlighted, and their sequences
are provided.Zta and Zta(N182S) mutant binding modified dsDNA.
Scatter graph
comparison of PBM 8-mers Z-scores for wild-type Zta
(x axis) with mutant Zta(N182S) (y axis) binding to DNA(5mC|C) (A). (B) Same as in panel (A) but for
binding to DNA(5hmC|C). dsDNA features and 8-mers are color coded
in several groups: those containing methylated or hydroxy-methylated
C/EBP half-site GC, TRE (TGA), remaining all other 8-mers with
cytosine and 8-mers without cytosines. (C) Same as in pane l (A) but
for binding to DNA(5mCG). dsDNA features and 8-mers are color coded
in three groups: those containing meZRE2 (TGAGMGA), with
methylated CG dinucleotide, and 8-mers without CG dinucleotide (black).
All arrays were scanned with identical laser settings (100PMT), with
the exception of DNA(5mC|C) and DNA(5hmC|C) measurements (50 and 10PMT,
respectively). A selection of 8-mers is highlighted, and their sequences
are provided.Zta(N182I) and Zta(N182V) bind
DNA(5mC|C) with similar specificity
but stronger than Zta (Figures S14A,D and S15A,D). Like Zta, these three mutants strongly bind 8-mers containing
the methylated C/EBP half-site GC (e.g., AMGTGM and MGTGM) (Figure S16D,E). They do not bind the
other modified dsDNAs strongly (Figures S14B,C,E,F, S15B,C,E,F, S17D,E, and S18D,E). A similar result was observed
for Zta(N182T) except DNA(5mCG) where some 8-mers bind dsDNA(5mCG)
similar to Zta (Figure S18F).Sequence
logos derived from the 50 strongest bound 8-mers highlight
the change in binding specificity for the mutants (Figure ). Logos for binding to DNA(C|C)
highlight that both A and T are important for binding for Zta mutants similar to Zta.
Nucleotides closer to the center of the motif are less conserved.
For both 5mC and 5hmC, the tetranucleotide GC is preferred for both Zta and the mutants. For DNA(5mCG), GC is dominant for both Zta and the polar mutants.
For the hydrophobic mutants, there is no CG dinucleotide
in the logos.
Similarities and Differences
of Zta and Mutants
Binding Four Different dsDNAs
We next examined the similarities
and differences for Zta and Zta(N182) mutations binding four types
of dsDNAs using heatmaps summarizing all pairwise correlations between
mutants and dsDNAs (Figures S19 and S20). The most different binding between WT Zta and the mutants is to
unmodified dsDNA DNA(C|C) (Figure S19A).
DNA(5mC|C) is bound similarly by Zta and all five mutants as exhibited
by the high positive correlations obtained across all pairwise comparisons
(Figure S19B). This can be rationalized
because binding to DNA(5mC|C) is dominated by the methylation of C in 8-mers containing GC. Position 182 in Zta is not near methyl C, and changes in amino acids would not be expected to change binding
to methylated GC. For DNA(5hmC|C) and DNA(5mCG)
8-mers, binding specificity aligns with the amino acid properties
of the N182 mutant: the polar amino acids are similar, and the two
hydrophobic amino acids are similar (Figure S19D).When we examine each protein binding the four dsDNAs (Figure S20), DNA(5mC|C) and DNA(5hmC|C) do not
change the binding specificity of the polar side-chain mutants (Figure S20C,D) while this is not the case for
the hydrophobic mutants as exhibited by the low correlation between
these DNA types (Figure S20F). 8-mers containing
DNA(5mCG) changes binding specificity of the mutants with the polar
side chains (no correlation between DNA(5mCG) and all other DNA types; Figure S20C), while it does not for the hydrophobic
side-chain mutants (high correlation between DNA types for hydrophobic
mutants; Figure S20F).Binding comparison
across the two polar and two non-polar substitutions
of Zta(N182) reveals that the two polar mutants Zta(N182S) and Zta(N182Q)
bind similarly with dsDNA with modified cytosines, while with DNA(C|C),
their binding specificity varies (Figure S21A–D). Zta(N182V) and Zta(N182I) bind with similar specificity to all
four types of dsDNA (Figure S21E–H).
Structural Analysis
We performed
a series of all-atom molecular dynamics (MD) simulations to place
these binding data in a structural framework. Initially, nine simulations
were performed, three proteins (Zta, Zta(N182Q), and Zta(N182S)) each
binding three dsDNA sequences. We started with the X-ray structure
of Zta bound to dsDNA containing the TRE (T) (PDB ID: 2C9N).[9] Then, we changed the dsDNA sequence to (TT) and (TC)
where C is replaced with A or G in both halves of the pseudo-palindrome
motif. The frequencies of hydrogen bonds between the N182, N182Q,
and N182S side chains and the three dsDNA sequences over the microsecond
trajectories are shown in Table . The rows are delimited by the three amino acid side
chains and the columns by C, A, and G. Mostly, only hydrogen
bonds with at least a 10% frequency in one of the three trajectories
are considered.
Table 2
Hydrogen Bond Frequencies Among the
N182, N182Q, and N182S Zta Proteins with C, A, and G Forms
of TRE dsDNA over 1 ms Molecular Dynamics Trajectoriesa
DNA
protein
C3
A3
G3
Zta(N182)
Y OD1 – A C3 H4 0.95
Y OD1 – A A3 H6 0.53
Y HD2 – A G3 N7 0.14
Y OD1 – B C3 H4 0.68
Z OD1 – B A3 H6 0.44
Y HD2 – B G3 N7 0.15
Y HD2 – A A3 N7 0.65
Y HD2 – B A3 N7 0.64
Y HD2
– B T–4 O4 0.88
Y HD2 – B T–4 O4 0.10
Z HD2 – A G–4 O4 0.57
Y HD2 – A T–4 O4 0.01
Zta(N182Q)
Y OE1 – A C3 H4 0.19
Y OE1 – A A3 H6 0.04
Y HE2 – A G3 O6 0.13
Z OE1 – B C3 H4 0.16
Z OE1 – B A3 H6 0.004
Z HE2 – B G3 O6 0.19
Y HE2 – A A3 N7 0.32
Y HE2 – A G3 N7 0.30
Z HE2 – B A3 N7
0.28
Z HE2 – B G3 N7
0.41
Y HE2 – A C–4 O4 0.47
Y HE2 – B T–4 O4 0.43
Y HE2 – B T–4 O4 0.53
Z HE2
– A T–4 O4 0.14
Z HE2 – A T–4 O4 0.35
Z HE2 – A T–4 O4 0.67
Y OE1 – B A–5 H6 0.29
Y OE1 – B A–5 H6 0.33
Y OE1 – B A–5 H6 0.36
Y OE1 – A C–5 H4 0.14
Z OE1 – A C–5 H4 0.26
Z OE1 –
A C–5 H4 0.55
Y HE2 – B A–5 N7 0.24
Zta(N182S)
Y OG – A C3 H4 0.20
Y OG – A A3 H6 0.03
Y OG – B C3 H4 0.15
Y OG – B A3 H6 0.61
Y
HG1 – B A3 N7 0.57
In accordance with the starting
crystal structure (PDB ID: 2C9N) notation, the two protein strands are indicated by
Y and Z and the two DNA strands by A and B. Except in a few cases
to show the asymmetry, only hydrogen bonds with a minimum frequency
of 0.1 are shown.
In accordance with the starting
crystal structure (PDB ID: 2C9N) notation, the two protein strands are indicated by
Y and Z and the two DNA strands by A and B. Except in a few cases
to show the asymmetry, only hydrogen bonds with a minimum frequency
of 0.1 are shown.Following
the format of the crystal structure, the two protein
segments are designated by Y and Z and the two single-stranded DNA
(ssDNA) chains by A and B; protein Y primarily binds ssDNA chain A,
and Z primarily binds B. As a control, two additional simulations
that started from the end of the trajectories of the Zta(N182Q) and
Zta(N182S) bound to the TRE. Despite different starting side-chain
configurations from the crystal structure, the asparagine side chains
of both Zta(N182Q) and Zta(N182S) quickly reorientated and regained
the hydrogen bonding profile of the wild-type system.The upper
left-hand portion of Table shows that asparagine in both monomers of
Zta makes two hydrogen bonds: (i) the asparagine side-chain oxygen
(OD1) with one of the N4-bound hydrogens (H4) of C and (ii) one of the side chain ND2-bound hydrogens (HD2)
of asparagine with the O4 oxygen of the T base on the opposite strand. These are the same hydrogen
bonds that occur in the starting crystal structure, which attests
to the stability of the simulation. They have also been described
in an independent X-ray study of cocrystals.[10] A typical snapshot of this groove-spanning configuration is shown
in Figure S22A. These two pairs of hydrogen
bonds persist throughout most of the trajectory (with frequencies
of 0.95 and 0.88 for one half-site and 0.68 and 0.57 for the other).
An interesting observation is that the frequencies are greater for
one half-site than the other. This may indirectly reflect the asymmetry
at the center of the motif, with C in one
DNA strand and G in the other strand.As indicated in Figure and Table A, the
next most stable 8-mer bound by Zta is TGAGTA, with a Z-score of 60 vs 97 for binding
the TRE. As seen in Table , this substitution of C for A causes a change in the hydrogen bonding pattern
of the asparagine from groove-spanning to a bidentate interaction
with A. The OD1 and HD2 atoms of the asparagine
side chain now bind with the H6 and N7 atoms of the adenine, respectively.
A typical snapshot of this configuration is shown in Figure S22B. As seen in the top row right column of Table , the asparagine does
not form any long-lasting hydrogen bonds to dsDNA containing G, consistent with the relatively low Z-scores of 12 and 10 for the two half-site complexes (Table A).The second
row in Table shows
the results of the simulations of Zta(N182Q) bound
to three dsDNA sequences. Glutamine in Zta(N182Q) bound to the TRE
(first column) forms weaker hydrogen bonds with C and T than asparagine. Glutamine
in the Zta(N182Q)/DNA-A complex has stronger
interactions with T and a strong
interaction with A helping to explain the
change in binding specificity. Interestingly, the strength of the
Zta(N182Q)/DNA-A complex is also found to
be dependent on whether the central base of the motif is G or C (with Z-scores of 61 vs 19, respectively). Like asparagine, the one methylene
longer glutamine can form bidentate hydrogen bonds with adenine,[12,26,27] but in these simulations, glutamine
like asparagine is forming hydrogen bonds, though less frequently,
with both C and T.We conducted an additional series of MD simulations
of Zta(N182Q)
in the complex with the asymmetric motifs TGAG and TGAC that
are strongly bound. Taking into account the complementary sequences,
this provided four unique half-site complexes: Zta(N182Q) with GA, GA, C, and C. As seen in Table S8,
the results are consistent with the experimental data. In particular,
the greatest hydrogen bonding residence frequencies occur for Zta(N182Q)
binding the G half-site (Y
HE2-B T O4 0.57 and Y HE2-A with A, N7 0.46), even greater than the symmetric sequence
containing A in both half-sites (strongest
monomer data: Y HE2-B T O4 0.43 and
Y HE2-A with A, N7 0.32). Replacing the central G base with C results
in a reduction in hydrogen bonds for both C and C.The third row of Table presents the simulation results for Zta(N182S) binding three
dsDNA sequences. Experimentally, Zta(N182S) binds strongest to dsDNA
with G and G at
the center (i.e., the G half-site)
with a Z-score of 138. This is not clearly reflected
in the hydrogen bonding profiles. Rather, the only significantly long-lasting
interactions are two, bidentate hydrogen bonds between the serine
and A in one half-site. Two infrequent, single
hydrogen bonds occur with C in both half-sites,
and no significantly resident hydrogen bonds are observed with the G DNA. Given that the serine side chain is shorter
than asparagine, it is not surprising that it does not reach as far
into the major groove to interact with the nucleotide bases. This
lack of bonding could explain why the alpha helical basic region of
the Zta(N182S) protein moved away from the DNA over the course of
the trajectory. This raises the possibility that the specificity is
derived not from hydrogen bonds with the base pair containing G but from entropy. It should be noted that the
paucity of interactions seen here is in contrast to several examples
of serine residues interacting with guanine and other nucleotides
in the X-ray structures of other protein/DNA complexes.[12,26,27]
Discussion
We used a PBM platform[25] to measure
the bZIP domain of Zta and N182 mutants (Q, S, T, V, and I) binding
to four types of dsDNA: cytosine in both strands, 5mC or 5hmC in one
dsDNA strand and cytosine in the second strand,[20] and 5mC in both cytosines of all CG dinucleotides.[24] The mutants were selected to have side chains
with a similar size to asparagine with either polar (serine, glutamine,
and threonine) or hydrophobic (isoleucine and valine) properties.
The greatest change in sequence-specific dsDNA binding was to unmodified
dsDNA by Zta(N182Q) and Zta(N182S). These two mutants also had weak
median binding, suggestive of increased binding specificity. We observe
similarities and differences between mutants in binding DNAs containing
modified cytosines.A molecular dynamics evaluation of Zta binding
dsDNA indicates
that asparagine 182 forms two hydrogen bonds with nucleotides in the
major groove of dsDNA, as suggested by X-ray structures.[10] Zta(N182Q) preferentially binds A, and the molecular dynamics simulations indicate that
glutamine forms hydrogen bonds with A. This
type of interaction has been described for both asparagine and glutamine
interacting with adenine.[11] Zta(N182S)
preferentially binds G, but no interactions
are identified in our molecular dynamics simulations. Entropic forces
may be dominating the preferential interaction with G. Calorimetric experiments indicate that the bZIP domain of
GCN4 binding to the TRE (TCA) is enthalpic, while binding to the CRE TCACGTGA has a stronger entropic effect,[30−32] highlighting the complexity
of the issue of sequence-specific dsDNA binding.Potentially,
the new dsDNA binding specificity is the result of
avoiding interactions with the nucleotides, thus driving binding specificity.
Negative design is a term that captures this avoidance property.[33,34] Negative design is a concept in protein folding in which an unfavorable
interaction in the disordered state drives folding.[35,36] The use of negative design to create binding specificity can also
explain some protein–protein interactions. Both attractive
and repulsive interactions are observed between charged amino acids
in the e and g positions of the heptads
in the leucine zipper coiled coil that lie over the hydrophobic core.[5] Repulsive energies are stronger than attractive
energies (coupling energy).[37] Negative
design is also observed in the hydrophobic core of leucine zipper
coiled coil.[38] These two negative design
parameters help create the distinct interaction properties of bZIP
proteins in Drosophila(39) and Arabidopsis.[40]Sequence-specific dsDNA binding of proteins was initially suggested
to involve hydrogen bonds between amino acid side chains and the nucleotides
in the major groove,[41] creating a direct
readout or code.[42] Subsequently, the X-ray
co-crystal structure of the Trp repressor/operator did not reveal
direct interactions between the amino acid side chains and nucleotides
but instead interactions between the protein and the backbone of DNA
that indirectly drove sequence-specific dsDNA binding.[43] A more recent work has shown that enthalpic
or entropic forces can drive sequence-specific dsDNA binding.[44] The specificity of protein-dsDNA interactions
has been examined for many protein domains.[45−47] These studies
typically mutate the protein and probe for new dsDNA binding activity.
The PBM platform allows direct examination of dsDNA binding to thousands
of sequences, but repulsive interactions are hard to identify. A potential
example of the negative design in protein-dsDNA interactions is the
expansion of C2H2-ZF TFs in metazoans.[48] The more ancient zinc fingers have more interactions with the nucleotides,
while more recent zinc fingers have more interactions with the DNA
backbone. This allows base-contacting specificity residues to mutate
without a catastrophic loss of DNA binding. This overdesign in binding
affinity to the backbone allows interactions with the nucleotides
to be repulsive, potentially generating a kaleidoscope of dsDNA binding
specificity.In summary, changing asparagine Zta(182) to either
glutamine or
serine creates new dsDNA binding specificity to unmodified dsDNA.
These Zta(N182) mutants could be interesting biological probes to
explore how different sequences in the genome are accessed.
Experimental Procedures
Cloning, Expression, and
Single Amino Acid
Mutations of the Zta bZIP DNA-Binding Domain
The N-terminal
GST fusion construct of the wild-type DNA binding domain of viral
bZIP protein Zta with approximately 50 flanking amino acids was obtained
in a modified pDEST15 MAGIC vector.[49] The
cloned amino acid sequence of the Zta bZIP domain is shown here with
the alpha helical DNA binding region in bold and N182 underlined.Zta: STVQTAAAVVFACPGANQGQQLADIGVPQPAPVAAPARRTRKPQQPESLEECDSELEIKRYKHYREVAAAKSSENDRLRLLLKQMCPSLDVDSIIPRTPDVLHEDLLNF.
We chose replacement amino acids that were of comparable size to N182
(https://www.genome.jp/dbget-bin/www_bget?aaindex:CHOC760101). Mutant constructs of Zta (N182Q, N182S, N182I, N182V, and N182T)
were generated by site-directed mutagenesis of the GST-fused wild-type
Zta plasmid construct (GenScript, USA). Zta constructs were expressed
using a PURExpress In vitro protein synthesis kit
(NEB) previously described.[24] Similar amounts
of protein synthesis were confirmed by western blot using an anti-GST-HRP
conjugated antibody (Figure S2).
PBM Experiments
We used the “HK”
array design (NCBI Gene Expression Omnibus (GEO) platform, GPL11260)
for all PBM experiments. Single-stranded DNA 60-mers on the array
were double-stranded using T7 DNA polymerase using either cytosine,
5mC (NEB), or 5hmC (Zymo Research).[20] Enzymatic
methylation of CG dinucleotides on PBMs (DNA(5mCG))[24] and protein binding experiments[18,19,50−52] were performed as described
previously.
Fluorescence Extraction
and Data Processing
For each PBM, a microarray image was
generated using an Agilent
Sure Scan II scanner and analyzed using the ImaGene extraction software
(BioDiscovery Inc.). Data quantification and Z-score
calculations were performed as previously described.[18,53] All mutant proteins bound dsDNA strongly (Figure S3). Each protein was assayed with replicates in good agreement
(R > 0.8) (Table S1 and Figures S4–S7). Arrays with
the fewest
saturated spots were used for further analysis. Z-scores for 5mC, 5mCG, and 5hmC PBM data were rescaled relative to
unmodified cytosine using the slope of the line of the best fit computed
from the Z-scores using 8-mers without cytosine (Table S2). Data (raw probe intensities and 8-mer Z-scores) are available at the NCBI GEO database under accession
numbers GSE115351 and GSE115352.
Molecular
Dynamics Simulations
All
simulations were based on the PDB ID: 2C9N crystal structure.[9] Mutations of the N182 amino acid side chain and C3 nitrogenous
base were done with UCSF-Chimera 1.13.1.[54] Molecular dynamics trajectories were generated with NAMD 2.13b2[55] using the CHARMM36 all-hydrogen topologies and
parameters.[56] The protein/DNA complexes
were centered in a periodic-boundary cell of 94 × 60 × 49
Å initial dimensions, which allowed for a minimum 12 Å gap
to any wall. VMD-1.9.3[57] was used to fill
the surrounding space with TIPS3P-model waters and sodium and chloride
ions necessary to neutralize the system and provide a salt concentration
of 150 mM. Electrostatic and VDW energy functions were calculated
with a CUTOFF of 12 Å, a SWITCHDIST of 10 Å, and a PAIRLISTDIST
of 14 Å. The Langevin and Langevin–Piston algorithms were
used to maintain the system temperature and pressure at 300.0 °K
and 1.0 ATM, respectively, with the defaults for the other parameters.
Rigid bonds were used to allow for an integration timestep of 2 fs.
Trajectories were run for a minimum of 1000 ns, with snapshots collected
every 10 ps. Hydrogen bonds and other interactions were studied with
CHARMM.[58]
Authors: Syed Khund-Sayeed; Ximiao He; Timothy Holzberg; Jun Wang; Divya Rajagopal; Shriyash Upadhyay; Stewart R Durell; Sanjit Mukherjee; Matthew T Weirauch; Robert Rose; Charles Vinson Journal: Integr Biol (Camb) Date: 2016-08-03 Impact factor: 2.192
Authors: M Suckow; K Schwamborn; B Kisters-Woike; B von Wilcken-Bergmann; B Müller-Hill Journal: Nucleic Acids Res Date: 1994-10-25 Impact factor: 16.971
Authors: Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell Journal: J Chem Theory Comput Date: 2012-07-18 Impact factor: 6.006
Authors: Christopher D Deppmann; Asha Acharya; Vikas Rishi; Barry Wobbes; Sjef Smeekens; Elizabeth J Taparowsky; Charles Vinson Journal: Nucleic Acids Res Date: 2004-06-29 Impact factor: 16.971