Literature DB >> 35036684

Altering the Double-Stranded DNA Specificity of the bZIP Domain of Zta with Site-Directed Mutagenesis at N182.

Sreejana Ray¹, Desiree Tillo^1,2, Nima Assad¹, Aniekanabasi Ufot¹, Aleksey Porollo^3,4, Stewart R Durell⁵, Charles Vinson¹.

Abstract

Zta, the Epstein-Barr virus bZIP transcription factor (TF), binds both unmethylated and methylated double-stranded DNA (dsDNA) in a sequence-specific manner. We studied the contribution of a conserved asparagine (N182) to sequence-specific dsDNA binding to four types of dsDNA: (i) dsDNA with cytosine in both strands ((DNA(C|C)), (ii, iii) dsDNA with 5-methylcytosine (5mC, M) or 5-hydroxymethylcytosine (5hmC, H) in one strand and cytosine in the second strand ((DNA(5mC|C) and DNA(5hmC|C)), and (iv) dsDNA with methylated cytosine in both strands in all CG dinucleotides ((DNA(5mCG)). We replaced asparagine with five similarly sized amino acids (glutamine (Q), serine (S), threonine (T), isoleucine (I), or valine (V)) and used protein binding microarrays to evaluate sequence-specific dsDNA binding. Zta preferentially binds the pseudo-palindrome TRE (AP1) motif (T-4G-3A-2G/C 0T2C3A4 ). Zta (N182Q) changes binding to A3 in only one half-site. Zta(N182S) changes binding to G3 in one or both halves of the motif. Zta(N182S) and Zta(N182Q) have 34- and 17-fold weaker median dsDNA binding, respectively. Zta(N182V) and Zta(N182I) have increased binding to dsDNA(5mC|C). Molecular dynamics simulations rationalize some of these results, identifying hydrogen bonds between glutamine and A3 , but do not reveal why serine preferentially binds G3 , suggesting that entropic interactions may mediate this new binding specificity.

Entities: Chemical

Year: 2021 PMID： 35036684 PMCID： PMC8756438 DOI： 10.1021/acsomega.1c04148

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

The bZIP motif occurs in a family of eukaryotic proteins with sequence-specific dsDNA binding properties.[1,2] The bZIP domain is a long bipartite alpha helix.[1] The C-terminal is a leucine zipper coiled-coil region that homodimerizes and/or heterodimerizes.[3−7] The N-terminal interacts with the major groove of dsDNA in a sequence-specific manner. Five conserved amino acids in the bZIP dsDNA-binding region interact with the nucleotides (NXXASXXCR).[8−10] We will focus on the invariant asparagine (N). Asparagine can form two hydrogen bonds with adenine.[11,12] In the bZIP motif, the asparagine[2] forms two hydrogen bonds with adjacent nucleotides on opposite strands of dsDNA. In all X-ray structures where the bZIP dimer binds the T trinucleotide, including the CREB complex with the CRE dsDNA palindrome (T),[13] and GCN4,[3] Fos-Jun,[4] and Zta[10] binding the shorter pseudo-palindrome TRE (T), the asparagine side-chain carbonyl oxygen accepts a hydrogen from N4 of C and the amide nitrogen donates a hydrogen to O4 of T on the opposite strand.[14] Asparagine in the bZIP domain of CEBPA binding to dsDNA recognizes the A dinucleotide in the motif T.[15] Asparagine has a hydrogen bond interaction with O4 of T as observed for all the CRE and TRE structures, but now, the asparagine side-chain carbonyl oxygen accepts a hydrogen from the N6 amino group of A that is stereochemically similar to N4 of C.[15] The Pap1-dsDNA complex binds the T dinucleotide (TAACGTT),[8] and the asparagine does not contact any of the nucleotides. Figure S1 describes the interaction of these bZIP domains with their preferred DNA motifs and illustrates the differences among them. Mutagenesis experiments using a few dsDNA sequences have determined that the conserved asparagine contributes to sequence-specific dsDNA binding.[16] In addition, recent studies show that Zta binds dsDNA with modified cytosine. Zta binds many TRE variants with methylated cytosine (M) replacing thymine at two positions (M and M).[10,17] Zta also binds dsDNA with the methylated C/EBP half-site (GM) or its oxidative product 5-hydroxymethylcytosine (5hmC, H)[10,18] or 5-formylcytosine,[19] similar to CREB1.[20] The impact of the conserved asparagine on biding to methylated cytosine is unknown. We have extended these studies and examined five mutants of the conserved asparagine (Zta(N182Q), Zta(N182S), Zta(N182I), Zta(N182V), and Zta(N182T)) binding four types of dsDNA including DNAs containing modified cytosines using protein binding microarrays (PBMs).[18,21]

Results

PBMs with Four Types of dsDNA and Data Analysis

Protein binding microarray (PBM) experiments used Agilent DNA microarrays containing 40,330 different single-stranded DNA 60-mers[22] in 16 sectors on a glass microarray slide.[18] We examined the sequence-specific dsDNA binding of six GST-Zta chimeric constructs, Zta and five single amino acid mutants (Q, S, I, V, or T) of Zta(N182) to four types of dsDNA. T7 DNA polymerase was used to generate three types of dsDNAs: (i) dsDNA with cytosine on both strands (DNA(C|C)) and (ii, iii) dsDNA with 5mC or 5hmC in one strand and cytosine in the second strand ((DNA(5mC|C) and DNA(5hmC|C)).[18,20,23] The fourth type of dsDNA was generated by enzymatic methylation of both cytosines in all CG dinucleotides ((DNA(5mCG)).[24] Equal amounts of in vitro synthesized chimeric protein containing GST and bZIP domains (Figure S2) were bound to these four types of dsDNA.[18] Binding was detected using a Cy5 conjugated antibody to the GST epitope followed by the measurement of fluorescence intensities at each of the array features. We evaluated the PBM data in five ways. First, for an overall quantitative measure of binding, we examined the fluorescence intensities of the 40,330 features in each of the 16 sectors on the glass slide. Second, we examined the dependence on nucleotide and dinucleotide composition on binding. Third, we determined a standardized score (Z-score)[25] for binding 8 bp long dsDNA sequences (8-mers). The Z-score for a given 8-mer is the number of standard deviations of the intensity of that 8-mer (computed from all features containing the 8-mer) that is from the global median intensity (computed from all array features). This measure is sensitive to changes in median binding. Fourth, we generated dsDNA logos[18] using the 50 strongest bound 8-mers. Fifth, we examined the effects of single nucleotide variants (SNVs) on binding the canonical TRE (T) motif 7-mer and related 7-mers containing one or no cytosines. This allows us to determine the contribution to binding of all nucleotides, including modified cytosines, at each position in the motif.[18] We will first describe the results for each of the five methods to binding unmodified dsDNA and then binding to three dsDNAs containing modified cytosine.

Zta(N182) Mutants Binding with Unmodified dsDNA(C|C)

Binding to 40,330 Features Containing dsDNA(C|C)

We compared previously published data for Zta binding to unmodified dsDNA[18] and newly generated data for Zta and five Zta(N182) mutants. Figure S8A presents newly determined data for Zta (x axis), and Zta(N182Q) (y axis) binding the 40,330 dsDNA features on the array showing maximal Zta(N182Q) binding is about half less than Zta, with median binding (a measure of non-specific binding) being 17-fold weaker (Table S3). Zta(N182Q) binds some features stronger than Zta, indicative of new sequence-specific dsDNA binding (e.g., GTGAGTAA, TGAGTAAT, etc.) (Figure S8A and Figure A). Zta(N182S) binding is about half that of Zta, with median binding being 34-fold weaker (Figure S8B and Table S3). Again, some features are more strongly bound by Zta(N182S) (e.g., TCACTCAT, ATCACTCA, ATGAGTGA, ATCACTGA, TCAGTGAC, etc.) than Zta, indicating new sequence-specific dsDNA binding (Figure S8B and Figure B). Zta(N182T) binds unmodified dsDNA features 3.3-fold weaker than Zta but with similar specificity (Figure S8C and Figure C). Non-polar mutant Zta(N182I) and Zta(N182V) median binding to unmodified dsDNA are similar to and 2.6-fold stronger than Zta, respectively (Table S3). However, they do not bind with specificity (Figure S11A–D), indicative of non-specific binding and consistent with the importance of hydrogen bonds or other electrically polarized interactions for affinity. The decreased median bindings for Zta(N182Q) and Zta(N182S) contribute to the increased skew in the distribution of binding intensities, indicating increased specific dsDNA binding (Figure S3 and Table S4).

Figure 1

Zta, Zta(N182Q), Zta(N182S), and Zta(N182T) mutants binding to DNA(C|C). (A) Scatter graph comparison of PBM 8-mer Z-scores of Zta (x axis) with Zta(N182Q) (y axis) dsDNA features containing cytosine in both strands (DNA(C|C)). dsDNA features and 8-mers are colored into several groups: those containing the TRE (TGA) and TRE variants changing C to G in one or both half-sites. Specific 8-mers are highlighted, and their sequences are indicated with an arrow head that is strongly bound by the Zta(N182Q) mutant and that is preferred by the wild-type Zta. (B) Same as in panel (A) but for Zta(N182S) (y axis). Features and 8-mers are colored into several groups: those containing the TRE and variants changing C to A. Similarly, some 8-mers features are highlighted that are preferred by the Zta(N182S) mutant and that are preferred by the thr wild type. (C) Same as in panel (A) but for Zta(N182T) (y axis). Features and 8-mers are colored into several groups.

Binding to Mono- and Dinucleotides

An intriguing aspect of this analysis is that median dsDNA binding is dramatically weakened for Zta(N182Q) and Zta(N1872S). Table S3 highlights that the weakening of median binding is only observed for Zta(N182Q) and Zta(N182S) binding DNA(C|C) and DNA(5mCG). To evaluate the reduction in median binding of Zta(N182Q) and Zta(N182S), we examined binding to mono- and dinucleotides, hypothesizing that asparagine may bind the adenine mononucleotide or the CA dinucleotide as observed in several protein-dsDNA structures.[12,26,27] Each feature contains a 60-mer where 24 nucleotides near the glass surface are common to each feature, and the remaining 36 nucleotides are variable. We counted the occurrences of each mono- and dinucleotide in the variable 36-mer for the 90% of the weakest bound features (Figures S12 and S13). For Zta, the more A:T base pairs, the stronger the binding, saturating at approximately 20 adenines in the random 36-mer. This is not observed for Zta(N182Q) and Zta(N182S). Dinucleotides containing adenine except AG have similar shapes to the adenine mononucleotide (Figure S13). Again, this is not observed for Zta(N182Q) and Zta(N182S) helping to explain their lower median binding. Zta(N182V) and Zta(N182I) have a strengthening in binding with increased adenine, potentially reflecting the hydrophobic amino acid side interacting with the methyl group of thymine, the complement of adenine.

Binding to 8-mers (Z-Scores)

We next compared Z-scores for dsDNA 8-mers, the approximate length of sequence-specific binding for the bZIP motif. Zta(N182Q) has higher Z-scores than Zta and binds variants of the TRE where C is changed to A in one half-site with G being preferred (TGAG) (Figure S8A and Figure A). Zta(N182S) also has higher Z-scores than Zta (Figure B). Zta(N182S) changes binding to motifs where C is changed to G in one or both halves of the pseudo-palindromic TRE motif (T)[10,18,28] (Figure B). Zta(N182T) has little change in binding specificity (Figure C).

Sequence Logos

Sequence logos were derived from the 50 strongest bound 8-mers (Figure ). Logos for binding to DNA(C|C) highlight that both A and T are important for binding for both Zta and mutants.

Figure 2

Sequence logos generated for well-bound 8-mers of Zta and the Zta(N182) mutant for each DNA type. Logos were generated using the top 50 8-mers (by Z-score) for each PBM experiment.

Zta, Zta(N182Q), and Zta(N182S) Binding Single Nucleotide Variants (SNVs) of Five 7-mers

SNVs of five 7-mers were examined, the TRE (TGAG) and four 7-mers where C is changed to either G or A in one half of the motif with either G or C at the center of the motif (Table A and Table S5). For the TRE that is palindromic except for the central nucleotide, if one replaces the central G with a C, then one simply switches the values in the two half-sites. Replacing T or A is catastrophic for binding. Zta binding SNVs of the TRE highlights that binding the two half-sites is different. With G, there is a modest preference for C (Z-score = 97) compared to A (Z-score = 60), while on the other half of the motif, there is much greater specificity with G being preferred (Z-score = 97) compared to T (Z-score = 23). The four variant 7-mers highlight how G or C affects binding to 7-mers with either G or A. The central base pair has little effect on binding TGA, while G is preferred in binding TGA (Z-score = 60) compared to C (Z-score = 23).

Table 1

SNV Tables of Z-Scores for (A) Zta Binding to the Canonical TRE Motif (TGAG), (B, C) Zta(N182Q) Binding to Two Variants of the TRE Motif, and (D, E) Zta(N182S) Binding to Two Other Variants of the TRE Motif

For Zta(N182Q) (Table B,C and Table S6), we highlight two 7-mers, TGAG and TGAC, which change C to A with either G or C. The preferential binding of Zta(N182Q) for A only occurs with G. This change in binding specificity only occurs on one side of the motif. Starting with TGAG, G (Z-score = 61) is preferred to T (Z-score = 24). For Zta(N182S) (Table D,E and Table S7), we highlight two 7-mers, TGAG and TGAC. These 7-mers change the TRE C to G with either G or C. For both 7-mers, G is preferred. For both 7-mers, C is also preferred, indicating that Zta(N182S) changes binding specificity in both halves of the motif. If one half of the TRE motif changes to G, then G (Z-score = 138) is preferred to C (Z-score = 44).

Zta(N182) Mutants Binding Three Modified dsDNAs

For dsDNA containing 5mC (M) in one strand, Zta(N182Q), like Zta, binds strongest to 8-mers containing the C/EBP half-site GC(10,18) (Table S3 and Figure S16C and Figure A). A few 8-mers better bound by Zta(N182Q) than Zta include AM and M, which are similar to the oriLyt motif, T(29) (Figure A). Strong binding to 8-mers containing M has also been observed for C/EBP|ATF4 heterodimers.[24] Zta(N182Q) binding to dsDNA with 5hmC|C (H) in one strand or dsDNA(5mCG) is more variable (Figure S9B,C). Zta(N182Q) binds DNA(5hmC|C) less well than Zta (Figure S9B and Table S3) with several exceptions, including the 8-mer with the CG dinucleotide AH and the 8-mer without the CG dinucleotide ATH (Figure B) being preferentially bound. Zta(N182Q) binding to DNA(5mCG) is weaker than Zta (Figure S9C and Table S3). meZRE2 sequences[17] are poorly bound by Zta(N182Q), and there are few 8-mers (e.g., GM and ATCAGM) that are well bound by Zta(N182Q) (Figure C and Figure S18C). Similar results occur with Zta(N182S) (Table S3 and Figures S10A–C and S16B–S18B and Figure A–C).

Figure 3

Figure 4

Zta and Zta(N182S) mutant binding modified dsDNA. Scatter graph comparison of PBM 8-mers Z-scores for wild-type Zta (x axis) with mutant Zta(N182S) (y axis) binding to DNA(5mC|C) (A). (B) Same as in panel (A) but for binding to DNA(5hmC|C). dsDNA features and 8-mers are color coded in several groups: those containing methylated or hydroxy-methylated C/EBP half-site GC, TRE (TGA), remaining all other 8-mers with cytosine and 8-mers without cytosines. (C) Same as in pane l (A) but for binding to DNA(5mCG). dsDNA features and 8-mers are color coded in three groups: those containing meZRE2 (TGAGMGA), with methylated CG dinucleotide, and 8-mers without CG dinucleotide (black). All arrays were scanned with identical laser settings (100PMT), with the exception of DNA(5mC|C) and DNA(5hmC|C) measurements (50 and 10PMT, respectively). A selection of 8-mers is highlighted, and their sequences are provided.

Zta and Zta(N182Q) mutant binding modified dsDNA. Scatter graph comparison of PBM 8-mers Z-scores for wild-type Zta (x axis) with mutant Zta(N182Q) (y axis) binding to DNA(5mC|C) (A). (B) Same as in panel (A) but for binding to DNA(5hmC|C). dsDNA features and 8-mers are color coded in several groups: those containing methylated or hydroxy-methylated C/EBP half-site GC, TRE (TGA), remaining all other 8-mers with cytosine and 8-mers without cytosines. (C) Same as in panel (A) but for binding to DNA(5mCG). dsDNA features and 8-mers are color coded in three groups: those containing meZRE2 (TGAGMGA), with methylated CG dinucleotide, and 8-mers without CG dinucleotide (black). All arrays were scanned with identical laser settings (100PMT), with the exception of DNA(5mC|C) and DNA(5hmC|C) measurements (50 and 10PMT, respectively). A selection of 8-mers is highlighted, and their sequences are provided. Zta and Zta(N182S) mutant binding modified dsDNA. Scatter graph comparison of PBM 8-mers Z-scores for wild-type Zta (x axis) with mutant Zta(N182S) (y axis) binding to DNA(5mC|C) (A). (B) Same as in panel (A) but for binding to DNA(5hmC|C). dsDNA features and 8-mers are color coded in several groups: those containing methylated or hydroxy-methylated C/EBP half-site GC, TRE (TGA), remaining all other 8-mers with cytosine and 8-mers without cytosines. (C) Same as in pane l (A) but for binding to DNA(5mCG). dsDNA features and 8-mers are color coded in three groups: those containing meZRE2 (TGAGMGA), with methylated CG dinucleotide, and 8-mers without CG dinucleotide (black). All arrays were scanned with identical laser settings (100PMT), with the exception of DNA(5mC|C) and DNA(5hmC|C) measurements (50 and 10PMT, respectively). A selection of 8-mers is highlighted, and their sequences are provided. Zta(N182I) and Zta(N182V) bind DNA(5mC|C) with similar specificity but stronger than Zta (Figures S14A,D and S15A,D). Like Zta, these three mutants strongly bind 8-mers containing the methylated C/EBP half-site GC (e.g., AMGTGM and MGTGM) (Figure S16D,E). They do not bind the other modified dsDNAs strongly (Figures S14B,C,E,F, S15B,C,E,F, S17D,E, and S18D,E). A similar result was observed for Zta(N182T) except DNA(5mCG) where some 8-mers bind dsDNA(5mCG) similar to Zta (Figure S18F). Sequence logos derived from the 50 strongest bound 8-mers highlight the change in binding specificity for the mutants (Figure ). Logos for binding to DNA(C|C) highlight that both A and T are important for binding for Zta mutants similar to Zta. Nucleotides closer to the center of the motif are less conserved. For both 5mC and 5hmC, the tetranucleotide GC is preferred for both Zta and the mutants. For DNA(5mCG), GC is dominant for both Zta and the polar mutants. For the hydrophobic mutants, there is no CG dinucleotide in the logos.

Similarities and Differences of Zta and Mutants Binding Four Different dsDNAs

We next examined the similarities and differences for Zta and Zta(N182) mutations binding four types of dsDNAs using heatmaps summarizing all pairwise correlations between mutants and dsDNAs (Figures S19 and S20). The most different binding between WT Zta and the mutants is to unmodified dsDNA DNA(C|C) (Figure S19A). DNA(5mC|C) is bound similarly by Zta and all five mutants as exhibited by the high positive correlations obtained across all pairwise comparisons (Figure S19B). This can be rationalized because binding to DNA(5mC|C) is dominated by the methylation of C in 8-mers containing GC. Position 182 in Zta is not near methyl C, and changes in amino acids would not be expected to change binding to methylated GC. For DNA(5hmC|C) and DNA(5mCG) 8-mers, binding specificity aligns with the amino acid properties of the N182 mutant: the polar amino acids are similar, and the two hydrophobic amino acids are similar (Figure S19D). When we examine each protein binding the four dsDNAs (Figure S20), DNA(5mC|C) and DNA(5hmC|C) do not change the binding specificity of the polar side-chain mutants (Figure S20C,D) while this is not the case for the hydrophobic mutants as exhibited by the low correlation between these DNA types (Figure S20F). 8-mers containing DNA(5mCG) changes binding specificity of the mutants with the polar side chains (no correlation between DNA(5mCG) and all other DNA types; Figure S20C), while it does not for the hydrophobic side-chain mutants (high correlation between DNA types for hydrophobic mutants; Figure S20F). Binding comparison across the two polar and two non-polar substitutions of Zta(N182) reveals that the two polar mutants Zta(N182S) and Zta(N182Q) bind similarly with dsDNA with modified cytosines, while with DNA(C|C), their binding specificity varies (Figure S21A–D). Zta(N182V) and Zta(N182I) bind with similar specificity to all four types of dsDNA (Figure S21E–H).

Structural Analysis

We performed a series of all-atom molecular dynamics (MD) simulations to place these binding data in a structural framework. Initially, nine simulations were performed, three proteins (Zta, Zta(N182Q), and Zta(N182S)) each binding three dsDNA sequences. We started with the X-ray structure of Zta bound to dsDNA containing the TRE (T) (PDB ID: 2C9N).[9] Then, we changed the dsDNA sequence to (TT) and (TC) where C is replaced with A or G in both halves of the pseudo-palindrome motif. The frequencies of hydrogen bonds between the N182, N182Q, and N182S side chains and the three dsDNA sequences over the microsecond trajectories are shown in Table . The rows are delimited by the three amino acid side chains and the columns by C, A, and G. Mostly, only hydrogen bonds with at least a 10% frequency in one of the three trajectories are considered.

Table 2

Hydrogen Bond Frequencies Among the N182, N182Q, and N182S Zta Proteins with C, A, and G Forms of TRE dsDNA over 1 ms Molecular Dynamics Trajectoriesa

	DNA
protein	C³	A³	G³
Zta(N182)	Y OD1 – A C³ H4 0.95	Y OD1 – A A³ H6 0.53	Y HD2 – A G³ N7 0.14
	Y OD1 – B C³ H4 0.68	Z OD1 – B A³ H6 0.44	Y HD2 – B G³ N7 0.15
		Y HD2 – A A³ N7 0.65
		Y HD2 – B A³ N7 0.64
	Y HD2 – B T^–4 O4 0.88	Y HD2 – B T^–4 O4 0.10
	Z HD2 – A G^–4 O4 0.57	Y HD2 – A T^–4 O4 0.01
Zta(N182Q)	Y OE1 – A C³ H4 0.19	Y OE1 – A A³ H6 0.04	Y HE2 – A G³ O6 0.13
	Z OE1 – B C³ H4 0.16	Z OE1 – B A³ H6 0.004	Z HE2 – B G³ O6 0.19
		Y HE2 – A A³ N7 0.32	Y HE2 – A G³ N7 0.30
		Z HE2 – B A³ N7 0.28	Z HE2 – B G³ N7 0.41
	Y HE2 – A C^–4 O4 0.47	Y HE2 – B T^–4 O4 0.43	Y HE2 – B T^–4 O4 0.53
	Z HE2 – A T^–4 O4 0.14	Z HE2 – A T^–4 O4 0.35	Z HE2 – A T^–4 O4 0.67
	Y OE1 – B A^–5 H6 0.29	Y OE1 – B A^–5 H6 0.33	Y OE1 – B A^–5 H6 0.36
	Y OE1 – A C^–5 H4 0.14	Z OE1 – A C^–5 H4 0.26	Z OE1 – A C^–5 H4 0.55
	Y HE2 – B A^–5 N7 0.24
Zta(N182S)	Y OG – A C³ H4 0.20	Y OG – A A³ H6 0.03
	Y OG – B C³ H4 0.15	Y OG – B A³ H6 0.61
		Y HG1 – B A³ N7 0.57

In accordance with the starting crystal structure (PDB ID: 2C9N) notation, the two protein strands are indicated by Y and Z and the two DNA strands by A and B. Except in a few cases to show the asymmetry, only hydrogen bonds with a minimum frequency of 0.1 are shown. Following the format of the crystal structure, the two protein segments are designated by Y and Z and the two single-stranded DNA (ssDNA) chains by A and B; protein Y primarily binds ssDNA chain A, and Z primarily binds B. As a control, two additional simulations that started from the end of the trajectories of the Zta(N182Q) and Zta(N182S) bound to the TRE. Despite different starting side-chain configurations from the crystal structure, the asparagine side chains of both Zta(N182Q) and Zta(N182S) quickly reorientated and regained the hydrogen bonding profile of the wild-type system. The upper left-hand portion of Table shows that asparagine in both monomers of Zta makes two hydrogen bonds: (i) the asparagine side-chain oxygen (OD1) with one of the N4-bound hydrogens (H4) of C and (ii) one of the side chain ND2-bound hydrogens (HD2) of asparagine with the O4 oxygen of the T base on the opposite strand. These are the same hydrogen bonds that occur in the starting crystal structure, which attests to the stability of the simulation. They have also been described in an independent X-ray study of cocrystals.[10] A typical snapshot of this groove-spanning configuration is shown in Figure S22A. These two pairs of hydrogen bonds persist throughout most of the trajectory (with frequencies of 0.95 and 0.88 for one half-site and 0.68 and 0.57 for the other). An interesting observation is that the frequencies are greater for one half-site than the other. This may indirectly reflect the asymmetry at the center of the motif, with C in one DNA strand and G in the other strand. As indicated in Figure and Table A, the next most stable 8-mer bound by Zta is TGAGTA, with a Z-score of 60 vs 97 for binding the TRE. As seen in Table , this substitution of C for A causes a change in the hydrogen bonding pattern of the asparagine from groove-spanning to a bidentate interaction with A. The OD1 and HD2 atoms of the asparagine side chain now bind with the H6 and N7 atoms of the adenine, respectively. A typical snapshot of this configuration is shown in Figure S22B. As seen in the top row right column of Table , the asparagine does not form any long-lasting hydrogen bonds to dsDNA containing G, consistent with the relatively low Z-scores of 12 and 10 for the two half-site complexes (Table A). The second row in Table shows the results of the simulations of Zta(N182Q) bound to three dsDNA sequences. Glutamine in Zta(N182Q) bound to the TRE (first column) forms weaker hydrogen bonds with C and T than asparagine. Glutamine in the Zta(N182Q)/DNA-A complex has stronger interactions with T and a strong interaction with A helping to explain the change in binding specificity. Interestingly, the strength of the Zta(N182Q)/DNA-A complex is also found to be dependent on whether the central base of the motif is G or C (with Z-scores of 61 vs 19, respectively). Like asparagine, the one methylene longer glutamine can form bidentate hydrogen bonds with adenine,[12,26,27] but in these simulations, glutamine like asparagine is forming hydrogen bonds, though less frequently, with both C and T. We conducted an additional series of MD simulations of Zta(N182Q) in the complex with the asymmetric motifs TGAG and TGAC that are strongly bound. Taking into account the complementary sequences, this provided four unique half-site complexes: Zta(N182Q) with GA, GA, C, and C. As seen in Table S8, the results are consistent with the experimental data. In particular, the greatest hydrogen bonding residence frequencies occur for Zta(N182Q) binding the G half-site (Y HE2-B T O4 0.57 and Y HE2-A with A, N7 0.46), even greater than the symmetric sequence containing A in both half-sites (strongest monomer data: Y HE2-B T O4 0.43 and Y HE2-A with A, N7 0.32). Replacing the central G base with C results in a reduction in hydrogen bonds for both C and C. The third row of Table presents the simulation results for Zta(N182S) binding three dsDNA sequences. Experimentally, Zta(N182S) binds strongest to dsDNA with G and G at the center (i.e., the G half-site) with a Z-score of 138. This is not clearly reflected in the hydrogen bonding profiles. Rather, the only significantly long-lasting interactions are two, bidentate hydrogen bonds between the serine and A in one half-site. Two infrequent, single hydrogen bonds occur with C in both half-sites, and no significantly resident hydrogen bonds are observed with the G DNA. Given that the serine side chain is shorter than asparagine, it is not surprising that it does not reach as far into the major groove to interact with the nucleotide bases. This lack of bonding could explain why the alpha helical basic region of the Zta(N182S) protein moved away from the DNA over the course of the trajectory. This raises the possibility that the specificity is derived not from hydrogen bonds with the base pair containing G but from entropy. It should be noted that the paucity of interactions seen here is in contrast to several examples of serine residues interacting with guanine and other nucleotides in the X-ray structures of other protein/DNA complexes.[12,26,27]

Discussion

We used a PBM platform[25] to measure the bZIP domain of Zta and N182 mutants (Q, S, T, V, and I) binding to four types of dsDNA: cytosine in both strands, 5mC or 5hmC in one dsDNA strand and cytosine in the second strand,[20] and 5mC in both cytosines of all CG dinucleotides.[24] The mutants were selected to have side chains with a similar size to asparagine with either polar (serine, glutamine, and threonine) or hydrophobic (isoleucine and valine) properties. The greatest change in sequence-specific dsDNA binding was to unmodified dsDNA by Zta(N182Q) and Zta(N182S). These two mutants also had weak median binding, suggestive of increased binding specificity. We observe similarities and differences between mutants in binding DNAs containing modified cytosines. A molecular dynamics evaluation of Zta binding dsDNA indicates that asparagine 182 forms two hydrogen bonds with nucleotides in the major groove of dsDNA, as suggested by X-ray structures.[10] Zta(N182Q) preferentially binds A, and the molecular dynamics simulations indicate that glutamine forms hydrogen bonds with A. This type of interaction has been described for both asparagine and glutamine interacting with adenine.[11] Zta(N182S) preferentially binds G, but no interactions are identified in our molecular dynamics simulations. Entropic forces may be dominating the preferential interaction with G. Calorimetric experiments indicate that the bZIP domain of GCN4 binding to the TRE (TCA) is enthalpic, while binding to the CRE TCACGTGA has a stronger entropic effect,[30−32] highlighting the complexity of the issue of sequence-specific dsDNA binding. Potentially, the new dsDNA binding specificity is the result of avoiding interactions with the nucleotides, thus driving binding specificity. Negative design is a term that captures this avoidance property.[33,34] Negative design is a concept in protein folding in which an unfavorable interaction in the disordered state drives folding.[35,36] The use of negative design to create binding specificity can also explain some protein–protein interactions. Both attractive and repulsive interactions are observed between charged amino acids in the e and g positions of the heptads in the leucine zipper coiled coil that lie over the hydrophobic core.[5] Repulsive energies are stronger than attractive energies (coupling energy).[37] Negative design is also observed in the hydrophobic core of leucine zipper coiled coil.[38] These two negative design parameters help create the distinct interaction properties of bZIP proteins in Drosophila(39) and Arabidopsis.[40] Sequence-specific dsDNA binding of proteins was initially suggested to involve hydrogen bonds between amino acid side chains and the nucleotides in the major groove,[41] creating a direct readout or code.[42] Subsequently, the X-ray co-crystal structure of the Trp repressor/operator did not reveal direct interactions between the amino acid side chains and nucleotides but instead interactions between the protein and the backbone of DNA that indirectly drove sequence-specific dsDNA binding.[43] A more recent work has shown that enthalpic or entropic forces can drive sequence-specific dsDNA binding.[44] The specificity of protein-dsDNA interactions has been examined for many protein domains.[45−47] These studies typically mutate the protein and probe for new dsDNA binding activity. The PBM platform allows direct examination of dsDNA binding to thousands of sequences, but repulsive interactions are hard to identify. A potential example of the negative design in protein-dsDNA interactions is the expansion of C2H2-ZF TFs in metazoans.[48] The more ancient zinc fingers have more interactions with the nucleotides, while more recent zinc fingers have more interactions with the DNA backbone. This allows base-contacting specificity residues to mutate without a catastrophic loss of DNA binding. This overdesign in binding affinity to the backbone allows interactions with the nucleotides to be repulsive, potentially generating a kaleidoscope of dsDNA binding specificity. In summary, changing asparagine Zta(182) to either glutamine or serine creates new dsDNA binding specificity to unmodified dsDNA. These Zta(N182) mutants could be interesting biological probes to explore how different sequences in the genome are accessed.

Experimental Procedures

Cloning, Expression, and Single Amino Acid Mutations of the Zta bZIP DNA-Binding Domain

The N-terminal GST fusion construct of the wild-type DNA binding domain of viral bZIP protein Zta with approximately 50 flanking amino acids was obtained in a modified pDEST15 MAGIC vector.[49] The cloned amino acid sequence of the Zta bZIP domain is shown here with the alpha helical DNA binding region in bold and N182 underlined. Zta: STVQTAAAVVFACPGANQGQQLADIGVPQPAPVAAPARRTRKPQQPESLEECDSELEIKRYKHYREVAAAKSSENDRLRLLLKQMCPSLDVDSIIPRTPDVLHEDLLNF. We chose replacement amino acids that were of comparable size to N182 (https://www.genome.jp/dbget-bin/www_bget?aaindex:CHOC760101). Mutant constructs of Zta (N182Q, N182S, N182I, N182V, and N182T) were generated by site-directed mutagenesis of the GST-fused wild-type Zta plasmid construct (GenScript, USA). Zta constructs were expressed using a PURExpress In vitro protein synthesis kit (NEB) previously described.[24] Similar amounts of protein synthesis were confirmed by western blot using an anti-GST-HRP conjugated antibody (Figure S2).

PBM Experiments

We used the “HK” array design (NCBI Gene Expression Omnibus (GEO) platform, GPL11260) for all PBM experiments. Single-stranded DNA 60-mers on the array were double-stranded using T7 DNA polymerase using either cytosine, 5mC (NEB), or 5hmC (Zymo Research).[20] Enzymatic methylation of CG dinucleotides on PBMs (DNA(5mCG))[24] and protein binding experiments[18,19,50−52] were performed as described previously.

Fluorescence Extraction and Data Processing

For each PBM, a microarray image was generated using an Agilent Sure Scan II scanner and analyzed using the ImaGene extraction software (BioDiscovery Inc.). Data quantification and Z-score calculations were performed as previously described.[18,53] All mutant proteins bound dsDNA strongly (Figure S3). Each protein was assayed with replicates in good agreement (R > 0.8) (Table S1 and Figures S4–S7). Arrays with the fewest saturated spots were used for further analysis. Z-scores for 5mC, 5mCG, and 5hmC PBM data were rescaled relative to unmodified cytosine using the slope of the line of the best fit computed from the Z-scores using 8-mers without cytosine (Table S2). Data (raw probe intensities and 8-mer Z-scores) are available at the NCBI GEO database under accession numbers GSE115351 and GSE115352.

Molecular Dynamics Simulations

All simulations were based on the PDB ID: 2C9N crystal structure.[9] Mutations of the N182 amino acid side chain and C3 nitrogenous base were done with UCSF-Chimera 1.13.1.[54] Molecular dynamics trajectories were generated with NAMD 2.13b2[55] using the CHARMM36 all-hydrogen topologies and parameters.[56] The protein/DNA complexes were centered in a periodic-boundary cell of 94 × 60 × 49 Å initial dimensions, which allowed for a minimum 12 Å gap to any wall. VMD-1.9.3[57] was used to fill the surrounding space with TIPS3P-model waters and sodium and chloride ions necessary to neutralize the system and provide a salt concentration of 150 mM. Electrostatic and VDW energy functions were calculated with a CUTOFF of 12 Å, a SWITCHDIST of 10 Å, and a PAIRLISTDIST of 14 Å. The Langevin and Langevin–Piston algorithms were used to maintain the system temperature and pressure at 300.0 °K and 1.0 ATM, respectively, with the defaults for the other parameters. Rigid bonds were used to allow for an integration timestep of 2 fs. Trajectories were run for a minimum of 1000 ns, with snapshots collected every 10 ps. Hydrogen bonds and other interactions were studied with CHARMM.[58]

58 in total

1. The GCN4 basic region leucine zipper binds DNA as a dimer of uninterrupted alpha helices: crystal structure of the protein-DNA complex.

Authors: T E Ellenberger; C J Brandl; K Struhl; S C Harrison
Journal: Cell Date: 1992-12-24 Impact factor: 41.582

Review 2. Origins of specificity in protein-DNA recognition.

Authors: Remo Rohs; Xiangshu Jin; Sean M West; Rohit Joshi; Barry Honig; Richard S Mann
Journal: Annu Rev Biochem Date: 2010 Impact factor: 23.643

3. 5-Hydroxymethylcytosine in E-box motifs ACAT|GTG and ACAC|GTG increases DNA-binding of the B-HLH transcription factor TCF4.

Authors: Syed Khund-Sayeed; Ximiao He; Timothy Holzberg; Jun Wang; Divya Rajagopal; Shriyash Upadhyay; Stewart R Durell; Sanjit Mukherjee; Matthew T Weirauch; Robert Rose; Charles Vinson
Journal: Integr Biol (Camb) Date: 2016-08-03 Impact factor: 2.192

4. Replacement of invariant bZip residues within the basic region of the yeast transcriptional activator GCN4 can change its DNA binding specificity.

Authors: M Suckow; K Schwamborn; B Kisters-Woike; B von Wilcken-Bergmann; B Müller-Hill
Journal: Nucleic Acids Res Date: 1994-10-25 Impact factor: 16.971

5. The structure of a CREB bZIP.somatostatin CRE complex reveals the basis for selective dimerization and divalent cation-enhanced DNA binding.

Authors: M A Schumacher; R H Goodman; R G Brennan
Journal: J Biol Chem Date: 2000-11-10 Impact factor: 5.157

6. C/EBPβ (CEBPB) protein binding to the C/EBP|CRE DNA 8-mer TTGC|GTCA is inhibited by 5hmC and enhanced by 5mC, 5fC, and 5caC in the CG dinucleotide.

Authors: Syed Khund Sayeed; Jianfei Zhao; Bangalore K Sathyanarayana; Jaya Prakash Golla; Charles Vinson
Journal: Biochim Biophys Acta Date: 2015-03-13

7. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles.

Authors: Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell
Journal: J Chem Theory Comput Date: 2012-07-18 Impact factor: 6.006