WRKY proteins, defined by the conserved WRKYGQK sequence, are comprised of a large superfamily of transcription factors identified specifically from the plant kingdom. This superfamily plays important roles in plant disease resistance, abiotic stress, senescence as well as in some developmental processes. In this study, the Arabidopsis WRKY1 was shown to be involved in the salicylic acid signaling pathway and partially dependent on NPR1; a C-terminal domain of WRKY1, AtWRKY1-C, was constructed for structural studies. Previous investigations showed that DNA binding of the WRKY proteins was localized at the WRKY domains and these domains may define novel zinc-binding motifs. The crystal structure of the AtWRKY1-C determined at 1.6 A resolution has revealed that this domain is composed of a globular structure with five beta strands, forming an antiparallel beta-sheet. A novel zinc-binding site is situated at one end of the beta-sheet, between strands beta4 and beta5. Based on this high-resolution crystal structure and site-directed mutagenesis, we have defined and confirmed that the DNA-binding residues of AtWRKY1-C are located at beta2 and beta3 strands. These results provided us with structural information to understand the mechanism of transcriptional control and signal transduction events of the WRKY proteins.
WRKY proteins, defined by the conserved WRKYGQK sequence, are comprised of a large superfamily of transcription factors identified specifically from the plant kingdom. This superfamily plays important roles in plant disease resistance, abiotic stress, senescence as well as in some developmental processes. In this study, the ArabidopsisWRKY1 was shown to be involved in the salicylic acid signaling pathway and partially dependent on NPR1; a C-terminal domain of WRKY1, AtWRKY1-C, was constructed for structural studies. Previous investigations showed that DNA binding of the WRKY proteins was localized at the WRKY domains and these domains may define novel zinc-binding motifs. The crystal structure of the AtWRKY1-C determined at 1.6 A resolution has revealed that this domain is composed of a globular structure with five beta strands, forming an antiparallel beta-sheet. A novel zinc-binding site is situated at one end of the beta-sheet, between strands beta4 and beta5. Based on this high-resolution crystal structure and site-directed mutagenesis, we have defined and confirmed that the DNA-binding residues of AtWRKY1-C are located at beta2 and beta3 strands. These results provided us with structural information to understand the mechanism of transcriptional control and signal transduction events of the WRKY proteins.
WRKY proteins comprise a large group of transcription factors identified specifically from the plant kingdom, ranging from lower plants such as mosses to higher plants (1,2). Members of the WRKY superfamily play important roles in a variety of developmental and physiological processes in plants. The most documented function for this superfamily of genes is their involvement in salicylic acid (SA) signaling and disease responses (3–8). SA is an important signal molecule in both local defenses and systemic acquired resistance (SAR). Endogenous accumulation or exogenous application of SA activates SAR and pathogenesis-related (PR) gene expression. In Arabidopsis, 49 out of 72 tested WRKY genes can be induced by pathogen infection or SA treatment (7), suggesting that a broad range of WRKY members could be involved in playing regulatory roles in disease resistance. This role of disease resistance was further confirmed by the finding of an over-presentation of the cis-element activated by the WRKY proteins, the W-box, within the promoters of a number of genes that are co-expressed in SAR (9,10). ArabidopsisNON-EXPRESSOR OF PR1 (NPR1) has been demonstrated as a key protein in SAR signaling pathway (11). SAR and expression of PR1 are prevented in Arabidopsisnpr1 mutants even after SA treatment (12). Previous reports also showed that over-expression of WRKY18 and WRKY70 in Arabidopsis resulted in increased resistance to pathogens and enhanced expression of PR genes (3,13).In addition, WRKY genes are involved in abiotic stress, such as drought (14–17), cold (14,17,18), heat (15), salinity (17) and wounding (19); and in biosynthesis of anthocyanin (20) and starch (21). Some WRKY genes can also regulate developmental process such as embryogenesis (22), senescence (4,23), tricome development (20) as well as seed size (24).There are more than 70 WRKY family members in Arabidopsis (1) and more than 100 in rice (25). The name of WRKY is derived from the conserved WRKY sequence motif which is present in all WRKY members. The WRKY domain is a stretch of about 60 amino acids with strictly conserved WRKYGQK sequence at its N-terminal followed by a putative novel zinc-binding motif with features of C–C–H–H or C–C–H–C (26–28). The existence of Zn2+ is crucial for the DNA-binding activity, which implies the importance of the putative zinc-binding motif (29,30). WRKY proteins can be classified into three groups according to the number of WRKY domains and zinc-binding pattern (26). Group I WRKY proteins are distinct from groups II and III by containing two WRKY domains. The zinc-binding motif of group III proteins is C–X7–C–X23–H–X–C, different from the common pattern of C-X4–5-C-X22–23-H-X-C of group I and II proteins. It has been reported that substitutions of the invariable WRKYGQK residues in the WRKY domain decreased the DNA-binding affinity, and any mutations of the conserved cysteine and histidine of the zinc-binding motif abolished the protein–DNA interaction (29).The two WRKY domains in group I proteins play different roles in DNA-binding activities (27). It has been shown that the specific binding to W-box is mediated mainly by the C-terminal WRKY domain (27,30), whereas the function of N-terminal WRKY domain remains unclear. Arabidopsis thalianaWRKY1 protein (also known as ZAP1), the first identified WRKY transcription factor from Arabidopsis, is localized in Chromosome 2 as a single copy and belongs to group I WRKY family (30). The recognition of AtWRKY1 with W-box mainly depends on the C-terminal WRKY domain while the N-terminal WRKY domain just slightly affect the protein–DNA interaction (30). The structural information of WRKY proteins and how they interact with W-box were lacking until recently when an NMR structure of C-terminal WRKY domain of AtWRKY4, AtWRKY4-C was determined (31). However, the domain division of the NMR sample seemed to be short and may not represent the structure of the whole domain. After the whole genome cloning of the transcription factors of A. thaliana (32), we have carried out functional and structural studies on AtWRKY1.In this report, we have showed that n class="Gene">AtWRKY1 can be induced by SA treatment and the induction was partially dependent on NPR1, suggesting that this protein may be involved in the defense response. Most importantly, we present the crystal structure of the AtWRKY1-C at 1.6 Å and a model of protein–DNA complex from site-directed mutagenesis studies.
MATERIAL AND METHODS
Plant material and growth conditions
Wild-type Arabidopsis plants (ecotype Col-0) were grown in the soil at a growth room (23/20°C, 12-h-light/12-h-dark cycle). All the seeds were vernalized at 4°C for at least 2 days before placement in a growth environment. The non-functional NPR1 mutant npr1–3 (12) and the transgenic line overexpressing NPR1 gene (33) were kindly provided by Dr Frederick M. Ausubel (Harvard Medical School, USA) and Dr Xin-Nian Dong (Duke University, USA), respectively.
SA treatment and northern blotting
Arabidopsis plants were sprayed with 2 mM SA diluted from stock solution (100 mM, adjusted to pH 6.5 with KOH). Leaves from different 4-week-old plants were harvested at indicated time points for total RNA preparation with RNAzol (Vigrous, Beijing) according to the manufacturer's instruction. Ten micrograms of total RNA was separated on 1.5% agrose–formaldehyde gels and transferred to a nylon membrane to hybridize with digoxigenin (DIG) labeled DNA probes. DIG DNA probes labeling, hybridization and detection were performed according to DIG Application Manual for Filter Hybridization (Roche, Penzberg, Germany). The following primer combinations were used to amplify DNA probes by PCR: PR1: forward 5′-ATGAATTTTACTGGCTATTCTCG-3′, reverse 5′-TTAGTATGGCTTCTCGTTCAC-3′; AtWRKY1-C: forward 5′-ATGGCTGAGGTGGGAAAAGTTCTG-3′, reverse 5′-GCTTTGGGCAGGCTCTGTCTTGGG-3′. Equal loading was confirmed by staining the gel with ethidium bromide (EB).
Protein preparation and crystallization
The preparation and crystallization of the AtWRKY1-C protein was described previously (34). Briefly, the cDNA fragment covering the C-terminal WRKY domain of AtWRKY1 was amplified by polymerase chain reaction (PCR) from a full-length cDNA clone of AtWRKY1 obtained from Arabidopsis (32) and cloned into the expression vector pET21aDEST. The protein was expressed in E. coli strain Rosetta and was purified to homogeneity in a two-step procedure of Ni2+ chelating and size exclusion chromatography. The mutant proteins were prepared similarly. The protein sample was kept in 20 mM Tris, 200 mM NaCl, pH 7.5 prior to crystallization. Crystals of the AtWRKY1-C were obtained at 277 K from condition containing 1.2 M succinic acid, 0.1 M Tris–HCl pH 7.0 and 1% w/v PEG MME 2000, by using the hanging-drop vapor diffusion methods.
Data collection and structure determination
Two crystal forms were obtained from the same growth condition. One belongs to the space group P21, with diffraction to 2.5 Å resolution at home X-ray sources as published before (34), whereas the other crystal form belongs to the space group P3221, and diffracted to better than 1.6 Å at beamline 3W1A of Beijing Synchrotron Radiation Facilities (BSRF). Diffraction data were collected on a MAR165 CCD camera (MARresearch GmbH, Hamburg). The wavelength was set to 1.24 Å, at the higher energy side of the absorption peak of zinc. The data with anomalous signals were processed using the program DENZO and SCALEPACK (35). The statistics for the X-ray data collection and processing of the P3221 crystal form are summarized in Table 1.
Table 1.
Data collection, phasing and refinement statistics (SAD)
Data collection
Refinement
Space group
P3221
Resolution (Å)
30–1.6
Cell dimensions
No. of reflections
10 536
a, b, c (Å)
45.55,45.55,68.96
Rwork/Rfree (%)
17.9/20.4
α, β, γ (°)
90,90,120
No. of atoms
727
Resolution (Å)
30–1.6 (1.66–1.6)
Protein
609
Rsym or Rmerge (%)
9.1 (35.4)
Ligand/ion
8/1
I/σI
22.9 (2.9)
Water
100
Completeness (%)
97.9 (95.5)
Average B-factors
20.95
Redundancy
5.6 (4.6)
RMSDa bond lengths (Å)
0.015
RMSDa bond angles (°)
1.61
aRMSD in bond lengths and angles are the root-mean squared (RMS) deviations from ideal values.
Data collection, phasing and refinement statistics (SAD)aRMSD in bond lengths and angles are the root-mean squared (RMS) deviations from ideal values.The zinc site was located using the program SHELXD (36) and the heavy atom parameters were refined using the program SOLVE (37). The initial phases were obtained by the program OASIS-2004 (38), then refined by using the program DM (CCP4) (39). Finally, n class="Gene">ARP/warp (40) was used for the model auto-building.
Since the initial phases and auto-traced model were of good qualities, the subsequent refinement work could be easily carried out by using the program CNS (41). Manual adjustment of the structure was done with the program O (42), and a succinic acid as well as water molecules were added into the positive difference densities. The final model was refined by REFMAC5 (43).
Mutant design and site-directed mutagenesis
The design of mutants was based on a DNA–protein complex model between the W-box with the sequence 5′-ATCGTTGACCGAGTTGA-3′ and AtWRKY1-C, which was generated from the known GCM–DNA complex (44) with the PDB ID 1ODH. GCM in the complex was replaced by AtWRKY1-C with least square (LSQ) fitting between AtWRKY1-C and GCM using the program O (42). The root mean-square standard deviations (RMSD) after LSQ was 2.3 Å for 68 Cα atoms. A B-DNA model for specific binding was prepared using the program CNS (41) and NAB (45) with the sequence 5′-ATCGTTGACCGAGTTGA-3′. DNA in the GCM–DNA complex was then replaced using the function of LSQ fit in XtalView (46).The wild-type AtWRKY1-C cDNA was used as a template for the mutagenesis primers. Site-directed mutagenesis by PCR was introduced into AtWRKY1-C. DNA sequencing verified the introduction of the desired mutations and demonstrated that no unwanted mutations were present in the mutated protein sequences.
Electrophoretic mobility shift assays (EMSA)
The oligonucleotides 5′-ATCGTTGACCGAGTTGA-3′ and 5′-TCAACTCGGTCAACGAT-3′ containing the optimal binding site for AtWRKY1 were annealed to form DNA duplex. The typical binding reaction (20 μl) contained 3 μg dsDNA, 20 mM HEPES/KOH (pH 7.2), 40 mM KCl, 1 mM EDTA, 0.5 mM DTT, 10% glycerol and 5 μg purified wild-type AtWRKY1-C protein or mutant proteins. The binding reaction mixture was incubated at room temperature for 20 min and the complex was separated from the free duplex on a 12% non-denaturing polyacrylamide gel electrophoresis (PAGE) in 0.5× TBE at 70 V for 3.5 h. The gel was stained with EB. Images were captured using the fluorescence imaging system (Alpha Innotech Corporation, USA). A more ‘standard’ EMSA analysis was also performed. The standard binding reaction (20 μl) contained 0.5 μg of poly(dI-dC), 20 mM HEPES/KOH (pH 7.2), 40 mM KCl, 1 mM EDTA, 0.5 mM DTT, 10% glycerol, 0–180 ng purified proteins were added and 10 ng labeled double-stranded synthetic oligonucleotide, labeled with [γ-32P]ATP using T4 polynucleotide kinase (NEB). For the competition assay, 100-fold molar excess of the unlabeled specific probe was added. DNA–protein complexes were allowed to form at room temperature for 30 min and then resolved on a 12% non-denaturing polyacrylamide gel in 0.5× TBE. The gel was dried and exposed for 2 h on X-ray films.
CD spectroscopic assays
All the CD spectra of the wild-type and mutant AtWRKY1-C proteins were recorded on a Jobin Yvon CD 6 spectrometer (Longjumeau, France) at 298 K. The CD spectra of all proteins were recorded in phosphate buffer saline (PBS) buffer, pH 7.5. For near-UV–CD spectrum, a cell with a path length of 1 mm was used. Each spectrum was the average of four scans corrected by subtracting a spectrum of the buffer solution in the absence of proteins recorded under identical condition. Each scan in the range of 195–260 nm was obtained by taking data points every 0.5 nm with a 2-nm bandwidth and integration time of 1 s.
RESULTS AND DISCUSSION
AtWRKY1 is involved in the SA signaling
SA is an important signal molecule in plant disease resistance. It has been previously shown that SA treatment can induce the expression of PR genes and activate systemic acquired resistance (SAR), which makes plants resistant against a spectrum of pathogens (47). To study the potential involvement of AtWRKY1 in SA signaling pathway and its possible relation to NPR1, a key protein in SAR, we characterized the expression profile of AtWRKY1 by RNA gel blot in wild-type Arabidopsis seedlings, npr1-3 mutant (12) and NPR1-H line (33) in response to exogenous SA treatment. As an important marker of SAR, PR1 was also examined to monitor the SA treatment procedure. In consistence with previous reports, PR1 began to accumulate at 4 h post-treatment and the induction of PR1 was abolished in npr1-3 while increased in NPR1-H than that in wild type (Figure 1). The induced expression of AtWRKY1 was observed after 8 h post-treatment and became obvious later than that of PR1. In npr1-3, AtWRKY1 can still be expressed but at a lower level than that in wild-type plants. The opposite trend appeared in NPR1-H with higher expression level of AtWRKY1 than in wild type (Figure 1). These data suggested that AtWRKY1 was involved in SA signaling pathway and its induced expression was partially dependent on NPR1.
Figure 1.
AtWRKY1 is partially NPR1-dependent in SA pathway. Four weeks old wild type (Col-0), npr1-3 and 35S::NPR1 Arabidopsis were sprayed with 2 mM SA and harvested at indicated time points. PR1 transcripts were detected the same as AtWRKY1 except for the difference of probe. The ethidium bromide stain of rRNA is shown for each lane to allow assessment of equal loading.
AtWRKY1 is partially n class="Gene">NPR1-dependent in SA pathway. Four weeks old wild type (Col-0), npr1-3 and 35S::NPR1Arabidopsis were sprayed with 2 mM SA and harvested at indicated time points. PR1 transcripts were detected the same as AtWRKY1 except for the difference of probe. The ethidium bromide stain of rRNA is shown for each lane to allow assessment of equal loading.
The nuclear localization of AtWRKY1-C
As transcriptional regulators, WRKY proteins are shown to contain functional nuclear localization signal (NLS) and are targeted to nucleus (27,48). However, the NLS sequences of WRKY proteins are not well conserved, and may be distributed at different locations. The analysis of protein sequences of AtWRKY1 revealed a potential NLS motif KRRKK between residues 273 and 277 near the C-terminal WRKY domain, which is a conserved feature for at least nine members of the group I WRKY proteins in Arabidopsis (Figure 2D). We have thus included this potential NLS motif in the AtWRKY1-C sequence for this study. Introducing the fusion construct of AtWRKY1-C with dimeric GFP (dGFP) into onion epidermal cells, the green fluorescent signal targeted specifically to the nucleus. However, the fluorescence of dGFP alone and fusion protein of AtWRKY1-CΔNLS with dGFP were unevenly distributed throughout the cells (data not shown), which suggested that the potential NLS motif of AtWRKY1-C is indeed responsible for the nuclear localization of AtWRKY1.
Figure 2.
Structure of AtWRKY1-C and multi-sequence alignment. (A) Ribbon representation of AtWRKY1-C domain. The AtWRKY1-C is composed of five β-strands (yellow ribbons), which are numbered from the N-terminus. The zinc ion is shown as a purple sphere and the zinc-coordinating residues are represented by sticks (yellow for C, red for O, blue for N, orange for S). (B) 3D-superimposition of the structure of AtWRKY1-C and the best representative NMR structure of AtWRKY4-C (model 15), using LSQ Fit in O. Macromolecular structures are shown by cartoons, with the AtWRKY1-C colored in yellow and AtWRKY4-C in cyan. The zinc ions are represented as spheres shown by magenta in AtWRKY1-C and orange in AtWRKY4-C. (C) Charge distribution on AtWRKY1-C structure surfaces given by GRASP. Positive charges are represented by blue, negative charges are represented by red. (D) Structure-based sequence alignment of both N-terminus and C-terminus of the nine Arabidopsis WRKY proteins from group I. The zinc-coordinating residues are shown on blue background. Conserved residue elements for stabilizing the structure and recognizing DNA are drawn on red and yellow background, respectively. Residues of β1, highlighted in green but missed in AtWRKY4-C structure, are rather conserved in all C-terminal domains of group I WRKY proteins. Residues marked by stars and triangles are scaffolds of two stable regions. Details of the interactions of the two regions are represented in Figure 3C and D.
Structure of AtWRKY1-C and multi-sequence alignment. (A) Ribbon representation of AtWRKY1-C domain. The AtWRKY1-C is composed of five β-strands (yellow ribbons), which are numbered from the N-terminus. The zinc ion is shown as a purple sphere and the zinc-coordinating residues are represented by sticks (yellow for C, red for O, blue for N, orange for S). (B) 3D-superimposition of the structure of AtWRKY1-C and the best representative NMR structure of AtWRKY4-C (model 15), using LSQ Fit in O. Macromolecular structures are shown by cartoons, with the AtWRKY1-C colored in yellow and AtWRKY4-C in cyan. The zinc ions are represented as spheres shown by magenta in AtWRKY1-C and orange in AtWRKY4-C. (C) Charge distribution on AtWRKY1-C structure surfaces given by GRASP. Positive charges are represented by blue, negative charges are represented by red. (D) Structure-based sequence alignment of both N-terminus and C-terminus of the nine Arabidopsis WRKY proteins from group I. The zinc-coordinating residues are shown on blue background. Conserved residue elements for stabilizing the structure and recognizing DNA are drawn on red and yellow background, respectively. Residues of β1, highlighted in green but missed in AtWRKY4-C structure, are rather conserved in all C-terminal domains of group I WRKY proteins. Residues marked by stars and triangles are scaffolds of two stable regions. Details of the interactions of the two regions are represented in Figure 3C and D.
Figure 3.
B-factor representation of AtWRKY1-C crystal structure. All the objects are colored according to the B-factor values, from blue to red in the order of increasing B-factor. (A) Plot of B-factor against residues, with B-factor averaged over each residue. (B) The overall structure of AtWRKY1-C in ribbon colored according to the B-factor values. The ends of the loop between strands β1 and β2 with low B-factor values are circled out by red rings for further amplification in 3C and 3D. Details of the interaction around these two terminals are described in 3C and 3D. (C) A D308-W312-K341 triad in the C-terminal of the loop between β1 and β2. Asp308 forms a well-defined salt-bridge with Lys341, extensively H-bonding with side chains of Trp312, Tyr357 and backbone of Tyr310. The three key residues Asp308, Trp312 and Lys341, components of D308-W312-K341 triad, and their hydrogen bonded residues Gly309, Tyr357 are marked by stars in Figure 2D. (D) The stable N-terminal end of the loop between β1 and β2. Arg345 is hydrogen bonded with the backbones of Thr301, Phe303 and Thr350. A potential salt-bridge between Arg345 and Asp304 was also observed as shown, and these two conserved residues are marked by triangles in Figure 2D.
B-factor representation of AtWRKY1-C crystal structure. All the objects are colored according to the B-factor values, from blue to red in the order of increasing B-factor. (A) Plot of B-factor against residues, with B-factor averaged over each residue. (B) The overall structure of AtWRKY1-C in ribbon colored according to the B-factor values. The ends of the loop between strands β1 and β2 with low B-factor values are circled out by red rings for further amplification in 3C and 3D. Details of the interaction around these two terminals are described in 3C and 3D. (C) A D308-W312-K341 triad in the C-terminal of the loop between β1 and β2. Asp308 forms a well-defined salt-bridge with Lys341, extensively H-bonding with side chains of Trp312, Tyr357 and backbone of Tyr310. The three key residues Asp308, Trp312 and Lys341, components of D308-W312-K341 triad, and their hydrogen bonded residues Gly309, Tyr357 are marked by stars in Figure 2D. (D) The stable N-terminal end of the loop between β1 and β2. Arg345 is hydrogen bonded with the backbones of Thr301, Phe303 and Thr350. A potential salt-bridge between Arg345 and Asp304 was also observed as shown, and these two conserved residues are marked by triangles in Figure 2D.
Structure determination of AtWRKY1-C crystal
Although in the same mother liquor and temperature, AtWRKY1-C was crystallized in a different space group, P3221, from the one published before (34). The structure was determined to 1.9 Å resolution by using the Single-wavelength Anomalous Dispersion (SAD) method. The average ratio of Bijovet pairs (<| Delta F|>/) was 4.6%, in accordance with the high correlation coefficients (CC/all and CC/weak) 35.94/20.70, given by SHELXD (36) in zinc substructure locating. Phasing and density modification (DM) (39) were carried out consecutively using the software of OASIS-2004 (38) and DM (CCP4). With a figure of merit (FOM) value of 0.771 and good quality of density maps, 75 residues could be docked automatically by ARP/wARP (40). However, no density could be observed between residues 266 and 292 and between residues 369 and 371. The final model was refined manually to R-factor/R-free of 17.9 and 20.4%, respectively at 1.6 Å resolution (Table 1). The final model includes 76 residues corresponding to 293–368, one zinc ion, one succinic acid and 100 water molecules.
Description of the overall structure of AtWRKY1-C
The AtWRKY1-C structure is mainly composed of a five-stranded antiparallel β-sheet (β1, 294–300; β2, 312–318; β3, 327–332; β4, 340–345; β5, 352–358, see Figure 2D), with the disordered N-terminal 27 (including the potential NLS motif) and C-terminal three residues missing from the structure (Figure 2). WRKY protein's defining sequence, ‘WRKYGQK’, spans the entire β2 strand. Due to the long bridging loop (Thr301–Arg311) between β1 and β2 which covers one side of the β-sheet, the structure of AtWRKY1-C looks rather globular and stable (Figure 2A).The crystal structure of AtWRKY1-C differs from the NMR structure of a similar protein AtWRKY4 C-terminal domain (AtWRKY4-C, which shares about 53% sequence identity with AtWRKY1-C) determined recently (31). As shown in Figure 2B and D, the β1 strand of AtWRKY1-C was missing in the NMR AtWRKY4-C model, thus the strands β2 to β5 in the crystal structure of AtWRKY1-C corresponding to the strands β1–β4 of the NMR structure, and the RMSD corresponding to the similar regions of the two structures (AtWRKY1-C residues 309–367, AtWRKY4-C residues 411–469) is 1.5 Å, indicating a very similar structure for a common DNA-binding mechanism. Since the AtWRKY4-C domain was constructed to start from VQTTS which is located in the middle of the β1 strand (Figure 2B and D with the alignment), the discrepancy in the organization of the β-sheet is most likely caused by the different ways of preparing the NMR and crystallization samples. In order to have a fair comparison, we have repeated and constructed the NMR samples in this work and compared the CD spectra for the two proteins, AtWRKY4-C (Asp365–Ala469) similarly constructed as AtWRKY1-C, and AtWRKY4-CΔ1b (Val399–Ala469) similar to the NMR sample without the N-terminal β1 strand. The CD spectra showed similar pattern for zinc binding for both AtWRKY4-C and AtWRKY4-CΔ1b, and characteristics in both cases of a folded β-sheet structure. However, there was a reduction in the CD signal of AtWRKY4-CΔ1b near the negative peak at 210–220 nm, indicating a reduction of β-sheet signal of the truncated protein (data not shown). We thus predict that the AtWRKY4-C would have a similar globular structure like AtWRKY1-C, had the AtWRKY4-C sample prepared several residues longer at the N-terminus.
The conserved β1 and B-factor distribution of AtWRKY1-C
As shown in Figure 2D, in addition to the conserved WRKY defining residues (in red, yellow and blue), the sequences in the β1 region (colored green) are well conserved in the C-terminal domains of the group I WRKY family members. We can thus conclude that all the C-terminal domains of the group I WRKY protein possess a very similar five-stranded β-sheet architecture as in AtWRKY1-C.Figure 3 describes the B-factor distribution over the AtWRKY1-C structure. Figure 3A shows the B-factor plot against the residue numbers, and Figure 3B is the overall structure overlaid with the B-factors color coded as defined in Figure 3A. The regions with high B-factors (higher than 25) are in both the N- and C-terminal ends and on the loop between β2 and β3, this loop may be involved in conformational changes upon DNA binding. The middle part of the long loop connecting β1 and β2, and a few residues after β5 also showed slightly high B-factor between 20 and 25, whereas most of the structure are quite stable and ordered with an average B-factor of 21, particularly the zinc-binding site and the central β sheet are all with B-factors below 20.
The conserved salt-bridges and the D308-W312-K341 triad
Interestingly, both ends of the connecting loop between β1 and β2 show conserved pattern with ordered salt-bridge interactions and extensive hydrogen-bonding networks (Figure 3C and D). Figure 3C shows that in the C-terminal end of the loop, Asp308 forms a well-defined salt-bridge with Lys341, and Asp308 also forms extensive H-bonds with Tyr310, Trp312, Tyr357. All these residues are very well conserved among the WRKY family proteins, indicating a common function of stabilizing the domain structure. The pivotal residue at this constellation of residues is Asp308. The distance between Asp308 OD1 and Trp312 NE1 is 2.94 Å, and between Asp308 OD2 and Lys341 NZ is 2.85 Å. These three residues are with the lowest B-factors, we thus name this constellation as D308-W312-K341 triad. In the NMR studies, the interactions around D308 could not be seen due to the undefined N-terminal of that structure, only the hydrophobic interaction of AtWRKY4-C W414-K443 (AtWRKY1-C residues W312-K341), which is part of the features of the previously defined triad constellation, has been mentioned (31). Figure 3D depicts another well conserved salt-bridge between Asp304 and Arg345 in the N-terminal end of the loop, and Arg345 is hydrogen bonded with the main chain of Thr301, Phe303 and Thr350.
The zinc-binding site in AtWRKY1-C
One of the important features resolved by the high-resolution crystal structure of AtWRKY1-C is the well-ordered zinc-binding site. As shown in the electron density map in Figure 4A, the high-resolution data of 1.6 Å made it possible to distinguish the zinc coordination with all the ligand atoms, and it was clearly shown that the Nδ of His361 and the Nε of His363, coordinate to the zinc ion (Figure 4A and B). The distances between the ligand atoms and the zinc ion are 2.06 Å (Zn–Nδ of His361), 2.07 Å (Zn–Nε of His363), 2.28 Å (Zn–Sγ of Cys332), 2.27 Å (Zn–Sγ of Cys337), all the distances are well within the ideal distances of a zinc-binding site in proteins (49).
Figure 4.
The zinc coordination environment. (A) 3Fo-2Fc electron-density map. The coordination of the zinc ion by Cys332, Cys337, His361 and His363 was shown. The map was prepared using CCP4 program suite and displayed with Pymol at the contour of 1.5σ level. The zinc ion is shown as a purple sphere and the zinc-coordinating residues are represented by stick (yellow for C, red for O, blue for N, orange for S, respectively). (B) The distances between the zinc ion and the coordinate residues. The orientation and colors are the same as shown in Figure 4A.
The zinc coordination environment. (A) 3Fo-2Fc electron-density map. The coordination of the zinc ion by Cys332, Cys337, His361 and His363 was shown. The map was prepared using CCP4 program suite and displayed with Pymol at the contour of 1.5σ level. The zinc ion is shown as a purple sphere and the zinc-coordinating residues are represented by stick (yellow for C, red for O, blue for N, orange for S, respectively). (B) The distances between the zinc ion and the coordinate residues. The orientation and colors are the same as shown in Figure 4A.The whole zinc-binding site is situated in the low B-factor regions (Figure 3), and the zinc ligand residues have almost the lowest B-factors in the structure, the zinc-binding site is right next to the low B-factor D308-n class="Chemical">W312-K341 triad described in the Figure 3C. Together with the D308-W312-K341 triad, the zinc-binding site is shaping up and stabilizing the AtWRKY1-C structure, since the zinc binding has been shown to be crucial for the stability of WRKY proteins (31).
The sequence signature of this zinc-binding site, C–X4–C–X23–H–X–H, somewhat resembles a classic C2H2 profile (50), the feature of a zinc-finger. In the crystal structure, it has been clearly shown that the zinc ion is coordinated by two cysteines (Cys332, Cys337) and two histidines (His361, His363) (Figure 4), but structurally dissimilar to classic zinc fingers, instead the structure is similar to a sequence unrelated DNA-binding protein Drosophila GCM (glia cell missing) (44) (PDB ID 1ODH). In contrast with the classic C2H2 zinc finger, AtWRKY1-C does not contain any helix, so it cannot bind DNA through the helix as most zinc-finger proteins do. The DNA-binding ability of AtWRKY1-C is mediated through the beta-hairpin regions between β2 and β3, and this mode of DNA–protein interaction is similar to the insightful prediction by Church, Sussman and Kim long time ago for non-helical DNA binding (51). Previous work on DNA binding of WRKY family has also pointed out that the conserved WRKYGQK region located on β2 strand is important for DNA binding (29).
Conserved residues responsible for DNA binding
There are more than 70 members in the Arabidopsis WRKY family grouped into three major groups (26), nine members from the group I with two WRKY domains are selected for multiple sequence alignment based on secondary structure information (Figure 2D) (sequence data were downloaded from http://rarge.gsc.riken.go.jp/rartf/tf_info/fasta_p1/TF_9_P1.fasta). It was shown from the alignment that at least 18 residues are strictly conserved, including the four residues coordinating with the zinc ion, and three residues in the conserved D308-W312-K341 triad.As shown in Figure 2D, more than 10 well-conserved residues are positioned in the region between β2 and β3 including the WRKYGQK motif. Particularly, the five consecutive residues RKYGQ, distinct components of the WRKY family scaffold, belong to the β2 strand; while the other five conserved residues located on β3 strand, in the pattern of PRSYYR/K. Considering the specific DNA sequence that all WRKY members could recognize, these conserved residues are good candidates for the specific DNA-binding process.As shown in Figure 2C, the surface charge distribution of AtWRKY1-C was uneven, with a calculated PI (point of iso-electricity) of 9.5. On one side, there were consecutive regions of positively charged residues, including Arg313, Lys314, Lys318, Lys321, Arg327 and Arg331, all belonging to the strictly conserved residues and located on β2 and β3 strands. Well-conserved Lys340 located at the beginning of strand β4 also belongs to this positively charged region. Since the WRKYGQK sequence falls nicely into this positively charged region and it has been shown that the WRKY motif is responsible to bind to the W-box (29), it can thus be deduced that the β2 and β3 strands are most likely to be involved in the process of DNA binding.
Identification and confirmation of the DNA-binding residues
After searching for similar structures of AtWRKY1-C using the DALI server (52), a DNA–protein complex structure, the Drosophila GCM (44) (PDB ID 1ODH, Z-score > 6.2) was returned. When the GCM protein was used to perform a least-square fitting with AtWRKY1-C, the resulting RMSD between the aligned parts (68 residues) was 2.3 Å, a DNA complex model of W-box DNA and AtWRKY1-C was thus constructed as shown in Figure 7A. By comparison of the potential DNA-binding residues (which are close to the DNA major groove as shown in the complex model), and previous work on the DNA-binding related residues on the WRKYGQK motif (29), the following mutants with a single residue change were constructed: R313E, K314A, K314R, Y315F, Y315R, G316F, Q317A, Q317K, R327A, R327E, Y330F, R331K, R331A and K340A. In Maeo et al.'s work (29), it has been shown that all residues in the WRKYGQK (corresponding to residues 312–318 in AtWRKY1-C) motif are important for the DNA binding. Particularly for W312, K314, Y315, K318, a change to Ala for any of those residues would completely abolish the DNA binding. On the other hand, the R313, G316 and Q317 would only reduce the amount of DNA binding (29). We have thus constructed multiple mutants for R313, K314, Y315, G316 and Q317. Furthermore, the well-conserved residues R327, Y330, R331 and K340 are quite close to the DNA-binding sites, these residues were also selected for mutagenesis studies. Thirteen out of the fourteen mutants are located on strands β2 and β3. This region, particularly the connecting loop, showed flexible feature (high B-factor) in Figure 3A and B, indicating conformational changes might occur upon DNA binding. EMSA were used to test the DNA-binding abilities of these mutants with a fragment of 17-bp duplex DNA containing the W-box of TGAC as shown in Figure 5. The EMSA assay using 32P-labeled probe according to standard protocol was also performed with similar results (not shown). To examine the specificity of the binding, a competition experiment was performed by using an excess amount of unlabeled probe of the same sequence. When a 100-fold excess of unlabeled oligonucleotides were included in the reaction, binding of AtWRKY1-C to the labeled probe was abolished completely (data not shown). All the mutants could be readily expressed and purified as the wild-type AtWRKY1-C, the circular dichroism (CD) spectra of these mutant proteins were used to monitor the correct folding of these proteins, thus the DNA-binding ability changes are not due to the structural change caused by the mutations, see Figure 6.
Figure 7.
A model of the AtWRKY1-C and W-box interaction. (A) Superimposition of AtWRKY1-C and the overlapping structure of the Drosophila GCM domain using LSQ Fit in the program O. Macromolecular structures are shown by cartoon loops, with the GCM domain colored yellow, AtWRKY1-C blue and zinc ions as orange and magenta spheres. The DNA double-helix is shown with sticks wrapped inside surface rendering. (B) The final schematic model of the specific interaction of AtWRKY1-C with the W-box, combining all information obtained from the structural and mutagenesis experiments presented in this work.
Figure 5.
EMSA with the consensus W-box binding site and wild-type AtWRKY1-C as well as mutant proteins. (A) Binding activity examination with mutations on β2. The binding reaction contained 5 μg purified proteins and 3 μg DNA duplex. The gel was stained with EB after the electrophoresis. The same amount of purified proteins were run on SDS–PAGE and stained with Coomassie Brilliant Blue to show the equal loading. (B) Binding activity examination with mutations on β3 and β4.
Figure 6.
CD spectra. CD spectra of wild-type AtWRKY1-C and the mutant proteins on β2 strand. The different proteins were distinguished from each other by colors. (A) CD spectra of wild-type AtWRKY1-C and the mutant proteins on β3 and β4 strands.
EMSA with the consensus W-box binding site and wild-type AtWRKY1-C as well as mutant proteins. (A) Binding activity examination with mutations on β2. The binding reaction contained 5 μg purified proteins and 3 μg DNA duplex. The gel was stained with EB after the electrophoresis. The same amount of purified proteins were run on SDS–PAGE and stained with Coomassie Brilliant Blue to show the equal loading. (B) Binding activity examination with mutations on β3 and β4.CD spectra. CD spectra of wild-type AtWRKY1-C and the mutant proteins on β2 strand. The different proteins were distinguished from each other by colors. (A) CD spectra of wild-type AtWRKY1-C and the mutant proteins on β3 and β4 strands.A model of the AtWRKY1-C and W-box interaction. (A) Superimposition of AtWRKY1-C and the overlapping structure of the Drosophila GCM domain using LSQ Fit in the program O. Macromolecular structures are shown by cartoon loops, with the GCM domain colored yellow, AtWRKY1-C blue and zinc ions as orange and magenta spheres. The DNA double-helix is shown with sticks wrapped inside surface rendering. (B) The final schematic model of the specific interaction of AtWRKY1-C with the W-box, combining all information obtained from the structural and mutagenesis experiments presented in this work.As shown in Figure 5, K314A, n class="Mutation">Y315R, G316F, R331A, R327A and R327E have completely abolished the DNA-binding abilities of those mutants, and the CD spectra in Figure 6 have shown those mutants were well-structured proteins. Thus the inabilities of DNA binding must be caused by the point mutations on the correct structural scaffold. The DNA-binding activity of the mutations R313E, K314R, Y315F and Q317K were reduced, whereas the mutations of Q317A, R331K, K340A and Y330F did not show any decrease of the DNA-binding ability, indicating that these mutations did not affect the DNA binding of AtWRKY1-C. These results are in good agreement with the previous work (29), and conclusions from the NMR structure (31).
In the crystal structure as well as in the NMR structure, a kink was formed in the middle of β2 around Gly316. The formation of this kink was considered helpful for the protruding of Gly316 into DNA major groove deeply and attending specific binding. The assumption was confirmed by the mutation G316F, which has lost almost all the DNA-binding ability. The complete loss of DNA binding of the mutants R327E, R327A and R331A are of great interests, since these two positively charged residues localized on the β3 strand that has been shown to be crucial for the DNA binding for the first time by site-directed mutagenesis. Furthermore, these two arginine residues, which are separated by four residues, is consistent with Tateno's model of DNA binding at the major groove with a pair of β strands (53).As a summary, all the mutation results further confirmed that strands β2 and β3 participate in DNA binding. It also suggested that residues Lys314, Gly316, Lys318 on β2, and Arg327, Arg331 on β3 are the key residues for DNA-specific recognition and substitution any one of them resulted in no specific binding. In addition, the distribution of these residues is i, i + 2, i + 4 on one strand and j, j + 4 on the other strand. This distribution is in agreement with Tateno's model (53), which proposed that the binding of DNA to proteins with two β strands should happen on the DNA major groove in a convex manner. Based on such a model, it has been found that normally six consecutive base pairs are involved in specific recognition, and the large-sized residues (not only charged), such as lysines at i and i + 4, as well as arginines at j and j + 4, could bridge the large gap separating the DNA.The structure model of DNA–AtWRKY1-C complex we made in this study has been based on the GCM–DNA structure (44), which is different from the Tateno's DNA-binding model (53) in details. The β-strands of the GCM protein adapted a previously undescribed binding mode to interact with DNA (44). Despite of the absence of sequence and topology similarity, AtWRKY1-C's five-stranded β-sheet overlaps with the corresponding strands of GCM with unexpected high accuracy (Figure 7A), except for the β2–β3 region including the flexible loop, which will be subjected to conformational changes upon DNA binding. We thus refined our DNA–AtWRKY1-C complex according to the mutagenesis results, and the resulting model is quite similar to the GCM–DNA complex in topology. A schematic description of this refined model for AtWRKY1-C and DNA interactions is described as in Figure 7B. This figure summarizes all the residues that might interact with the W-box DNA, providing structural explanations for the function of the WRKY superfamily of transcription factors.
ACCESSION NUMBER
The Protein Data Bank accession number for the AtWRKY1-C crystal structure discussed in tn class="Chemical">his paper is PDB ID 2AYD.
Authors: Matthew B Lohse; Oren S Rosenberg; Jeffery S Cox; Robert M Stroud; Janet S Finer-Moore; Alexander D Johnson Journal: Proc Natl Acad Sci U S A Date: 2014-07-03 Impact factor: 11.205