The DNA-binding mode of archaeal feast/famine-regulatory proteins (FFRPs), i.e. paralogs of the Esherichia coli leucine-responsive regulatory protein (Lrp), was studied. Using the method of systematic evolution of ligands by exponential enrichment (SELEX), optimal DNA duplexes for interacting with TvFL3, FL10, FL11 and Ss-LrpB were identified as TACGA[AAT/ATT]TCGTA, GTTCGA[AAT/ATT]TCGAAC, CCGAAA[AAT/ATT]TTTCGG and TTGCAA[AAT/ATT]TTGCAA, respectively, all fitting into the form abcdeWWWedcba. Here W is A or T, and e.g. a and a are bases complementary to each other. Apparent equilibrium binding constants of the FFRPs and various DNA duplexes were determined, thereby confirming the DNA-binding specificities of the FFRPs. It is likely that these FFRPs recognize DNA in essentially the same way, since their DNA-binding specificities were all explained by the same pattern of relationship between amino-acid positions and base positions to form chemical interactions. As predicted from this relationship, when Gly36 of TvFL3 was replaced by Thr, the b base in the optimal DNA duplex changed from A to T, and, when Thr36 of FL10 was replaced by Ser, the b base changed from T to G/A. DNA-binding characteristics of other archaeal FFRPs, Ptr1, Ptr2, Ss-Lrp and LysM, are also consistent with the relationship.
The DNA-binding mode of archaeal feast/famine-regulatory proteins (FFRPs), i.e. paralogs of the Esherichia coli leucine-responsive regulatory protein (Lrp), was studied. Using the method of systematic evolution of ligands by exponential enrichment (SELEX), optimal DNA duplexes for interacting with TvFL3, FL10, FL11 and Ss-LrpB were identified as TACGA[AAT/ATT]TCGTA, GTTCGA[AAT/ATT]TCGAAC, CCGAAA[AAT/ATT]TTTCGG and TTGCAA[AAT/ATT]TTGCAA, respectively, all fitting into the form abcdeWWWedcba. Here W is A or T, and e.g. a and a are bases complementary to each other. Apparent equilibrium binding constants of the FFRPs and various DNA duplexes were determined, thereby confirming the DNA-binding specificities of the FFRPs. It is likely that these FFRPs recognize DNA in essentially the same way, since their DNA-binding specificities were all explained by the same pattern of relationship between amino-acid positions and base positions to form chemical interactions. As predicted from this relationship, when Gly36 of TvFL3 was replaced by Thr, the b base in the optimal DNA duplex changed from A to T, and, when Thr36 of FL10 was replaced by Ser, the b base changed from T to G/A. DNA-binding characteristics of other archaeal FFRPs, Ptr1, Ptr2, Ss-Lrp and LysM, are also consistent with the relationship.
Homologs of the Escherichia colileucine-responsive regulatory protein (Lrp) are distributed throughout archaea and eubacteria, composing a large family of transcription factors (1–4). Sensing the presence of rich nutrition in many cases by a high concentration of leucine, E. coli regulates ∼100 transcription units by Lrp, thereby changing its overall metabolism (5–7). In order to summarize this global regulation, Calvo and Matthews (5) used the term feast/famine regulation. For this reason, Lrp homologs can be referred to as feast/famine regulatory proteins (FFRPs) (1,3,4). In fact, sensing the lysine concentration, the archaeon Pyrococcus sp. OT3 regulates ∼200 transcription units using the FFRP FL11, thereby shifting its metabolism between the feast and famine modes (8). Lrp interacts not only with leucine, but also with alanine, isoleucine and valine (9). Similarly various archaeal FFRPs interact with amino acids, some with multiple types (10,11). It is believed that the last common ancestor of extant organisms first differentiated to archaea and eubacteria. It is likely that this ancestor regulated its overall metabolism using an FFRP, sensing the nutritional condition by the concentration of an amino acid.We determined the crystal structure of the FL11 dimer in complex with the DNA duplex, TGAAA[AAT/ATT]TTTCA (8). To our knowledge this is the single FFRP–DNA complex so far determined. Although DNA had been added while crystallizing Lrp, the structure of the DNA was not determined (12). In the FL11–DNA crystal complex, six amino-acid residues of each FL11 monomer bound to 5 bp, TGAAA/TTTCA, at each end of the DNA [summarized in Figure 2 of (8)]. This DNA recognition mode is unusual, since the six residues are not positioned on one side of an α-helix or a β-sheet, but four of them cluster in a loop between α-helices (Figure 1). DNA-binding of other archaeal FFRPs was characterized, i.e. LrpA from Pyrococcus furiosus (13,14), Ptr1 and Ptr2 from Methanocaldococcus jannaschii (15), and LysM (16), Ss-Lrp (17) and Ss-LrpB (18,19) from Sulfolobus solfataricus. In these characterizations small numbers of binding sites were analyzed. Many of them were based on the assumption that FFRPs would regulate genes coding themselves, but this can be misleading. Consequently it was not clear whether or not the crystal complex represented a common DNA recognition mode shared by archaeal FFRPs (4,8).
Figure 2.
DNA fragments used in six cycles of selection with TvFL3 (A), FL10 (B), FL11 (C) and Ss-LrpB (D). A set of six mixtures of DNA fragments designed to have NNNNNNNWWWNNNNNNN insertions, 1.93 pmol each in 12 µl of 42 mM Na–phosphate buffer (pH 7.0) containing 125 mM NaCl and 6.7% (w/v) sucrose, was applied to a 12% polyacrylamide gel in the presence (+) and absence (−) of an FFRP, and electrophoresed. After electrophoresis the gel was stained with ethidium bromide. In order to show a general increase in the fraction of fragments interacting with each FFRP following cycles, when an FFRP was present (+), the molar ratio of DNA fragments to the FFRP dimer was kept to 1:2. However, in the real selection, the molar ratio was decreased from 1:1 (first–third cycles), through 1:0.5 (fourth) and 1:0.3 (fifth), to 1:0.2 (sixth) (see ‘Materials and Methods’ section). Also in the real section the DNA concentration was higher, 10 pmol in 12 µl. Arrows indicate bands of fragments bound by the FFRP dimers.
Figure 1.
Amino-acid sequences of the DNA-binding domains of FFRPs. Secondary structural elements, i.e. α-helices 1–3 (α1–3) and β-strand 1 (β1), were identified using the crystal structures of FL11 (8), LrpA (24) and Lrp (12). Six residues of FL11 bound to bases in the crystal complex with DNA (8) and residues at the identical positions in the other FFRPs are indicated by bold characters. In the crystal complex Asp6 and Asp9 interacted with Arg41 thereby directing Arg41 to a DNA phosphate (8). The three residues conserved among the FFRPs are italicized and underlined. The sequence identities of the DNA-binding domains of the archaeal FFRPs, FL11-Ptr2, are 31.3–59.6%, and those between the DNA-binding domain of Lrp and the archaeal domains are 30.8–41.9%. The numbering scheme shown here is used to describe amino-acid positions of FFRPs in text.
Amino-acid sequences of the DNA-binding domains of FFRPs. Secondary structural elements, i.e. α-helices 1–3 (α1–3) and β-strand 1 (β1), were identified using the crystal structures of FL11 (8), LrpA (24) and Lrp (12). Six residues of FL11 bound to bases in the crystal complex with DNA (8) and residues at the identical positions in the other FFRPs are indicated by bold characters. In the crystal complex Asp6 and Asp9 interacted with Arg41 thereby directing Arg41 to a DNA phosphate (8). The three residues conserved among the FFRPs are italicized and underlined. The sequence identities of the DNA-binding domains of the archaeal FFRPs, FL11-Ptr2, are 31.3–59.6%, and those between the DNA-binding domain of Lrp and the archaeal domains are 30.8–41.9%. The numbering scheme shown here is used to describe amino-acid positions of FFRPs in text.DNA fragments used in six cycles of selection with TvFL3 (A), FL10 (B), FL11 (C) and Ss-LrpB (D). A set of six mixtures of DNA fragments designed to have NNNNNNNWWWNNNNNNN insertions, 1.93 pmol each in 12 µl of 42 mM Na–phosphate buffer (pH 7.0) containing 125 mM NaCl and 6.7% (w/v) sucrose, was applied to a 12% polyacrylamide gel in the presence (+) and absence (−) of an FFRP, and electrophoresed. After electrophoresis the gel was stained with ethidium bromide. In order to show a general increase in the fraction of fragments interacting with each FFRP following cycles, when an FFRP was present (+), the molar ratio of DNA fragments to the FFRP dimer was kept to 1:2. However, in the real selection, the molar ratio was decreased from 1:1 (first–third cycles), through 1:0.5 (fourth) and 1:0.3 (fifth), to 1:0.2 (sixth) (see ‘Materials and Methods’ section). Also in the real section the DNA concentration was higher, 10 pmol in 12 µl. Arrows indicate bands of fragments bound by the FFRP dimers.In order to obtain a global insight into DNA recognition by archaeal FFRPs, in this study DNA duplexes optimal for interacting with four archaeal FFRPs and their three variants, where amino-acid residues are replaced, have been identified by selecting ∼100 sites tightly interacting with each protein, using the method of systematic evolution of ligands by exponential enrichment (SELEX) (20). By analyzing the nucleotide sequences of these DNA duplexes in the light of the amino-acid sequences of the FFRPs, a common contact pattern relating amino-acid positions with base positions has been deduced.Archaeal FFRPs used in this study are TvFL3 (genbank ID 13541958) from Thermoplasma volcanium, FL10 (14591370) and FL11 (14591302) from P. OT3, and Ss-LrpB (15898915) from S. solfataricus (Figure 1). These archaea are thermophiles, having optimal growth temperatures of 60°C (T. volcanium), 98°C (P. OT3) and 80°C (S. solfataricus) (21–23). Before this study the DNA-binding specificity of TvFL3 was unknown. Ss-LrpB binds to three sites in the ss-lrpB promoter, whose consensus is TTGYAWWWWWTRCAA, where Y is T or C, R is A or G and W is A or T (18). Interaction of Ss-LrpB with TTGCAA[AAT/ATT]TTGCAA and its variants, where single or 2 bp each were replaced by other types, was studied (19). FL10 is an ortholog of LrpA from P. furiosus, whose crystal structure was determined in the absence of DNA (24). LrpA represses transcription from the lrpA promoter (13,14), and consensus of two LrpA-binding sites in the lrpA promoter is GTCGA[AGA/TCT]TCGAC (25). Binding of FL11 to various promoters was characterized, and consensus of these binding sites is TTGAAA[AAT/ATT]TTTCAA (8). Nevertheless it has turned out that not all of these sequences are optimal for interacting with the FFRPs.
MATERIALS AND METHODS
Protein purification
The tvf l3 gene was cloned into the pET15b vector and introduced into the E. coli strainBL21(DE3) (Novagen) in order to synthesize TvFL3 by adding a His-tag to its N-terminus. To cells growing at 37°C at an OD600 of 0.5–0.8, isopropyl-1-thio-β-D-galactopyranoside (IPTG) was added to 1 mM, and the culture was continued for additional 4 h. Cells were collected by centrifugation at 8000 × g for 10 min at 4°C, suspended in 200 ml of 50 mM NaH2PO4 containing 300 mM NaCl and 20 mM imidazole, and sonicated. After centrifugation at 48400 × g for 30 min at 4°C, the supernatant was incubated at 60°C for 30 min. After another centrifugation at 48400 × g for 20 min at 4°C, the supernatant was filtered through a membrane of pore size 0.45 µm, and subjected to affinity chromatography using Ni–NTA agarose (QIAGEN) and a linear, 0.02–1 M, gradient of imidazole in 50 mM NaH2PO4 containing 300 mM NaCl. Proteins eluting at 40–100 mM imidazole were subjected to gel filtration using Sephacryl S-300 (GE Healthcare) equilibrated with 50 mM NaH2PO4 containing 300 mM NaCl.The tvfl3 gene was modified using the method of Higuchi et al. (26). A pair of DNA duplexes was amplified by PCR (27), which covered the gene from different ends to the position to be modified. Of two primers used for PCR of each duplex, that corresponding to the position to be modified was designed so that GGA coding Gly36 was replaced by GCA coding Ala or ACA coding Thr in the duplex amplified. In this paper amino-acid positions are described using the positions of corresponding residues in FL11 (Figure 1). The two duplexes were fused by PCR using two of the four primers corresponding to the two ends of the gene. TvFL3[G36A] and TvFL3[G36T] were synthesized and purified in the same way as the original TvFL3 except that gel filtration using Sephacryl S-300 was not performed.The fl10 gene was cloned into the pET21a vector and introduced into the E. coli strain Rosetta 2(DE3) (Novagen). After induction using IPTG, cells were suspended in 200 ml of 100 mM Na–phosphate buffer (pH 7.0) containing 150 mM NaCl and 5% (v/v) glycerol, and sonicated. After centrifugation at 48400 × g for 30 min at 4°C, the supernatant was incubated at 75°C for 30 min. After another centrifugation at 48400 × g for 20 min at 4°C, the supernatant was dialyzed against 50 mM Na–phosphate buffer (pH 7.0), and subjected to anion-exchange chromatography using Q-sepharose (GE Healthcare) and a linear, 0–1 M, gradient of NaCl in the same buffer. Proteins eluting at 400–650 mM NaCl were subjected to gel filtration using Sephacryl S-300 equilibrated with 50 mM Na–phosphate buffer (pH 7.0) containing 150 mM NaCl. The f l10 gene was modified using the method of Higuchi et al. (26) so that ACA coding Thr36 was replaced by TCA coding Ser. FL10[T36S] was synthesized and purified in the same way as the original FL10.FL11 was synthesized and purified as has been previously described (11).The ss-lrpB gene was cloned into the pET28a vector and introduced into the E. coli strain Rosetta 2(DE3) in order to synthesize Ss-LrpB by adding a His-tag to its N-terminus. After induction using IPTG, cells were suspended to 120 ml of 50 mM K–phosphate buffer (pH 7.0) containing 300 mM NaCl and 20 mM imidazole, and sonicated. After centrifugation at 11900 × g for 20 min at 4°C, the supernatant was subjected to affinity chromatography using Ni–NTA superflow (QIAGEN) and a linear, 0.2–1 M, gradient of imidazole in 50 mM K–phosphate buffer (pH 7.0) containing 300 mM NaCl. Proteins eluting at 250–550 mM imidazole were concentrated, and applied to gel filtration using Sephacryl S-300 equilibrated with 50 mM K–phosphate buffer (pH 7.0) containing 300 mM NaCl.
Selection of DNA fragments tightly interacting with FFRPs
A mixture of single-stranded DNA fragments of 79 bases in the form 5′-GAAATTAATACGACT CACTATGGGGAGAGAGANNNNNNNWWWNNNNNNNAGAGA GTCGCTAGTTATTGCTCAGCGGTGG-3′ was synthesized, where NNNNNNNWWWNNNNNNN was inserted, with N being A, T, G or C, and W being A or T. Using PCR (27) with primers, CCACCGCTGAGCAATAACTAGCGACTCTCT and GAAATTAATACGACTCACTATGGGGAGAGAGA, DNA duplexes were synthesized and amplified.The DNA duplexes, 10.0 pmol, were mixed with an FFRP at room temperature in 42 mM Na-phosphate buffer (pH 7.0) containing 125 mM NaCl and 6.7% (w/v) sucrose, 12 µl. Altogether 1.07 × 109 different sequences fit into the form NNNNNNNWWWNNNNNNN, and with 10.0 pmol on average 5626 molecules were expected for each sequence. The quantities of the FFRP dimer used were 10.0 pmol in the first to third cycles, 5.0 pmol in the fourth cycle, 3.0 pmol in the fifth cycle and 2.0 pmol in the sixth cycle, yielding DNA to FFRP dimer ratios of 1:1, 1:0.5, 1:0.3 and 1:0.2, respectively. After being kept for 10 min at room temperature, the DNA–FFRP solution was submitted to electrophoresis using a 12% polyacrylamide gel and 90 mM Tris–borate buffer (pH 8.3) containing 1 mM ethylenediaminetetraacetic acid (EDTA). The gel was stained with ethidium bromide, and the part containing the band of duplexes bound by the FFRP dimer was cut out. When the band was not visually identified, the part corresponding to markers of 600–800 bp was cut out. By overnight incubation of gel pieces at 37°C, DNA fragments were eluted into 800 µl of solution A, i.e. 500 mM ammonium acetate containing 10 mM Mg(CH3COO)2, 1 mM EDTA and 0.1% sodium dodecyl sulfate (SDS). After proteins were removed using phenol–chloroform, DNA fragments were precipitated by ethanol, and dissolved into water, 20 µl.Using ExTaq polymerase (TaKaRa), the DNA fragments were amplified by two steps of PCR. At the first step, 15 cycles of PCR were carried out using the DNA solution, 10 µl, mixed into a reaction mixture, 50 µl. After electrophoresis using a 12% polyacrylamide gel and 40 mM Tris–acetate buffer (pH 8.1) containing 1 mM EDTA, DNA fragments were eluted into solution A, 400 µl, by overnight incubation at 37°C. After proteins were removed using phenol–chloroform, DNA fragments were precipitated by ethanol, and dissolved into water, 20 µl. At the second step, 25 cycles of PCR were carried out using the DNA solution, 8 µl, mixed into a reaction mixture, 800 µl, and divided into 16 tubes of 50 µl each. After electrophoresis using a 12% polyacrylamide gel and 40 mM Tris–acetate buffer (pH 8.1) containing 1 mM EDTA, DNA fragments were eluted into solution A, 800 µl, by overnight incubation at 37°C. After proteins were removed using phenol–chloroform, DNA fragments were precipitated by ethanol, and dissolved into 50 mM Na-phosphate buffer (pH 7.0) containing 150 mM NaCl, 20 µl. The DNA solution was used in the subsequent cycle of selection. After the sixth cycle, DNA fragments identified as interacting with the FFRP were cloned into the HincII site of the pUC118 vector using Mighty Cloning Kit (TaKaRa), and sequenced using PRISM377 (ABI).In order to confirm identities of the central 3 bp in optimal DNA duplexes for interacting with the FFRPs, mixtures of single-stranded DNA fragments, where GCTACGANNNTCGTACG, CGTTCGANNNTCGAACC and CTTGCAANNNTTGCAAC, respectively, were inserted between GAAATTAATACGACTCACTATGGGGAGAGAGA and TGTGTGTCGCTAGTTATTGCTCAGCGGTGG, were synthesized for selection using TvFL3, FL10 and Ss-LrpB, respectively. Duplexes in each type were synthesized and amplified by PCR, and mixed with the FFRP at room temperature. In experiments using TvFL3 and FL10 the DNA to FFRP dimer ratio was kept to 1:0.05 (first cycle) or 1:0.033 (second to fourth cycles). In another experiment using Ss-LrpB the ratio was kept to 1:0.3 (first) or 1:0.2 (second to fourth). DNA duplexes identified as interacting with each FFRP were amplified by 30 cycles of PCR, with each cycle carried out in the same way as that in the second step of PCR in the other procedure, and used in the subsequent cycle of selection. After the fourth cycle, DNA fragments interacting with each FFRP were cloned into the pUC118 vector and sequenced.
Sequence analysis of DNA fragments tightly interacting with FFRPs
The nucleotide sequences of DNA fragments, identified as tightly interacting with each FFRP, were analyzed. Using a sliding window, each fragment was scanned from the last GA of GAGAGAGA through the insertion to the first AG of AGAGAGTC against each reference sequence, thereby searching for the highest number of matches. Reference sequences having the highest average number of matches with insertions were identified from all 13-bp sequences, and also from 1024 sequences in the form abcdeWWW. Here, e.g. a and are bases complementary to each other, and differences in combining 3 W bases were ignored.For each insertion, the 13-bp site best matching the consensus abcdeWWW sequence was identified as a binding site of the FFRP. Bases corresponding to at binding sites were converted to the complementary bases, and jointly analyzed with bases originally corresponding to abcde, for calculating the frequencies of A, T, G and C at these positions. Base frequencies at −1 and −2 upstream of abcde as well as at +1 downstream of abcde were also calculated. The bases whose frequencies were 45% or higher at these positions were identified as optimal for interacting with the FFRP. In combination with the most frequent base combination between abcde and , an optimal DNA duplex for interacting with the FFRP was identified.
Determination of apparent equilibrium FFRP–DNA binding constants
Seven DNA duplexes of 79 bp each were synthesized. In each duplex along one of the two strands between GAAATTAATACGACTCACTATGGGGAGAGAGA at the 5′-end and TCGCTAGTTATTGCTCAGCGGTGG at the 3′-end inserted was CGTTCGA[AAT]TCGAACCTGTGTG (referred to as the TTCGA duplex), CGTACGA[AAT]TCGTACCTGTGTG (the TACGA duplex), CGGTCGA[AAT]TCGACCCTGTGTG (the GTCGA duplex), CGGACGA[AAT]TCGTCCCTGTGTG (the GACGA duplex), CGCCGAAA[ATT]TTTCGGAGAGAG (the CGAAA duplex), CGCTGAAA[ATT]TTTCAGAGAGAG (the TGAAA duplex) or CTTGCAA[AAT]TTGCAACTGTGTG (the TGCAA duplex). One of the duplexes and an FFRP were mixed into 42 mM Na-phosphate buffer (pH 7.0) containing 125 mM NaCl and 6.7% (v/w) sucrose, 12 µl. While the concentration of the duplex was fixed at 2.25 × 10−8 M, that of the FFRP dimer was changed in six to seven steps from 0M to a value between 1.8 × 10−7 M and 14.4 × 10−7 M, thereby creating a series. After being kept for 10 min at 22°C, the series was submitted to electrophoresis, using a 12% polyacrylamide gel and 90 mM Tris–borate buffer (pH 8.3) containing 1 mM EDTA. To make a calibration curve, standards, 12 µl each, containing the same DNA duplex with concentrations of 0.25 × 10−8 M, 0.50 × 10−8 M, 0.75 × 10−8 M, 1.00 × 10−8 M, 1.25 × 10−8 M, 1.50 × 10−8 M, 1.75 × 10−8 M, 2.00 × 10−8 M and 2.25 × 10−8 M, were applied to each gel.After electrophoresis each gel was stained with a DNA-binding fluorescent dye, SYBR® Gold (invitrogen), following a procedure recommended by the manufacturer. Using an image analyzer, PharosFXTM (BIO–RAD), exciting at 488 nm and filtering at 530 nm, the integrated intensity of the DNA band unbound by the FFRP in each lane was quantified according to the calibration curve. The intensity of the bound band in each lane was not directly measured, but identified as the difference between the unbound bands in the presence and absence of the FFRP due to smearing of the bound band [see (19) for similar identification]. Taking Y as the intensity of the bound band, and X as the total concentration of the FFRP, the curve Y = aXb/(c + Xb) was fitted non-linearly, using the KaleidaGraph software version 4.0 (HULINKS). Using the curve, the total concentration of the FFRP at which the bound band intensity became 50% of the intensity of the unbound band in the absence of the FFRP i.e. that of the total DNA, was obtained; here referred to as (FFRP50). The apparent equilibrium binding constant (KB) was calculated, using the equation KB = 1/[(FFRP50) − 0.5(total DNA)].
RESULTS
Optimal DNA duplexes for interacting with TvFL3, FL10, FL11 and Ss-LrpB
After six cycles of selection from a pool of DNA duplexes, that had insertions in the form NNNNNNNWWWNNNNNNN (Figure 2), 53–84 fragments tightly interacting with each of TvFL3, FL10, FL11 and Ss-LrpB were sequenced (Table 1). Of these, 96.2–100% had 17 bp between GAGAGAGA and AGAGAGTC, where WWW combinations were positioned at the centers. However, the remaining 0–3.8% had insertions of 16 or 18 bp, or 17 bp with combinations of 2 Ws and 1 S, i.e. G or C, at the centers. It is believed that these deviations from the original design were created by incorrect replications while repeating PCR.
Table 1.
Sequences best matching insertions in fragments tightly interacting with FFRPs, identified from all 13 bp sequences or 1024 sequences in the abcdeWWW form, where W is A or T, and, e.g. a and are bases complementary to each other
FFRP
No. fragments
Best 13 bp sequence
Scorea
Best abcdeWWWedcba
Scorea
TvFL3
84
TACGA[AAT/ATT]TCGTA
10.65
TACGAWWWTCGTA
11.05
TvFL3[G36A]
73
GTTCGA[AAT]TCGT/
10.70
TACGAWWWTCGTA
10.70
ACGA[ATT]TCGAAC
TvFL3[G36T]
80
GTTCGA[AAT]TCGT/
10.36
TTCGAWWWTCGAA
10.10
ACGA[ATT]TCGAAC
FL10
79
CGA[AAT]TCGAACC/
10.65
TTCGAWWWTCGAA
10.98
GGTTCGA[ATT]TCG
FL10[T36S]
126
GTGCGA[ATA]TCGA/
10.55
TGCGAWWWTCGCA
10.71
TGCA[TAT]TCGCAC
FL11
73
CGAAA[AAT/ATT]TTTCG
11.48
CGAAAWWWTTTCG
11.73
Ss-LrpB
53
TGCAA[AAT/ATT]TTGCA
9.25
TGCAAWWWTTGCA
9.72
aAn average of the highest number of matches found between the sequence and DNA fragments in regions from the last GA of GAGAGAGA through insertions to the first AG of AGAGAGTC.
Sequences best matching insertions in fragments tightly interacting with FFRPs, identified from all 13 bp sequences or 1024 sequences in the abcdeWWW form, where W is A or T, and, e.g. a and are bases complementary to each otheraAn average of the highest number of matches found between the sequence and DNA fragments in regions from the last GA of GAGAGAGA through insertions to the first AG of AGAGAGTC.From all 13-bp sequences, those best matching sets of insertions were identified as TACGA[AAT/ATT]TCGTA for TvFL3, CGA[AAT]TCGAACC/GGTTCGA[ATT]TCG for FL10, CGAAA[AAT/ATT]TTTCG for FL11 and TGCAA[AAT/ATT]TTGCA for Ss-LrpB (Table 1). The sequences identified for TvFL3, FL11 and Ss-LrpB fitted into the form abcdeWWW, where, e.g. a and are bases complementary to each other. That identified for FL10 was closely related with the form.From 1024 abcdeWWW sequences, those best matching insertions were identified as TACGAWWWTCGTA for TvFL3, TTCGAWWWTCGAA for FL10, CGAAAWWWTTTCG for FL11, and TGCAAWWWTTGCA for Ss-LrpB (Table 1). In each insertion the 13 bp best matching the identified abcdeWWW sequence constituted a binding site of the FFRP. The frequencies of the bases most common at a-e in each set of binding sites were 52.7–100% (Figure 3). At the −1 position immediately upstream of abcde the frequencies of G at FL10-binding sites, C at FL11-binding site, and T at Ss-LrpB-binding sites were higher than 45% (Figure 3).
Figure 3.
Frequencies of the four bases at positions of sites identified as tightly interacting with FFRPs. Statistics of 84 sites best matching TACGAWWWTCGTA identified using DNA fragments having NNNNNNNWWWNNNNNNN insertions (A), those of 73 sites best matching TACGAWWWTCGTA (B), those of 80 sites best matching TTCGAWWWTCGAA (C), those of 79 sites best matching TTCGAWWWTCGAA (D), those of 126 sites best matching TGCGAWWWTCGCA (E), those of 73 sites best matching CGAAAWWWTTTCG (F), and those of 53 sites best matching TGCAAWWWTTGCA (G). The frequency 45% is indicated by horizontal lines.
Frequencies of the four bases at positions of sites identified as tightly interacting with FFRPs. Statistics of 84 sites best matching TACGAWWWTCGTA identified using DNA fragments having NNNNNNNWWWNNNNNNN insertions (A), those of 73 sites best matching TACGAWWWTCGTA (B), those of 80 sites best matching TTCGAWWWTCGAA (C), those of 79 sites best matching TTCGAWWWTCGAA (D), those of 126 sites best matching TGCGAWWWTCGCA (E), those of 73 sites best matching CGAAAWWWTTTCG (F), and those of 53 sites best matching TGCAAWWWTTGCA (G). The frequency 45% is indicated by horizontal lines.The most frequent combination between abcde and was always AAT/ATT (33.3–56.2%), which was followed by AAA/TTT (14.0–27.4%) (Table 2, WWW columns). At GCTACGANNNTCGTACG sites identified as tightly interacting with TvFL3, CGTTCGANNNTCGAACC sites identified as tightly interacting with FL10, and CTTGCAANNNTTGCAAC sites identified as tightly interacting with Ss-LrpB, the frequencies of AAT/ATT at the centers were even higher, 50.0–88.0% (Table 2, NNN columns).
Table 2.
Frequencies of base combinations between abcde and at binding sites
TvFL3
FL10
FL11
Ss-LrpB
WWWa
NNNb
WWWa
NNNb
WWWa
WWWa
NNNb
No. of binding sites
84
50
79
44
73
53
36
AAT/ATT
33.3
88.0
34.2
68.2
56.2
36.8
50.0
AAA/TTT
17.9
6.0
13.9
15.9
27.4
22.6
27.8
ATA/TAT
9.5
0
13.9
2.3
0
13.2
8.3
TAA/TTA
2.4
0
5.1
6.8
0
1.9
2.8
2W1Sc
36.9
6.0
32.9
6.8
16.4
23.6
8.3
1W2Sd
0
0
0
0
0
1.9
2.8
3Se
0
0
0
0
0
0
0
Values are given in percentages unless otherwise mentioned.
aStatistics of binding sites identified using DNA fragments having insertions in the NNNNNNNWWWNNNNNNN form.
bStatistics of binding sites in the form, GCTACGANNNTCGTACG (TvFL3), CGTTCGANNNTCGAACC (FL10) or CTTGCAANNNTTGCAAC (Ss-LrpB).
cThe combinations where 2 Ws and 1 S are combined along each strand.
dThe combinations where 1 W and 2 Ss are combined along each strand.
eThe combinations where 3 Ws are combined along each strand.
Frequencies of base combinations between abcde and at binding sitesValues are given in percentages unless otherwise mentioned.aStatistics of binding sites identified using DNA fragments having insertions in the NNNNNNNWWWNNNNNNN form.bStatistics of binding sites in the form, GCTACGANNNTCGTACG (TvFL3), CGTTCGANNNTCGAACC (FL10) or CTTGCAANNNTTGCAAC (Ss-LrpB).cThe combinations where 2 Ws and 1 S are combined along each strand.dThe combinations where 1 W and 2 Ss are combined along each strand.eThe combinations where 3 Ws are combined along each strand.When selection was carried out using FL11 and DNA fragments having insertions in the NNNNNNNNNNNNNNNNN form (Yokoyama, K. et al., unpublished results), the frequencies of C at −1, C at a, G at b, A at c, A at d, and A at e at binding sites were 70.0–100%, and most frequent between abcde and was AAT/ATT (77.8%), which was followed by AAA/TTT (18.9%).In this way optimal DNA duplexes for interacting with TvFL3, FL10, FL11 and Ss-LrpB were identified as TACGA[AAT/ATT]TCGTA, GTTCGA[AAT/ATT]TCGAAC, CCGAAA[AAT/ATT]TTTCGG and TTGCAA[AAT/ATT]TTGCAA, respectively.
Interaction between FFRPs and 79-bp DNA duplexes
The apparent equilibrium binding (K) constant of Ss-LrpB and a 79-bp duplex having TTGCAA[AAT/ATT]TTGCAA, i.e. the TGCAA duplex, was determined to 3.50 × 107 M−1 (Figure 4A and B, Table 3). This value is similar to 1.1–3.0 × 107 M−1, reported by another group for apparent K of Ss-LrpB and 45–150-bp duplexes having TTGCAA[AAT/ATT]TTGCAA (19).
Figure 4.
Interaction between FFRPs and 79-bp DNA duplexes. (A) A gel stained with a DNA-binding fluorescent dye, SYBR® Gold, after electrophoresis of the TGCAA duplex having TTGCAA[AAT/ATT]TTGCAA in the presence and absence of Ss-LrpB. On top the concentrations of Ss-LrpB are indicated. Bands of the duplex bound or unbound by Ss-LrpB are pointed to with arrows. (B) The binding profile of Ss-LrpB to the TGCAA duplex, obtained by analyzing the gel in A. (C) The binding profiles of FL10 to the TTCGA duplex having GTTCGA[AAT/ATT]TCGAAC, the TACGA duplex having GTACGA[AAT/ATT]TCGTAC, the GTCGA duplex having GGTCGA[AAT/ATT]TCGACC and the GACGA duplex having GGACGA[AAT/ATT]TCGTCC. (D) The binding profiles of TvFL3 to the TACGA, GACGA, TTCGA and GTCGA duplexes. (E) A gel stained with SYBR® Gold, after electrophoresis of the CGAAA duplex having CCGAAA[AAT/ATT]TTTCGG in the presence and absence of FL11. On top the concentrations of FL11 are indicated. Bands of the duplex bound or unbound by FL11 are pointed to with arrows. (F) The binding profile of FL11 to the CGAAA duplex, obtained by analyzing the gel in E, and that of FL11 to the TGAAA duplex having CTGAAA[AAT/ATT]TTTCAG, obtained by analyzing another gel.
Table 3.
Apparent and relative KB constants of FFRPs and 79-bp DNA duplexes
FFRP
DNA
Apparent KB (M−1)
Relative KBa
TvFL3
TACGAb
1.28 × 107
1
TvFL3
GACGAb
0.90 × 107
0.70
TvFL3
TTCGAb
0.82 × 107
0.64
TvFL3
GTCGAb
0.73 × 107
0.57
FL10
TTCGAb
1.04 × 107
1
FL10
TACGAb
0.66 × 107
0.63
FL10
GTCGAb
0.33 × 107
0.32
FL10
GACGAb
0.18 × 107
0.17
FL11
CGAAAb
3.45 × 107
1
FL11
TGAAAb
1.54 × 107
0.45
Ss-LrpB
TGCAAb
3.50 × 107
1
aThe value relative to the highest apparent KB obtained with each FFRP.
bThe 79-bp duplex that has GTACGA[AAT/ATT]TCGTAC (TACGA), GGACGA[AAT/ATT]TCGTCC (GACGA), GTTCGA[AAT/ATT]TCGAAC (TTCGA), GGTCGA[AAT/ATT]TCGACC (GTCGA), CCGAAA[AAT/ATT]TTTCGG (CGAAA), CTGAAA[AAT/ATT]TTTCAG (TGAAA) or TTGCAA[AAT/ATT]TTGCAA (TGCAA).
Interaction between FFRPs and 79-bp DNA duplexes. (A) A gel stained with a DNA-binding fluorescent dye, SYBR® Gold, after electrophoresis of the TGCAA duplex having TTGCAA[AAT/ATT]TTGCAA in the presence and absence of Ss-LrpB. On top the concentrations of Ss-LrpB are indicated. Bands of the duplex bound or unbound by Ss-LrpB are pointed to with arrows. (B) The binding profile of Ss-LrpB to the TGCAA duplex, obtained by analyzing the gel in A. (C) The binding profiles of FL10 to the TTCGA duplex having GTTCGA[AAT/ATT]TCGAAC, the TACGA duplex having GTACGA[AAT/ATT]TCGTAC, the GTCGA duplex having GGTCGA[AAT/ATT]TCGACC and the GACGA duplex having GGACGA[AAT/ATT]TCGTCC. (D) The binding profiles of TvFL3 to the TACGA, GACGA, TTCGA and GTCGA duplexes. (E) A gel stained with SYBR® Gold, after electrophoresis of the CGAAA duplex having CCGAAA[AAT/ATT]TTTCGG in the presence and absence of FL11. On top the concentrations of FL11 are indicated. Bands of the duplex bound or unbound by FL11 are pointed to with arrows. (F) The binding profile of FL11 to the CGAAA duplex, obtained by analyzing the gel in E, and that of FL11 to the TGAAA duplex having CTGAAA[AAT/ATT]TTTCAG, obtained by analyzing another gel.Apparent and relative KB constants of FFRPs and 79-bp DNA duplexesaThe value relative to the highest apparent KB obtained with each FFRP.bThe 79-bp duplex that has GTACGA[AAT/ATT]TCGTAC (TACGA), GGACGA[AAT/ATT]TCGTCC (GACGA), GTTCGA[AAT/ATT]TCGAAC (TTCGA), GGTCGA[AAT/ATT]TCGACC (GTCGA), CCGAAA[AAT/ATT]TTTCGG (CGAAA), CTGAAA[AAT/ATT]TTTCAG (TGAAA) or TTGCAA[AAT/ATT]TTGCAA (TGCAA).Among 79-bp duplexes used, FL10 showed the tightest interaction with the TTCGA duplex having GTTCGA[AAT/ATT]TCGAAC (apparent KB of 1.04 × 107 M−1), which was followed by the TACGA duplex having GTACGA[AAT/ATT]TCGTAC (apparent KB of 0.66 × 107 M−1), the GTCGA duplex having GGTCGA[AAT/ATT]TCGACC (apparent KB of 0.33 × 107 M−1), and the GACGA duplex having GGACGA[AAT/ATT]TCGTCC (apparent KB of 0.18 × 107 M−1), in this order (Figure 4C, Table 3). This observation is consistent with the binding specificity of FL10, identified using SELEX procedures. Also it suggests that the presence of T at position a is more important than that of T at position b for interaction with FL10, which is consistent with a higher frequency of T at a in the statistics (Figure 3D).TvFL3 showed the tightest interaction with the TACGA duplex (apparent KB of 1.28 × 107 M−1) (Figure 4D, Table 3). Binding to the GACGA duplex (apparent KB of 0.90 × 107 M−1) or the TTCGA duplex (apparent KB of 0.82 × 107 M−1) was weaker, and that to the GTCGA duplex was the weakest (apparent KB of 0.73 × 107 M−1). Although apparent KB of FL10 and the four duplexes varied between 100% and 17% of the highest value, those of TvFL3 varied between 100% and 57% of the highest value (Table 3), showing smaller differences.Apparent KB of FL11 and the CGAAA duplex having CCGAAA[AAT/ATT]TTTCGG was 3.45 × 107 M−1 (Figure 4E and F, Table 3). That of FL11 and the TGAAA duplex having CTGAAA[AAT/ATT]TTTCAG was 1.54 × 107 M−1, i.e. 45% of the value obtained with the CGAAA duplex (Figure 4F, Table 3).
Optimal DNA duplexes for interacting with TvFL3[G36A], TvFL3[G36T] and FL10[T36S]
SELEX experiments were carried out using variants of TvFL3, where Gly36 was replaced by Ala, i.e. TvFL3[G36A], and Thr, i.e. TvFL3[G36T], respectively, and a variant of FL10, i.e. FL10[T36S]. In this paper amino-acid positions are described using the positions of corresponding residues in FL11 (Figure 1). After six cycles of selection from a pool of DNA duplexes, that had insertions in the NNNNNNNWWWNNNNNNN form, 73–126 fragments tightly interacting with each variant were sequenced (Table 1).For TvFL3[G36A] the abcdeWWW sequence best matching insertions was identified as TACGAWWWTCGTA, which was followed by the second best, GACGAWWWTCGTC and the third best, TTCGAWWWTCGAA. When statistics were made using the sites best matching TACGAWWWTCGTA, the frequency of A at b, 56.9%, was smaller than that obtained with the original TvFL3, 70.2% (Figure 3B). The frequency of T at b increased to 25.3% from 3.6%, obtained with the original TvFL3. At a the frequency of T decreased to 56.2% from 67.3%, and that of G increased to 30.1% from 24.4%. At −1 the frequency of G increased to 57.5% from 41.7%.For TvFL3[G36T] the abcdeWWW sequence best matching insertions was identified as TTCGAWWWTCGAA, which was followed by the second best, GTCGAWWWTCGAC, and the third best, TACGAWWWTCGTA. When statistics were made using the sites best matching TTCGAWWWTCGAA, at b, the frequency of T, 45.0%, was higher than that of A, 30.6% (Figure 3C). At a the frequency of T further decreased to 45.6%, and that of G further increased to 41.3%. At −1 the frequency of G further increased to 61.3%.For FL10[T36S] the abcdeWWW sequence best matching insertions was identified as TGCGAWWWTCGCA, which was followed by the second best, TACGAWWWTCGTA, and the third best, TTCGAWWWTCGAA. When statistics were made using the sites best matching TGCGAWWWTCGCA, at b the frequency of T decreased to 17.4% from 60.8%, obtained with the original FL10, and that of G increased to 42.6% from 20.3% (Figure 3E). Also the frequency of A at b increased to 31.2% from 10.1%. The frequency of T at a decreased to 69.3% from 83.5%, and that of G at d decreased to 83.5% from 98.7%.
DISCUSSION
Recognition of CCGAAA[AAT/ATT]TTTCGG by FL11
Using the SELEX method an optimal DNA duplex for interacting with FL11 was identified as CCGAAA[AAT/ATT]TTTCGG. While, the duplex co-crystallized with FL11 was TGTGAAA[AAT]TTTCACT/AGTGAAA[ATT]TTTCACA, where central 13 bp correspond to consensus of four tight binding sites of FL11 in the fl11 and lysine synthesis promoters (8). Compared with the optimal duplex, C is not positioned at −1, and at position a C is replaced by T.In the crystal complex Thr37 and Ala34 of FL11 formed hydrophobic interactions with the methyl group of T at a, i.e. T(a), on its major groove side (Figure 5A). Similar hydrophobic interactions seem to be possible also with C5H and C6H of C at a (Figure 5B). It is difficult to predict which of C and T will be the optimal partner for a hydrophobic residue, since it depends on small differences in the binding geometry (28,29). In fact, as will be discussed in the next subsection, it is likely that Thr37 of Ss-LrpB and Thr37 of TvFL3 both bind to T(a). Binding of FL11 at the four sites in the fl11 and lysine synthesis promoters is essential for transcriptional regulation of the units (8,11). At a of the four sites, T is most frequent (50.0%), and only 1 C is found (12.5%). With the A/T content of the P. OT3 genome being 58.1% (30), in this genome the number of TG steps, 180702, is larger than that of CG steps, 92316, and the CCGAAAWWWTTTCGG sequence is not present. The consensus sequence of biologically functioning FL11-binding sites can be different from the tightest binding sequence in order to produce binding constants appropriate for regulations.
Figure 5.
Chemical interactions formed (A and C) or predicted to be formed (B and D–F) between amino-acid side-chains of FFRPs and DNA bases on their major groove sides. A and B represent hydrophobic interactions. In C–F arrows indicate the donor to acceptor directions of hydrogen bonds. In the crystal complex (8) Thr37 of FL11 bound to T(a) (A), and the residue is predicted to bind to C(a) in the optimal duplex for interacting with FL11 (B). The pair of NH2 groups or an NH2 and the NH of Arg39 can be used for donating two hydrogen bonds to G(), and D shows an example.
Chemical interactions formed (A and C) or predicted to be formed (B and D–F) between amino-acid side-chains of FFRPs and DNA bases on their major groove sides. A and B represent hydrophobic interactions. In C–F arrows indicate the donor to acceptor directions of hydrogen bonds. In the crystal complex (8) Thr37 of FL11 bound to T(a) (A), and the residue is predicted to bind to C(a) in the optimal duplex for interacting with FL11 (B). The pair of NH2 groups or an NH2 and the NH of Arg39 can be used for donating two hydrogen bonds to G(), and D shows an example.In P. OT3 intracellular concentrations of K+ and Mg++ are as high as 431 mM and 63 mM, respectively (Kawashima-Ohya, Y. et al., in preparation). Depending on the concentrations of K+ and Mg++, KB of the TATA-binding protein (TBP) and the TATA-box DNA changes considerably, although it generally increases with increasing temperature (Kawashima-Ohya, Y. et al., in preparation). At high temperature with high concentrations of K+ and Mg++, KB of FL11 and the TGAAA duplex relative to that of FL11 and the CGAAA duplex might be different from that observed at 22°C with 125 mM NaCl (Table 3). During electrophoresis, which is essential for the methods used in this study, it is difficult to maintain high metal concentrations. A different method is needed in order to characterize effects of high metal concentrations on DNA recognition by archaeal FFRPs.In the optimal duplex for interacting with FL11, position −1 is fixed to C. Also positions −1 of the optimal duplexes for interacting with FL10 and Ss-LrpB are fixed to G and T, respectively. In the crystal complex bases at −1 or the complementary bases did not interact with any amino-acid residue of FL11 (8), and so mechanisms of fixing −1 to these bases remain unknown.
A common contact pattern relating amino-acid and base positions
Optimal DNA duplexes for interacting with TvFL3, FL10, FL11 and Ss-LrpB were identified as TACGA[AAT/ATT]TCGTA, GTTCGA[AAT/ATT]TCGAAC, CCGAAA[AAT/ATT]TTTCGG and TTGCAA[AAT/ATT]TTGCAA, respectively. Although the AAT/ATT combination is conserved at the centers, it is likely that this combination does not directly interact with amino-acid residues in the FFRPs. In the crystal complex 3 A:T basepairs at the center of the duplex did not interact with FL11 (8). Any combinations of A:T basepairs were tolerated there, but when all three were replaced by G:C, interaction with FL11 was severely weakened. An A:T basepair has two hydrogen bonds only, and it is less planar than a G:C basepair, which has three hydrogen bonds. The central A:T basepairs are important for bending the duplex around the FL11 dimer by propeller twisting (8).It is expected that outside AAT/ATT similarities of, and differences between the nucleotide sequences of the optimal duplexes reflect similarities of, and differences between amino-acid residues forming chemical interactions with DNA bases. Some amino-acid side-chains form specific interactions with DNA bases (29). For example, Arg can donate two hydrogen bonds to the G base on the major groove side (Figure 5D), and Glu can accept a hydrogen bond from the C or A base on the major groove side (Figure 5E).In the crystal complex (8) six amino-acid residues of each FL11 monomer, i.e. Leu24 in α-helix 2, Ala34-Thr37 in the loop connecting α-helices 2 and 3, and His39 in α-helix 3 (Figure 1), bound to 5 bp, TGAAA/TTTCA, at each end of the duplex in its major groove (Figure 6). Of the six residues, Leu24, Ser36 and Thr37 are also present in Ss-LrpB (Figure 1). In the crystal complex, Thr37 formed a hydrophobic interaction with T(a) (Figure 5A), and Leu24 formed those with T() and T(). Ser36 donated a bifurcated hydrogen bond to G(b) (Figure 5C). All four bases are retained in the optimal duplex for interacting with Ss-LrpB (Figure 6). In the crystal complex T() was bound by His39. While T() is replaced by G in the optimal duplex for interacting with Ss-LrpB, in Ss-LrpB His39 is replaced by Arg, a specific partner of G. Glu35 of FL11 is changed to Ile in Ss-LrpB. In the crystal complex, Glu35 of FL11 formed hydrogen bonds with A(c), A(d), T() and T(), indirectly through water molecules (8). Of the corresponding positions in the optimal duplex, and are occupied by T, the specific partner of Ile (Figure 6).
Figure 6.
Chemical contacts predicted to be formed from residues of FFRPs to bases in optimal DNA duplexes for interacting with the FFRPs in comparison with those formed in the FL11–DNA crystal complex. In the crystal complex Glu35 of FL11 formed hydrogen bonds with A(c), A(d), T() and T(), indirectly through water molecules. Here the bond with T() only is shown, with the water molecule indicated by W. Among bases possibly bound by residues 35 of other FFRPs, bases at only are indicated, since only the positions in the optimal duplexes are always occupied by specific partners of residues 35 in the FFRPs. Amino-acid residues the same as or similar to those of FL11, and bases predicted to be bound by these residues are highlighted by yellow backgrounds. Arg39 of the FFRPs, and G bases at in the optimal duplexes are highlighted by light-blue backgrounds. Contacts relating residues at the same positions with bases at the same positions are indicated by lines in the same colors. Chemical interactions predicted to be formed from residues 24 to the and bases, and those predicted to be formed alternatively to the e bases are differentiated by line types.
Chemical contacts predicted to be formed from residues of FFRPs to bases in optimal DNA duplexes for interacting with the FFRPs in comparison with those formed in the FL11–DNA crystal complex. In the crystal complex Glu35 of FL11 formed hydrogen bonds with A(c), A(d), T() and T(), indirectly through water molecules. Here the bond with T() only is shown, with the water molecule indicated by W. Among bases possibly bound by residues 35 of other FFRPs, bases at only are indicated, since only the positions in the optimal duplexes are always occupied by specific partners of residues 35 in the FFRPs. Amino-acid residues the same as or similar to those of FL11, and bases predicted to be bound by these residues are highlighted by yellow backgrounds. Arg39 of the FFRPs, and G bases at in the optimal duplexes are highlighted by light-blue backgrounds. Contacts relating residues at the same positions with bases at the same positions are indicated by lines in the same colors. Chemical interactions predicted to be formed from residues 24 to the and bases, and those predicted to be formed alternatively to the e bases are differentiated by line types.In the optimal duplexes for interacting with Ss-LrpB, TvFL3 and FL10, G bases are present at , and Arg residues are found at positions 39 in the FFRPs (Figure 6). While T bases are present at a, positions 37 of the FFRPs are occupied by Thr and Ala (Figure 6). It is likely that these Thr and Ala form hydrophobic interactions with the T bases. TvFL3 and FL10 have Glu35, and in the optimal duplexes C bases are positioned at and c (Figure 6). So among the four base positions indirectly bound by Glu35 of FL11, the positions in the optimal duplexes for interacting with TvFL3, FL10 and also Ss-LrpB are all occupied by partner bases of residues 35 of the FFRPs (Figure 6). Glu35 of FL11 was unable to bind to a C base, because is occupied by T in order to interact with Leu24 and His39. In this way the DNA-binding specificities of the archaeal FFRPs can be well explained by the same pattern of relationship between amino-acid positions and base positions.In the crystal complex Ala34 bound to T(a) together with Thr37 (8). So it is likely that Pro34 of Ss-LrpB forms a hydrophobic interaction with T(a), and Ser34 of TvFL3 and FL10 donate hydrogen bonds to T(a). It is likely that C() and T() in the optimal duplex for interacting with FL10 are determined by Glu35 and Phe24 of FL10, but C() and T() in that for interacting with TvFL3 are determined by Glu35 and Asn24 of TvFL3 (Figure 6). A hydrophobic residue such as Phe can bind to two T/C bases neighboring along the same strand (29). Asn has a hydrogen bond donor as well as an acceptor, and can simultaneously bind to T and C (29). Alternatively, it can form two hydrogen bonds with A (31), which is positioned at e in the optimal duplex for interacting with TvFL3 (Figure 5F).When Gly36 of TvFL3 was replaced by Thr, A(b) in the optimal duplex changed to T(b) (Figure 6). When Thr36 of FL10 was replaced by Ser, T(b) in the optimal duplex changed to G(b) or A(b) (Figure 6). It is likely that Ser36 of an FFRP forms a hydrogen bond with G or A at b, as indicated by the binding specificities of FL11, Ss-LrpB and FL10[T36S] (Figure 6). While, Thr36 will form a hydrophobic interaction with T(b), as indicated by the binding specificities of FL10 and TvFL3[G36T].A(b) in the optimal duplex for interacting with TvFL3 will not interact with the side-chain proton of Gly36 of TvFL3. To explain the presence of A(b) in the duplex, essential physical characteristics such as high bendability, where a pyrimidine–purine step, T(a)A(b), would be required, might better be considered (32). Alternatively, the OH group of Thr37 or Ser34 might form a hydrogen bond with A(b). In the crystal complex Thr37 of FL11 donated a hydrogen bond from the side-chain OH to the DNA–phosphate group between T(a) and G(−1) (8). When another OH of Ser34 is positioned nearby, the two OH might interact with each other so that one of them will form a hydrogen bond with A(b). If this happens, although residues 34 and 37 primarily interact with the a base, and residue 37 does so with the b base, more precisely speaking, the a and b bases are recognized by residues 34, 36 and 37 in combination.By the amino-acid replacements at positions 36 of TvFL3 and FL10 the b bases in the optimal duplexes changed as predicted from the contact pattern (Figure 3). However, the frequencies of these bases at binding sites are not so high as one might expect. The most frequent bases at a did not change, and this fact is consistent with the contact pattern. However, their frequencies decreased by the replacements (Figure 3). These facts appear to be consistent with the idea of combinatorial recognition of the a and b bases by residues 34, 36 and 37.
DNA-binding characteristics of other archaeal FFRPs consistent with the contact pattern
Consensus of a LysM-binding site in the lysWXJK promoter in the genome of S. solfataricus and identical sites in three related genomes was reported as GGTWYKAAWWWSGWACC, where Y is C or T, K is G or T, and S is C or G (16). However, by fitting into a self-complementary form, the consensus can be described as GGTTCGA[AAT/ATT]TCGAACC. Consensus of 16 Ptr1-binding sites is TACGC[AAT/ATT]GCGTA, and that of 13 Ptr2-binding sites is GGACGA[AAA/TTT]TCGTCC (15). The abcde bases in these sequences can be explained using the contact pattern discussed in the preceding subsection and residues 24, 34–37 and 39 of the FFRPs (Figure 1).The abcde sequence of the consensus LysM-binding site in the self-complementary form is the same as those of the optimal duplexes for interacting with FL10 and TvFL3[G36T] (Table 4). The presence of C(c), G(d) and A(e) can be explained by binding of Tyr24, Glu35 and Arg39 of LysM to bases in the same way as that of Phe24, Glu35 and Arg39 of FL10 predicted (Figure 6). A LysM ortholog from Sulfolobus acidocaldarius has Phe24 instead of Tyr24, and so the two residues are expected to have the same functions. The presence of T(a) and T(b) can be explained by hydrophobic interactions of Ala37 of LysM and T(a), and Ala36 and T(b) (Table 4).
Table 4.
Bases in consensus FFRP-binding sites and amino-acid residues of the FFRPs
FFRP
Bases at positions
Residues at positions
a
b
c
d
e
24
34
35
36
37
39
LysM
T
T
C
G
A
Tyr
Ser
Glu
Ala
Ala
Arg
FL10
T
T
C
G
A
Phe
Ser
Glu
Thr
Ala
Arg
TvFL3[G36T]
T
T
C
G
A
Asn
Ser
Glu
Thr
Thr
Arg
Ptr2
G
A
C
G
A
Tyr
Ser
Glu
Ser
Ser
Arg
Ga
FL11
C
G
A
A
A
Leu
Ala
Glu
Ser
Thr
His
Ss-LrpB
T
G
C
A
A
Leu
Pro
Ile
Ser
Thr
Arg
FL10[T36S]
T
G
C
G
A
Phe
Ser
Glu
Ser
Ala
Arg
Ab
Ptr1
T
A
C
G
C
Phe
Ser
Glu
Gly
Thr
Arg
TvFL3
T
A
C
G
A
Asn
Ser
Glu
Gly
Thr
Arg
Bases at a and residues 34 and 37 are italicized and underlined. Bases at b and residues 36, except for Gly, are underlined. Bases at c, d and e, and residues 24, 35 and 39 are shown in bold. It is likely that these residues interact with the bases shown in the same expressions, i.e. underlined, italicized and underlined, or bold, or the complementary bases.
aG is frequent next to A at b of Ptr2-binding sites (15).
bThe frequency of G at a of FL10[T36S]-binding sites does not reach 45%, and A is frequent next to G.
Bases in consensus FFRP-binding sites and amino-acid residues of the FFRPsBases at a and residues 34 and 37 are italicized and underlined. Bases at b and residues 36, except for Gly, are underlined. Bases at c, d and e, and residues 24, 35 and 39 are shown in bold. It is likely that these residues interact with the bases shown in the same expressions, i.e. underlined, italicized and underlined, or bold, or the complementary bases.aG is frequent next to A at b of Ptr2-binding sites (15).bThe frequency of G at a of FL10[T36S]-binding sites does not reach 45%, and A is frequent next to G.Ptr2 has the same residues as LysM at positions 24, 34, 35 and 39 (Table 4). Positions 36 and 37 are occupied by Ala in LysM but Ser in Ptr2. As has been discussed, Ser36 will form a hydrogen bond with G or A at b. At the 13 Ptr2-binding sites the most and next frequent bases at b are A and G, respectively (15). Among proteins listed in Table 4 only Ptr2 has Ser37, and the consensus Ptr2-binding site only has G(a). It is likely that Ser37 of Ptr2 donates a hydrogen bond to G(a).Ptr1 has the same residues as TvFL3 at five of the six positions (Table 4). Instead of Asn present at position 24 of TvFL3, Ptr1 has Phe. As has been discussed for FL10 and TvFL3, this difference will not differentiate the binding specificity of the FFRPs. The consensus Ptr1-binding site has T(a)A(b)C(c)G(d)C(e) (15), which is almost the same as T(a)A(b)C(c)G(d)A(e) in the optimal duplex for interacting with TvFL3 (Table 4).Binding of Ss-Lrp to the ss-lrp promoter was characterized by foot-printing and contact-probing experiments (17). Ss-Lrp has Leu24, Ser34, Pro35, Ala36, Thr37 and His39 (Figure 1). Using the contact pattern, an optimal duplex for interacting with Ss-Lrp is predicted to be TTAAAWWWTTTAA, since Thr37 and Ser34 will determine a to T in the same way as those of TvFL3, and Ala36 will determine b to T in the same way as that of LysM, and Leu24 and His39 will determine cde to AAA in the same way as those of FL11 (Table 4). Inside the region tightly protected by Ss-Lrp from DNase I cleavage around a cluster of bases identified as tightly contacted by amino-acid residues (17), TGAAA[ATT]TTTTA/TAAAA[AAT]TTTCA, and, shifted by 1 bp, GAAAA[TTT]TTTAA/TTAAA[AAA]TTTTC are present, which are very close to the site predicted to be optimal. While it is likely that the downstream half of the region was protected by a dimer binding around these sequences, Ss-Lrp forms a tetramer (17), and so the upstream half might have been protected by the other dimer in the same tetramer.
Factors modifying the contact pattern
Summarizing the discussion, it is likely that various archaeal FFRPs recognize DNA in essentially the same way as FL11 in the crystal complex. However, it is difficult to explain the DNA-binding specificity of E. coliLrp by exactly the same contact pattern. Using another SELEX procedure, Cui et al. (33) identified an optimal duplex for interacting with Lrp as YAGHAW[AAT/ATT]WTDCTR, where H is a non-G base, and D is a non-C base. We have obtained preliminary results suggesting that an optimal duplex for interacting with the Lrp dimer is CAGCAT[AAT/ATT]ATGCTG (Yokoyama, K. et al., unpublished results). While Lrp has Asn24, Ser34, Pro35, Thr36, Pro37 and Leu39 (Figure 1), A(a) will not interact with the hydrophobic side-chain of Pro37, and G() will not interact with that of Leu39. Pro35 and Pro37 positioned nearby might affect the flexibility of the loop containing residues 34–37, thereby modifying the contact pattern.Ser36 of FL11 bound to G(b) in the crystal complex (Figure 5C), and it is likely that Arg39 of an FFRP binds to G() (Figure 5D). However, it will be difficult to replace Arg39 by Ser, since the side-chain of Ser will be too short to reach G() from position 39. Similarly, Leu24 or Ile35 of Ss-LrpB, or Phe24 of FL10 will not be replaced by Ala. So it is important to identify sizes of amino-acid side-chains able to relate amino-acid and base positions as in the contact pattern. According to our analysis of the amino-acid sequences of FFRPs (Suzuki, M. et al., unpublished results), positions 34, 36 and 37 are generally occupied by small residues (see sequences in Figure 1). The average volumes of residues at the three respective positions are all close to the volumes of Ser and Thr. While, the average volumes of residues at positions 24 and 39, respectively, are larger and close to the volumes of Glu-Lys. That of residues at position 35 are intermediate and close to the volumes of Asn-Glu. If all six residues are replaced by residues of fundamentally different sizes, in many cases the FFRP will become unable to bind to DNA. However, some FFRPs do have such combinations. Another contact pattern might be needed for characterizing their DNA recognition mode.
FUNDING
Core Research for Evolutional Science and Technology program of the Japan Science and Technology Agency. Funding for open access charge: Core Research for Evolutional Science and Technology program of the Japan Science and Technology Agency.Conflict of interest statement. None declared.
Authors: P M Leonard; S H Smits; S E Sedelnikova; A B Brinkman; W M de Vos; J van der Oost; D W Rice; J B Rafferty Journal: EMBO J Date: 2001-03-01 Impact factor: 11.598
Authors: Arie B Brinkman; Stephen D Bell; Robert Jan Lebbink; Willem M de Vos; John van der Oost Journal: J Biol Chem Date: 2002-05-31 Impact factor: 5.157
Authors: R K Saiki; D H Gelfand; S Stoffel; S J Scharf; R Higuchi; G T Horn; K B Mullis; H A Erlich Journal: Science Date: 1988-01-29 Impact factor: 47.728
Authors: A B Brinkman; I Dahlke; J E Tuininga; T Lammers; V Dumay; E de Heus; J H Lebbink; M Thomm; W M de Vos; J van Der Oost Journal: J Biol Chem Date: 2000-12-08 Impact factor: 5.157
Authors: Christopher L Plaisier; Fang-Yin Lo; Justin Ashworth; Aaron N Brooks; Karlyn D Beer; Amardeep Kaur; Min Pan; David J Reiss; Marc T Facciotti; Nitin S Baliga Journal: BMC Syst Biol Date: 2014-11-14