Individual protein binding sites on DNA can be measured in bits of information. This information is related to the free energy of binding by the second law of thermodynamics, but binding kinetics appear to be inaccessible from sequence information since the relative contributions of the on- and off-rates to the binding constant, and hence the free energy, are unknown. However, the on-rate could be independent of the sequence since a protein is likely to bind once it is near a site. To test this, we used surface plasmon resonance and electromobility shift assays to determine the kinetics for binding of the Fis protein to a range of naturally occurring binding sites. We observed that the logarithm of the off-rate is indeed proportional to the individual information of the binding sites, as predicted. However, the on-rate is also related to the information, but to a lesser degree. We suggest that the on-rate is mostly determined by DNA bending, which in turn is determined by the sequence information. Finally, we observed a break in the binding curve around zero bits of information. The break is expected from information theory because it represents the coding demarcation between specific and nonspecific binding.
Individual protein binding sites on DNA can be measured in bits of information. This information is related to the free energy of binding by the second law of thermodynamics, but binding kinetics appear to be inaccessible from sequence information since the relative contributions of the on- and off-rates to the binding constant, and hence the free energy, are unknown. However, the on-rate could be independent of the sequence since a protein is likely to bind once it is near a site. To test this, we used surface plasmon resonance and electromobility shift assays to determine the kinetics for binding of the Fis protein to a range of naturally occurring binding sites. We observed that the logarithm of the off-rate is indeed proportional to the individual information of the binding sites, as predicted. However, the on-rate is also related to the information, but to a lesser degree. We suggest that the on-rate is mostly determined by DNA bending, which in turn is determined by the sequence information. Finally, we observed a break in the binding curve around zero bits of information. The break is expected from information theory because it represents the coding demarcation between specific and nonspecific binding.
Transcription factors bind to a variety of sequences with different affinities (1). The amount of sequence variability within a set of binding sites is limited by physical requirements for binding, as well as the ability for the site to be distinguished from non-sites in the genome (2). A range of affinities allows for a subtle regulation of transcription. In the case of activators, higher affinity sites will presumably be bound longer than lower affinity sites, and have a greater probability of stabilizing the initiation complex, which in turn has a greater probability of transcribing a gene. Therefore, the affinity of the protein for a site is a direct indicator of the degree that that site will affect the gene expression.Being able to predict binding affinities for different DNA targets is useful in characterizing genetic regulatory pathways. To do this, we use an information theory-based weight matrix to quantify protein binding to individual sequences (3).Information theory was developed by Claude Shannon in the late 1940s to describe the movement of information in communications (4). When applied to biological systems it has proven to be useful (2,5–7). Based on the frequency of each base at each position in a set of aligned binding sites, we can determine the strength of an individual site in bits of information. This strength is called the individual information, R (rate of individual information transfer, bits per site) for a site (3). Advantages of this approach are discussed in Materials and Methods.It has been shown that the protein–DNA dissociation constant, KD, varies with DNA sequences, and can be approximated by different weight matrix approaches (8–12). The information in a binding site should be related to the binding energy (13). Binding energy, in turn, is proportional to the logarithm of the ratio of the association (kon) and dissociation (koff) rate constants of binding. Since the on-rate depends on diffusion of the protein to the DNA binding site, we expected that the on-rate would be independent of the binding sequence. This suggests that the information of binding sites (R) should be linearly related to the logarithm of the off-rate. Others have reported differences in binding rate constants as a function of sequence (14–16), but they did not report any relationship between the rate constants and affinity predictions. No one has shown how information theory predictions of individual binding sites are related to binding and dissociation kinetics.To address this issue, we used surface plasmon resonance (SPR) technology (17–19) and electrophoretic mobility shift assays (EMSA) to measure the binding kinetics for 13 Fis binding sites ranging in predicted site strength, based on our information theory approach. Fis is a pleiotropic homodimeric DNA binding protein involved in site-specific recombination, chromosomal compaction and transcriptional regulation (6,20,21). Because many genomic sites have been experimentally identified, a reliable Fis model could be constructed and verified (6,22), making it a good protein for this analysis.
MATERIALS AND METHODS
Constructing the Fis model
The Fis binding site model was built using the standard Delila programs (23,24) (Figure 1), and was originally presented in (6). Individual information analysis (3,25) of Fis binding sites was computed using a weight matrix from the equation:
where f (b, l) is the frequency of each base b at each position l for all positions in an aligned set of binding sites. e(n) is a sample correction value where n is 120, the number of Fis binding sites and their complements that make up our frequency matrix. To determine the strength of a site (R), a DNA sequence is compared to the R (b,l) weight matrix and the information contribution of each base is summed across the site. There are several advantages to our approach. First, our models are composed of only experimentally verified binding sites, and do not require a training set of unproven ‘non-sites’ like many neural-networks or HMMs (26–29). Second, our method has no arbitrary parameters, and the theory predicts that all sites with greater than zero bits of information have a negative ΔG of binding (3). Third, the units of measurement, bits, allow direct comparison between different molecular systems. Fourth, the average R for all binding sites that define an R (b,l) is Rsequence, or the total information content [the area under a sequence logo (2)]. The information content is a measure of the sequence conservation and it is determined by the evolution of the sites in the genome (30).
Figure 1.
Sequence logo for the Fis protein (6). The heights of letters in each stack are proportional to the frequency of each base at that position. The height of the stack is the information content for that position (23). The total conservation, summed for all positions in the range −7 to +7 is Rsequence= 7.18 ± 0.23 bits per site (2), which is also the average of the individual information of all of the sites (3). The sine wave above the logo represents the 10.6 bp helical twist of B-form DNA (24). The positions presumably bound by the D helices from the major groove at ± 7 are marked with squares, the pyrimidine/purine steps that kink the DNA at ± 4 and ± 3 are marked with filled triangles, and the A/T bases that allow bending into the minor groove are marked with open triangles.
Sequence logo for the Fis protein (6). The heights of letters in each stack are proportional to the frequency of each base at that position. The height of the stack is the information content for that position (23). The total conservation, summed for all positions in the range −7 to +7 is Rsequence= 7.18 ± 0.23 bits per site (2), which is also the average of the individual information of all of the sites (3). The sine wave above the logo represents the 10.6 bp helical twist of B-form DNA (24). The positions presumably bound by the D helices from the major groove at ± 7 are marked with squares, the pyrimidine/purine steps that kink the DNA at ± 4 and ± 3 are marked with filled triangles, and the A/T bases that allow bending into the minor groove are marked with open triangles.We used a Fis model ranging from −7 to +7 throughout this article. This assumes that positions outside this region do not affect binding and it is consistent with known footprinting data (6). The small amount of information observed in positions −10 to −8 and +8 to +10 (Figure 1) may correspond to overlapping adjacent Fis sites (22).Individual information analysis was done using the program scan and sequence walkers were generated using lister (3,31) (Figure 2).
Figure 2.
Sequences used for analysis. Information analysis for individual sequences was computed (3) and displayed using sequence walkers (31). Those positions that are favored according to the R(b,l) weight matrix (contribute positive information) are represented by bases above the x-axis, whereas those bases that are not favored (contribute negative information) are below the x-axis. The height of each base is its information contribution to the site strength. The sum of all base heights is R for the sequence, and this is given on the right of the sequence walker. These sequences correspond to those in Table 1. The sequences are sorted by their strength in bits and the saturation of a colored rectangle behind each walker is proportional to that strength. As in Figure 1, the sine wave above the walker represents the 10.6 bp helical twist of B-form DNA (24).
Sequences used for analysis. Information analysis for individual sequences was computed (3) and displayed using sequence walkers (31). Those positions that are favored according to the R(b,l) weight matrix (contribute positive information) are represented by bases above the x-axis, whereas those bases that are not favored (contribute negative information) are below the x-axis. The height of each base is its information contribution to the site strength. The sum of all base heights is R for the sequence, and this is given on the right of the sequence walker. These sequences correspond to those in Table 1. The sequences are sorted by their strength in bits and the saturation of a colored rectangle behind each walker is proportional to that strength. As in Figure 1, the sine wave above the walker represents the 10.6 bp helical twist of B-form DNA (24).
Table 1.
Kinetics as determined by SPR
Oligo name
Ri (bits)
Number of experiments
Stability (s−1)
Reference
anti-con
−30.6
6
2.21 × 10−1 ± 4.08 × 10−3
This work
cin-336
4.9
2
1.24 × 10−1 ± 2.48 × 10−3
(33)
hin-1096
5.4
2
7.39 × 10−2 ± 6.07 × 10−4
(32)
lacP-560
6.6
4
1.67 × 10−2 ± 6.49 × 10−5
(34)
ndhII-188
8.2
7
7.37 × 10−3 ± 1.10 × 10−4
(35)
comp-ndhII-188
8.2
2
6.24 × 10−3 ± 4.19 × 10−5
(35)
fis-333
10.1
1
3.45 × 10−3 ± 3.21 × 10−5
(37)
tgt-1824
10.2
1
2.62 × 10−2 ± 1.21 × 10−4
(36)
hin-180
10.4
2
7.83 × 10−3 ± 4.80 × 10−5
(32)
thrU-87
10.9
1
4.06 × 10−3 ± 6.54 × 10−5
(38)
ndhI-137
12.7
1
2.90 × 10−3 ± 5.80 × 10−5
(35)
mut-con
12.8
6
8.65 × 10−4 ± 4.29 × 10−6
This work
con
14.9
1
9.40 × 10−4 ± 1.39 × 10−5
This work
‘Oligo’ is the name of the synthetic DNA hairpin as defined in this article or the name of the adjacent gene and base coordinate of the site in a GenBank entry (6). The sequences are given in Figure 2 and Supplementary Data Figure 1. R is the individual information for that site. ‘Number of experiments’ is the number of measurements made with each oligo. ‘Stability’ is the apparent koff that we measure using SPR and analyzed with Scrubber. ‘Reference’ is the reference describing the binding of Fis to that sequence.
Oligo construction
Thirteen oligos of varying information content were synthesized to measure binding kinetics. Ten of these contain naturally occurring Fis binding sites, where binding has been experimentally verified (32–38). These sites are presented in Table 1 and Figure 2. We chose these oligos to cover a spectrum of strengths from 4.9 to 12.7 bits, as assessed by our information theory approach. The three remaining oligos do not contain characterized binding sites, but have been engineered by us to test binding at additional site strengths.Kinetics as determined by SPR‘Oligo’ is the name of the synthetic DNA hairpin as defined in this article or the name of the adjacent gene and base coordinate of the site in a GenBank entry (6). The sequences are given in Figure 2 and Supplementary Data Figure 1. R is the individual information for that site. ‘Number of experiments’ is the number of measurements made with each oligo. ‘Stability’ is the apparent koff that we measure using SPR and analyzed with Scrubber. ‘Reference’ is the reference describing the binding of Fis to that sequence.The first engineered oligo is the Fis consensus of 5′-ATTGGTTAAATTTTAACCAAT-3′ over the range −10 to +10, containing three extra natural bases on each end (Figure 1), which is presumably the highest strength site (14.9 bits), and it does not occur in the Escherichia coli genome (named con in Table 1, Figure 2). The second oligo is a slight modification of this consensus, where we mutated the T at position +1 to a G to decrease the strength of the site to 12.8 bits (named mut-con in Table 1, Figure 2). The third oligo is the Fis anti-consensus of 5′-CGGCTGACCCCGGGTCAGCCG-3′, which is made up of the least favorable base at each position (named anti-con in Table 1, Figure 2). The kinetics of binding to this sequence are presumably those of nonspecific interactions of Fis with DNA.All sequences were inserted into the same hairpin construct: 5′-GCTATCGCG-[Sequence]-ACGATCGCGC-GAA-GCGCGATCGT-[Complement of Sequence]-CGCGA-3′, where there is a 5′ 4 bp overhang of GCTA to allow for future modification, and a 3 bp loop of GAA in the center. This construct has been shown to form tight hairpins (6,39). All oligos were synthesized carrying a 5′-biotin tag (Synthegen, LLC) to allow immobilization of the oligos onto NeutrAvidin (NA)-coated sensor chips (B1 chips, Biacore Inc.). To test whether the orientation of a sequence in the hairpin affects binding, we inverted the ndhII-188 sequence in the hairpin to create comp-ndhII-188.
SPR analysis
NeutrAvidin was purchased from Pierce. EDTA, SDS, NaCl and HEPES (pH 7.4) were purchased from Invitrogen. Potassium glutamate was purchased from Sigma-Aldrich. Tris-HCl (pH 7.5) was purchased from Quality Biological, Inc. Binding experiments were performed on Biacore 2000 and Biacore 3000 instruments (Biacore Inc.). NeutrAvidin was diluted to a final concentration of 25 g/ml in 10 mM sodium acetate, pH 4.5. An immobilization wizard within the Biacore control software was used to immobilize no more than 4000 RU of NA. One RU, or resonance unit, corresponds to a change in the angle of the intensity minimum by 0.0001 ○ as detected by the Biacore. The oligos were diluted to a final concentration of 1 mg/ml in immobilization buffer (10 mM Tris-HCl pH 7.5, 300 mM NaCl, 1 mM EDTA). To prepare double-stranded DNA, the oligos were heated to 95°C for 5 min, snap cooled on ice for 5 min, and incubated at room temperature for 15 min. The sample was then diluted 750-fold in immobilization buffer and injected manually over the surface until between 100 and 150 RUs were captured on the B1 sensor chip.Purified Fis protein (22) was serially diluted in 1×running buffer (10 mM HEPES pH = 7.4, 350 mM potassium glutamate (40), 3.4 mM EDTA, 0.01% BSA) to concentrations ranging from 100 nM for the high affinity oligos to 1000 nM for the low affinity oligos and injected at 25°C at a flow rate of 100 μl/min for 90 s. All oligos reached a stochastic steady state of Fis binding. Dissociation times were typically 90–360 s depending upon the stability of the complex. Disruption of any complex that remained bound after dissociation was achieved using two 50 μl injections of regeneration solution (0.1% SDS, 3.4 mM EDTA) followed by one EXTRACLEAN command, a running buffer wash to eliminate carry-over into the next experiment. At the beginning of each cycle, the needle was pre-dipped in running buffer before an injection of 100 μl running buffer. Similarly, each cycle was ended by an injection of 100 μl running buffer and an EXTRACLEAN command. Typically, every concentration of protein was injected twice from separate vials. In order to subtract any background noise from each data set, all samples were also run over a sensor chip surface of NA without oligo and injections of running buffer were performed for every experiment (‘double referencing’) (41). Data were fit to a single exponential decay model using both of the programs Scrubber 1.10 (42) and Biaevaluation 3.1 (Biacore, Inc).
Using EMSA, we found that nonspecific binding occurred with the long oligos used in the SPR experiments. Therefore we used hairpin oligos containing a Fis site (−7 to +7) with no additional bases, a loop (5′-GCGAAGC-3′) and the complementary sequence of the Fis site for EMSA. (See Supplementary Data Figure 1 for the sequences used.)Competition EMSA between conF37, a 5′ 6-FAM labeled oligo 5′-GGTTAAATTTTAACC-GCGAAGC-GGTTAAAATTTAACC-3′ (Integrated DNA Technologies) containing the consensus Fis binding site, and unlabeled oligos containing naturally occurring and mutated Fis binding sites, was used to determine the KD of the sites. When a potassium glutamate-containing buffer was used for EMSA, Fis–DNA complexes smear on a gel, therefore we used the following buffer. Binding reactions were carried out in 10 μl of solution, containing 7.7 mM Bis Tris Propane-HCl, 10 mM NaCl, 0.5% glycerol, 10 mM MgCl2, 1 mM DTT, 800 nM Fis, 40 nM labeled conF37 oligo and 1.0, 1.5 or 2.0 μM competitors for 5 min at room temperature, followed by 2.2% agarose gel electrophoresis in 5 mM sodium borate pH = 8.5 (43) for 20 min at 5 V/cm and the gel was scanned by a FMBIO II fluorescent scanner (Hitachi) with 505 nm emission filter (Figure 5). (See Supplementary Data for how the data were analyzed.)
Figure 5.
Competition electrophoretic mobility shift assay with three different concentrations of oligos containing different Fis binding sites. (See Supplementary Data Figure 1 for the sequences.) For each concentration, the top band is Fis bound to the consensus 5′ 6-FAM labeled oligo and the bottom band is unbound labeled oligo (see Materials and Methods). The competitor concentrations shown are approximately: 1.0 μM low, 1.5 μM medium, 2.0 μM high; the exact values for each competitor are given in Supplementary Data. Lanes 1 to 13: competitor oligos 1 to 13; Lane 14: no competitor.
RESULTS AND DISCUSSION
The Fis sequence logo is consistent with models of the Fis/DNA complex (Figure 1) (6,44,22). Sequence conservation at positions ± 7 above 1 bit suggests that Fis binds two major grooves on the same face of the DNA (24). However, the distance between the D helices which bind these two major contacts is less than 10.6 bases, one helical twist of B-form DNA, suggesting that the DNA must bend to enable positions ± 7 to contact the D helices (24). Indeed, Fis bends DNA (44). The relatively low information content of 7.18 ± 0.23 bits over the range ± 7 bases, suggests that Fis is a fairly prolific binder (2, 30). This is consistent with the observed high concentration of Fis in response to nutrient upshifts (as many as 50 000 dimers per cell) (37). Finally, DNA methylation and DNase I hypersensitivity results are consistent with positions of significant sequence conservation (6). The correspondence between the physical and biochemical characterization of Fis binding with the sequence conservation supports the information-theory based Fis binding model.We chose ten naturally occurring Fis binding sites and three synthetic sites for kinetic analysis. These sites covered a spectrum of strengths and are reported in Figure 2. The terms anti-consensus (anti-con) and consensus (con) refer to the weakest and strongest possible sites based on our model respectively (3).In order to measure the binding kinetics of these oligos, we used SPR technology. Protein can be flowed over a mat of DNA tethered to a thin gold surface. As the protein associates and dissociates, the change in density on the surface can be monitored, and kon and koff can be determined (17,45). The SPR plots appeared to have one-stage binding, suggesting a simple association–dissociation mechanism (Figure 3).
Figure 3.
Sensogram of Fis bound to different DNA sequences. All curves were normalized so that saturation of the chip is set to 1. At time zero, Fis was washed onto the SPR chip. At time 90 s, Fis was washed off the chip. The stability measurements reported in Table 1 were determined from the curve after 90 s.
Sensogram of Fis bound to different DNA sequences. All curves were normalized so that saturation of the chip is set to 1. At time zero, Fis was washed onto the SPR chip. At time 90 s, Fis was washed off the chip. The stability measurements reported in Table 1 were determined from the curve after 90 s.All data obtained for the Fis dimer (22.4 kDa) on the Biacore machine were transport limited (46). That is, the kinetics of binding that are inferred from these experiments are not only a measurement of binding, but also a measure of the delivery of Fis to the chip surface. However, we were able to measure an apparent koff or ‘stability’ which is the rate of dissociation of Fis from the surface. Although this is not the true koff, because of the transport limitation, it is proportional since the rate of transport (
) is constant for all measurements. Additionally, surface effects such as nonspecific interactions of Fis with the chip surface could affect the SPR measure so that it does not entirely represent in vivo or in-solution conditions, but as with the rate of transport, such effects should also be constant for all measurements.The stability kinetics measurement is strongly correlated to the individual information of the sites, with r2 = 0.84 (Figure 4). These values are presented in Table 1. The complexes of Fis with oligos ndhII and comp-ndhII had similar stabilities (7.4 × 10−3 and 6.2 × 10−3 s−1 respectively) suggesting that orientation within the hairpin had little affect on the stability measurement. The dissociation of the protein from the anti-con oligo is faster than the dissociation from the weakest observed natural binder cin-336, 0.22 s−1 versus 0.12 s−1. This is presumably related to the energy difference between the weakest possible specific binding and nonspecific binding for Fis. The stability of the protein with the consensus and mutated consensus is very high, 9.4 × 10−4 and 8.7 × 10−4 s−1, respectively.
Figure 4.
Binding site information is correlated to stability. For each sequence described in Figure 2 and Table 1, we plotted the stability versus the information R. Scrubber and Biaevaluation are two implementations of curve fitting by a single exponential decay describing the dissociation. Both were used to evaluate all of the data and slight differences were observed from small deviations in the start and stop points chosen for analysis. We plot each measurement independently. Although the anti-con oligo is presumably nonspecific at −30.6 bits, we plotted it as having 0 bits of information. All points at zero bits are for the anti-con oligo. The regression line (excluding the anti-con) is shown as a red line (r2 = −0.84). 99% confidence limits for the regression are shown with blue lines. The equation for the regression line is log2(Stability) = −0.70 × Individual information −0.84.
Binding site information is correlated to stability. For each sequence described in Figure 2 and Table 1, we plotted the stability versus the information R. Scrubber and Biaevaluation are two implementations of curve fitting by a single exponential decay describing the dissociation. Both were used to evaluate all of the data and slight differences were observed from small deviations in the start and stop points chosen for analysis. We plot each measurement independently. Although the anti-con oligo is presumably nonspecific at −30.6 bits, we plotted it as having 0 bits of information. All points at zero bits are for the anti-con oligo. The regression line (excluding the anti-con) is shown as a red line (r2 = −0.84). 99% confidence limits for the regression are shown with blue lines. The equation for the regression line is log2(Stability) = −0.70 × Individual information −0.84.Competition electrophoretic mobility shift assay with three different concentrations of oligos containing different Fis binding sites. (See Supplementary Data Figure 1 for the sequences.) For each concentration, the top band is Fis bound to the consensus 5′ 6-FAM labeled oligo and the bottom band is unbound labeled oligo (see Materials and Methods). The competitor concentrations shown are approximately: 1.0 μM low, 1.5 μM medium, 2.0 μM high; the exact values for each competitor are given in Supplementary Data. Lanes 1 to 13: competitor oligos 1 to 13; Lane 14: no competitor.The logic of our experiment is based on a series of simple relations:Information is related to energy by a version of the Second Law of thermodynamics (13). The relationship is generally proportional (TDS in preparation) so we expect that the individual information should relate to the binding energy:
This is supported by experiments in a number of systems (8,9,47).The binding energy is related to the binding constant:The binding constant is a function of the on and off rates:Once a protein is at a binding site, it will frequently bind irrespective of how strong the binding is, so the on-rate, kon should be roughly constant and this is observed in various other genetic systems (14–16, 48).Combining the above
so the more information a binding site has, the larger the number of contacts it can make with the protein (49) and correspondingly the more difficult it becomes for thermal noise to separate the two once they are bound together. The off-rate is strongly dependent on the detailed binding contacts since all of these have to be broken to release the protein.Although our Biacore experiments gave the relationship of Equation (5) (Figure 4), they did not give us kon values. To investigate kon, we performed competitive EMSA experiments to determine KDs (Figure 5). The results show a linear relationship between
and R:
with r2 = −0.73 (Supplementary Data). The experiment was repeated and similar results were obtained (data not shown).Since koffs and KDs were measured by different techniques, the relative ratios between the sites should be correct but they may differ from the absolute values by an unknown multiplicative factor. On the log scale, this is in the additive constant. Using the KDs measured by EMSA and the koffs measured in the Biacore experiments, we calculated kon according to Equation (4) for each DNA. Unexpectedly, we observed that kon is related to the information.By using linear regression of
against R and
against R,
and
we found that 49% of the variance of
and 78% of the variance of
is explained by the variance of R (Supplementary Data). Thus most of the off-rate is explained by the information in the sequence. In addition, a good portion of the on-rate is explained by the sequence, implying that another factor—we suggest sequence bendability—may be involved in the initial binding.Are the evolved binding targets of Fis the result of the physical properties of DNA? It is possible that the bases that are specifically contacted have been adapted through natural selection to facilitate binding through bending. If this is true, then there should be a correlation between kon and koff. Indeed, kon and koff increase together with a positive correlation
and 85% of the
variance is explained by
, suggesting that some of the positions are important for both binding and bending (Supplementary Data Figure 2).This proposal is consistent with our previous observations on the sequence logo of Fis (22). We found that patterns of bases in the Fis sites can be explained in two distinct ways. In Figure 1, the outer bases at ± 7, mostly G and C, are consistent with direct binding by Fis into the major groove but these contacts are too close to allow the D helices of Fis to fit into the major groove unless the DNA is also bent. Positions ± 4 and ± 3 contain pyrimidines and purines (respectively, on the 5′↣ 3′ strand) which could be contacted directly through the major groove or which could provide a bendable step. Likewise positions −2 to +2 contain A or T which is also consistent with either direct minor groove contacts or with bending into the minor groove. Since the central positions from −4 to +4 do not appear to be contacted in our 3D model (22), binding of Fis may first involve specific contacts followed by bending that perhaps releases those contacts. This implies that the binding rate requires DNA sequence-dependent bending. If so, kon is controlled by the degree of flexibility of the DNA and that, in turn, is controlled by the DNA sequence. However, if Fis makes direct contacts to the central bases while bound (despite our modeling) then DNA sequence should determine the strength of binding, and this is indeed observed. We are led to suggest that both bending and direct contacts are involved in both of the on- and off-stages of Fis binding. Similar experiments relating the information content of binding sites for other proteins that do not bend DNA as strongly as Fis may reveal further insights into the binding process.The experiments described here suggest that R is mostly dependent upon the logarithm of koff. It has previously been shown that the average R for all sites is Rsequence, the sequence conservation of a set of binding sites (3). Therefore, the results imply that the sequence conservation (the amount of variability among a set of binding sites) for a protein is directly related to the binding kinetics of that protein to its targets. A stronger binding protein that covers the same length of DNA will have a less variable site. Another aspect is that Rsequence evolves to match the information needed to find the sites in the genome, Rfrequency, which is a function of the size of the genome and number of sites (2,30). As a protein evolves to bind a greater number of targets, the average specific binding energy of that protein to its targets would decrease by increased koff.Our experiment provides preliminary data supporting a distinction between two approaches to understanding the DNA recognition process. In Figure 4, no data points were obtained between the anti-consensus at −30.6 bits and 0 bits, however the lowest positive Fis site, at 4.9 bits has a
around −3 and the anti-consensus is around −2 so the curve is linear with a negative slope to near zero bits and then presumably is essentially flat from there to −30.6 bits. As shown in Figure 6, a similar result occurs with a plot of binding energy (
) versus information. We suggest that this apparent break at zero bits is a manifestation of the Second Law of Thermodynamics and the channel capacity. That is, the Second Law predicts that sites with positive information should have negative ΔG values and those with negative information should not bind because they have positive ΔG values (13). Shannon's channel capacity theorem predicts threshold effects in coded systems where there is a sharp boundary between recognized and unrecognized signals (50). The break in the curve therefore provides support for a coding interpretation of the binding interaction between Fis and DNA. This is in contrast with thermodynamic theories of binding, which generate a scale starting at the consensus, and which do not predict a specific boundary (8).
Figure 6.
Binding energy is linearly related to binding site information for positive information binding sites but apparently flat for sites with negative information. The curve appears to break near zero bits. The average KD values were normalized so that the Hin-180 sequence has the published value of Hin-D, 2× 10-9 M (51). Excluding the anti-consensus at −30.6 bits, the regression line is given in Equation (6) with r2 = −0.73.
Binding energy is linearly related to binding site information for positive information binding sites but apparently flat for sites with negative information. The curve appears to break near zero bits. The average KD values were normalized so that the Hin-180 sequence has the published value of Hin-D, 2× 10-9 M (51). Excluding the anti-consensus at −30.6 bits, the regression line is given in Equation (6) with r2 = −0.73.The individual information appears to be well correlated to the kinetics of binding. This not only gives greater confidence in our previous information theory based models, but also shows that it is a reliable approach to characterize genetic systems in silico. Furthermore, the relationship between information and energy is subtle (13), and this correlation helps ground the information theory approach into thermodynamics.