Catherine A Kemme1, Alexandre Esadze1, Junji Iwahara1. 1. Department of Biochemistry & Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch , Galveston, Texas 77555, United States.
Abstract
Functions of transcription factors require formation of specific complexes at particular sites in cis-regulatory elements of genes. However, chromosomal DNA contains numerous sites that are similar to the target sequences recognized by transcription factors. The influence of such "quasi-specific" sites on functions of the transcription factors is not well understood at present by experimental means. In this work, using fluorescence methods, we have investigated the influence of quasi-specific DNA sites on the efficiency of target location by the zinc finger DNA-binding domain of the inducible transcription factor Egr-1, which recognizes a 9 bp sequence. By stopped-flow assays, we measured the kinetics of Egr-1's association with a target site on 143 bp DNA in the presence of various competitor DNAs, including nonspecific and quasi-specific sites. The presence of quasi-specific sites on competitor DNA significantly decelerated the target association by the Egr-1 protein. The impact of the quasi-specific sites depended strongly on their affinity, their concentration, and the degree of their binding to the protein. To quantitatively describe the kinetic impact of the quasi-specific sites, we derived an analytical form of the apparent kinetic rate constant for the target association and used it for fitting to the experimental data. Our kinetic data with calf thymus DNA as a competitor suggested that there are millions of high-affinity quasi-specific sites for Egr-1 among the 3 billion bp of genomic DNA. This study quantitatively demonstrates that naturally abundant quasi-specific sites on DNA can considerably impede the target search processes of sequence-specific DNA-binding proteins.
Functions of transcription factors require formation of specific complexes at particular sites in cis-regulatory elements of genes. However, chromosomal DNA contains numerous sites that are similar to the target sequences recognized by transcription factors. The influence of such "quasi-specific" sites on functions of the transcription factors is not well understood at present by experimental means. In this work, using fluorescence methods, we have investigated the influence of quasi-specific DNA sites on the efficiency of target location by the zinc finger DNA-binding domain of the inducible transcription factor Egr-1, which recognizes a 9 bp sequence. By stopped-flow assays, we measured the kinetics of Egr-1's association with a target site on 143 bp DNA in the presence of various competitor DNAs, including nonspecific and quasi-specific sites. The presence of quasi-specific sites on competitor DNA significantly decelerated the target association by the Egr-1 protein. The impact of the quasi-specific sites depended strongly on their affinity, their concentration, and the degree of their binding to the protein. To quantitatively describe the kinetic impact of the quasi-specific sites, we derived an analytical form of the apparent kinetic rate constant for the target association and used it for fitting to the experimental data. Our kinetic data with calf thymus DNA as a competitor suggested that there are millions of high-affinity quasi-specific sites for Egr-1 among the 3 billion bp of genomic DNA. This study quantitatively demonstrates that naturally abundant quasi-specific sites on DNA can considerably impede the target search processes of sequence-specific DNA-binding proteins.
Many transcription
factors and
DNA-repair/modifying enzymes perform their function by recognizing
particular sequences or structural signatures as targets in DNA. In
eukaryotes, this must be accomplished in the presence of billions
of base pairs of genomic DNA containing numerous nonspecific sites
that are structurally similar to the targets. While scanning DNA,
these proteins should encounter numerous sites on DNA, which positively
and negatively impact the kinetics of the protein’s target
search. Nonspecific sites near targets can accelerate the target association
process by creating an antenna that directs the protein to its target
through one-dimensional diffusion along DNA (“sliding”).[1−6] In contrast, nonspecific sites far outside the antenna on the same
DNA or sites on different DNA molecules can effectively trap proteins
because sliding or hopping from such sites does not directly lead
to target association.[4,7]Since the discovery of amazingly
rapid association of Escherichia
coli lac repressor with operator DNA in 1970,[8] many studies have focused on the mechanisms that accelerate
target DNA search by proteins. Translocation processes such as sliding,
hopping, and intersegment transfer were proposed as the mechanisms
for efficient target location, initially based on indirect evidence
from various biochemical experiments.[1,7,9] Mainly in the 21st Century, these translocation processes
were directly confirmed by biophysical methods such as nuclear magnetic
resonance (NMR) and single-molecule techniques.[10−13] Meanwhile, studies that focus
on factors that decelerate the search process remain rare.[5] Trapping of proteins at nonfunctional sites on
DNA could be prevalent in the nucleus because of extremely high DNA
density (∼100 mg/mL).[14] Even though
80% of the DNA is covered by histones,[15] the concentration of accessible DNA (i.e., linkers) in the nuclei
is estimated to be as high as ∼0.5 mM. Furthermore, genomic
DNA includes many sites that are similar to the target sequence. Such
sites, which we term “quasi-specific” sites, should
exhibit relatively high affinities and therefore potentially trap
the proteins more effectively and hinder their search for targets.[16,17] To date, however, the influence of quasi-specific sites on functions
of sequence-specific DNA-binding proteins remains to be investigated
by experimental means.We address this problem for the inducible
transcription factor
Egr-1 (also known as Zif268), which recognizes the 9 bp sequences,
GCG(T/G)GGGCG, via three zinc finger domains.[18,19] In the nervous system, Egr-1 functions as a regulator of synaptic
plasticity to promote memory formation.[20,21] In the cardiovascular
system, Egr-1 mediates the formation of scar tissue and intimal thickening
in response to damage caused by cardiovascular injury.[22,23] To activate these responses, Egr-1 must locate its target sequence
and initiate the gene response within a short time, because of its
limited lifetime in the nucleus (half-life of ∼0.5–1
h).[22] In our previous studies using NMR
and stopped-flow fluorescence spectroscopic methods,[24−31] we investigated DNA scanning and recognition by the DNA-binding
domain (DBD) of Egr-1 at molecular and atomic levels. This system
is suited for research on the target search process, especially because
the Egr-1DBD behaves well in various biochemical and biophysical
characterizations.In this study, using fluorescence methods,
we demonstrate the influence
of various quasi-specific DNA on the efficiency in the target search
by Egr-1. Our work presents a kinetic model for analyzing the effect
of quasi-specific sites during the target DNA search process and provides
insight into how much this effect impedes Egr-1’s search process
in the nucleus.
Materials and Methods
Protein and DNA
The protein construct used in this
study was the Egr-1DBD, which consists of three zinc fingers (humanEgr-1 residues 335–423). For the sake of simplicity, we will
refer to this construct as Egr-1 hereafter. This protein was expressed
in E. coli strain BL21(DE3) and purified as described
in our previous papers.[25,28,29] All fluorescence experiments used a 143 bp probe DNA duplex containing
an Egr-1 target sequence, GCGTGGGCG, near a 5′-end
to which a fluorescein amidite (FAM) is attached (Figure A). The same 143 bp probe DNA
was used in our previous studies.[25,26] This DNA duplex
was generated by polymerase chain reaction (PCR) with a FAM-labeled
primer, an unlabeled reverse primer, and the pUC19 plasmid (New England
BioLabs), and extensively purified with the PCR purification kit (Qiagen),
anion exchange chromatography, and polyacrylamide gel electrophoresis,
as described previously.[25] Four types of
unlabeled 28 bp competitor DNA duplexes were used in these experiments.
One competitor, termed DNA L, is a completely nonspecific duplex (Figure B), which was also
used in our previous work.[25,26,29] The other three competitor 28 bp duplexes are derivatives of DNA
L and contain a quasi-specific site with a 5, 6, or 7 bp match with
the 9 bp target sequence GCGTGGGCG (Figure C). Each chemically synthesized DNA strand
was purchased from Integrated DNA Technologies and purified via Mono-Q
anion exchange chromatography (GE Healthcare). After complementary
strands had been annealed, the 28 bp DNA duplexes were purified with
a second Mono-Q anion exchange chromatography as described. Calf thymus
DNA was purchased from Invitrogen and sonicated for fragmentation
into an average size of ∼500 bp, which was confirmed by 0.9%
agarose gel electrophoresis in TBE buffer (Invitrogen).
Figure 1
Measurement
of relative affinities of quasi-specific DNA duplexes
for the Egr-1 zinc finger protein. (A) FAM-labeled 143 bp DNA duplex
as the probe DNA. The Egr-1 target site is colored red. The same probe
DNA was used in our previous studies.[25,26] (B) Nonspecific
competitor DNA. This 28 bp duplex termed DNA L does not contain any
sites similar to Egr-1. This nonspecific DNA was also used in our
previous studies.[25,26,29] (C) Quasi-specific DNA duplexes LW, LS, and
LT, which contain a sequence similar to the Egr-1 target.
(D) FAM fluorescence emission spectra measured for 2.5 nM probe DNA.
(E) Data from the competition assays. FAM fluorescence was measured
for the solutions of 2.5 nM probe DNA, 30 nM protein, and competitor
DNA at varied concentrations in 10 mM Tris-HCl (pH 7.5), 0.2 μM
ZnCl2, and 150 mM KCl. Fractions of the free probe DNA
were measured from FAM fluorescence as a function of the concentration
of the quasi-specific 28 bp DNA. Solid lines show the best-fit curves
obtained via nonlinear least-squares fitting with eq .
Measurement
of relative affinities of quasi-specific DNA duplexes
for the Egr-1 zinc finger protein. (A) FAM-labeled 143 bp DNA duplex
as the probe DNA. The Egr-1 target site is colored red. The same probe
DNA was used in our previous studies.[25,26] (B) Nonspecific
competitor DNA. This 28 bp duplex termed DNA L does not contain any
sites similar to Egr-1. This nonspecific DNA was also used in our
previous studies.[25,26,29] (C) Quasi-specific DNA duplexes LW, LS, and
LT, which contain a sequence similar to the Egr-1 target.
(D) FAM fluorescence emission spectra measured for 2.5 nM probe DNA.
(E) Data from the competition assays. FAM fluorescence was measured
for the solutions of 2.5 nM probe DNA, 30 nM protein, and competitor
DNA at varied concentrations in 10 mM Tris-HCl (pH 7.5), 0.2 μM
ZnCl2, and 150 mM KCl. Fractions of the free probe DNA
were measured from FAM fluorescence as a function of the concentration
of the quasi-specific 28 bp DNA. Solid lines show the best-fit curves
obtained via nonlinear least-squares fitting with eq .
Competition Assays for the Specific versus Quasi-Specific and
Nonspecific DNA Duplexes
Relative affinities of quasi-specific
DNA duplexes for the Egr-1 zinc finger protein were measured by fluorescence-based
completion assays with an ISS PC1 spectrofluorometer. Using an excitation
wavelength of 460 nm and an emission wavelength of 521 nm, the FAM
fluorescence was measured for 2 mL solutions of the 143 bp FAM-labeled
DNA (2.5 nM), protein (30 nM), and competitor DNA (0–64 μM)
in a buffer of 10 mM Tris-HCl (pH 7.5), 0.2 μM ZnCl2, and 150 mM KCl. FAM fluorescence was also measured in the absence
of both protein and competitor DNA, which corresponds to the maximal
fluorescence intensity caused by the absence of quenching by macromolecular
interactions. The FAM fluorescence in the presence of 30 nM protein
but in the absence of competitor DNA corresponds to the minimal intensity
because of complete association of the target with the protein under
these conditions. The FAM fluorescence was measured as a function
of concentrations of competitor DNA and was normalized to the intensity
of the free probe with no competitor DNA. A control experiment with
no protein but with competitor DNA was also performed under identical
conditions. The normalized intensities from the control experiment
were subtracted from the intensity data at individual concentrations
of competitor DNA, so that any direct influence of competitor DNA
on FAM fluorescence would be removed. The fraction of the free probe
DNA (pfree) was calculated from these
intensities, assuming that each obtained intensity is the population-weighted
average of the intensities for the free and protein-bound states of
the probe DNA.When the total concentrations of the probe DNA
(Dtot), protein (Ptot), and competitor DNA (Ctot)
satisfy the relationship Dtot ≪ Ptot ≪ Ctot, the fraction of the probe DNA in the free state (pfree) is given by[30]where Kd(comp) and Kd(probe) are the dissociation constants
for the competitor and probe DNA duplexes, respectively. The observed
fluorescence intensity (Iobs) should be
a function of pfree as follows:where Ifree and Ibound are intrinsic fluorescence intensities
for free and protein-bound probe DNA duplexes, respectively. If Ctot ≫ Kd(comp), eq becomes a simple
expression:The parameter Γ represents a relative
affinity defined as Kd(probe)/Kd(comp). This equation was used to determine
the relative affinity Γ of the quasi-specific DNA duplexes via
nonlinear least-squares fitting to the experimental Iobs data as a function of Ctot. Note that reaching the asymptote at high concentrations of the
competitor in this titration experiment is not a requisite for determination
of Γ, because the asymptote corresponds to Ifree, the fluorescence intensity of the free state of
the probe DNA, which was directly measured. The fitting calculations
were performed with the MATLAB software.
Stopped-Flow Fluorescence
Kinetic Assays
The target
search kinetics of Egr-1 was measured at 20 °C using an ISS PC-1
spectrofluorometer equipped with an Applied Photophysics RX.2000 stopped-flow
device. In these experiments, the following two solutions were rapidly
mixed in a 1:1 volume (∼0.5 mL) ratio by the stopped-flow device:
(1) a solution of the Egr-1 zinc finger protein and (2) a DNA solution
of FAM-labeled probe DNA and competitor DNA. Both solutions were in
a buffer of 10 mM Tris-HCl (pH 7.5), 0.2 μM ZnCl2, and 150 mM KCl. Immediately after the flow for mixing had been
stopped, the time course data of the fluorescence intensity were collected
for 4–35 s with a time interval of 20–50 ms. The FAM
fluorophore was excited at 460 nm, and the emission light that passed
through a long-pass filter with a cutoff at 515 nm (Edmund Optics)
was recorded. For the competitor, we used the synthetic 28 bp duplexes
shown in Figure and
the sonicated calf thymus DNA. When the mixtures of synthetic 28 bp
duplexes were used as competitor DNA, the total concentrations of
nonspecific and quasi-specific 28 bp duplexes was kept constant at
2 μM, though the concentrations of quasi-specific duplexes were
varied between 0.05 and 0.25 μM. When the sonicated calf thymus
DNA was used as the competitor, the experiment was performed at two
different “base pair” concentrations, 56 and 112 μM
(corresponding to 37 and 74 μg/mL, respectively). Each measurement
was repeated 8–20 times via multiple injections. In all kinetic
measurements, the concentration of the probe DNA (Dtot) was 2.5 nM, whereas the concentrations of the protein
(Ptot) and competitor (Ctot) were varied. To create a pseudo-first-order condition
that simplifies the kinetic analysis,[32] all binding reactions were conducted under conditions of Dtot ≪ Ptot ≪ Ctot.[25,26] The apparent pseudo-first-order rate constant (kapp) for target association was determined from the time
course of fluorescence intensity, I(t), via nonlinear least-squares fitting withwhere I0 and I represent the intensities
at time zero and infinite time, respectively. Rate constant kapp was measured as a function of protein, and
the protein concentration dependence data were analyzed with the kinetic
model that is described below. MATLAB software was used for nonlinear
least-squares fitting.
Results
Relative Affinities of
Quasi-Specific DNA Duplexes
For quantitative characterizations
of the quasi-specific sites, we
first assessed their relative affinities with respect to the target
site. Our previous studies[25,26,29] on nonspecific interactions between the Egr-1 zinc finger protein
and DNA utilized a completely nonspecific 28 bp duplex, which we term
DNA L (Figure B).
This DNA does not contain any sequences similar to the Egr-1 target.
For the investigations of quasi-specific sites, we made three variants
of DNA L, which were named LW, LS, and LT (Figure C).
Each contains a quasi-specific sequence involving a 5 bp (LW), 6 bp (LS), or 7 bp (LT) match with the 9
bp target sequence GCGTGGGCG, and the subscripts in the
names of these variants stand for weak, strong, and tight, respectively,
representing their relative affinity for Egr-1.Using fluorescence-based
competition assays,[30] we investigated affinities
of these quasi-specific DNA duplexes. In these experiments, the Egr-1
zinc finger protein (30 nM) and the FAM-labeled 143 bp probe DNA (2.5
nM) were mixed with competitor DNA, and the FAM fluorescence at equilibrium
was measured as a function of the competitor concentration. A fluorescent
FAM moiety is attached covalently to the 5′-end proximal to
the target site on the probe DNA. The FAM fluorescence is partially
quenched upon Egr-1’s association with the target site (Figure D). In the absence
of competitor DNA, the target site on the probe DNA is virtually 100%
bound to the protein because of its high affinity for the target (Kd < 0.1 nM) under the current conditions.[25,26] Addition of high-affinity quasi-specific DNA increased the unbound
target due to transfer of protein from the target to the competitor,
thereby reducing the fluorescence quenching effect (Figure D). From the fluorescence intensity
data along with the intensities for the free and protein-bound states,
we obtained the fractions of the free state of the target site on
the probe DNA at individual concentrations of competitor DNA (Figure E). Competitor DNA
duplexes at high concentrations outcompeted the target site on the
probe DNA, increasing the fraction of its free state. Using these
data, we determined the relative affinities of these quasi-specific
DNA duplexes with respect to the affinity of the target on the probe
DNA via nonlinear least-squares fitting with eq . The first two concentration points were
excluded from the fitting calculations because these concentrations
do not satisfy the inequality Ptot ≪ Ctot, which is required for eq . The best-fit curves are shown in Figure E. Values of Γ
= Kdquasi-specific/Kdspecific for DNA duplexes LT, LS, and LW were determined to be 5.6
± 0.8, 25 ± 4, (3.9 ± 1.7) × 103, respectively.
These results qualitatively indicate that a sequence more similar
to the target sequence exhibits a stronger affinity, which is quite
reasonable. This set of quasi-specific DNA duplexes allowed us to
examine the relationship between the affinity and kinetic impact of
quasi-specific sites, as described below.
Impact of Quasi-Specific
Sites on the Kinetics of Target Search
By stopped-flow fluorescence
assays similar to those described
in our previous studies,[25,26] we investigated the
influence of the quasi-specific DNA on the target search kinetics
of Egr-1. The basic scheme for the kinetic experiment is depicted
in Figure A. In these
assays, a protein solution is mixed with a DNA solution containing
the probe DNA (final concentration, 2.5 nM), nonspecific competitor
DNA L, and quasi-specific competitor DNA LW, LS, or LT. The final total concentration of the competitor
DNA duplexes (i.e., nonspecific + quasi-specific) was kept constant
at 2000 nM, whereas the concentration of the quasi-specific competitor
was varied. Immediately after the flow of mixing was stopped, the
reaction time course for the association of the protein to the target
site was recorded by measuring the change in the FAM fluorescence
intensity over time. Some of the time course data are shown in Figure B. The percent change
in fluorescence intensity was typically 3–7%, depending on
the fraction of the protein-bound state of the target site on DNA
at equilibrium. The change was relatively small when the target site
on the probe DNA (2.5 nM) was outcompeted by the high-affinity quasi-specific
site of a substantially higher concentration (e.g., see the data with
50 nM DNA LT in Figure B). Time courses for the fluorescence intensity were
found to be monoexponential. The pseudo-first-order rate constants
(kapp) were determined from the time course
data at various concentrations of protein and quasi-specific DNA.
Figure 2
Impact
of quasi-specific DNA on the target search kinetics of the
Egr-1 zinc finger protein. (A) Schematic of the stopped-flow fluorescence
assay for investigating the impact of quasi-specific DNA. In this
assay, the change in FAM fluorescence was monitored upon mixing the
solution of the Egr-1 zinc finger protein with the solution containing
the 143 bp FAM-labeled DNA, nonspecific 28 bp DNA, and quasi-specific
28 bp DNA. The concentration of the probe DNA was 2.5 nM. The total
concentration of 28 bp duplexes (nonspecific + quasi-specific) was
2000 nM, and the concentration of the quasi-specific 28 bp DNA was
varied. (B) Examples of the fluorescence time course data and monoexponential
fittings. (C–E) Protein concentration dependence of the apparent
pseudo-first-order rate constant (kapp) for target association in the presence of quasi-specific DNA LW (C), LS (D), or LT (E). Circles show
the kapp constants obtained from monoexponential
fitting to the fluorescence time course data. The solid lines represent
the best-fit curves obtained via nonlinear least-squaring fitting
with eqs –9. In these calculations, only two parameters, Kd,q and ka0, were
optimized. The buffer conditions for these experiments were 10 mM
Tris-HCl (pH 7.5), 0.2 μM ZnCl2, and 150 mM KCl.
Note that protein concentration dependence of the target search kinetics
becomes biphasic (rather than linear) in the presence of high-affinity
quasi-specific sites.
Impact
of quasi-specific DNA on the target search kinetics of the
Egr-1 zinc finger protein. (A) Schematic of the stopped-flow fluorescence
assay for investigating the impact of quasi-specific DNA. In this
assay, the change in FAM fluorescence was monitored upon mixing the
solution of the Egr-1 zinc finger protein with the solution containing
the 143 bp FAM-labeled DNA, nonspecific 28 bp DNA, and quasi-specific
28 bp DNA. The concentration of the probe DNA was 2.5 nM. The total
concentration of 28 bp duplexes (nonspecific + quasi-specific) was
2000 nM, and the concentration of the quasi-specific 28 bp DNA was
varied. (B) Examples of the fluorescence time course data and monoexponential
fittings. (C–E) Protein concentration dependence of the apparent
pseudo-first-order rate constant (kapp) for target association in the presence of quasi-specific DNA LW (C), LS (D), or LT (E). Circles show
the kapp constants obtained from monoexponential
fitting to the fluorescence time course data. The solid lines represent
the best-fit curves obtained via nonlinear least-squaring fitting
with eqs –9. In these calculations, only two parameters, Kd,q and ka0, were
optimized. The buffer conditions for these experiments were 10 mM
Tris-HCl (pH 7.5), 0.2 μM ZnCl2, and 150 mM KCl.
Note that protein concentration dependence of the target search kinetics
becomes biphasic (rather than linear) in the presence of high-affinity
quasi-specific sites.While the total competitor concentration was kept at 2 μM,
the pseudo-first-order rate constants (kapp) were measured at various concentrations of the protein in the presence
of 80, 150, and 250 nM quasi-specific DNA LW (Figure C) or LS (Figure D). For
the quasi-specific DNA LT, only a single concentration
of 50 nM was tested (Figure E) because the kinetic measurement at a higher concentration
of this duplex was difficult due to the small magnitude of the fluorescence
change. In all cases tested, we found that the presence of the quasi-specific
DNA made the target search kinetics considerably slower. For each
quasi-specific DNA, we measured the rate constants kapp using various concentrations of Egr-1, starting at
low concentrations (10–25 nM) and increasing until we reached
the upper limit of our instrument’s measurable range (∼20
s–1). We found that as we increased the protein
concentration, the rate of association increased, as well. In the
case with only nonspecific DNA L being present as a competitor, the
dependence of kapp on protein concentration
was linear (black in Figure C–E), as expected for any second-order process. The
data for the cases in the presence of DNA LW were also
almost linear (Figure C). However, we found that the protein concentration dependence of kapp in the presence of DNA LS or
LT was clearly biphasic rather than linear (Figure D,E). At concentrations below
the concentration of quasi-specific DNA, the rate of Egr-1 increased
linearly with a shallow slope. However, when the concentration of
the Egr-1 zinc finger protein exceeded that of the quasi-specific
DNA, the slope increased dramatically and proceeded again in a linear
fashion. This tendency was more pronounced for high-affinity quasi-specific
DNA.
Kinetic Model for the Target Search in the Presence of Quasi-Specific
Sites
To quantitatively understand the kinetic influence
of the quasi-specific site, we modified our previous analytical expression
for the target search kinetics in the presence of nonspecific competitor
DNA. Previously, for a system involving protein, probe DNA, and competitor
DNA, we showed that when Dtot ≪ Ptot ≪ Ctot, the apparent second-order rate constant (ka) for target association is related to the intrinsic association
rate constant (kon,n) for each nonspecific
site as follows:[26]Parameter ρ represents
a scaling factor
(0 < ρ < 1) due to the trapping of protein at nonspecific
sites and corresponds to the fraction of protein molecules that are
not trapped by any nonspecific sites during the target search process.
Parameter S represents the so-called antenna effect;[4,26,33] nonspecific sites near the target
on the same DNA serve as an antenna that attracts the protein and
makes the target association S-fold faster. Parameter
η represents an enhancement factor (η > 1) due to intersegment
transfer. On the basis of the discrete stochastic kinetic model of
Veksler and Kolomeisky,[34] we previously
gave explicit forms of parameters η and S as
functions of various kinetic rate constants, equilibrium constants,
and configurational factors.[26] When Dtot ≪ Ptot ≪ Ctot, parameter ρ is
given bywhere Z corresponds to a
partition function for protein at the pseudoequilibrium during the
target search process, Ntot is the total
concentration of nonspecific sites (on competitor and probe DNA, excluding
those in the antenna region), and Kd,n is the dissociation constant for each nonspecific site.For
the systems involving quasi-specific sites on competitor DNA, we make
the following two assumptions: (1) Parameters S and
η are virtually unaffected by the presence of quasi-specific
sites on competitor DNA, and (2) interactions of protein with quasi-specific
sites and with nonspecific sites reach steady states well before the
interaction with the target site reaches equilibrium. The first assumption
should be valid in the current case because the quasi-specific sites
are located only on the competitor DNA, not on the probe DNA. The
second assumption is justified when the concentrations of the quasi-specific
and nonspecific sites are far greater than the concentration of the
target site. The pseudoequilibrium for the nonspecific DNA was rigorously
validated using exact numerous simulations for the system with only
nonspecific competitor DNA in our previous work.[25] Under the assumption of the pseudoequilibrium during the
target search process, the trapping effect is represented by the following
parameter ρnq:where Znq represents
a partition function in the form of the binding polynominal[35] for protein at the pseudoequilibrium in the
presence of quasi-specific sites; [Q] is the concentration of the
quasi-specific sites in the free state; and Kd,q is the dissociation constant for each quasi-specific site.Equation together
with eq can qualitatively
explain the biphasic dependence of the apparent pseudo-first-order
rate constants (kapp) on the total protein
concentration (Ptot) as seen in Figure C–E. A slope
of protein concentration dependence corresponds to an apparent second-order
rate constant ka. When the protein concentration
is low, a high affinity (i.e., Kd,q ≪ Kd,n) of the quasi-specific site and a large
fraction of its free state can make the [Q]/Kd,q term predominant in partition function Znq, rendering ρnq ≪ ρ. This
corresponds to the first phase of the biphasic dependence, where the
slope is far gentler than one in the absence of the quasi-specific
sites. When the protein concentration is significantly higher than
the total concentration of the quasi-specific site, most quasi-specific
sites are bound to the protein and [Q] can become virtually zero,
making ρnq ≈ ρ. This corresponds to
the second phase of the biphasic dependence, where the slope should
be virtually the same as that in the system involving no quasi-specific
site.In the case presented here, the concentration of quasi-specific
sites in the free state in the pseudoequilibrium during the target
search process is given bywhere Z = 1 + Ntot/Kd,n (i.e., the same as Z in eq ).
This expression is derived by solving the equations Kd,n = Ntot[P]/[NP], Kd,q = [Q][P]/[QP], Ptot = [P] + [NP] + [QP], and Qtot = [Q]
+ [QP], where [P], [NP], and [QP] represent the concentrations of
free protein, nonspecific sites bound to protein, and the quasi-specific
site bound to protein, respectively. The apparent pseudo-first-order
rate constant (kapp) is given bywhere ka0 corresponds
to the second-order rate constant when no quasi-specific site is involved
in competitor DNA.For our experimental data in panels C–E
of Figure , we conducted
fitting calculations
with eqs –9 via optimization of two parameters, ka0 and Kd,q. These calculations
require the experimental value of the dissociation constant (Kd,n) for the affinity of each nonspecific site.
In our previous study,[26] we determined Kd,n to be 16 μM for Egr-1 under the identical
buffer conditions with 150 mM KCl. The best-fit curves are shown together
with the experimental data in the graphs in Figure C–E. The fitting gave good agreement
with the experimental data. From these fittings to the kinetic data, Kd,q values of DNA duplexes LT, LS, and LW were calculated to be 0.07 ± 0.05,
1.0 ± 0.3, and 44 ± 7 nM, respectively. With experimental
uncertainties taken into consideration, ratios of these values from
the kinetic data are consistent with the relative affinity data from
the competition assays. These results suggest that our kinetic model
can explain the kinetic influence of quasi-specific sites both qualitatively
and quantitatively.
Quasi-Specific Sites in Genomic DNA
To examine the
influence of natural quasi-specific sites in genomic DNA on the target
search kinetics of Egr-1, we conducted the stopped-flow fluorescence
assays using calf thymus DNA as a competitor. In this experiment,
sonicated calf thymus DNA (average length, ∼500 bp) was used
instead of synthetic duplexes such as DNA L, LW, LS, and LT. Using base pair concentrations of 56
and 112 μM for the sonicated calf thymus DNA (equivalent to
2 and 4 μM, respectively, for 28 bp DNA) as a competitor, we
measured the target search kinetics of Egr-1 at 150 mM KCl. Figure shows the dependence
of measured kapp constants on protein
concentration. The dependence in these experiments with calf thymus
DNA appeared to be nonlinear, as seen in the case with synthetic quasi-specific
DNA. In fact, fitting with proportional functions assuming a simple
second-order kinetics gave poor agreement with the experimental data
as shown in Figure (dotted lines). These results strongly suggest the significant influence
of quasi-specific sites in calf thymus DNA.
Figure 3
Evidence of the kinetic
influence of natural quasi-specific sites
on the target search process by Egr-1. The graph shows the protein
concentration dependence of kapp constants
measured with the stopped-flow assay using calf thymus DNA as a competitor.
To reduce the viscosity, calf thymus DNA was fragmented into an average
length of ∼500 bp by sonication. The dotted lines represent
fitting with proportional functions. The solid lines are the best-fit
curves obtained via nonlinear least-squares fitting with eqs –9. The fitting calculation was performed for the two data sets simultaneously.
In this calculation, four fitting parameters were optimized: two ka0 parameters at two different overall DNA concentrations,
the apparent affinity (Kd,q), and probability
(fq) of quasi-specific sites. The global
fitting calculations gave an apparent probability of the quasi-specific
sites among the genomic DNA of 0.28 ± 0.03%, and Kd,q = 3.7 ± 0.8 nM. These data suggest that there
are ∼106–107 quasi-specific sites
with high affinity for Egr-1 in the genomic DNA.
Evidence of the kinetic
influence of natural quasi-specific sites
on the target search process by Egr-1. The graph shows the protein
concentration dependence of kapp constants
measured with the stopped-flow assay using calf thymus DNA as a competitor.
To reduce the viscosity, calf thymus DNA was fragmented into an average
length of ∼500 bp by sonication. The dotted lines represent
fitting with proportional functions. The solid lines are the best-fit
curves obtained via nonlinear least-squares fitting with eqs –9. The fitting calculation was performed for the two data sets simultaneously.
In this calculation, four fitting parameters were optimized: two ka0 parameters at two different overall DNA concentrations,
the apparent affinity (Kd,q), and probability
(fq) of quasi-specific sites. The global
fitting calculations gave an apparent probability of the quasi-specific
sites among the genomic DNA of 0.28 ± 0.03%, and Kd,q = 3.7 ± 0.8 nM. These data suggest that there
are ∼106–107 quasi-specific sites
with high affinity for Egr-1 in the genomic DNA.To gain insight into the quantity and affinity of quasi-specific
sites in calf thymus DNA, we used our kinetic model to conduct global
fitting for the 56 and 112 μM base pair data. In this calculation,
we defined a probability fq for quasi-specific
sites, with which Qtot = fqNtot in eqs and 9, and optimized
four parameters: fq, Kd,q, and two ka0 parameters
individually defined for the two data sets. Application of the current
kinetic model to the genomic DNA containing various different quasi-specific
sites is obviously simplistic, because this model assumes a uniform Kd,q for all quasi-specific sites. Therefore,
the affinity (Kd,q) and concentration
(Qtot) from these calculations should
be regarded merely as apparent parameters. The global fitting calculation
with eqs –9 showed excellent agreement with both experimental
data sets (solid lines, Figure ) and yielded a coefficient of determination higher than that
of the linear model (R2 values of 0.985
vs 0.849). This calculation gave values for the apparent affinity
(Kd,q) and probability of quasi-specific
sites (fq) of 3.7 ± 0.8 nM and 0.0028
± 0.0003, respectively. These results suggest that high-affinity
quasi-specific sites number as many as ∼106–107 in 3 billion base pairs of calf thymus genomic DNA.
Discussion
Trapping
at Nonfunctional Sites
Recently, methods such
as ChIP-on-chip[36] and ChIP-seq[37] have allowed for genome-wide studies of binding
sites of transcription factors in vivo. Such genome-wide
studies showed that transcription factors bind to many DNA sites that
are apparently nonfunctional in the nuclei.[38,39] As these methods detect only high occupancies of transcription factors
at sites with the strongest affinities,[40] there must be a far greater number of quasi-specific sites with
weaker affinities that are similar to the recognition sequences. This
should be particularly true for eukaryotes because their genome is
large and eukaryotic transcription factors recognize relatively short
sequences (typically <10 bp).[41,42] Because of
the large abundance, quasi-specific sites could substantially influence
transcription factors in vivo in both thermodynamic
and kinetic terms, as theoretically considered by Chakrabarti et al.[16]In fact, our current results from the
kinetic experiment with calf thymus DNA suggest that target DNA search
by Egr-1 can be considerably impeded due to ∼106–107 quasi-specific sites, which substantially
increase the mean search time of Egr-1. For a pool of random sequences,
the probability of finding m bp match in a window
of n bp covered by a transcription factor is given
bywhere C represents combinations and
the factor of 2 accounts
for the sequence match for the complementary strand. Using this, the
total number of quasi-specific sites (m ≥
6) for Egr-1 (n = 9) is estimated to be on the order
of 107 sites in human genomic DNA comprised of 3 ×
109 bp. Thus, our experimental results are roughly consistent
with this probabilistic estimate.
Potential Role of Quasi-Specific
Sites in the Regulation of
Transcription Factors
Another remarkable finding from our
data is that the adverse effect of the quasi-specific sites on target
association disappears once quasi-specific sites are completely occupied
by proteins. This gives two important implications. First, a relatively
high expression level of the transcription factors is required for
efficient regulation of their target genes, unless other proteins
occupy the nonfunctional quasi-specific sites. When the level of the
transcription factor exceeds a threshold at which binding to quasi-specific
sites is saturated, target association of the transcription factors
will become drastically enhanced. This sharp response is essentially
similar to the ultrasensitive response caused by protein sequestration,
which was studied for genetic circuits in yeast.[43,44] Second, functions of the transcription factors would be considerably
enhanced if other proteins (e.g., histones and other nuclear proteins)
bind to the quasi-specific sites and make them inaccessible for the
transcription factors. The quasi-specific sites could also be blocked
by other proteins of the same transcription factor family due to similar
sequence specificity in DNA binding. DNA methylation could block quasi-specific
sites by altering their affinities or by attracting methyl-CpG-binding
proteins to quasi-specific sites containing methylated CpG dinucleotides.
The latter should be particularly relevant to Egr-1. The 9 bp Egr-1
target sequences contain two CpG dinucleotides, yet their methylation
does not weaken association of Egr-1 with target DNA in vitro.[30] Interestingly, a genome-wide ChIP-on-chip
study of Egr-1-binding sites[45] showed that
the functional target sites for Egr-1 are colocalized with CpG islands.
Note that DNA methylation is rare (typically <10%) in CpG islands,
although the overall CpG methylation level is as high as 80% in mammalian
genomic DNA.[46−49] Because of this distribution, it is likely that methyl-CpG-binding
proteins do not block the functional target sites
for Egr-1 in the CpG islands but do block the majority
of quasi-specific sites. Western blot and DNA association data for
nuclear extracts (e.g., refs (22) and (23)) suggest that when induced, the level of nuclear Egr-1 in
vivo is roughly on the order of 10–9 to
10–7 M, corresponding up to ∼104 copies per nucleus. Considering that this number is smaller than
the estimated number of quasi-specific sites in genomic DNA, blocking
or releasing of quasi-specific sites may work as an effective mechanism
for the regulation of Egr-1 and other transcription factors. Further
studies are required to examine this interesting possibility.
Concluding
Remarks
This study demonstrates a quantitative description
of the impact
of quasi-specific sites on target search kinetics for Egr-1. Depending
on the affinities and numbers of quasi-specific sites, they can substantially
impede
the search process due to trapping of the protein. Because of this
effect, the protein concentration dependence of the apparent pseudo-first-order
kinetic rate constant for target association in the presence of quasi-specific
sites is biphasic (rather than linear) despite the second-order nature
of the target association process. When all quasi-specific sites are
saturated with proteins, the target association becomes far faster
because the strong trapping effect becomes absent. Given this observation,
it is reasonable to consider that quasi-specific sites can substantially
attenuate functions of transcription factors in vivo and that quasi-specific sites might play a role in the regulation
of transcription factors via indirect interplay with other nuclear
proteins.
Authors: William W Fisher; Jingyi Jessica Li; Ann S Hammonds; James B Brown; Barret D Pfeiffer; Richard Weiszmann; Stewart MacArthur; Sean Thomas; John A Stamatoyannopoulos; Michael B Eisen; Peter J Bickel; Mark D Biggin; Susan E Celniker Journal: Proc Natl Acad Sci U S A Date: 2012-12-10 Impact factor: 11.205
Authors: Anton Valouev; Steven M Johnson; Scott D Boyd; Cheryl L Smith; Andrew Z Fire; Arend Sidow Journal: Nature Date: 2011-05-22 Impact factor: 49.962