Junji Iwahara1, Anatoly B Kolomeisky2. 1. Department of Biochemistry and Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX 77555, USA. Electronic address: j.iwahara@utmb.edu. 2. Department of Chemistry, Department of Chemical and Biomolecular Engineering, Department of Physics and Astronomy and Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA.
Abstract
To perform their functions, transcription factors and DNA-repair/modifying enzymes randomly search DNA in order to locate their specific targets on DNA. Discrete-state stochastic kinetic models have been developed to explain how the efficiency of the search process is influenced by the molecular properties of proteins and DNA as well as by other factors such as molecular crowding. These theoretical models not only offer explanations on the relation of microscopic processes to macroscopic behavior of proteins, but also facilitate the analysis and interpretation of experimental data. In this review article, we provide an overview on discrete-state stochastic kinetic models and explain how these models can be applied to experimental investigations using stopped-flow, single-molecule, nuclear magnetic resonance (NMR), and other biophysical and biochemical methods.
To perform their functions, transcription factors and DNA-repair/modifying enzymes randomly search DNA in order to locate their specific targets on DNA. Discrete-state stochastic kinetic models have been developed to explain how the efficiency of the search process is influenced by the molecular properties of proteins and DNA as well as by other factors such as molecular crowding. These theoretical models not only offer explanations on the relation of microscopic processes to macroscopic behavior of proteins, but also facilitate the analysis and interpretation of experimental data. In this review article, we provide an overview on discrete-state stochastic kinetic models and explain how these models can be applied to experimental investigations using stopped-flow, single-molecule, nuclear magnetic resonance (NMR), and other biophysical and biochemical methods.
In cells, genomic DNA molecules are gigantic polymers containing millions to billions of nucleotide residues. To regulate particular genes, transcription factors must locate functional target sites within particular regulatory regions in the genome [1]. For maintaining the genomic integrity, DNA-repair enzymes must detect damages buried among numerous intact nucleotide residues of the genome [2]. These DNA-binding proteins specifically recognize particular structural signatures in DNA. They can also interact with nonspecific sites on DNA. Although the interactions are weaker for individual nonspecific sites, the vast quantity of nonspecific DNA segments compensates for their weak affinity. Kinetic and thermodynamic efficiencies for the proteins to bind to their functional targets on DNA are strongly influenced by prior interactions with non-target sites on genomic DNA [3-6].Target search processes of DNA-binding proteins have been intensively studied in the past five decades. Since Riggs et al. discovered astonishingly rapid target location by the lac repressor in 1970 [7], the mechanisms allowing the DNA-binding proteins to efficiently locate their targets on DNA have been studied both experimentally and theoretically [8-27]. Arguably the most impactful work in this area was a series of papers published in 1981 by Berg, Winter, and von Hippel [28-30]. These researchers theorized some of the key concepts for protein translocation on DNA and used them to explain biochemical data on the lac repressor. They hypothesized that proteins search for their targets on DNA via several translocation modes such as sliding and hoping. This work was remarkable in that these hypothetical concepts were postulated when only a very limited number of experimental methods and no crystal structures of protein-DNA complexes were available. Now, sliding of proteins on DNA is a well-established fact, which has been directly observed for many DNA-binding proteins in vitro and even in vivo by single-molecule methods [8,10,14,19-21,31]. Other methods such as nuclear magnetic resonance (NMR) spectroscopy [13,32], stopped-flow fluorescence [33-35], and elaborate biochemical approaches [36-42] also provide rich and quantitative information about how proteins locate their targets on DNA.In the 21st century, remarkable progress has also been made in understanding the molecular mechanisms of target DNA search processes (e.g. reviewed in Refs. [15, 18, 23, 25, 43]). A large number of theoretical studies has utilized the so-called chemical-kinetic or discrete-state stochastic models for binding of proteins at nonspecific sites on DNA [5,44-57]. For convenience, we refer to them as discrete-state stochastic kinetic models. The main advantage of these models is that exact analytical expression for the mean search times can be obtained, as recently reviewed [25]. Many features of target DNA search (e.g., impacts of sequence heterogeneity, crowding, traps, DNA-looping, protein conformational fluctuations) were theoretically investigated obtaining the search time as an analytical function of experimentally measured parameters (rates, diffusion constants, length of DNA, etc.) for underlying processes and configurations [44-54,56].Importantly, because the discrete-state stochastic kinetic models can provide testable predictions, they can also facilitate experimental studies of the target DNA search processes. By incorporating the models into experimental investigations, kinetic rate constants and other parameters relevant to protein translocation on DNA can be determined. The analytical functions of target search kinetics are particularly useful for such experimental investigations. In this review article, we provide an overview on discrete-state stochastic kinetic models, physical meaning of involved parameters, and experimental applications of the models. Showing some examples, we explain how these theoretical models can facilitate experimental analysis and interpretation of various biophysical and biochemical observations on the search processes.
Discrete-state stochastic kinetic models for protein translocation on DNA
Justification for discrete states
In discrete-state stochastic kinetic models for protein translocation on DNA, discrete states are defined for proteins being bound nonspecifically at different sites on DNA. One might rather suppose that protein translocation on DNA should occur in a continuous (as opposed to discrete) manner. Coarse-grained molecular dynamics simulations elegantly display continuous movements of proteins on DNA (e.g. Refs. [58-63]). Sliding of proteins on DNA is a random-walk process that is typically regarded as one-dimensional (1D) diffusion [28]. In the free state, a protein molecule undergoes diffusion in a three-dimensional (3D) diffusion. Obviously, it would be unreasonable to assume discrete states at different positions for the protein undergoing 3D diffusion in a solution. One may wonder why discrete states can be assumed for a protein undergoing 1D diffusion along DNA. What justifies the use of the discrete-state models?Unlike 3D diffusion, sliding is not a barrierless process. In other words, for the protein to slide to an adjacent site, the protein molecule must first break interactions with nucleotides at the current site. Solution NMR studies on nonspecific DNA complexes showed that despite perpetual changes in binding sites, nonspecific DNA complexes of proteins share many structural features with the corresponding specific DNA complexes with the targets [64-69]. In nonspecific DNA complexes, intermolecular ion pairs should be formed between protein basic side chains and DNA phosphates. The protein molecule must transiently break all of these ions pairs when it moves from one site to another on DNA. The requirement of breaking all ion pairs could represent an energy barrier for sliding [70]. In fact, for many proteins, the 1D diffusion coefficient for the sliding on DNA is ~102–103 fold smaller than the 3D diffusion coefficient calculated with the Stokes-Einstein equation that gives the diffusion coefficient as a function of the hydrodynamic radius, viscosity, and temperature. This fact implies the presence of energy barriers for translocation of a protein along DNA. Upon overcoming a barrier, the protein can slide to an adjacent site and may form ion pairs with a shifted set of DNA phosphates. These energy barriers clearly define different protein states, allowing for successful use of discrete-state stochastic models.
Example model
Fig. 1A depicts an example of the discrete-state stochastic kinetic models for target DNA search by proteins. Similar, but more elaborate models were also developed to account for the effect of semi-specific sites viewed as traps, sequence heterogeneity, crowding, protein conformational dynamics and DNA looping [44-54,56]. The model shown in Fig. 1A involves two types of DNA duplexes whose concentrations and configurations are different. One of the DNA duplexes contains a total of L sites, of which only the m-th site from an edge is a target of the protein, and all others are nonspecific sites. The other DNA duplex (referred to as the ‘competitor DNA’) contains a total of M nonspecific sites and no target. This model can represent various systems, including one involving target-containing and non-target-containing segments separated by nucleosomes (Fig. 1B). Individual sites on each DNA are overlapped and shifted by a single base pair (bp) from adjacent sites (Fig. 1C). For example, for a 100-bp DNA and a protein that covers a 10-bp region, the number of sites L is 91 (=100 − 10 + 1).
Fig. 1.
A discrete-state stochastic kinetic model for target DNA search by proteins. (A) Sites and processes involved in the model. The system involves two types of DNA duplexes: one containing a target and the other nonspecific sites only. (B) Target search on linker DNA between nucleosomes. This can be modeled by the system shown in Panel A. (C) Nonspecific binding sites for a protein on B-form DNA. Each site is overlapped and shifted by 1 bp from adjacent sites.
This model involves the kinetic rate constants for dissociation, association, sliding, and intersegment transfer. The rate constants for the specific complex and the nonspecific complexes are separately defined, for which the subscripts ‘S’ and ‘N’ are used, respectively. The parameters used in this model are summarized in Table 1. The model involves the intrinsic (as opposed to apparent) association rate constants k and k, the dissociation rate constants k and k, the first-order rate constants k and k for sliding, and the second-order rate constants k and k for intersegment transfer (also known as direct transfer) between nonspecific sites on two distinct DNA duplexes.
Table 1
Parameters for the model shown in Fig. 1.
Parameters
Symbols
Units
Kinetic rate constants
Apparent rate constant for target association
ka
M−1 s−1
Intrinsic associate rate constants[a]
kon,N (kon,S)
M−1 s−1
Intrinsic dissociate rate constants[a]
koff,N (koff,S)
s−1
Sliding for a nonspecific site to an adjacent site[a]
ksl,N (ksl,S)
s−1
Rate constant for intersegment transfer[a]
kIT,N (kIT,S)
M−1 s−1
DNA parameters
Number of sites in the target-containing DNA segment
L
(unitless)
Position of the target
m
(unitless)
Number of sites in the nonspecific DNA segment
M
(unitless)
Number of possible protein orientations for each nonspecific site (1 or 2)
ϕ
(unitless)
Base-pair thickness (3.4 Å)
lb
m
Concentrations
Total concentration of target-containing DNA segment
Dtot
M
Total concentration of nonspecific DNA segment
Ctot
M
Total protein concentration
Ptot
M
Subscripts ‘S’ and ‘N’ in these symbols are for ‘specific’ and ‘nonspecific’ sites, respectively.
Relation to one-dimensional diffusion coefficient for sliding
In the discrete-state stochastic kinetic model presented above, k is defined as the rate constant for sliding from one site to an adjacent site. However, it is practically difficult to directly observe each step of sliding because the distance separating two neighboring sites is very short, 0.34 nm, the thickness of 1 base pair (bp). For example, although single-molecule analysis can provide a 1D diffusion coefficient D1 for sliding, the spatiotemporal resolution of single-molecule methods is not high enough to detect a shift of the protein’s position by 1 bp via sliding. The rate constant k for a single-step sliding from one site to an adjacent site is directly related to the 1D diffusion coefficient D1 as follows [33]:
where l is the length of 1 bp along the DNA axis (i.e., l = 3.4 × 10−10 m).The derivation of Eq. 1 is straightforward but worth describing here because it was not discussed previously. By definition, the diffusion coefficient D1 is related to the mean squared displacement as follows:
For a discrete-state system, since adjacent sites are overlapped and shifted by 1 bp (see Fig. 1C), the mean squared displacement of a protein along the DNA axis from the initial position is given by:
where j represents an index for each site; p represents the probability of finding the protein at the site j; and i is the site index for the protein’s initial position. This probability p is defined for a consecutive sliding process and ∑p = 1. During the consecutive sliding process from the initial association with DNA until dissociation, the differential equation for the time evolution of p for each site (but not for the ends of DNA) is:
Neglecting the two ends, which is valid for long DNA chains when p for the end sites are small, Eqs. 3 and 4 give the time derivative of :
This expression can be rearranged as follows,Eqs. 2 and 6 together lead to Eq. 1.Eq. 1 provides an important connection between continuum models and discrete models for protein translocation on DNA. This relation is also useful when the discrete-state stochastic models are applied to interpret experimental data. In literature, D1 is often given in bp2s−1 units instead of m2s−1 units. When D1 is given in bp2s−1 units, it is numerically equivalent to k. For example, the D1 coefficient for the Egr-1 zinc-finger protein at 110 mM KCl was measured to be 6.1 × 105 bp2s−1 [34]. This is equivalent to k = 6.1 × 105 s−1 and also corresponds to D1 = 7.1 × 10−14 m2s−1.
Analytical expression of target search kinetics
An important feature of discrete-state stochastic kinetic models is that they can provide analytical expression for the search kinetics [25]. The first-passage theory and backward master equations for probability density functions [71] can give an analytical expression for the mean search time T0 for an initially unbound protein molecule to reach a target [56]. Macroscopically, is related to the apparent rate constant for the protein-target association (k) and corresponds to the product between k and the concentration of the target DNA. This relation of to k was confirmed through numerical simulations for ensemble populations of individual states by solving the rate equations for the same system [33].An exact analytical expression for the apparent second-order rate constant (k) for the target association is available for the model shown in Fig. 1A. Although the full derivation is rather complicated [34], the final expression for k is remarkably simple and it has a very clear physical meaning:The parameter S represents an acceleration (S > 1) of target association through the antenna effect (see Section 2.5); ρ represents a deceleration (0 < ρ < 1) via trapping of proteins at nonspecific sites outside the antenna; and the parameter η represents an acceleration (η > 1) via intersegment transfer. These parameters are given as follows [34]:In Eq. 8, K(=k/k) is the dissociation constant for each nonspecific site. The parameter ϕ is the number of possible orientations for each nonspecific site. Due to structural pseudo-C2 symmetry for DNA, proteins that bind as a monomer to DNA can take two opposite orientations. This corresponds to ϕ = 2. For symmetric dimers, ϕ = 1. For example, for monomeric proteins, specific association with a target site occurs only in one of the two possible orientations, and the binding to the same site in the opposite orientation is regarded as nonspecific. In Eq. 10, y is a function of the sliding length λ (in bp; unitless):
The sliding length λ represents the mean length of sliding (in base pairs; unitless) and is given by [28]:
The parameter τ is the mean residence time of a protein bound to DNA nonspecifically. In the absence of intersegment transfer, the mean residence time τ is:
In the presence of the intersegment transfer mechanism, the time τ is shorter and given by:Although details are obviously model-dependent, Eq. 7 captures some general features of the target DNA search kinetics. In the following subsections, based on the presented discrete-state stochastic kinetic model, we will explain the effects and factors that impact the search kinetics (Table 2).
Table 2
Effects and factors relevant to target search efficiency.
Parameters
Symbols
Units
Equations
Acceleration & deceleration factors
Antenna size
S
(unitless)
Eqs. 10–15
Deceleration by trapping
ρ
(unitless)
Eq. 8
Acceleration by intersegment transfer
η
(unitless)
Eq. 9
Sliding parameters
One-dimensional diffusion coefficient for sliding
D1
m2s−1
Eq. 1
Sliding length
λ
bp (unitless)
Eq. 12
Mean time for sliding
τn
s
Eqs. 13, 14
Sliding length and antenna effect
To understand the target DNA search kinetics, the sliding length λ (also known as the scanning length) is important. The sliding length corresponds to the average distance that a protein can slide without dissociating from DNA [28]. It is relatively easy to measure through experiments (see Sections 3.1 and 3.2). The sliding length λ is directly related to the antenna effect (represented by the parameter S), one of the direct determinants of target search kinetics. Fig. 2A depicts the physical meanings of the parameter S and the sliding length λ. Let’s consider a system with a target located in a middle of a long DNA segment. If a protein associates with a nonspecific site more than λ bp apart from the target, the protein molecule is likely to dissociate from DNA before reaching the target via the sliding mechanism. If the protein binds to a nonspecific site within λ bp from the target, the protein can reach target through sliding. In other words, nonspecific sites within ±λ bp can capture the protein and lead it to the target through sliding. In such a case, the target search kinetics is accelerated by a factor of up to 2λ. This effect, which is referred to as the antenna effect, depends on the position of the target and the DNA length. The parameter S represents S-fold acceleration by the antenna effect. Hereafter, we referred to S as the antenna size.
Fig. 2.
The antenna size S and the sliding length λ. (A) Physical meanings of the antenna size S and the sliding length λ. When a protein binds to a nonspecific site outside of the antenna, the protein does not reach the target through sliding. (B) Dependence of the antenna size S on the sliding length λ. Note that S ≈ L when λ » L. (C) Dependence of the antenna size S on the target position m. For Panels B and C, L = 47 was used, which corresponds to an approximate number of sites in a linker DNA segment of the average length in nuclei of human cells.
Eq. 10 gives an exact expression of the antenna size S to the target position, the number of sites, and the sliding length. A less exact, but intuitively clearer form of S is also available. If the target position is near the middle of the DNA duplex (i.e., m ≈ L/2), S becomes virtually independent of m and is reduced to [18,28]:The maximum value of S is 2λ, which is achieved when the target-containing DNA is much longer than the sliding length. Under this limit, the antenna is spanned by up to the sliding length λ on each side of a target (Fig. 2A). If the target-containing DNA segment is short and λ » L, then S is virtually the same as L (Fig. 2B). In this limit, the entire DNA segment becomes the antenna. In other words, even if the protein exhibits a very long sliding length, the antenna size cannot exceed the size of the target-containing DNA segment. For example, the average length of linkers between nucleosome particles in human cells is only 56-bp [72]. Short lengths of linkers certainly limit the antenna effect. Although Eq. 15 is intuitively useful, this approximation is not valid when the target is located closer to an edge of DNA. Even in such as cases, Eq. 10 accurately provides the antenna size S. Fig. 2C compares values of S calculated with Eqs.15 and 10, showing how S depends on the target position m. Compared to those near the middle of a DNA segment, positions near a DNA edge give a smaller S and therefore should exhibit slower target association with the protein. The difference between the sliding length λ and the antenna size S can also be explained by using the following arguments. The parameter S gives the average number of distinct sites visited by the protein before the dissociation from DNA. Thus, the antenna size cannot be less than one, while the sliding length can go below one. Even if the protein cannot slide after association to the DNA molecule it will check the association site if it is a target one or not. Similarly, the antenna size cannot be larger than L while it is possible to have λ > L. In this case, the protein can move on DNA by visiting the same sites multiple times.
Trapping effect
When proteins are bound to distant nonspecific sites and are unable to reach a target through sliding, these proteins are essentially trapped at non-productive positions. In the analytical expression of the apparent target association rate constant k (Eq. 7), the parameter ρ represents the population of proteins that are not trapped at non-productive positions. When ρ is small, the trapping effect is strong, and the search kinetics is slow. The term ϕL – S in Eq. 2 corresponds to the number of nonspecific sites outside the antenna on the target-containing DNA duplex, and (ϕL – S)D + ϕMC corresponds to the net overall concentration of nonspecific sites. The parameter ρ is virtually independent of S when the amount of competitor DNA is much larger than the target-containing DNA (i.e., D « C). K represents the dissociation constant for a nonspecific complex and is given by k/k. It should be noted that the denominator in Eq. 8 is in a form of a partition function based on the equilibrium constant for the protein: the first term (i.e., 1) corresponds to the statistical weight for the free state as the reference state; the second term is for the proteins bound to non-antenna regions of target-containing DNA; and the third term is for the proteins bound to competitor DNA. From this form, it is obvious that the parameter ρ represents the fraction of protein molecules that are not trapped by any non-productive positions during the target search process. In terms of diffusion, the parameter ρ represents the population of proteins undergoing 3D diffusion. Through 3D diffusion, the protein can arrive at the antenna or it can directly reach the target, although with a smaller probability. The protein can reach the target through sliding from the antenna without dissociating from nonspecific DNA.
Impact of intersegment transfer
Although intersegment transfer has been neglected in many studies on target DNA search processes, some theoretical studies suggest that this translocation mechanism plays an important role in the search kinetics [73-75]. Because intersegment transfer requires an intermediate where a protein molecule transiently bridges two DNA duplexes [28], this mechanism is significant for multi-domain or multi-subunit DNA-binding proteins. If there is no intersegment transfer, proteins can transfer to target-containing DNA segment only through dissociation, 3D diffusion, and re-association. Through the intersegment transfer mechanism, proteins can directly transfer from one DNA segment to another without going through the free state. As previously explained [33], intersegment transfer can be treated as a phenomenological second-order process. The model shown in Fig. 1A involves the second-order rate constant k for intersegment transfer from nonspecific site. The parameter η represents acceleration of target search through intersegment transfer. If intersegment transfer is much faster than dissociation [i.e., k(ϕLD + ϕMC) » k], thenη » 1 and intersegment transfer can substantially increase the apparent target association rate constant k and accelerate the search kinetics. The contribution of intersegment transfer can in principle exceed the contribution of sliding (i.e., η > S), especially for systems with high density of DNA and short segments. Since two DNA ends of each nucleosome particle are apart by only ~60 Å, proteins might be able to bypass the roadblock of nucleosome via intersegment transfer [69].
Experimental applications
The simultaneous presence of multiple processes poses a challenge in experimental studies on the target search mechanisms. The discrete-state stochastic kinetic models can greatly facilitate analyses and interpretation of various experimental data. Here, we describe how these models can be applied to experimental studies of target DNA search by proteins.
Ensemble kinetics experiments
Various biochemical and biophysical techniques can be used to measure the kinetics of the target search process whereby DNA-binding proteins locate their specific targets on DNA in the abundant presence of nonspecific DNA. The discrete-state stochastic kinetic model shown in Fig. 1A can be applied to ensemble kinetics experiments (e.g., fluorescence-based stopped-flow experiments) for quantitative investigations of the target DNA search processes under various conditions.
DNA length dependence of target association
In conjunction with the discrete-state stochastic kinetic model, DNA length dependence data for target association kinetics allows for precise determination of the sliding length λ. When the length dependence is studied, the apparent target association rate constant k is measured for some DNA duplexes of different lengths, varying L, the number of sites on target-containing DNA. Since the equations of the parameters S, ρ, and η (i.e., Eqs. 3–5) contain L, all of these parameters will be affected when L is changed in experiments. However, if the amount of the competitor DNA is much larger than the target-containing DNA, satisfying LD « MC, then ρ and η become virtually independent of L, and the length dependence of k will arise solely from the length dependence of S. Under such conditions, the sliding length λ can be accurately determined from the length-dependent k data alone, because S involves only λ and two DNA configurational parameters m and L. As an example, Fig. 3 shows the length-dependence data of the target association kinetics measured for the Egr-1 zinc-finger protein at 110 mM KCl by a stopped-flow method [34]. In this case, longer DNA exhibited significantly faster target association for lengths between 33 and 88 bp, whereas DNA duplexes longer than 88 bp resulted in almost the same target association kinetics. This is because additional nonspecific sites far from the antenna do not increase the chance for the protein to reach the target via sliding. Through fitting using Eqs. 7–14, the sliding length λ was determined to be 44 ± 3 bp in this case. Because the target site was close to the edge, the exact analytical form of S (i.e., Eq. 10) was used.
Fig. 3.
Determination of the sliding length λ from experimental data of the DNA length-dependence of the apparent association rate constant k. The data of the apparent kinetic rate constant for target association of the Egr-1 zinc-finger protein at 110 mM KCl are shown [34]. This protein is the DNA-binding domain of the human transcription factor Egr-1 and comprises three zinc fingers (Egr-1 residues 335–423). The target association kinetics was measured for 33, 48, 65, 88, 113, and 143-bp DNA duplexes. The nucleotide sequences of these DNA duplexes are shown. The target-containing probe DNA was 2.5 nM in each kinetic measurement. The solutions also contained a far larger amount of 28-bp nonspecific competitor DNA (2000 nM). The experimental k data are shown by red circles in the graph on the left-hand side. The best-fit curve is shown by a red solid line. The sliding length was determined to be 44 ± 3 bp through nonlinear least-squares fitting to the k data. The corresponding length is indicated by a black bar below the DNA sequences. Green bars represent the antenna sizes S calculated for individual target-containing DNA duplexes and indicates the region of S sites, including the target. The length of each green bar is (the length of the target) + S – 1 bp. Shown on the right-hand side is a graph indicating the dependence of S on the number of sites L for the system of m = 2 and λ = 44 bp. Eq. 10 was used for this graph.
The discreate-state stochastic kinetic model can also allow for determination of the 1D diffusion coefficient D (also the rate constant k; see Eq. 1) for sliding and the rate constant k and k, if the dissociation constant K and the rate constant k for intersegment transfer are available from other experiments. There are experimental methods to determine the dissociation constant K for a nonspecific site. Methods to determine the rate constant k are also available, as described below. Thus, the DNA-length dependence data can provide comprehensive information about sliding and dissociation & re-association processes.
Experimental analysis of intersegment transfer
Discrete-state stochastic kinetic models also facilitate experimental investigations of intersegment transfer. There are several different experimental approaches to investigate intersegment transfer [33-35,66,69,76-81]. These approaches require kinetic measurements at varied concentrations of competitor DNA and can be categorized into two types.One of the approaches to determine k is through measurements of apparent rate constants for transfer of the protein from a complex to competitor DNA at various concentrations. A typical experiment for these measurements involves a mixing of the protein-DNA complex and competitor DNA at concentrations much higher than K, satisfying k « k[D]. This inequality ensures that the rate-limiting step in the dissociation & reassociation process is dissociation. Since dissociation is a first-order process, the apparent dissociation rate constant should be independent of the competitor DNA concentration in the absence of intersegment transfer. However, in the presence of intersegment transfer, the apparent dissociation rate constant is linearly dependent on the competitor DNA concentration, and the apparent rate constant is given by k + kϕMC. The rate constant k for intersegment transfer can be determined from this dependence. It should be noted that the second term is proportional to the number of sites M on the competitor DNA. When the efficiency of intersegment transfer is discussed, the number of nonspecific sites on competitor DNA should be taken into account.The other type of the experimental approaches utilizes target association kinetics at various concentrations of competitor DNA [33,34]. In the absence of intersegment transfer (i.e., k = 0 and η = 1), the k constant is inversely proportional to C when the concentration of the competitor DNA is much higher than that of the target-containing DNA (i.e., D « C) [82]. This inverse proportionality arises from the parameter ρ (see Eq. 8). At a higher concentration of competitor DNA, the protein can be trapped at nonspecific sites more easily, which slows down the target association process. However, intersegment transfer can counteract this trapping effect [33]. Upon an increase in the concentration of competitor DNA, intersegment transfer becomes faster and the parameter η increases (see Eq. 9), which also increases the k constant. The increase in the parameter η also affect the antenna size S. Due to these effects, the dependence of k on C becomes substantially deviate from the proportionality to . Using Eqs. 5–14, the rate constant k for intersegment transfer can be determined from k data at various C concentrations.
Salt concentration dependence of target search kinetics
Because electrostatic interactions are crucial for protein-DNA association [70], the kinetics and thermodynamics of protein-DNA interactions strongly depend on the salt concentration used in experiments [29,34,64,67,83-88]. The counterion condensation theory predicts a linear relationship between logK and log[salt] for the dissociation constants of protein-DNA complexes [85,86] and a similar linear relationship between logk and log[salt] for some kinetic rate constants relevant to protein-DNA constants [89]. In either case, the salt concentration dependence of these parameters is predicted to be monotonic. However, the salt-concentration dependence of the apparent rate constants (k) for target search kinetics is not monotonic and there exists a salt concentration that maximizes the efficiency of the target DNA search. For the lac repressor and the Egr-1 zinc-finger protein, the target search kinetics is fastest at a physiological ionic strength and slower at lower or high ionic strengths [29,34]. The salt concentration dependence of the k rate constant measured for the Egr-1 zinc-finger protein at various concentrations of KCl is shown in Fig. 4A.
Fig. 4.
Salt concentration dependence of the target search kinetics for the Egr-1 zinc-finger protein. (A) The apparent target association rate k measured for the Egr-1 zinc-finger protein at various concentrations of KCl. The stopped-flow experiments were conducted using a 113-bp DNA containing an Egr-1 recognition sequence (D = 2.5 nM) and a nonspecific 28-bp DNA (C = 2000 nM). (B) Salt concentration dependence of the parameters S, η, and ρ. The parameter S represents the antenna size (see Section 2.5); η represents an acceleration by intersegment transfer; and ρ represents a deceleration by trapping of protein at nonspecific DNA. These parameters were calculated from the experimental data on the rate constants k, k and k, and the dissociation constant K for the Egr-1 zinc-finger protein along with Eqs. 8–10. The following conditions were used: L = 105 sites, M = 20 sites, and m = 2 were used. Ionic-strength dependence represented by logk = a log [KCl] + b was assumed for k, k, k, and K, and the parameters a and b were calculated from the salt-dependence data for these constants. Adopted from Esadze et al. [34].
The salt concentration dependence can be explained using the equations for the discrete-state stochastic kinetic model. Based on experimental data, the parameters ρ, η, and S for the Egr-1 zinc-finger protein were calculated (Fig. 4B). The parameters S and η are decreasing functions of ionic strength, and the antenna effect and intersegment transfer substantially accelerate the target search kinetics at low ionic strengths. In contrast, the parameter ρ is an increasing function of ionic strength, and the trapping effect substantially decelerates the target search kinetics at low ionic strengths. The rate constants for electro-statically assisted macromolecular association (such as k) are typically a decreasing function of ionic strength [90,91]. The rate constant k as the product Sρηk (Eq. 6) is therefore maximized at a particular ionic strength as shown in Fig. 4A.
Single-molecule experiments
In the 21st century, single-molecule fluorescence tracking of proteins bound to DNA has gained widespread popularity in the biophysical field [8,10,14,19-21,31]. The tacking for thousands of proteins molecules bound to DNA allows for quantitative analysis of sliding and dissociation. From the trajectory data, the mean squared displacement along DNA axis can be obtained for proteins sliding on DNA. For a simple diffusion process, the mean squared displacement is a linear function of time and the slope corresponds to 2D1 (see Eq. 2). Therefore, the single-molecule fluorescence tracking data directly provide the 1D diffusion coefficient D1 for sliding. Through histogram analysis of the ‘bound time’ between the initial association and the final dissociation of the protein, the dissociation rate constant k can also be determined. The sliding length λ can also be determined using the D1 and k data along with Eqs. 12 and 13. Discrete-state stochastic kinetic models can provide additional information. From D1 data, one can estimate the mean time of protein at a nonspecific site before sliding to either adjacent site. This is given by based on Eq. 1. If the measured 1D diffusion coefficient is D1 = 6 × 10−14 m2s−1, it suggests that the protein spends ~1 μs at each nonspecific site before sliding to an adjacent site (i. e., a shift by 1 bp, 0.34 nm). This information from the discrete-state stochastic kinetic model is useful because it is beyond the current highest spatiotemporal resolution of single-molecule techniques (~20 nm, ~500 μs; Ref. [92]). Based on Eq. 10, the antenna size S for a target in a particular DNA segment (e.g., a linker between two nucleosomes) can also be calculated. In other words, from the single-molecule data on proteins bound to nonspecific DNA, the discrete-state stochastic kinetic model can provide information about the extent of acceleration of target association kinetics via the sliding mechanism. This is important for understanding how the microscopic processes affect the efficiency in target DNA search by proteins.
NMR experiments
NMR spectroscopy is well suited to study dynamics of biological macromolecules at atomic and molecular levels and can provide spatiotemporal information on their dynamics [93]. The capabilities of NMR spectroscopy are actively being expanded with new technologies such as ultra-high field magnets (with 1H frequencies up to 1.2 GHz), 13C/15N direct-detection triple-resonance cryogenic probes, and dynamic nuclear polarization methods [94]. More challenging, dynamic systems of biological macromolecules can be studied with current NMR techniques. Some NMR-based approaches have been developed for investigating the target DNA search by proteins [13,32]. Structural information on the proteins scanning DNA can be obtained through NMR experiments [64-69]. For example, conformational mobility of particular domains or moieties within the proteins bound to DNA can be investigated. NMR spectroscopy also allows for investigations of ion-pair dynamics involving basic side chains at protein-DNA interfaces [95-98].Some NMR methods can provide kinetic information about protein translocation on DNA. The discrete-state stochastic kinetic model of protein translocation on DNA can be incorporated directly into a master equation for NMR spectroscopy [99]. Due to the timescale of protein translocation on DNA, the McConnell equation [100,101] can be used to describe the behavior of nuclear magnetization for discrete states involved in the search process. Since proteins and DNA at chemical equilibrium are used and the concentrations of individual species remain constant in typical NMR experiments, any second-order processes can be dealt with a pseudo-first-order treatment, enabling the use of a kinetic matrix even for second-order processes (Fig. 5). By numerically solving NMR master equations incorporating the kinetic matrix for protein translocation on DNA, one can learn how the translocation process influences NMR data for proteins that are nonspecifically bound to DNA. Kinetic information about sliding, dissociation, and intersegment transfer can be obtained through this approach, as demonstrated for the Hox-D9 homeodomain [99]. In principle, the discrete-state stochastic kinetic model for protein translocation on DNA can readily be implemented into the equations on N-site exchange systems for CPMG R relaxation [100], R relaxation [102], cross-saturation [103], and paramagnetic relaxation enhancement (PRE) [104].
Fig. 5.
Discrete-state stochastic kinetic models can be incorporated into the McConnell equations for NMR investigations of protein translocation on DNA. (A) NMR spectra recorded for two nonspecific DNA complexes of the HoxD9 homeodomain and for a mixture of the two complexes. From NMR relaxation data for these samples, kinetic rate constants for protein translocation on DNA can be determined. (B) Kinetic matrices for protein translocation on DNA. NMR experiments are conducted using solutions at chemical equilibrium, which does not alter macroscopic concentrations of chemical species. Due to the constant concentrations of involved species in the NMR experiments, even second-order processes can be treated with a kinetic matrix for the McConnell equations. Adopted from Sahu et al. [99] with permission from the American Chemical Society.
Dissecting impacts of mutations on search kinetics
The discrete-state stochastic kinetic models can also facilitate in-depth investigations into how protein mutations impact the target DNA search because these models allow for the determination of various parameters for the underlying processes (Tables 1 and 2). Depending on mutations, different molecular properties may be impacted. Some mutations cause an increase in the diffusion coefficient D1 for sliding on DNA [68,105]. Such an increase in D1 (or k) does not necessarily cause a longer sliding length λ because dissociation could also be faster when the energy barrier for sliding is lower. Some mutations in the Egr-1 zinc-finger protein were found to modulate intersegment transfer [68]. Mutations can also shift the conformational equilibrium that governs the kinetic and thermodynamic properties of protein molecules during the target search process, as discussed for p53, TUS, and λ repressor proteins [106].
Limitations of discrete-state stochastic kinetic models
Although discrete-state stochastic kinetic models are useful for experimental investigations of target DNA search processes, experimentalists should be aware of some limitations of these models. The model shown in Fig. 1A assumes identical properties for each nonspecific site. This assumption simplifies the experimental analysis. However, actual properties of individual sites could be more or less different, depending on DNA sequence. If a DNA segment used in an experiment is supposed to be nonspecific but actually contains some sites whose affinities are substantially higher than others, these high-affinity sites will affect the search kinetics [48,107,108]. Another assumption for typical discrete-state stochastic kinetic models is that once a protein molecule dissociates from DNA, all sites have equal chance of association with the protein. It can be argued that this is the result of the fact that the 3D diffusion is much faster than the 1D diffusion and protein can reach all sites on DNA via the bulk diffusion. In reality, proximal sites should have a higher chance [109,110]. The assumption of equal probability of re-association may be valid only for dilute systems and may not be applicable to systems with high DNA density. Experiments should be designed so that the assumptions for the employed model are valid enough. Nonetheless, it should also be noted that the advantage of discrete-state stochastic models is that they are flexible and can take into account these effects. For example, the sequence heterogeneity [52], conformational switching [45,49], the presence of other proteins [54], and DNA looping [44,51,53] have been already incorporated into discreate-state stochastic kinetic models in the previous theoretical studies. In principle, these modified models can be used in experimental analysis, though such applications remain to be examined.
Conclusions
Mainly from a perspective of experimental applications, we have provided an overview of discreate-state stochastic kinetic models for target DNA search by proteins. These simple theoretical models provide the explicit relations of underlying processes to macroscopic kinetics of the target DNA search processes. Incorporating these models into the analyses of experimental data is relatively straightforward and can greatly facilitate various experimental studies on the target DNA search process. The analytical expressions of the search kinetics for these models allow for fitting calculations to determine various parameters such as the sliding length, the 1D diffusion coefficient for sliding, and the kinetic rate constants for microscopic processes from experimental data. But most importantly, this theoretical method significantly clarifies the details of molecular mechanisms for protein-DNA interactions. We hope that this review will encourage researchers to take advantage of these models for their experimental or theoretical studies on target DNA search processes.