Jaejun Choi1,2, Ryeonghyeon Kim1,2, Junseock Koh1. 1. School of Biological Sciences, Seoul National University, Seoul 08826, Korea. 2. These authors contributed equally to this work.
Abstract
Multivalent macromolecular interactions underlie dynamic regulation of diverse biological processes in ever-changing cellular states. These interactions often involve binding of multiple proteins to a linear lattice including intrinsically disordered proteins and the chromosomal DNA with many repeating recognition motifs. Quantitative understanding of such multivalent interactions on a linear lattice is crucial for exploring their unique regulatory potentials in the cellular processes. In this review, the distinctive molecular features of the linear lattice system are first discussed with a particular focus on the overlapping nature of potential protein binding sites within a lattice. Then, we introduce two general quantitative frameworks, combinatorial and conditional probability models, dealing with the overlap problem and relating the binding parameters to the experimentally measurable properties of the linear lattice-protein interactions. To this end, we present two specific examples where the quantitative models have been applied and further extended to provide biological insights into specific cellular processes. In the first case, the conditional probability model was extended to highlight the significant impact of nonspecific binding of transcription factors to the chromosomal DNA on gene-specific transcriptional activities. The second case presents the recently developed combinatorial models to unravel the complex organization of target protein binding sites within an intrinsically disordered region (IDR) of a nucleoporin. In particular, these models have suggested a unique function of IDRs as a molecular switch coupling distinct cellular processes. The quantitative models reviewed here are envisioned to further advance for dissection and functional studies of more complex systems including phase-separated biomolecular condensates.
Multivalent macromolecular interactions underlie dynamic regulation of diverse biological processes in ever-changing cellular states. These interactions often involve binding of multiple proteins to a linear lattice including intrinsically disordered proteins and the chromosomal DNA with many repeating recognition motifs. Quantitative understanding of such multivalent interactions on a linear lattice is crucial for exploring their unique regulatory potentials in the cellular processes. In this review, the distinctive molecular features of the linear lattice system are first discussed with a particular focus on the overlapping nature of potential protein binding sites within a lattice. Then, we introduce two general quantitative frameworks, combinatorial and conditional probability models, dealing with the overlap problem and relating the binding parameters to the experimentally measurable properties of the linear lattice-protein interactions. To this end, we present two specific examples where the quantitative models have been applied and further extended to provide biological insights into specific cellular processes. In the first case, the conditional probability model was extended to highlight the significant impact of nonspecific binding of transcription factors to the chromosomal DNA on gene-specific transcriptional activities. The second case presents the recently developed combinatorial models to unravel the complex organization of target protein binding sites within an intrinsically disordered region (IDR) of a nucleoporin. In particular, these models have suggested a unique function of IDRs as a molecular switch coupling distinct cellular processes. The quantitative models reviewed here are envisioned to further advance for dissection and functional studies of more complex systems including phase-separated biomolecular condensates.
Entities:
Keywords:
biological linear lattice; combinatorial model; conditional probability model; multivalent binding; overlapping binding site
Recent advances in cutting-edge biotechnologies have provided opportunities to observe unprecedented molecular details of various biological processes (Ha et al., 2022; Mahamid et al., 2016; Oikonomou and Jensen, 2017; Sigal et al., 2018). Interpretation of such observations requires quantitative models dissecting the underlying macromolecular interactions. In turn, the quantitative information allows further understanding and prediction of spatiotemporal regulation of specific cellular processes in dynamically changing environments. The complexity of macromolecular interactions ranges from simple 1:1 binding to formation of phase-separated condensates with multivalent binding among two or more components (Banani et al., 2017; Lyon et al., 2021; Shin and Brangwynne, 2017). In contrast to the 1:1 binding, multivalent interactions are difficult to describe with the simple mass action law but modeled with more sophisticated frameworks accounting for the presence of various molecular states (Bujalowski, 2006; Freire et al., 2009; Wyman and Gill, 1990). Furthermore, the quantitative models are often formulated with large numbers of parameters, and exemplary cases determining these parameters with suitable in vitro model systems and methods are exceedingly rare.A linear or one-dimensional lattice is a relatively tractable multivalent system found in numerous cellular processes. Linear lattices present multiple binding motifs or domains to interact with diverse proteins or multiple copies of identical proteins (Fig. 1) (Cortese et al., 2008; Dunker et al., 2005; Fung et al., 2018). For instance, in many signaling pathways, scaffold proteins such as axin, BRCA1, and Ste5 recruit various target proteins via specific binding sites (Choi et al., 1994; Mark et al., 2005; Wodarz and Nusse, 1998). These scaffold-driven higher-order assemblies are predicted to colocalize and increase the local concentrations of the target proteins and thereby facilitate their interactions for efficient integration and propagation of diverse signals in the cell (Fig. 1A) (Noutsou et al., 2011; Xue et al., 2013). Another example is the intrinsically disordered regions (IDRs) of some nucleoporins (Nups) present in the nuclear pore complex (NPC) (Fig. 1B) (Frey and Gorlich, 2007; Radu et al., 1995). The Nup IDRs mediate massive yet selective molecular transport between the nucleus and cytoplasm through specific interactions with karyopherin (Kap) proteins carrying macromolecular cargos (Koh and Blobel, 2015; Schoch et al., 2012). These interactions are achieved by multiple interspersed phenylalanine-glycine (FG) motifs on an IDR capturing several Kap molecules (Bayliss et al., 2000).
Fig. 1
Schematic illustration of the representative linear lattice systems in cellular processes.
(A) Scaffold proteins recruiting diverse binding partners in signal transduction. (B) IDRs in the NPC binding multiple Kaps in nucleocytoplasmic transport. (C) Nonspecific sites on the chromosomal DNA for transcription factor binding.
Finally, nucleic acids are the most prominent linear lattice systems in the cell. In particular, the chromosomal DNA presents the enormous amount of repeating phosphate groups along its backbone, creating electrostatic potentials for nonspecific protein-DNA interactions (Fig. 1C) (Berg et al., 1981; Stracy et al., 2021). Such polyelectrolyte effect is a major driving force (Lohman et al., 1980; Record et al., 1976), particularly at low salt concentrations, for formation of nucleosomes (Shrader and Crothers, 1989; Widom, 1999) as well as for binding of chromatin architectural proteins such as HMG (high mobility group)-box proteins with little specificities for DNA base sequences (Dragan et al., 2004). Even specific DNA binding proteins typically engage their cationic amino acid side chains to neutralize DNA phosphate charges (Jen-Jacobson et al., 2000; Privalov et al., 2011). Thus, these proteins are expected to interact with nonspecific sites that are present in overwhelming excess over specific site in the chromosomal context. In addition, as the copy numbers of many transcription factors (TFs) are considered greater than those of their corresponding specific binding sites on DNA, the majority of these factors may exist in vivo as nonspecifically bound states (Bintu et al., 2005; Kao-Huang et al., 1977). The physiological impact of the nonspecific protein-DNA interaction is substantial as demonstrated in the classical study by the von Hippel group (von Hippel et al., 1974) as well as in the recent seminal work by the Phillips group (Brewster et al., 2014). Both groups used Escherichia coli lac repressor as a model system to investigate the interplay among the copy numbers of TFs and their binding sites on DNA, the specificity ratio, and the inducer binding affinity in bacterial gene expression. The quantitative models proposed in these studies accurately described and predicted the expression profiles of the genes under the repressor regulation by incorporating nonspecific protein-DNA interactions as a “sink” for RNA polymerase and lac repressor.Taken together, numerous protein-protein and protein-nucleic acid interactions can be perceived as multivalent interactions mediated by linear lattices. Thus, quantitative models for linear lattice systems are indispensable in understanding a broad range of biological processes and may be further extended to dissect more complex systems including phase-separated biomolecular condensates. In this review, we go over two general mathematical frameworks, combinatorial and conditional probability models, for quantitative description of linear lattices. Prior to the detailed derivation of these models, the molecular features of multivalent interactions on a linear lattice will be qualitatively discussed in light of how they are fundamentally different from 1:1 binding or discrete-site systems. The derivation will be supplemented in Supplementary Information with some detailed mathematical procedures omitted but not immediately evident in the original articles. In the end, a couple of practical examples will be discussed where the models have been further extended and applied to highlight their physiological significance. The alternative methods of sequence generating functions and transfer matrix may be referred to the original and case studies for handling multiple binding modes, heterogeneous lattices, and lattice conformational changes (Bujalowski et al., 1989; Lifson, 1964; Schellman, 1974; Teif, 2007).
MOLECULAR FEATURES OF MULTIVALENT INTERACTIONS ON LINEAR LATTICES
It is straightforward to derive the quantitative models for the linear lattices that utilize discrete regions or domains to bind multiple distinct target proteins with the interaction stoichiometry of 1:1 for each target. In the absence of cooperativity among bound targets, the binding of each target can be handled, independent of binding of other targets, by the simple mass action law yielding a quadratic equation as a function of total concentrations of the lattice and the corresponding target. An advanced model has been derived by constructing a partition function for a linear lattice with cooperativities among bound targets (Cho et al., 2021).Complexity arises when a target protein occupies two or more binding motifs on a linear lattice. We consider a linear lattice with a total of M motifs and a target protein occluding n consecutive motifs (Fig. 2) (Epstein, 1978; McGhee and von Hippel, 1974). The binding motif can be any repeating unit including a base-pair or phosphate on DNA and a short peptide motif or a PTM (post-translational modification) moiety on an intrinsically disordered protein (IDP). As DNA or proteins have particular directions in denoting their motifs (5’ to 3’ end or N to C-terminus), target proteins are assumed to be polar as well in recognizing the motifs. It is further assumed that there is no partial binding where a target protein occludes less than n motifs. Then, the target binding stoichiometry (N) is the greatest integer less than or equal to M/n (N = [M/n]). A fundamental nature of the linear lattice system becomes evident when a target protein binds to a naked lattice (left panel in Fig. 2A). Because the target protein occupies n consecutive motifs, any motifs except the rightmost n-1 positions can be starting points for target binding. Thus, potential target binding sites overlap and the number of such overlapping sites equals M
–
n + 1, obviously greater than the stoichiometry [M/n]. In contrast, for a conventional system in which a target protein binds discrete and isolated sites (right panel in Fig. 2A), the number of binding sites is simply equal to the stoichiometry N = [M/n].
Fig. 2
Molecular features of multivalent interactions on a linear lattice (M = 9) where a protein occupies any n (= 3) consecutive motifs.
(A) The number of potential overlapping binding sites on a naked lattice (left panel) is greater as compared to a discrete-site system (right panel) with the same stoichiometry (N). (B) The number of potential binding sites eliminated upon binding of a protein depends on where the protein occupies on the lattice (left panel). In contrast, binding of a protein to the discrete-site system invariably eliminates only one potential binding site (right panel). (C) Possible configurations of the linear lattice with two proteins bound (left panel). Many configurations are futile for the last protein binding, resulting in apparent negative cooperativity among bound proteins. In contrast, all corresponding configurations in the discrete-site system are competent for binding (right panel).
As the linear lattice subsequently binds more target proteins, its overlapping nature generates additional features further deviating from the discrete-site system. The number of potential binding sites eliminated upon binding of a protein depends on where the protein occupies on the lattice. When a protein binds to a gap exactly n motifs long between prebound proteins on the lattice, only one potential site is removed. Instead, if a gap is longer than 3n – 2 motifs, binding of a protein to this region can eliminate as many as 2n
– 1 sites. For instance, binding of a protein with the site size of n = 3 to the three leftmost motifs on a linear lattice with a total of nine motifs eliminates three potential binding sites (the second figure in the left panel of Fig. 2B). Alternatively, if the protein occupies the three motifs at the center of the lattice, five potential binding sites are eliminated (the third figure in the left panel of Fig. 2B). However, in the discrete-site system, protein binding invariably eliminates only one potential binding site (right panel in Fig. 2B). Finally, it is difficult to completely saturate the linear lattice since the overlapping protein binding increasingly accumulates gaps with less than n motifs that are futile for binding. This point is explicitly illustrated in Fig. 2C (left panel) listing all possible configurations of the linear lattice with [M/n] – 1 proteins bound. Among them, many are futile configurations with the n free (unoccupied) motifs scattered over the lattice and must rearrange the bound proteins to create a site with n consecutive motifs for the last protein binding. Such a rearrangement or reduction in number of lattice configurations corresponds to a loss of mixing entropy, culminating in apparent negative cooperativity among bound proteins. In contrast, the number of available binding sites is independent of the configuration of bound proteins in the discrete-site system (right panel in Fig. 2C). In summary, because of the overlapping nature of multivalent linear lattice-target interactions, a linear lattice initially presents binding sites greater than the stoichiometry and thereby enhances protein binding as compared to a discrete-site system. However, with density of bound proteins increased, the effect of the overlapping binding is reversed, attenuating saturation of the linear lattice.The following sections review the quantitative models penetrating the overlap problem of the linear lattice to yield the mathematical formulations relating the binding parameters to experimentally measurable properties of the lattice-target interactions. A core element of each model is the computation of the number of possible configurations for a given density of bound proteins on a lattice.
QUANTITATIVE FRAMEWORK FOR LINEAR LATTICE-PROTEIN INTERACTIONS: COMBINATORIAL MODEL
A complete set of parameters for description of linear lattice-protein interactions consists of the binding stoichiometry (N), binding constant (K), and cooperativity (ω) among bound proteins. As discussed above, the binding stoichiometry (N) is determined by the numbers of all motifs on a lattice (M) and those occupied by a target protein (n, termed site size) (N = [M/n]). The binding constant (K) corresponds to the affinity between a protein and a site n motifs long. Cooperativity can arise from pairwise interactions between any two proteins bound to a linear lattice. Although there are in principle iC2 pairs on a lattice with i (≥2) proteins bound, the models discussed in this review formulate cooperativity only for the interaction between nearest neighbors (i.e., a pair of contiguously bound proteins without any intervening free motifs). Thus, the cooperativity parameter (ω) is equivalent to an equilibrium constant for formation of a direct “contact point” between a pair of bound proteins. Then, under these definitions, a linear lattice presents three distinct types of protein binding sites (Fig. 3A): 1) an isolated site with the binding constant K; 2) a singly contiguous site with the binding constant Kω; 3) a doubly contiguous site with the binding constant Kω2. If ω > 1 (or 0 < ω < 1), the nearest neighbor interaction is favorable (or unfavorable) and the protein binding is positively (or negatively) cooperative. For ω = 1, bound proteins are independent of each other and the binding is noncooperative.
Fig. 3
Calculation of the number of distinct lattice configurations with i proteins bound and j contact points.
(A) Three distinct types of protein binding sites on a linear lattice and the definitions of K and ω. (B) Dissection of a linear lattice into two distinct physical elements, runs and unattached free motifs. The i – j – 1 leftmost runs are attached at their righthand end with a free motif (termed attached free motif). (C) Creation of the distinct lattice configurations by combining the two elements.
A fundamental relationship between the binding parameters and experimental variables can be derived by constructing a partition function for a linear lattice (Freire et al., 2009; Wyman and Gill, 1990). The partition function is a sum of relative probabilities or statistical weights of all possible protein-bound states of a linear lattice with a free lattice assigned as a reference state of unit relative probability (i.e., statistical weight = 1). Then, the statistical weight of a lattice with i proteins bound and j contact points among them is given by (K[P])iωj where [P] is the free protein concentration. However, in order to account for the presence of multiple configurations for a given set of (i, j), the statistical weight must be multiplied by the degeneracy term P(i, j), the number of distinct ways to distribute i proteins on a lattice with M motifs and j contact points. Then, the partition function (Z) is given by the following equations:The average number of proteins bound per lattice (or binding density, ν), which is a principal quantity to be measured in all binding experiments, can be formulated from the partition function:Likewise, the average number of contact points per lattice can be calculated from a partial derivative of the partition function:The final task in constructing the partition function is to derive the expression for P(i, j). Here we follow the original combinatorial derivation of P(i, j) (Epstein, 1978), highlighting the concept behind the mathematical procedures. A linear lattice with i proteins bound and j contact points may be dissected into two physical elements. The first element is a “run” defined as a distinct cluster of contiguously bound proteins, and the number of runs can be calculated as i – j (Fig. 3B). Because there is at least one free motif between runs, each of the i – j – 1 leftmost runs must be attached with a free motif on the right side. The second element is the remaining free motifs and there are M – ni – (i – j – 1) unattached free motifs (≡ N). Then, the number (≡ N) of ways of mixing these two elements to create the distinct lattice configurations equals the number of distributing i – j runs (accompanied with the i – j – 1 attached free motifs) and N unattached free motifs into N + i – j slots (Fig. 3C):In this expression, all runs have been treated as identical elements, regardless of the actual number of bound proteins in each run. Therefore, in order to complete the derivation of P(i, j), the function N must be multiplied by the number (≡ N) of distinct ways to distribute i proteins into i – j runs:The equation N is mathematically equivalent to the number of partitions of the integer i into i – j positive integers. Finally, P(i, j) is derived as the following equation:For noncooperative binding (ω = 1), the number of contact points j becomes irrelevant and P(i, j) reduces to P(i), the number of ways of mixing i proteins and M – ni free motifs to build distinct lattice configurations:Then, the partition function for noncooperative binding can be written in a simplified form:In practice, the total lattice and protein concentrations ([L]tot and [P]tot), rather than the free protein concentration ([P]), are known experimental variables. The total concentrations are related to each other and other binding parameters through a simple mass balance equation:For a given set of binding parameters and reactant concentrations, this mass balance equation can be solved for [P] by the numerical procedures such as the Newton-Raphson and the bisection method (Hamming, 1986). In turn, this solution allows calculation of the relative probabilities of all lattice configurations and the ensemble-averaged quantities including Eqs. 2 and 3. Thus, the combinatorial method is straightforward and intuitive in constructing a partition function which illustrates distribution among various protein-bound states of a linear lattice as a function of lattice and protein concentrations. However, this method is difficult to apply to a very long linear lattice (i.e., M >> n) because the number of possible lattice configurations may be too large and potentially cause an overflow problem in computation.
QUANTITATIVE FRAMEWORK FOR LINEAR LATTICE-PROTEIN INTERACTIONS: CONDITIONAL PROBABILITY MODEL
Several quantitative frameworks have been proposed to treat an “infinitely” long linear lattice (M >> n), particularly relevant for proteins nonspecifically binding the chromosomal DNA. Among these frameworks, we review the conditional probability model originally presented in the seminal work by McGhee and von Hippel (1974). In this model, the conditional probabilities have been formulated for the particular states (free or bound) of two consecutive motifs on a linear lattice. For instance, the conditional probability ff (or fb1) is defined as, given a randomly chosen free motif, the probability of the subsequent righthand side motif being free (or bound by the left end of a protein) (Supplementary Fig. S1). In addition, the conditional probability b (or b1) is defined as, given a motif bound by the right end of a protein, the probability of the subsequent motif being free (or bound by the left end of a protein) (Supplementary Fig. S1). The conditional probabilities were then used to derive an expression for the average number of free binding sites per lattice at a given binding density. This elegant approach yielded a modified form of the Scatchard equation:where θ corresponds to the average number of proteins bound per motif (i.e., θ = ν/M).Referring to Supplementary Information for the detailed mathematical procedures of the derivation, we focus on a few intuitive limiting cases leading to the interpretations of this equation consistent with the molecular features of the linear lattice system (McGhee and von Hippel, 1974).1) In the case of ω = 1 (noncooperative binding), by using L’Hospital’s rule, the equation can be reduced to the following (see Supplementary Information for detailed mathematical procedures):Note that, for n = 1 (no overlap between bound proteins), the equation further reduces to the original Scatchard equation θ/[L] = K (1 – θ) in which the term (1 – θ) simply represents the fraction of free motifs. Because the squared bracket term in Eq. 11 is always less than unity for n ≥ 2, the fraction of free motifs competent for binding is smaller than the total fraction of free motifs 1 – nθ. Therefore, this result quantitatively supports that, even without genuine interactions among bound proteins (i.e., ω = 1), apparent negative cooperativity arises from the overlap among potential binding sites and consequent futile gaps shorter than n motifs.2) In the case of ω = 0 (infinite negative-cooperativity), Eq. 10b reduces to the following expression:This reduced form simply corresponds to Eq. 11 with n = n + 1. The increased binding site size demonstrates that, if the interaction between bound proteins is extremely unfavorable, there is apparently no contact point between any adjacently bound proteins. Instead, they are separated by a persistent free motif. This result clearly demonstrates the fundamental relationship between binding site size and cooperativity.3) Further insight can be provided at the molecular level from the partial derivatives of Eqs. 10b and 11 with respect to θ at the limiting condition of θ → 0 (see Supplementary Information for detailed mathematical procedures):Based on Eq. 10a, the partial derivative can be interpreted as a net change in the average numbers of all three types (Fig. 3A) of binding sites, weighted by their corresponding binding constants, upon binding of one protein to a naked (ν = 0) lattice. As illustrated in Fig. 2B, the binding of a protein to a sufficiently long region eliminates a total of 2n – 1 potential binding sites. In addition, the binding converts the two adjacent isolated binding sites into two singly contiguous binding sites (2∙Kω). Hence, a total of (2n – 1) + 2 isolated binding sites has been eliminated (– (2n + 1)∙K). Likewise, the partial derivative of Eq. 11 at θ → 0 is given by:Therefore, in the noncooperative case, the binding of one ligand to a naked lattice simply eliminates 2n – 1 potential binding sites.Taken together, although the conditional probability method is based on the different conceptual framework as compared to the combinatorial approach, the final formulation provides intuitive interpretations fully consistent with the molecular features of the linear lattice systems. In practice, Eq. 10b is rearranged and incorporated into a mass balance equation relating the binding parameters to the total concentrations of lattice motif and protein ([M]tot and [P]tot):Eq. 15e can be numerically solved for θ at given values of [M]tot and [P]tot. When interactions of proteins with short linear lattices (e.g., DNA oligomers) are analyzed, the equation can be partially corrected for the assumption of infinite lattice length by applying an “end effect” constant, (M – n + 1) / M, to the term ff
(Tsodikov et al., 2001).
APPLICATION AND EXTENSION OF THE QUANTITATIVE MODELS
Competition among multiple binding modes in protein-nucleic acid interactions
Spatiotemporal regulation of transcription is achieved by interactions between TFs and their specific binding sites on DNA. Because of the enormous number of nonspecific sites on the chromosomal DNA, binding of TFs to these regions must be taken into account to accurately predict the occupancy of the specific sites and thereby the transcription profiles of the corresponding genes (Brewster et al., 2014; von Hippel et al., 1974). In order to recapitulate the essential features of the competition between specific and nonspecific DNA binding, the conditional probability model was extended and applied to a hypothetical two-component (TF and infinitely long DNA with a few embedded specific sites) system. While the 1:1 interaction between TF and a specific site is fully described by the binding constant K, the nonspecific binding is characterized by the binding site size n (in base-pairs), the binding constant K, and the cooperativity parameter ω. Then, combining Eq. 10b with the mass-action law for the 1:1 specific binding, the TF concentrations of free, specifically, and nonspecifically bound forms ([TF], [TF], [TF]can be derived as the following equations:) can be derived as the following equations:where [D]tot and [M]tot are the total concentrations of the specific site and the nonspecific binding motif (base-pair), respectively. Substituting Eq. 16a for [TF] in Eq. 16b, the mass balance equation for the total TF concentration ([TF] = [TF] + [TF] + [TF]) can be numerically solved for θ. The final outcome of the calculation is the fractional occupancy of the specific site (Ysp = [TF]/[D]) as a function of total concentration ratio between TF and the specific site ([TF]/[D] ranging from 0 to 10) (upper panels in Figs. 4A and 4B). In the calculation, the ratio Ksp/K (termed specificity ratio) (Fig. 4A) or the total nonspecific motif concentrations (Fig. 4B) was varied over orders of magnitude while the nonspecific binding site size and cooperativity were fixed at the constant values for simplicity (n = 10, ω = 1).
Fig. 4
Application and extension of the quantitative models for linear lattice systems.
(A and B) Effects of nonspecific protein-DNA interactions on transcription. Upper panels: Using an extended conditional probability model (Eq. 16), the fractional occupancy of specific DNA sites (Ysp = [TF]sp,b/[Dsp]tot) for binding of a hypothetical TF was calculated as a function of molar ratio [TF]tot/[Dsp]tot for various sets of interaction parameters. Bottom panels: The corresponding fractional distribution of TF between specifically (solid curves) and nonspecifically (dashed curves) bound states were calculated. In these calculations, the value of Kns (A) or the concentration of nonspecific motifs ([M]tot) (B) was varied with the fixed values of Ksp = 1 × 1012 M-1, n = 10 bp, and ω = 1 ([M]tot = 5 mM in (A); Kns = 1 × 105 M–1 in (B)). (C) Quantitative model for assembly of the Nup153 IDR hub with multiple interaction partners and competitors (adapted from Cho et al., 2021). The Nup153 IDR presents a high-affinity 1:1 Kap binding site (purple) and a series of low-affinity sites for overlapping binding of multiple Kaps. Kap occupies multiple dipeptide (FG) motifs (pink vertical bars). Using advanced combinatorial models, fine-tuning of the Kap occupancy of Nup153 IDR was predicted as a function of location of the competitor binding site. In the partition function Z, Z0 corresponds to the partition function of the Nup153 IDR in the absence of competition; Kc[C] represents the competitor binding; The terms in the brackets are the partition functions for two subregions of the low-affinity sites separated by the competitor binding; (1 + Ks[P]) represents the 1:1 interaction of Kap with the high-affinity site. (D) On the basis of the multivalent, overlapping IDR-Kap interaction, the Nup153 IDR is proposed to function as a molecular switch to couple nucleocytoplasmic transport to transcription.
At a given specificity ratio and a total motif concentration, as the concentration ratio [TF]/[D] is increased, the fractional occupancy of the specific site by TF monotonically increases with an apparent hyperbolic feature (upper panels in Figs. 4A and 4B). However, the underlying distribution of TF exhibits a dynamic shift from specifically to nonspecifically bound states (bottom panels in Figs. 4A and 4B). For higher specificity ratio or lower nonspecific motif concentrations, the specific complex is predominant in the regime [TF]/[D] < 1, leading to a steep rise in occupancy of the specific site. Consequently, the transition to the nonspecifically bound state is achieved at higher concentration ratio. Therefore, under these conditions, a relatively small amount of TF is required to saturate the specific site and thereby fully activate transcription. Conversely, for lower specificity ratio or higher nonspecific motif concentrations, the nonspecific binding significantly competes with the specific binding even at low [TF]/[D] (bottom panels in Figs. 4A and 4B), attenuating saturation of the specific site (upper panels in Figs. 4A and 4B). These simulations suggest that, since protein-DNA interactions are generally sensitive to many cellular conditions such as salt concentration and osmotic stress, changes in these variables potentially fine-tune the specificity ratio of TFs and thereby the corresponding transcription levels. Furthermore, a change in chromosome packing may indirectly affect the TF-specific site interaction by altering the nonspecific site concentrations. Taken together, nonspecific protein-DNA interactions, via change in either specificity ratio or abundance of nonspecific sites, can modulate the occupancies of specific TF binding sites and consequently reprogram the gene-specific transcriptional activities.Competitions between specific and nonspecific binding or among multiple nonspecific binding modes have been observed in numerous in vitro protein-DNA interactions as well (Bujalowski et al., 1988; Rajendran et al., 1998). Even studies using short oligonucleotides have shown similar competitions due to significantly low specificity ratios (Holbrook et al., 2001; Koh et al., 2008). In order to accurately determine a specific binding constant, the linear lattice models must be applied or further advanced to tease apart the contributions from multiple binding modes to the observed binding signal (Tsodikov et al., 2001).
Competition among distinct target proteins for binding to an intrinsically disordered protein
IDPs often utilize short peptide motifs to recruit multiple distinct targets or multiple copies of an identical target (Cumberworth et al., 2013; Hong et al., 2020; Wright and Dyson, 2015). These IDPs are collectively termed hubs and involved in signal transduction and macromolecular transport. A representative example is Nup153, a subunit of the NPC, that contains a long C-terminal IDR (~600 amino acids in length) (Krull et al., 2004). The IDR presents multiple FG-motifs to interact with Kaps carrying macromolecular cargos into and out of the nucleus. Multiple hydrophobic pockets on the Kap surface are the primary binding sites for the FG-motifs (Bayliss et al., 2000).A recent thermodynamic study has developed an advanced combinatorial model to demonstrate that the Nup153 IDR comprises a high-affinity 1:1 binding site and a series of low-affinity sites for binding of multiple Kaps (Fig. 4C) (Cho et al., 2021). Calorimetric data of various protein concentrations and IDR lengths were scrutinized to further show that the overlapping binding of Kaps to the low-affinity sites results in apparent negative cooperativity. Because the Nup153 IDR potentially interacts with nuclear proteins involved in transcription and chromatin organization (Kadota et al., 2020; Kasper et al., 1999), this study has constructed composite combinatorial models to test how the multivalent Kap binding would be affected by competitive binding of nuclear proteins (Fig. 4C). Remarkably, the simulation has revealed that the Kap occupancy of the low-affinity region can be fine-tuned by changing the location of the competitor binding site (Fig. 4C). This delicate modulation arises from the molecular feature of the overlapping binding: The number of potential Kap binding sites eliminated by the competition is determined by the position of the competitor binding site (Fig. 2B). Therefore, assuming that the Kap occupancy is a proxy for the transport activity of the NPC, it is conceivable that the Nup153 IDR functions as a molecular switch coupling specific nuclear processes to distinct transport states. For instance, a strong promoter may be coupled to the NPC activity in such a way that specific TFs or co-activators associated with the strong promoter target a location in the Nup153 IDR that considerably reduces the Kap occupancy (Fig. 4D). As a consequence of the reduced general transport activity mediated by Kaps, a large amount of mRNA transcribed from the strong promoter may be efficiently exported through the NPC (Fig. 4D). Although awaiting experimental validation, the coupling mechanism built upon multivalent, overlapping IDP-target interactions may contribute to the functional versatility of the IDP hubs in dynamic cellular processes. This exemplary study demonstrates that the original combinatorial model can be readily expanded by simple mathematical operations to account for additional complexities in linear lattice-protein interactions including heterogeneous binding sites.
CONCLUSION
Linear lattice systems and their multivalent interactions with target proteins often regulate dynamic cellular processes. Because of the overlapping target binding sites on a linear lattice, quantitative understanding of such interactions requires a fundamentally different framework as compared to simple 1:1 binding or discrete-site systems. In this review, we discussed the two prevalent approaches in unraveling the linear lattice systems, namely combinatorial and conditional probability models. Constructing the lattice partition functions from the combinatorial approach is straightforward and readily expandable in data analysis and predictions as illustrated in the Nup153 IDR–Kap interaction. On the other hand, the conditional probability model provides invaluable physical insights consistent with the molecular features of the multivalent linear lattice–target interactions. Furthermore, this method is suitable in simulating in vivo nucleic acid systems of apparent infinite lattice length. These frameworks may serve as a cornerstone to develop sophisticated models to analyze more complex cellular processes including competition among multiple DNA binding proteins on nucleosomal DNA (Segal and Widom, 2009) as well as formation of phase-separated condensates involving multiple components (Lyon et al., 2021).
Supplemental Materials
Note: Supplementary information is available on the Molecules and Cells website (