The properties of disordered proteins are thought to depend on intrinsic conformational propensities for polyproline II (PPII) structure. While intrinsic PPII propensities have been measured for the common biological amino acids in short peptides, the ability of these experimentally determined propensities to quantitatively reproduce structural behavior in intrinsically disordered proteins (IDPs) has not been established. Presented here are results from molecular simulations of disordered proteins showing that the hydrodynamic radius (Rh) can be predicted from experimental PPII propensities with good agreement, even when charge-based considerations are omitted. The simulations demonstrate that Rh and chain propensity for PPII structure are linked via a simple power-law scaling relationship, which was tested using the experimental Rh of 22 IDPs covering a wide range of peptide lengths, net charge, and sequence composition. Charge effects on Rh were found to be generally weak when compared to PPII effects on Rh. Results from this study indicate that the hydrodynamic dimensions of IDPs are evidence of considerable sequence-dependent backbone propensities for PPII structure that qualitatively, if not quantitatively, match conformational propensities measured in peptides.
The properties of disordered proteins are thought to depend on intrinsic conformational propensities for polyproline II (PPII) structure. While intrinsic PPII propensities have been measured for the common biological amino acids in short peptides, the ability of these experimentally determined propensities to quantitatively reproduce structural behavior in intrinsically disordered proteins (IDPs) has not been established. Presented here are results from molecular simulations of disordered proteins showing that the hydrodynamic radius (Rh) can be predicted from experimental PPII propensities with good agreement, even when charge-based considerations are omitted. The simulations demonstrate that Rh and chain propensity for PPII structure are linked via a simple power-law scaling relationship, which was tested using the experimental Rh of 22 IDPs covering a wide range of peptide lengths, net charge, and sequence composition. Charge effects on Rh were found to be generally weak when compared to PPII effects on Rh. Results from this study indicate that the hydrodynamic dimensions of IDPs are evidence of considerable sequence-dependent backbone propensities for PPII structure that qualitatively, if not quantitatively, match conformational propensities measured in peptides.
Many proteins, and protein domains, that perform critical biological tasks have disordered structures under normal solution conditions [1-3]. These proteins are referred to as intrinsically disordered [4] and, accordingly, molecular models of disorderedprotein structures are needed to understand the physical basis for the activities [2,3], roles regulating key signaling pathways [5], and relationships to human health issues [6-9] that have been linked to intrinsically disordered proteins (IDPs).The properties of disorderedprotein structures are often associated with conformational propensities for polyproline II (PP
) helix [10-12] and charge-based intramolecular interactions [13-15]. PP
propensities are locally-determined [16] and intrinsic to amino acid type [17-19], while charge-charge interactions seem to be important for organizing disordered structures owing to both long and short range contacts [13-15,20,21]. Since chain preferences for PP
increase the hydrodynamic sizes of IDPs [22,23], and Coulombic interaction energies are distance-dependent, it could be argued that charge effects on IDP structures are modulated locally by intrinsic PP
propensities. A number of issues with that hypothesis, however, are apparent. First, it has not been established if PP
propensities measured in short peptide models of the unfolded states of proteins [17-19] translate to IDPs. It could be that PP
propensities are negligible and unimportant in IDP systems. Second, methods capable of separating the impact of weak to possibly strong local conformational propensities and charge-charge interactions in the context of flexible and disorderedprotein structures have not been demonstrated, but are required for testing any potential interdependence.To investigate such issues, a computer algorithm [22-24] based on the Hard Sphere Collision (HSC) model [25] was developed for parsing the contributions of intrinsic PP
propensities and charge to the structures of IDPs, as represented by the hydrodynamic radius (R
). A HSC model was chosen since PP
propensities and charge effects could be added separately and in steps, to isolate contributions to simulated IDP structures. R
was chosen since experimental values are available for a wide range of IDP sequences, allowing direct comparisons to model-simulated R
.Here we demonstrate that R
for disordered proteins trend with chain propensities for PP
structure by a simple power-law scaling relationship. Using experimental PP
propensities for the common biological amino acids from Kallenbach [17], Creamer [18], and Hilser [19], this relationship was tested against experimental R
from 22 IDPs [23,26-42] ranging in size from 73 to 260 residues and net charge from 1 to 43. We observed that the power-law scaling function was able to reproduce IDP R
with good agreement when using propensities from Hilser, while the Kallenbach and Creamer scales consistently overestimated R
. The ability to describe R
from just intrinsic PP
propensities associated with a sequence was supported by simulation results showing that charge effects on IDP R
are generally weak. Relative to the effects of PP
propensities, charge effects on IDP R
were substantial only when charged side chains were separated in sequence by 2 or fewer residue positions and if the sequence had higher than typical bias for one charge type (i.e., positive or negative). Overall, these results demonstrated that two seemingly disparate experimental datasets, IDP R
and intrinsic PP
propensities, are in qualitative agreement; providing evidence for considerable sequence-dependent conformational preferences for PP
structure in the disordered states of biological proteins.
Results
Computer simulation of R
dependence on PP
propensity
R
for IDPs are sensitive to site-specific and general structural perturbations such as amino acid substitutions [23], changes in net charge [13,14], charge rearrangements [15], and temperature changes [22,43,44]. Fig 1 shows that IDP R
differ substantially from R
for folded proteins [22,45,46] that have similar residue length, N. R
from modeling proteins with no strongly preferred conformations [22], which is referred to as a random coil [47], is also provided for comparison to the experimental values. The solid line representing coil R
was determined from computer simulation of randomly configured polypeptide chains using a HSC model [22]. Owing to favorable native contacts that promote stable globular structures, folded proteins have R
that are compacted relative to the R
of simulated random coils. In contrast, the data in Fig 1 indicate that R
from IDPs are generally larger than random coil estimates.
Fig 1
R
comparison to number of residues, N. Filled and open circles represent experimental R
for IDPs [23,26–42] and folded proteins [22,45,46], respectively.
The solid line is the R
dependence on N estimated from simulations of randomly configured protein structures [22]. Stippled lines show R
for randomly configured structures with chain propensities for PP
(f
) from 0.1 to 1 in 0.1 increments. Every other stippled line is end-labeled by its f
value.
R
comparison to number of residues, N. Filled and open circles represent experimental R
for IDPs [23,26–42] and folded proteins [22,45,46], respectively.
The solid line is the R
dependence on N estimated from simulations of randomly configured protein structures [22]. Stippled lines show R
for randomly configured structures with chain propensities for PP
(f
) from 0.1 to 1 in 0.1 increments. Every other stippled line is end-labeled by its f
value.The dependence of R
on N for chemically denatured proteins follows a power-law scaling relationship,
where R
is 2.2 Å and v is 0.57 [45]. To understand changes in R
and v that are required for modeling the dependence of R
on N for IDPs, it is useful to recognize that unfolded proteins in aqueous solutions absent high concentrations of guanidine hydrochloride or urea show R
compaction [48] with a concomitant decrease in v [49]. Consistent with that observation, Marsh and Forman-Kay demonstrated that R
and N scale with v = 0.509 for IDPs under normal conditions [49]. R
for IDPs was found to depend on PRO content and net charge by,
where f
is the fractional number of PRO residues and |Q| the absolute net charge determined from sequence [49]. Since PRO residues have strong propensities for PP
helix, which is an extended structure [50], and repulsive interactions between charged groups likewise favor extended conformations to minimize unfavorable energetics, a simple molecular interpretation of Eq (2) can be offered whereby the R
dependence on N for IDPs follows a baseline trend of R
= (2.17 Å)∙N
0.509 (i.e., R
with f
and |Q| set to zero) with sequence-dependent increases in R
from this baseline owing to chain propensities for PP
and repulsive charge-charge interactions. Simulated R
for random coils were observed to trend with N by R
= (2.16 Å)∙N
0.509 [22], supporting this hypothesis (and reproduced in Fig 1). The effects of ALA to GLY substitutions on IDP R
also indicated that chain propensities for PP
structure modulate IDP R
and not simply PRO content [23].To model the effects of PP
propensities on coil R
, a sampling bias for PP
structure was applied to random coil simulations and the relationship between R
, N, and fractional number of residues in the PP
conformation, f
, was determined [22,23]. This is shown in Fig 1 by stippled lines to demonstrate that increases in f
cause increases in coil R
. These results were generated from simulations that modeled PP
bias by applying an identical sampling bias for PP
structure at each residue position in a polypeptide chain and, accordingly, did not include effects that could be caused by position-specific variations in PP
propensity.To test for effects on coil R
owing to PP
propensity variations within a polypeptide chain, conformational ensembles for N = 15, 25, 35, 50, and 75 were generated for poly-ALA with the algorithm modified to allow position-specific sampling rates for PP
structure. It was shown previously that the effects of N on R
were mostly insensitive to amino acid sequence in HSC model simulations of disordered proteins [22] and thus poly-ALA was chosen as a computational simplification. Variations in PP
propensity among residue positions were simulated by applying a sampling bias for PP
structure (S
) at every position, every second position, every third position, every fourth position, or every fifth position in the poly-ALA chains. S
at values of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9 were tested at the indicated residue locations. This PP
sampling strategy resulted in 225 separate simulated ensembles (5 N lengths X 5 patterns X 9 S
values).A set of simulations using randomly determined position-specific bias for PP
structure was also modeled using poly-ALA chains. These additional simulations used N = 15, 25, and 35, with each residue position assigned a different random value for S
. Position-specific random assignments were repeated 3 times for S
ranging from 0 to 1, 0 to 0.5, 0.25 to 0.75, and 0.5 to 1, resulting in an additional 36 simulated ensembles (3 N lengths X 3 distributions of random position-specific PP
biases X 4 applied ranges in PP
sampling bias).The ensemble-averaged fractional number of residues in the PP
conformation (i.e., the propensity) can be different from S
in these simulations since randomly generated structures containing van der Waals contact violations are removed from the calculation. Differences between the applied sampling rate (i.e., S
) and the observed ensemble-averaged rate (i.e., f
) at S
-targeted positions followed the same Gaussian relationship that was established previously for whole-chain S
and f
comparisons [22] and thus straight-forward conversion between applied and observed bias rates was available (S1 Fig). f
determined from simulation for residue positions with no applied S
was 0.012 ± 0.004.Cumulative results from the >250 separate ensemble simulations were analyzed in terms of the power-law scaling relationship given by Eq (1). Previously, we demonstrated that the exponential term, v, was dependent on S
while R
was mostly independent of S
with an averaged value of 2.16 Å [22]. Fig 2A shows v, determined from ln(R
/2.16)/ln(N), for each simulated ensemble and plotted as a function of f
calculated for the whole chain. R
for each simulated ensemble was calculated as,
and f
as,
Fig 2
Simulated effect of PP
propensities on coil R
.
Each circle and square represents a simulated disordered polypeptide. Squares are from ensembles simulated with position-specific PP
propensities assigned randomly; circles had PP
propensity assignments that followed the sequence patterns described in the text. In panel A, f
was calculated as /N, where was the ensemble averaged number of residues with (Φ,Ψ) in the PP
region (-75±10, 145±10), and v was calculated as ln(R
/R
)/ln(N) using /2 for R
and 2.16 Å for R
. These data were fit to v = v
+ β∙ln(1-f
), with v
and β as fit parameters, producing the solid line. In panel B, R
was calculated as /2. R
was determined from f
using R
= (2.16 Å)∙N
and the panel A fit for v. R
and R
correlation (R2) is provided in the figure.
Simulated effect of PP
propensities on coil R
.
Each circle and square represents a simulated disordered polypeptide. Squares are from ensembles simulated with position-specific PP
propensities assigned randomly; circles had PP
propensity assignments that followed the sequence patterns described in the text. In panel A, f
was calculated as <N
>/N, where <N
> was the ensemble averaged number of residues with (Φ,Ψ) in the PP
region (-75±10, 145±10), and v was calculated as ln(R
/R
)/ln(N) using /2 for R
and 2.16 Å for R
. These data were fit to v = v
+ β∙ln(1-f
), with v
and β as fit parameters, producing the solid line. In panel B, R
was calculated as /2. R
was determined from f
using R
= (2.16 Å)∙N
and the panel A fit for v. R
and R
correlation (R2) is provided in the figure.In Eq (3), = ∑ L
∙P
, where L
is the maximum Cα-Cα distance calculated for state i, P
is the Boltzmannprobability for state i, and the summation was over all states i of an ensemble. In Eq (4), <N
> = ∑ N
∙P
, where N
is the number of residues in the PP
conformation for state i. The distinction of “chain” given to f
in Eq (4) was provided to limit confusion between f
calculated for a whole chain versus f
calculated for specific residue positions.The relationship between v and f
for all simulations followed a logarithmic trend that was fit to the equation,
using the Levenberg-Marquardt method of nonlinear least squares [51,52]. The parameters v
and β were found to be 0.503 ± 0.002 and -0.11 ± 0.003, respectively. Fig 2B shows that R
determined from f
(Eq (4)) and N by combining Eqs (1) and (5) (see Eq (6) below) correlated strongly with R
calculated directly from a simulated ensemble (Eq (3)). All possible patterns of position-specific PP
bias were not tested in our computer trials. Results in Fig 2 predict, however, that in general a quantitative relationship exists for disordered proteins between R
, N, and the ensemble-averaged per-residue chain propensity for PP
structure (f
).
Test of model using experimental PP
propensities
Results from HSC model simulations that are summarized in Figs 1 and 2 can be interpreted as an ideal relationship between R
and N that includes the general effects of sterics and PP
propensities but is absent other intrinsic and intramolecular factors. Contributions from Coulombic interaction energies to IDP R
will be discussed below and added to this model. First, the simulation-derived relationship between R
, N, and f
is tested by applying experimental PP
propensities to the sequences of IDPs in Fig 1. The identity, sequence, and experimental R
for each IDP are given in Supporting Information (S1 and S2 Tables). This dataset includes 22 IDPs containing 3016 total residue positions. Amino acids represented at rates greater than 0.05 in this dataset were, in rank order and listed by their three letter codes, SER (0.104), GLU (0.100), LEU (0.083), PRO (0.080), ASP (0.074), GLY (0.073), ALA (0.073), THR (0.061), LYS (0.055), GLN (0.053), and VAL (0.053).Amino acid PP
propensities reported by Kallenbach [17], Creamer [18], and Hilser [19] for disordered proteins are reproduced in Table 1 and were used for testing the relationship,
Table 1
Intrinsic backbone PP
propensities measured in disordered peptides.
Kallenbach [17]
Creamer [18]
Hilser [19]
hosta
Ac-G2XG2-NH2
Ac-P3XP3GY-NH2
Ac-VP2XVP2R3Y-NH2
ALA (A)
0.818
0.61
0.37
CYS (C)
0.557
0.55
0.25
ASP (D)
0.552
0.63
0.30
GLU (E)
0.684
0.61
0.42
PHE (F)
0.639
0.58
0.17
GLY (G)
-
0.58
0.13
HIS (H)
0.428
0.55
0.20
ILE (I)
0.519
0.50
0.39
LYS (K)
0.581
0.59
0.56
LEU (L)
0.574
0.58
0.24
MET (M)
0.498
0.55
0.36
ASN (N)
0.667
0.55
0.27
PRO (P)
-
0.67
1.00
GLN (Q)
0.654
0.66
0.53
ARG (R)
0.638
0.61
0.38
SER (S)
0.774
0.58
0.24
THR (T)
0.553
0.53
0.32
VAL (V)
0.743
0.49
0.39
TRP (W)
0.764
-
0.25
TYR (Y)
0.630
-
0.25
average
0.626
0.58
0.35
sequence of host peptide used to measure PP
propensity at the guest position, X
sequence of host peptide used to measure PP
propensity at the guest position, XThese propensity scales were chosen since weak correlations are observed among the group (S2 Fig), indicating a potential for yielding different results when each set is used separately with Eq (6) for a given IDP sequence. A physical explanation for the different PP
propensity values reported for the amino acids is not given here (e.g., the reported ALA PP
propensities are very different when compared), other than to note that their measurements used host peptide sequences that were also very different (Table 1). Kallenbach measured PP
propensities in the background of a GLY-rich host peptide, whereas the scale reported by Creamer was determined for positions flanked on both sides by PRO residues. The propensity scale from Hilser was measured for positions located in between PRO and valine (VAL). Other PP
propensity scales were not included in these tests due to similarities to the Kallenbach, Creamer, or Hilser reported values. For example, a PP
propensity scale from Zondlo [53] correlated with the Creamer values (coefficient of determination, R2, gave a correlation of 0.58), likely owing to the use of a host peptide that also flanked the guest position with PRO residues.Inspection of Table 1 shows that PP
propensities for tryptophan (TRP) and tyrosine (TYR) were not reported by Creamer. For these amino acids, we used the averaged Creamer-reported value calculated from the 18 other amino acids (0.58). In the Hilser set, TRP and TYR had lower than average PP
propensity. In contrast, TRP and TYR had higher than average PP
propensity in the Kallenbach set. Using the Creamer average was a compromise that likely had low significance in our tests since TRP and TYR had very low representation among the IDP sequences; 0.008 and 0.012, respectively. PP
propensities were not reported for PRO and GLY by Kallenbach. Here, we used 1 for PRO since it is generally accepted that PRO has the highest propensity for PP
structure [10,12,17-19]. This gave PRO a larger value than ALA (0.818), which was the amino acid with the highest reported propensity in the Kallenbach set. GLY was given a propensity of 0.50, which is lower than the Kallenbach average (0.626) but higher than the lowest value (0.428). This also was a compromise from observing that GLY had the lowest value in the Hilser set (0.13), but an average value in the Creamer set (0.58).f
was calculated for each IDP by using the amino acid PP
propensity given in Table 1, summing over the IDP sequence, and dividing by N. Fig 3A shows the experimental scales predict different chain propensities for PP
structure for each IDP sequence. The scale from Kallenbach gave f
ranging from 0.746 to 0.628, whereas the Creamer and Hilser scales gave f
from 0.609 to 0.579 and 0.489 to 0.283, respectively. Eq (6) was then used to predict R
from f
for comparison to experimentally observed R
, which is shown in Fig 3B. The average prediction error (|R
−R
|) and the correlation between predicted and observed R
is given in Table 2. To assess contributions from the amino acid scales for predicting R
, a null model was included by assigning each amino acid the PP
propensity of 0.012, the background f
calculated from HSC simulations when no sampling bias for PP
structure was applied (i.e., S
= 0). Accordingly, the null model represents random coil values.
Fig 3
Chain propensity for PP
from experimental scales and comparison of predicted and observed R
.
Panel A gives f
for each IDP sequence, ordered left to right to show the range obtained with each scale, calculated using experimental PP
propensities from Kallenbach (red triangles), Creamer (blue squares), and Hilser (open circles). X is f
from the null model. Panel B shows R
predicted for each IDP using Eq (6) and f
from panel A. Symbols in panel B match panel A representations. Black dots show R
predicted from the composite propensity scale. Stippled line is the identity line.
Table 2
Comparison of predicted and observed R
.
Propensity Scale
Average Error (Å)a
R2b
Average Normalized Errorc
R2d
Null (random coil)
7.1 ± 3.7
0.797
-0.28 ± 0.13
0.265
Kallenbach
13.4 ± 5.4
0.819
0.51 ± 0.15
0.301
Creamer
8.4 ± 4.3
0.817
0.32 ± 0.13
0.297
Hilser
2.5 ± 1.8
0.825
0.006 ± 0.12
0.407
Composite
2.4 ± 1.8
0.834
0.015 ± 0.12
0.423
Static
2.6 ± 2.0
0.799
-0.016 ± 0.13
0.291
determined from |predicted R
—observed R
|
coefficient of determination, correlation of predicted R
and observed R
determined from (predicted R
—observed R
)/(random coil R
)
coefficient of determination, correlation of normalized error and net charge density
Chain propensity for PP
from experimental scales and comparison of predicted and observed R
.
Panel A gives f
for each IDP sequence, ordered left to right to show the range obtained with each scale, calculated using experimental PP
propensities from Kallenbach (red triangles), Creamer (blue squares), and Hilser (open circles). X is f
from the null model. Panel B shows R
predicted for each IDP using Eq (6) and f
from panel A. Symbols in panel B match panel A representations. Black dots show R
predicted from the composite propensity scale. Stippled line is the identity line.determined from |predicted R
—observed R
|coefficient of determination, correlation of predicted R
and observed Rdetermined from (predicted R
—observed R
)/(random coil R
)coefficient of determination, correlation of normalized error and net charge densityDifferent values of f
predict different R
for a given IDP sequence, as expected from Eq (6). For example, the null model, which used the smallest f
values, predict R
that are smaller than observed for each IDP. In contrast, PP
propensities from Kallenbach and Creamer, which report relatively large f
values, predict R
that are larger than observed for each IDP. Experimental propensities from Hilser predict R
that trend with the identity line, showing good agreement, but also showing scatter relative to that line (average error was 2.5 Å). In an attempt to reduce prediction error, a composite PP
propensity scale that used the Hilservalues by default but the Kallenbach values for residues located between GLY (i.e., GLY-X-GLY) and Creamer values for residues located between PRO (i.e., PRO-X-PRO) was tested. This context-specific composite propensity scale (identified as “Composite” in Table 2 and Fig 3B) caused only small changes in predicted R
, with no significant improvement in prediction capabilities relative to using only the Hilser reported PP
propensities.Since R
increases with N (Fig 1), prediction error was normalized for peptide length by,Random coil R
was calculated using Eq (6) with f
= 0.012, the null model value. Average normalized error is given in Table 2 for each propensity scale. Fig 4 shows trends in the normalized error with N and net charge density, determined as the absolute net charge normalized for peptide length,
Fig 4
Correlation of normalized error in predicted R
to N and net charge density.
Normalized error and net charge density were calculated for each IDP using Eqs (7) and (8), respectively. In both panels, red triangles show normalized error from R
predicted using the Kallenbach reported propensities, blue squares from Creamer reported propensities, open circles from Hilser reported propensities, black dots from the composite propensity scale, and X is the null model. Lines are linear fits to the five prediction sets colored as the symbols (Kallenbach scale was red; Creamer was blue, Hilser was stippled black, composite was solid black, and null was dotted black).
Correlation of normalized error in predicted R
to N and net charge density.
Normalized error and net charge density were calculated for each IDP using Eqs (7) and (8), respectively. In both panels, red triangles show normalized error from R
predicted using the Kallenbach reported propensities, blue squares from Creamer reported propensities, open circles from Hilser reported propensities, black dots from the composite propensity scale, and X is the null model. Lines are linear fits to the five prediction sets colored as the symbols (Kallenbach scale was red; Creamer was blue, Hilser was stippled black, composite was solid black, and null was dotted black).S1 Table gives net charge and N for each IDP. No obvious bias with peptide length (i.e., N) was observed in the normalized error for the Hilser and composite propensity scales. Normalized error clearly increased with N when using Kallenbach and Creamer values, indicating that these PP
propensities may be over-estimated when applied to IDP sequences to predict R
. Since the exponent in Eq (6) becomes larger with increasing f
, a set of propensity values that systematically are too large would cause normalized errors that increase with N.It is interesting to note that normalized error correlated with net charge density for each experimental propensity scale (Fig 4B and Table 2), suggesting that prediction error was caused partially by charge effects on R
that were not included in the model. This is not surprising since Marsh and Forman-Kay demonstrated that increases in net charge correlate with increases in IDP R
[49] and the trend we observed of decreasing normalized error with increased net charge density is consistent with their conclusions. Extrapolating this trend to zero net charge density for the Hilser and composite propensity scales yields positive normalized errors suggesting that, in the background of no net charge contributions to R
, the PP
propensities reported by Hilser may also be slightly too large when using Eq (6) to predict R
.While this analysis of experimental PP
propensities indicated that one of the scales was capable of reproducing experimental R
with good agreement for a set of IDPs, it is important to recognize that comparative tests based on Eq (6) may not be suitable for affirmation. Since R
in this model depends only on N and chain averaged propensity for PP
structure, contrived scales that predict IDP R
with similar agreement in terms of the average prediction error are simple to generate. For example, each IDP could be given a sequence-independent f
value of 0.364, which was determined by converting experimental R
to an apparent f
using Eq (6) and then averaging over the IDP dataset. Using this static f
to predict IDP R
gives an average prediction error (identified as “Static” in Table 2) that is close to the error obtained when using the experimental scale from Hilser. Correlations between predicted and observed R
and between normalized error and net charge density for the contrived static scale, however, decreased relative to the correlations that were observed with the experimental scales, suggesting that static representations of f
may not fully capture some molecular dependencies that are inherent to IDP R
.To further investigate the capabilities of Eq (6) for relating IDP R
and PP
propensity, random sets of amino acid scales were generated following a two-step protocol and analyzed. First, a random number between 0 and 1 was used to target an average propensity for a scale. Then, random scales were generated, where each amino acid was assigned a different random value between 0 and 1, until a set was found whose average for the 20 amino acids matched the target determined in the first step (±0.05). The goal from using two steps to generate scales was to ensure that chain averaged propensities in the high, medium, and low range were evenly sampled. This sampling scheme was repeated until 100,000 random scales were generated. Each propensity scale was then used to predict R
from Eq (6) and the results are summarized in Fig 5. It was observed that randomly generated scales gave average prediction errors for the IDP dataset ranging from 1.9 to 239.8 Å, correlations between predicted and observed R
ranging from 0.02 to 0.88, and correlations between normalized error and net charge density from 0 to 0.81. Optimal values for these metrics (i.e., highest correlations coupled with lowest average error), seem to focus toward values of R2 and average error that are obtained when using experimental PP
propensities from Hilser. This result shows that experimental R
of the IDP dataset are in good qualitative agreement with experimental PP
propensities reported by Hilser, and vice versa, giving evidence that the molecular properties of IDPs that link R
, N, and f
are well-approximated by the simple power-law scaling relationship of Eq (6).
Fig 5
R
prediction from random PP
propensity scales.
Random scales were generated as described in the text and used to predict R
for each IDP by Eq (6). Shown is the correlation (R2) obtained for each scale between observed and predicted R
plotted against the correlation obtained between the normalized error (n. error) and the net charge density (ncd). Shown by color is the average prediction error of each scale. Random scales giving average prediction error larger than 75 Å were omitted to emphasize differences at lower error values.
R
prediction from random PP
propensity scales.
Random scales were generated as described in the text and used to predict R
for each IDP by Eq (6). Shown is the correlation (R2) obtained for each scale between observed and predicted R
plotted against the correlation obtained between the normalized error (n. error) and the net charge density (ncd). Shown by color is the average prediction error of each scale. Random scales giving average prediction error larger than 75 Å were omitted to emphasize differences at lower error values.
Effects of Coulombic interaction energies on R
In the HSC model used for this study, a computer algorithm generates polypeptide structures by random conformational search until R
(Eq (3)) converges to a stable ensemble-averaged value [22]. A structure-based energy function parameterized to solvent-accessible surface areas that has been tested extensively [54-62] is used to population-weight each randomly generated structure. To approximate charge effects on ensemble populations, the energy function was modified to include Coulombic interaction energies by,
where the constant 332 converts the energy into units of kilocalories per mole at 25°C, D
is the dielectric of water, Z is the charge at site i or j, R
is the distance between two charged sites i and j (in Å), κ (the Debye parameter) accounts for screening from solution ionic strength, and the sums are over all charge-bearing sites. The Debye parameter was calculated as,
where I is ionic strength (in molarity, M). D
used was 78.3 [63] and I was 0.1 M to represent normal conditions. Since the simulations used poly-ALA chains, charged residues were modeled with a positive or negative charge located at the coordinates of the Cβ atom to denote the approximate location for flexible and charged side chains. Coordinates for the backbone N and O atoms of the first and last residues were used to assign positive and negative charge, respectively, to N- and C-termini. Simulations were limited to 25 residue poly-ALA chains to establish trends for the effects of charge on R
in this model. For each ensemble, an identical S
was applied at each residue position. S
was varied among the different simulations to target ensemble-averaged f
ranging from 0.1 to 0.92.Fig 6A shows that introducing charge at N- and C-termini had no effect on simulated R
for poly-ALA chains. Modeling negative charge at the Cβ position of each residue, or positive charge (S3 Fig), caused large increases in R
from repulsive electrostatic intramolecular interactions. Identical charge at every other residue position caused smaller increases in R
, while identical charge at every third position gave R
that were mostly similar to R
of poly-ALA modeled with no charges. These data predict that the effects of charge on IDP R
should weaken as charged residues separate in sequence, as expected. Fig 6B shows the ensemble-averaged distance between “charged” Cβ atoms that were closest in sequence for each ensemble in panel A, indicating repulsive charge-charge interactions at distances ≥9 Å had only minor effects on R
. The Debye length for the modeled conditions (i.e., 1/κ) was 9.6 Å, which is the distance where interactions between charged groups become negligible at a given ionic strength. The simulation results thus trend with expected outcomes for fully solvated charges. It was also observed that, for polypeptides with each residue position charged, f
calculated for an ensemble was larger than expected based upon the applied S
(Fig 6A inset). This result predicts that repulsive charge-charge interactions between side chain groups preferentially select for the extended PP
structure to minimize unfavorable interaction energies.
Fig 6
Simulated effect of charged residues on R
.
In panel A, the stippled line is R
from Eq (6) with N = 25 and f
= 0–0.98. Plotted symbols are R
from poly-ALA simulations (N = 25) calculated using Eq (3). Open squares are uncharged poly-ALA and open circles have charged termini. Filled circles have each residue modeled with negative charge at the Cβ atom. Filled squares have every other residue modeled with negative charge, filled triangles have every third residue with negative charge, and X is every fourth residue with negative charge. In panel B, is the ensemble averaged distance (in Å) between Cβ atoms from two charged residues, i and j, closest in sequence. Panel B symbols match panel A representations. A inset: comparison of observed f
(shown as obs f
) to f
expected from the applied S
(shown as applied f
; calculated as f
= S
− 0.062∙exp(-(S
-0.63)2/(2∙0.282)) [22]. Note that filled circles trend higher than other plotted data. Inset symbols match panel representations.
Simulated effect of charged residues on R
.
In panel A, the stippled line is R
from Eq (6) with N = 25 and f
= 0–0.98. Plotted symbols are R
from poly-ALA simulations (N = 25) calculated using Eq (3). Open squares are uncharged poly-ALA and open circles have charged termini. Filled circles have each residue modeled with negative charge at the Cβ atom. Filled squares have every other residue modeled with negative charge, filled triangles have every third residue with negative charge, and X is every fourth residue with negative charge. In panel B, is the ensemble averaged distance (in Å) between Cβ atoms from two charged residues, i and j, closest in sequence. Panel B symbols match panel A representations. A inset: comparison of observed f
(shown as obs f
) to f
expected from the applied S
(shown as applied f
; calculated as f
= S
− 0.062∙exp(-(S
-0.63)2/(2∙0.282)) [22]. Note that filled circles trend higher than other plotted data. Inset symbols match panel representations.To test the effects of clusters of charge on R
, polypeptides with patterns consisting of three consecutively charged residues were also simulated (Fig 7). Similar trends were observed, whereby the effects of charge on R
weaken as charged groups (i.e., clusters) were separated in sequence. Charge clusters, however, affected R
when modeled with 4 intervening non-charged residues, with weaker effects persisting at even larger separation distances between the clusters. This contrasts with the simulation results for non-clustered charged residues that exhibited negligible effects on R
when charges were separated by as little as 2 intervening uncharged residue positions (Fig 6A).
Fig 7
Simulated effect of clusters of charged residues on R
.
Filled circles, open circles, open squares, and the stippled line were reproduced from Fig 6A. As in Fig 6A, R
was calculated from poly-ALA simulations with N = 25. A charge cluster was defined as three consecutive residues with negative charge modeled at the Cβ atoms. Charge clusters separated in sequence by two uncharged residues (no charge modeled at Cβ) are shown with filled squares whereas charge clusters separated by four uncharged residues are shown with filled triangles. X and + symbols represent charge clusters separated by six and eight uncharged residues, respectively. Inset: comparison of observed f
to f
expected from the applied S
(following Fig 6A inset description). Inset symbols match panel representations.
Simulated effect of clusters of charged residues on R
.
Filled circles, open circles, open squares, and the stippled line were reproduced from Fig 6A. As in Fig 6A, R
was calculated from poly-ALA simulations with N = 25. A charge cluster was defined as three consecutive residues with negative charge modeled at the Cβ atoms. Charge clusters separated in sequence by two uncharged residues (no charge modeled at Cβ) are shown with filled squares whereas charge clusters separated by four uncharged residues are shown with filled triangles. X and + symbols represent charge clusters separated by six and eight uncharged residues, respectively. Inset: comparison of observed f
to f
expected from the applied S
(following Fig 6A inset description). Inset symbols match panel representations.Since IDPs, in general, contain both positive and negative charges, simulations with opposite charge at adjacent residue positions were also performed. Fig 8A shows that repeating patterns of opposite charge had minimal effects on R
in these simulations, even when each residue position was charged. This was mostly the case for charge clusters too (Fig 8B) with the exception that the simulation would sporadically generate ensembles with compacted R
, whereby “compacted” is used to indicate R
smaller than what was observed for non-charged poly-ALA coils of identical N. Overall, the amount of R
compaction owing to favorable interactions between oppositely charged residues (or clusters) was small when compared to increases in R
that were observed owing to unfavorable interactions between identically charged residues (or clusters).
Fig 8
Simulated effect on R
from oppositely charged residues.
Stippled line in each panel was reproduced from Fig 6A. As in Fig 6A, R
was calculated from poly-ALA simulations with N = 25. Charge was modeled with opposite charge at adjacent residue positions (panel A) or adjacent clusters (panel B). In panel A, filled circles have each residue modeled with charge at the Cβ atom (first residue negative, second residue positive, third residue negative, etc.). Filled squares have every other residue modeled with charge (first residue negative, third residue positive, etc.), filled triangles have every third residue modeled with charge, and X represents every fourth residue modeled with charge. In panel B, each residue in a cluster had identical charge while clusters adjacent in sequence had opposite charge. Filled circles are poly-ALA with every residue charged (i.e., residues 1–3 having negative charge, residues 4–6 with positive charge, residues 7–9 with negative charge, etc.). Charge clusters separated in sequence by two uncharged residues are shown with filled squares (i.e., residue 1–3 with negative charge, residues 4–5 uncharged, residues 6–8 with positive charge, etc.) whereas charge clusters separated by four uncharged residues are shown by filled triangles. X and + symbols represent charge clusters separated by six and eight uncharged residues, respectively. Insets: comparison of observed f
to f
expected from the applied S
(following Fig 6A inset description). Inset symbols match panel representations.
Simulated effect on R
from oppositely charged residues.
Stippled line in each panel was reproduced from Fig 6A. As in Fig 6A, R
was calculated from poly-ALA simulations with N = 25. Charge was modeled with opposite charge at adjacent residue positions (panel A) or adjacent clusters (panel B). In panel A, filled circles have each residue modeled with charge at the Cβ atom (first residue negative, second residue positive, third residue negative, etc.). Filled squares have every other residue modeled with charge (first residue negative, third residue positive, etc.), filled triangles have every third residue modeled with charge, and X represents every fourth residue modeled with charge. In panel B, each residue in a cluster had identical charge while clusters adjacent in sequence had opposite charge. Filled circles are poly-ALA with every residue charged (i.e., residues 1–3 having negative charge, residues 4–6 with positive charge, residues 7–9 with negative charge, etc.). Charge clusters separated in sequence by two uncharged residues are shown with filled squares (i.e., residue 1–3 with negative charge, residues 4–5 uncharged, residues 6–8 with positive charge, etc.) whereas charge clusters separated by four uncharged residues are shown by filled triangles. X and + symbols represent charge clusters separated by six and eight uncharged residues, respectively. Insets: comparison of observed f
to f
expected from the applied S
(following Fig 6A inset description). Inset symbols match panel representations.The results in Figs 6–8 from modeling charge effects on R
indicate that, in general, the strongest effects on R
should occur owing to identical charges at sequentially-adjacent residue positions (Figs 6 and 7) and for polypeptides with the least amount of mixing of positive and negative charge types (Fig 8). To test these two general observations, the IDP dataset was analyzed to determine the net number of adjacent charges in each IDP sequence. This was calculated by first summing the number of ASP residues that had GLU or ASP immediately next or prior in sequence with the number of GLU residues that had GLU or ASP immediately next or prior in sequence to determine the total number of negative charges with an adjacent negatively charged neighbor. A similar calculation was performed using LYS and ARG to determine the number of positive charges with an adjacent positively charged neighbor. The net number of adjacent charges for an IDP was then the absolute value in the difference between the positive and negative adjacent charge numbers (provided in S1 Table). Fig 9A shows that normalized error in predicted R
for the IDP dataset trends with the net adjacent charge density (i.e., net adjacent charge normalized for peptide length), similar to the correlation that was observed between normalized error and net charge density (Fig 4B). This should be expected since net charge and net adjacent charge correlate with R2 = 0.64 in the dataset.
Fig 9
Correlation of normalized error in predicted R
to net adjacent charge density.
Panel A symbols and lines match their Fig 4 representations. Panel B shows correlations (R2) between normalized error and net adjacent charge density for all IDPs, IDPs in the high charge bias group (labeled as “high bias”), and IDPs in the low charge bias group (labeled as “low bias”). Red columns are correlations from using the Kallenbach propensity scale to predict R
, blue from using the Creamer propensities, white the Hilser propensities, and black the composite propensity scale.
Correlation of normalized error in predicted R
to net adjacent charge density.
Panel A symbols and lines match their Fig 4 representations. Panel B shows correlations (R2) between normalized error and net adjacent charge density for all IDPs, IDPs in the high charge bias group (labeled as “high bias”), and IDPs in the low charge bias group (labeled as “low bias”). Red columns are correlations from using the Kallenbach propensity scale to predict R
, blue from using the Creamer propensities, white the Hilserpropensities, and black the composite propensity scale.The set of IDPs was also split according to the amount of mixing of positive and negative charge types in a given sequence. To do this, a “charge bias” was calculated for each IDP as the simple ratio of total negative charges (sum of ASP and GLU residues) to total positive charges (sum of LYS and ARG residues), or vice versa, depending on which ratio gave a value greater than 1. As a metric for separating IDPs with “high” and “low” charge bias, a “typical” charge bias was calculated for the entire dataset by the concatenated sequence and found to be 1.9. The average IDP charge bias, found to be 4.2, was not used to separate IDPs since: 1) ratio-based distributions are skewed, 2) only 7 IDPs would have been in the “high” charge bias set, and 3) 4 of these 7 were sequences derived from the p53protein. Using the charge bias of the concatenated sequence gave 12 IDPs in the high charge bias set and 10 IDPs in the low charge bias set.Fig 9B shows that correlations between net adjacent charge density and normalized error in predicted R
persisted in the set of IDPs with high charge bias and mostly disappeared for IDPs with low charge bias, seeming to agree with the simulation prediction that significant mixing of positive and negative charge types in a sequence should reduce charge effects on R
. Applying this analysis to net charge density gave different results (S4 Fig). Correlations between net charge density and normalized error in predicted R
decreased for both the high and low charge bias sets. This could be owing to trends shown in Fig 6, whereby net charge effects on R
depended strongly on the distance between the charged groups. Overall, these results seem to indicate that charge effects on IDP structures are highly dependent on sequence, however, charge effects on R
can be weakened substantially by mixing negative and positive charge types or by slight increases in the distances between charged groups in sequence. The hypothesis that charge effects on R
may be generally weak for IDPs is supported by data in Fig 3B showing that R
could be predicted without specific consideration of charges when provided an appropriate amino acid scale for intrinsic PP
propensities.
Discussion
Fig 1 shows that experimental R
for IDPs are much larger than computational predictions based on random coil modeling of the R
dependence on N. Numerous studies have demonstrated the importance of Coulombic effects for regulating IDP structural preferences [13-15]. Thus, it could be surprising to note that sequence effects on IDP R
can be predicted with good agreement from sequence differences in PP
propensity, even when other intramolecular factors are ignored. R
predicted from IDP sequence and Eq (6) seemed to work best when using an experimental PP
propensity scale from Hilser and colleagues [19], or a composite scale that combined the Hilser, Kallenbach [17], and Creamer [18] propensities, giving an average error of ~2.5 Å for an IDP dataset covering a wide range of residue lengths, net charge, and sequence composition. As examples of sequence differences in this dataset, the fractional number of PRO residues (f
= (# PRO residues)/N) varied from 0 to 0.24, SER from 0.02 to 0.20, GLU from 0.06 to 0.31, and ALA from 0 to 0.16, indicating significant sequence diversity among the IDPs that were tested.If it were established that molecular descriptions for R
depend mostly on PP
propensities for disordered proteins, this would have important implications. First, R
well-above random coil estimates would indicate non-trivial preferences for PP
structure. Fig 1 shows this to be the case for many IDPs. And second, large variations in R
for IDPs with similar N would indicate large differences in propensity for PP
structure among the biologically common amino acids. Observed differences in amino acid propensity for PP
[17-19,53] are thus consistent with the observed differences in R
for IDPs with similar N. For example, consider that R
varied from 24.5 Å to 32.4 Å for IDPs with N = 87–97 in Fig 1. The average prediction error in R
for these 8 IDPs from using Eq (6) and the composite propensity scale was only 1.7 ± 0.7 Å, though net charge ranged from 4 to 29 for these proteins. In contrast, predictions using random coil values give R
from 20.5 to 21.7 Å with an average error of 6.4 ± 2.7 Å.The simulation-derived relationship between R
, N, and f
appears to be surprisingly simple for disordered proteins. As noted above, Eq (6) should be interpreted as an ideal relationship that excludes many molecular factors known to regulate structural preferences in proteins (e.g., electrostatic effects, cis-trans isomerization rates). Observed deviations from this “ideal” behavior can then be interpreted in terms of factors that were not modeled, as shown (Fig 4B). We recognize that exclusive use of poly-ALA for computational modeling may prove to be unjustified with further studies. Poly-ALA was used as a simplifying step since the effects of N on R
were mostly independent of amino acid sequence in previous HSC-based simulations and agreed with general IDP trends determined from a literature survey [22,49]. As shown here, this simulation-derived relationship provides a straight-forward molecular explanation for R
variations among IDPs. The R
dependence on f
also predicts heat-induced compaction of IDP R
since the enthalpy of unfolding PP
structure is positive [16,64]. Many studies have demonstrated R
compaction caused by elevated temperatures for IDPs [22,43,44].As noted above, the simulation results presented here could be interpreted as indicating that charge effects on R
are generally weak for IDPs, relative to the effects of intrinsic PP
propensities. These data demonstrate, however, that certain sequence patterns of charge can modulate R
substantially (see Fig 6). For charged groups, this would be those that are separated at distances averaging less than the solution Debye length, involving identical charge type (i.e., positive or negative), and within a region showing higher than typical charge bias. These general rules are in qualitative agreement with results from Pappu and colleagues showing that simulated hydrodynamic sizes for highly charged and disordered polypeptides, with every residue modeled as GLU or LYS, depend strongly on the mixing of negative and positive charge types [15]. In that study, mixing of charge types in a sequence caused structural compaction relative to biased charge distributions, similar to our own conclusions. The observation that unfavorable charge-charge interactions between side chain groups can promote PP
structure (Figs 6A and 7 insets) has also been noticed in computational studies from other researchers [14,65]. This result predicts multiple mechanisms for charge-mediated regulation of IDP structure; possibly owing to both the accumulation of charge and local modulation of PP
propensities. Overall, these data demonstrate the importance of sequence context for understanding the structural properties of IDPs and for describing quantitatively how disorderedprotein structures respond to discrete perturbations such as changes in charge state and amino acid substitutions.
Methods
Computer generation of polypeptide structures
Detailed description of the computer algorithm that was used is provided elsewhere [22,24]. Briefly, simulations of disorderedprotein structures were limited to poly-ALA polypeptides. Main chain atoms of poly-ALA were generated using the standard bond angles and bond lengths [66] and a random sampling of the dihedral angles Φ, Ψ, and ω. The dihedral angle ω was given a Gaussian fluctuation of ±5° around the trans value of 180°. To sample conformational space efficiently, (Φ,Ψ) values were restricted to the allowed Ramachandran regions [67]. Of the two possible positions of the side chain Cβ atom, the one corresponding to L-alanine was used throughout the studies. To calculate state distributions typical of protein ensembles, a structure-based energy function parameterized to solvent-accessible surface areas was used to population-weight the generated structures [54-62].
Comparison of f
and S
.
In this figure, S
is the average applied sampling rate for PP
for residues with S
≠ 0 in a simulation, while f
was the observed per-position average PP
rate, also excluding residues with S
= 0. Open circles are from ensembles where position-specific S
followed the pattern specified in the text (i.e., different simulations had different S
ranging from 0.1 to 0.9 in 0.1 increments applied to each residue, every other residue, every third residue, etc.) which is why circles align at S
= 0.1–0.9 in 0.1 increments. Blue circles give the average f
for each applied S
. Open squares represent this calculation performed on simulations using randomly assigned position-specific S
. Stippled line is the identity; solid line is the relationship between f
and S
established previously for S
applied at constant values across all residues [22]. In general, f
trends with S
by: f
= S
-0.062∙exp(-(S
-0.63)2/(2∙0.282)). This gives the algorithm the ability to target specific f
from the applied value of S
.(TIF)Click here for additional data file.
Correlation of experimental PP
propensities for the common amino acids.
Panel A, correlation of Kallenbach [17] and Creamer reported values [18]. Panel B, correlation of Kallenbach and Hilser reported values [19]. Panel C, correlation of Creamer and Hilser reported values. Panel D, correlation of Creamer and Zondlo reported values [53].(TIF)Click here for additional data file.
Simulated effect of positive charged residues on R
.
Stippled line is R
from Eq (6) with N = 25 and f
from 0 to 0.98. Symbols are simulated R
from ensembles of poly-ALA (N = 25) using Eq (3) (R
= /2). Filled circles have each residue modeled with positive charge at the Cβ atom. Filled squares have every other residue modeled with positive charge, filled triangles have every third residue modeled with positive charge, and X represents every fourth residue modeled with positive charge. Inset: comparison of observed f
to f
expected from the applied S
(following Fig 6A inset description). Inset symbols match panel representations.(TIF)Click here for additional data file.
Correlation of normalized error in predicted R
to net charge density.
Shown are correlations (R2) between normalized error and net charge density for all IDPs, IDPs in the high charge bias group (labeled as “high bias”), and IDPs in the low charge bias group (labeled as “low bias”). Red columns are correlations from using the Kallenbach propensity scale to predict R
, blue from using the Creamer propensities, white the Hilserpropensities, and black the composite propensity scale.(TIF)Click here for additional data file.
Authors: Alice Soragni; Barbara Zambelli; Marco D Mukrasch; Jacek Biernat; Sadasivam Jeganathan; Christian Griesinger; Stefano Ciurli; Eckhard Mandelkow; Markus Zweckstetter Journal: Biochemistry Date: 2008-09-20 Impact factor: 3.162
Authors: Vladimir N Uversky; Vrushank Davé; Lilia M Iakoucheva; Prerna Malaney; Steven J Metallo; Ravi Ramesh Pathak; Andreas C Joerger Journal: Chem Rev Date: 2014-05-15 Impact factor: 60.622
Authors: Katerina E Paleologou; Adrian W Schmid; Carla C Rospigliosi; Hai-Young Kim; Gonzalo R Lamberto; Ross A Fredenburg; Peter T Lansbury; Claudio O Fernandez; David Eliezer; Markus Zweckstetter; Hilal A Lashuel Journal: J Biol Chem Date: 2008-03-14 Impact factor: 5.157
Authors: Lance R English; Alexander Tischer; Aysha K Demeler; Borries Demeler; Steven T Whitten Journal: Biophys J Date: 2018-07-17 Impact factor: 4.033
Authors: Lance R English; Sarah M Voss; Erin C Tilton; Elisia A Paiz; Stephen So; George L Parra; Steven T Whitten Journal: J Phys Chem B Date: 2019-11-14 Impact factor: 2.991
Authors: Nabanita Saikia; Inna S Yanez-Orozco; Ruoyi Qiu; Pengyu Hao; Sergey Milikisiyants; Erkang Ou; George L Hamilton; Keith R Weninger; Tatyana I Smirnova; Hugo Sanabria; Feng Ding Journal: Cell Rep Phys Sci Date: 2021-10-15
Authors: Leidys French-Pacheco; Cesar L Cuevas-Velazquez; Lina Rivillas-Acevedo; Alejandra A Covarrubias; Carlos Amero Journal: PeerJ Date: 2018-06-07 Impact factor: 2.984