Literature DB >> 32345021

Investigating the Conformational Ensembles of Intrinsically Disordered Proteins with a Simple Physics-Based Model.

Yani Zhao¹, Robinson Cortes-Huerto¹, Kurt Kremer¹, Joseph F Rudzinski¹.

Abstract

Intrinsically disordered proteins (IDPs) play an important role in an array of biological processes but present a number of fundamental challenges for computational modeling. Recently, simple polymer models have regained popularity for interpreting the experimental characterization of IDPs. Homopolymer theory provides a strong foundation for understanding generic features of phenomena ranging from single-chain conformational dynamics to the properties of entangled polymer melts, but is difficult to extend to the copolymer context. This challenge is magnified for proteins due to the variety of competing interactions and large deviations in side-chain properties. In this work, we apply a simple physics-based coarse-grained model for describing largely disordered conformational ensembles of peptides, based on the premise that sampling sterically forbidden conformations can compromise the faithful description of both static and dynamical properties. The Hamiltonian of the employed model can be easily adjusted to investigate the impact of distinct interactions and sequence specificity on the randomness of the resulting conformational ensemble. In particular, starting with a bead-spring-like model and then adding more detailed interactions one by one, we construct a hierarchical set of models and perform a detailed comparison of their properties. Our analysis clarifies the role of generic attractions, electrostatics, and side-chain sterics, while providing a foundation for developing efficient models for IDPs that retain an accurate description of the hierarchy of conformational dynamics, which is nontrivially influenced by interactions with surrounding proteins and solvent molecules.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2020 PMID： 32345021 PMCID： PMC7246978 DOI： 10.1021/acs.jpcb.0c01949

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

Despite lacking stable tertiary structure under physiological conditions, intrinsically disordered proteins (IDPs) are involved in a large number of important biological functions, including intracellular signaling and regulation, and are also associated with a broad range of diseases, including cancer, neurodegenerative diseases, amylidoses, diabetes, and cardiovascular disease.[1,2] The experimental characterization of IDPs is complicated by the heterogeneous nature of their disordered conformational ensembles (i.e., conformational distributions), which challenges traditional techniques developed for folded proteins. For example, X-ray crystallography and cryo-EM, which recover high-resolution images of biomolecules in the crystalline or frozen state, are fundamentally inappropriate for characterizing the distribution of relevant IDP conformations.[3] However, techniques including nuclear magnetic resonance (NMR), small-angle X-ray scattering (SAXS), single-molecule Förster resonance energy transfer (FRET), dynamic light scattering (DLS), and two-focus fluorescence correlation spectroscopy (2f-FCS) are capable of identifying the conformational transitions sampled by IDPs,[4−7] since they perform measurements of the protein as it fluctuates within its “natural” environment. However, these measurements provide limited resolution in terms of the specification of a unique corresponding microscopic distribution of conformations. In other words, there may exist multiple distinct conformational ensembles which reproduce the experimental measurements, requiring molecular models to infer the correct underlying distribution. As a result, molecular simulations have become increasingly important tools for obtaining microscopic insight that supports experimental observations (e.g., for the characterization of IDP conformational ensembles).[4] All-atom (AA) models have gained significant popularity for providing detailed descriptions of complex biomolecular processes and, in conjunction with reweighting techniques, can also be used to assist in the interpretation of experimental measurements. The application of AA simulations to study IDPs has brought to light transferability problems of standard models, which were constructed to stabilize three-dimensional structures of folded proteins. These force fields not only predict overly compact structures,[8] but distinct AA models can also generate widely varying and qualitatively different secondary structure content for a given protein sequence.[9] Recent efforts have been made to adjust these models to more accurately describe the properties of IDPs.[8,10,11] Despite these improvements, AA simulations remain prohibitively expensive for investigating the environment-dependent conformational dynamics of IDPs, due to the expansive conformational landscape traversed by these systems. Moreover, the large range of time scales (from picoseconds to hours), thermodynamic or chemical conditions (e.g., denaturation concentrations), as well as system variations (e.g., sequence mutations) commonly explored in experimental studies represent an overwhelming gap in computational accessibility for AA models that is unlikely to be overcome in the near future through improvements in software or hardware. The computational expense of these detailed models has motivated the use of much simpler polymer models[12,13] (e.g., ensemble construction methods[14] or analytically solvable polymer models)[4] to provide microscopic interpretations for the experimental characterizations of processes involving IDPs. The disordered nature of IDPs results in conformational heterogeneity and broad intramolecular distance distributions, reminiscent of generic models from the study of polymer physics.[15] However, these models are limited in resolution and often lack the ability to provide significant microscopic insight beyond what can already be inferred from experiments. Moreover, the simplicity of the model approximations have been shown to generate inconsistencies in the interpretation of experimental measurements.[16−19] Native-biased models (e.g., Go̅-type models)[20], which use experimentally determined protein structures to construct a potential energy function with the protein’s native state at the global minimum, have contributed immensely to our basic understanding of the driving forces for protein folding.[21−23] When combined with additional non-native interactions, these models provide a straightforward route to elucidate the essential features for reproducing a given experimental observation.[24−26] Although these models have been useful for investigating the environment-dependent folding processes of IDPs[27,28] (i.e., coupled folding and binding processes), their reliance on a well-defined native structure limits their ability to describe unfolded or disordered conformations. This limitation can even propagate into the characterization of the folding process of globular proteins, resulting in a qualitatively incorrect representation of folding pathways.[29] Recent work from Shell and co-workers aims to partially alleviate this limitation by combining transferable bonded interactions with traditional nativelike “nonbonded” interactions.[30] There have also been significant advancements in the development of physics-based coarse-grained (CG) models to describe the temperature-dependent collapse and liquid–liquid phase separation of IDPs.[31,32] Recently, Rudzinski and Bereau proposed a simple physics-based model[33] for describing largely disordered conformational ensembles of peptides. The foundational premise of the model is that the sampling of sterically forbidden conformations, due to missing degrees of freedom, can seriously complicate the faithful description of both static and dynamic properties in CG models of proteins. This complication is perhaps most severe for disordered ensembles, where conformational entropy plays an important role in shaping the free-energy landscape. For this reason, the steric interactions and local stiffness of the protein are described at a united-atom resolution (i.e., explicit representation of all heavy atoms). These interactions account only for excluded volume and chain stiffness, without explicit attractions between atoms which reside at significant separation along the peptide chain. In addition to these detailed interactions, CG attractive interactions are added to represent the characteristic driving forces for peptide secondary and tertiary structure formation. For example, in the introductory studies,[33,34] the authors employed a generic attractive interaction between Cβ carbons in order to model the effective attractions between side chains due to the hydrophobic effect. Additionally, attractive interactions between Cα atoms separated by three peptide bonds along the protein backbone were employed to model helix-forming hydrogen-bonding interactions. These two interactions represent the minimum set of interactions necessary for qualitative reproduction of the conformational ensemble of short peptides, that is, to sample helical, coil, and swollen (i.e., hairpinlike) structures. The model was shown to accurately characterize both structural and kinetic properties of helix–coil transitions in small peptides, demonstrating its potential for efficiently describing disordered ensembles, while retaining relevant microscopic details.[33,34] Furthermore, the Hamiltonian of the model can be easily adjusted to investigate the driving forces for particular processes. In this work, we apply variants of this simple physics-based model to investigate the role of distinct interactions in shaping disordered protein ensembles. As a model system, we consider the activation domain, ACTR, of the SRC-3 protein, a “fully disordered” protein with only transient helical propensity.[35−37] One way in which IDPs perform their function is by adapting to their environment through so-called coupled folding and binding processes.[38] For example, ACTR can form a structured complex with the nuclear-coactivator binding domain (NCBD) of the transcriptional coactivator CREB-binding protein (CBP), which plays an important role in the regulation of eukaryotic transcription.[39] CBP demonstrates the functional advantages of IDPs in the regulation of genes,[39] participating in interactions with more than 400 transcription factors in the cell.[40] In the absence of a binding partner, NCBD is a molten globule with three substantial helical regions[39] but undergoes coupled folding and binding processes with a variety of distinct ligands.[41] Within the NCBD/ACTR complex, the three helices of NCBD form a bundle with a hydrophobic groove in which ACTR is docked, and the assembly of the two proteins promotes three helices in ACTR[37] (see Figure ).

Figure 1

Visualization of the NCBD/ACTR folded complex (PDB ID: 1KBH). The number labels correspond to residue numbers at the beginning and ends of helices formed by NCBD (red) and ACTR (blue).

Visualization of the NCBD/ACTR folded complex (PDB ID: 1KBH). The number labels correspond to residue numbers at the beginning and ends of helices formed by NCBD (red) and ACTR (blue). Great efforts have been made to understand how IDPs recognize their binding partners.[42,43] For example, electrostatic attractions have been shown to play an important role in driving the formation of encounter complexes between the binding pair.[38,44] Additionally, the change in the solvent-accessible surface area of IDP residues upon binding suggests that IDPs can utilize different residues along the amino acid sequence for interactions with different binding partners.[2] The conformational diversity of IDPs leading to the folded state makes it challenging to precisely characterize their binding mechanisms both experimentally and computationally.[45,46] Previous work has identified two limiting mechanisms of coupled folding and binding: “conformational selection” and “induced fit”. The conformational selection mechanism is characterized by an IDP which samples the relevant folded structure (or some fraction of this structure) within the unbound ensemble. In the induced fit mechanism, the folded state only arises within the conformational ensemble of the IDP through interactions with its binding partner. In practice a combination of these is typically observed.[35,47,48] Thus, a key step to describing the binding mechanism for a particular IDP/partner pair, especially in cases where conformational selection is prominent, is to characterize the unbound ensembles of the molecules. Previous computational work employing Go̅-type models has found that the NCBD/ACTR folding process demonstrates dominant characteristics of the induced fit mechanism.[47] However, a mechanistic shift toward conformational selection is also possible when NCBD attains a distinct folded structure after a proline isomerization.[49] Computational investigations of coupled folding and binding typically employ models that are not explicitly constructed to accurately represent the unbound ensembles of the individual binding partners. While the unbound ensemble of NCBD has been analyzed using both AA and CG simulations,[35,36,50] the unbound conformational behavior of ACTR has not, to our knowledge, been investigated in detail. Instead, ACTR is usually taken to be a fully disordered ensemble, as characterized by simple polymer models.[15,51] The present investigation employs an intermediate resolution physics-based model to characterize the ensemble of ACTR in the absence of a binding partner. This model enables systematic analysis of the impact of distinct interactions and sequence specificity on the randomness of the resulting conformational ensemble. In particular, starting with a model akin to a bead–spring (BS) model and then adding more detailed interactions one by one, we construct a hierarchical set of models and perform a detailed comparison of their properties. Our analysis shows the following: (i) The incorporation of generic attractions between amino acid side chains significantly expands the diversity of the conformational ensemble, without severely perturbing the distribution of the radius of gyration. (ii) Electrostatic interactions can increase the ruggedness of the conformational landscape, reducing the overall conformational heterogeneity, but can simultaneously introduce additional routes for stabilizing particular secondary structures. (iii) Side chain sterics play a crucial role in determining the overall shape of the free-energy landscape through stabilization of particular structural motifs. The rest of the paper is organized as follows. In section , the hierarchical set of models, associated simulation protocol, and relevant analysis tools are described in detail. Section presents a detailed characterization of the hierarchy of CG models describing the unbound conformational ensemble of ACTR. Two additional polypeptides are also considered to investigate the effect of side chain excluded volume on the conformational ensembles of IDPs. Then, the transferability to the unbound ensemble of NCBD using the more detailed models is assessed. Finally, section provides a brief discussion and conclusions from the investigation.

Methods

Protein Sequences

This work considers the activation domain, ACTR, of the SRC-3 protein and the nuclear-coactivator binding domain (NCBD) of the transcriptional coactivator CREB-binding protein (CBP). The amino acid sequences of ACTR and NCBD are given by:where eqs and 2 are the sequences for ACTR and NCBD with 71 and 59 residues, respectively. The spacing in the equations separate the sequences into groups of 10 residues. Hydrophobic residues are labeled in red font, while the positively and negatively charged residues are denoted by “+” and “–”, respectively. Upon interaction, NCBD and ACTR form a stable folded complex (Protein Databank (PDB) ID: 1KBH, see Figure ).

A Simple Physics-Based Model for Describing Disordered Ensembles

ACTR and NCBD were modeled using a physics-based approach that represents the protein in near-atomic detail while treating the solvent implicitly through effective interactions between protein atoms.[33,34] The total potential energy function of the model can be written as a sum of three terms:Uloc represents local interactions contributing to chain connectivity and stiffness and employs the standard functional forms and parameters for bond, angle, dihedral, and 1–4 interactions given by the Amber99SB-ILDN force field[52,53] (see Figure S1). For reasons that will become clear below, we write Uloc as a sum of two contributions:where Ubond represents the bond interactions between pairs of covalently bonded atoms and Ustiff represents the remaining local interactions listed above. Uexc represents excluded volume interactions at a united-atom resolution (i.e., an explicit representation of all heavy atoms, without hydrogens). The excluded volume interaction for each heavy atom pair was determined by transforming the Lennard-Jones interactions between the pair (again given by the Amber99SB-ILDN force field) to a Weeks–Chandler–Andersen (WCA) potential (i.e., a purely repulsive potential). Uatt represents the attractive interactions employed between Cα, Cβ, and representative side-chain atoms and can contain several distinct contributions. In this work, we consider a hierarchy of eight different models which systematically build upon each other (see Table ). The first model employs only bond and excluded volume interactions similar to the self-avoiding random walk model from polymer theory: U(1) = Ubond + Uexc.[12] The second model adds stiffness to the chain by incorporating the other local interactions: U(2) = Uloc + Uexc. The remaining models employ the full Utot potential with varying representations of Uatt: U(id) = Uloc + Uexc + Uatt(id), for id ∈ {3a, 3b, 4, 5a, 5b, 6}. Model 3 employs attractive interactions between Cβ atoms, Uhp, to model the hydrophobic attraction between side chains: with σ = 0.5 nm. We consider two variants of model 3: (i) x = a, where the same parameter is employed for all amino acids (denoted homo), ϵhp,(a) = ϵhp, and (ii) x = b, where the parameter depends on the identity of the pair of residues (denoted hetero), , where εhp, is determined according to the Miyazawa–Jernigan interaction matrix[54] (see Figure S2 and Table S1). More specifically, to set the absolute scale of these interactions, we followed the work of Bereau and Deserno.[55] Briefly, the 20 × 20 Miyazawa–Jernigan interaction matrix is reduced to 20 residue-specific energy values, which approximately generate the full matrix through geometric averages between pairs of residue types. These energy values are then normalized to be between 0 (most hydrophilic) and 1 (most hydrophobic). Finally, a single overall interaction scale, εhp, is chosen to determine all values of εhp, simultaneously. Model 4 builds upon model 3 by incorporating electrostatic interactions, UDH, between charged residues: Uatt(4) = Uhp(b) + UDH. These interactions are described at a coarse-grained level of resolution (see Figure S3), using the Debye–Hückel formalism,[56] where the full point charge is placed on the last side chain carbon (i.e., furthest from the backbone) for each charged residue: arginine (R), lysine (K), aspartic acid (D), and glutamic acid (E). In particular, the electrostatic energy is given bywhere kJ mol–1 nm e–2 and ε = 80 at room temperature for monovalent salt; κ–1 is the Debye screening length; q and q are the point charges of the ith and jth charged sites; and r is the distance between these sites. κ–1 = 0.313 I–1/2 nm mol1/2 L–1/2, where is the ionic concentration of the solution, n is the number of unique ionic species, and c is the molar concentration of the ion type i with charge q.[57] Employing physiological concentrations, c = 0.1 mol/L for all ions, we obtain κ–1 = 1 nm.

Table 1

Overview of Interactions for Model Hierarchy

model id	U_bond	U_stiff	U_exc	U_hp^(x)	hp type	U_hb	U_DH
1	yes	no	yes	no	N/A	no	no
2	yes	yes	yes	no	N/A	no	no
3a	yes	yes	yes	yes	homo	no	no
3b	yes	yes	yes	yes	hetero	no	no
4	yes	yes	yes	yes	hetero	no	yes
5a	yes	yes	yes	yes	homo	yes	no
5b	yes	yes	yes	yes	hetero	yes	no
6	yes	yes	yes	yes	hetero	yes	yes

Model 5 builds upon model 3 by incorporating “local” hydrogen-bonding interactions, Uhb, between Cα atoms that are separated by three residues along the peptide backbone: Uatt(5 = Uhp( + Uhb. This interaction ensures that the proteins are capable of forming α-helical conformations. The incorporation of 1–4 hydrogen bonds independently from hydrogen bonds occurring between residues farther apart along the peptide chain allows the independent investigation of the driving forces for helical versus β-sheet conformations. The latter are not considered in the present study since ACTR does not have a substantial propensity toward β-sheet formation. In a way, the local hydrogen bonds represent a “nativelike” interaction for peptides that fold into a single helix. For this reason, the model was originally designated as a “hybrid Go̅” model, indicating the combination of atomically detailed physics-based interactions with simplistic (possibly natively biased) attractive interactions at a coarser level of resolution. Note that in previous work the hydrogen-bonding interaction was denoted nc for “native contact”. Following previous work employing native-biased CG models, we employ a hydrogen-bonding interaction with a Lennard-Jones form along with a desolvation barrier using the following functional form:[24]Uhb = ∑Udb,, whereIn eq , rcm = 0.5 nm is the position of the first potential minimum with a corresponding depth of εhb, and rdb = 0.65 nm is the position of the desolvation barrier maximum with a corresponding height of εdb = 0.4εhb. Z(r) = (rcm/r), Y(r) = (r – rdb)2, , B = mεssm(rssm – rdb)2( with εssm = εdb/100 and rssm = rcm + 0.3 nm, , and . The parameters k = 6, m = 3, and n = 2 control the shape of Uhb (see ref (33) for a plot of the potential). Again, two variants of Uhp( are considered, with homo- and heterotype interactions for x = a and x = b, respectively, as described above. Finally, model 6 also incorporates electrostatic interactions: Uatt(6) = Uhp(b) + Uhb + UDH. The hierarchy of models employed in this work is summarized in Table . Previous work using model 4 performed an extensive search in parameter space to characterize the behavior of the model in the context of helix–coil transitions of short peptides.[33,34] Here, we tune the parameters of the model in an attempt to accurately describe the conformational ensemble of ACTR. There are no adjustable parameters for the local, excluded volume, and electrostatic interactions. Moreover, as described above, several of the parameters for the hydrogen-bonding interactions have been fixed based on previous work.[33,34] Thus, the models are left with just two free parameters: εhp and εhb. εhp was initially determined by simulating model 3a with various parameter values and then comparing the generated Rg distribution with that determined from experimental measurements.[6] For model 3b (hetero hp type), the residue-specific hydrophobic attractions were applied such that the average hydrophobic interaction energy (i.e., the average value of εhp, along the chain) was identical to that of model 3a (homo hp type). εhp,(b) for ACTR is presented in Figure S3. After fixing εhp, εhb was determined by simulating model 4 with various parameter values and then comparing the generated average fraction of helical segments per residue, h(i), to experiments.[6] With the exception of the difference in εhp used for the homo and hetero variants described above, identical εhp and εhb parameters were employed for the entire hierarchy of models, wherever applicable. We hypothesize that the very accurate representation of sterics in the model will result in energetic parameters that are quite sequence-transferable for sequences that exhibit largely disordered ensembles. A challenging test of transferability is assessed toward the end of this work by considering the molten globule NCBD. When comparing models with fundamentally different interactions, there is no unique procedure for calibrating the energy scales of the models. When the interaction sets are not entirely different (as is the case for the hierarchy of models considered here), one option is to evaluate the models on the same absolute temperature scale, as dictated by the simulation protocol. This would lead to different ensemble properties at the relevant temperature, due to changes in the incorporated interactions. Alternatively, one can work with a reduced temperature scale, defined with respect to a reference temperature, T*, at which a particular ensemble property is reproduced. We follow this latter approach in the present work, and define T* as the temperature at which the average experimental radius of gyration is reproduced. In terms of the absolute temperatures employed in the simulation protocol, T* corresponds to 300 K for ACTR for models 3a, 3b, 5a, and 5b and 270 K for models 4 and 6. For NCBD, T* corresponds to 330 K for the two considered models, 5b and 6.

Simulations

All simulations of the hierarchical set of physics-based models were performed with the GROMACS 4.5.5 simulation suite[58] in the constant NVT ensemble while employing the stochastic dynamics algorithm with a friction coefficient γ = (2.0 τ)−1 and a time step of 1 × 10–3 τ. The CG unit of time, τ, can be determined from the fundamental units of length, mass, and energy of the simulation model. Employing any one of the Lennard-Jones radii and energies from the Amber99SB-ILDN force field yields a time unit on the order of 1 ps. We report the connection to physical units since the models are simulated using these units within the GROMACS suite. For simplicity, we define τ = 1 ps, and report the simulation protocol in units of τ. This relationship to physical units does not provide any meaningful description of the absolute time scale of characteristic dynamical processes generated by the model, due to a lost connection to the true dynamics.[59] The present study focuses on ensemble-averaged properties of the generated ensembles and does not attempt to calibrate or interpret the generated dynamics, although previous studies with this model have demonstrated the faithful reproduction of kinetic processes for secondary-structure formation.[33,34] For each peptide, a single chain was placed in a cubic box with a volume of (20 nm)[3] and simulated without periodic boundary conditions. Thus, no explicit cutoffs were used for the interaction functions described in the previous section. Replica exchange simulations[60] were performed to enhance the sampling of the system. In total, 16 temperatures ranging from 225 to 450 K were scanned with an average acceptance ratio of 0.4. These represent absolute simulation temperatures, which were transformed to reduced temperatures for comparison of different models (as described above). The exchange of replicas was attempted every 500 or 1000 τ, and each simulation was run for at least 500 000 τ. The convergence of the simulations were assessed by randomly dividing each trajectory into two groups and then checking for consistency of various observables, including the average radius of gyration and the average fraction of helical segments, as well as autocorrelation functions of the radius of gyration and of the end-to-end distance. Representative examples of the convergence tests are presented in Figures S4 and S5. For comparison with more generic polymer ensembles, we considered a BS model (often referred to as the Kremer–Grest model),[61] which represents each monomer (i.e., residue) with a single CG site. Connections between monomers are represented with the finite extensible nonlinear elastic (FENE) potential. We considered two variations of the BS model, which differed in the treatment of nonbonded interactions. The first (denoted “BS”) employed a purely repulsive WCA potential to represent interactions between monomers, while the second (denoted “BS-LJ”) employed a standard Lennard-Jones (LJ) potential with a cutoff rc = 2.5σ. The properties of the BS models are determined in reduced units in terms of the LJ interaction radius, σ, the well depth, ε, and the mass, m, of a monomer. The corresponding time unit is . The BS models were simulated at a temperature of T* = 2.0 ε/kB. Simulations of the BS models were performed with the ESPResSo++ package.[62] Each simulation employed a time step of 0.005 τ and was run for 3.2 × 109 τ, while using the Langevin thermostat with a damping coefficient of 1.0 τ–1.

Analysis

Polymeric Behavior

Because IDPs possess some properties similar to those of more generic polymer systems, such as long-range fluctuations and structural heterogeneity, traditional polymer physics analysis can be useful for providing an overarching description of the conformational ensembles of IDPs.[15] The single-chain backbone structure factor, which characterizes the overall shape of a molecule, is given by[63,64]where N is the number of residues (N = 71 for ACTR and N = 59 for NCBD) and q is the wave vector. r corresponds to the position of the Cα atom of the ith residue for the physics-based models and the position of the ith bead for the BS models. S(q) is widely used to characterize polymer systems.[64] We also calculated the shape parameters (radius of gyration), Re2 = (r–r1)2 (end-to-end distance), and (inter-residue distance between the Cα atoms). We will use the notation , where X = {Rg, Re, dC(i, j)}. The average (real space) distance between two residues separated by m residues along the chain is calculated as , where is a sum over all ij pairs with |j – i| = m and N* is the number of such pairs. Note that , where ν is the Flory scaling exponent. Thus, it is useful to consider the normalized quantitysuch that is constant for a random walk and proportional to m0.1 for a self-avoiding random walk.[13,64] For a slightly more detailed view of the ensemble, we also calculated contact probability maps, which are obtained by determining the probability that a pair of Cα atoms are within a given cutoff distance, rc, from one another. In this case, we have chosen rc = 1.0 nm. Additionally, we calculated the gyration tensor:where m, n ∈ {x, y, z}. Note that only the Cα atoms were taken into account in the calculation of the gyration tensor, for consistency with the BS models. The eigenvalues of S are calculated and ordered as λ1 ≤ λ2 ≤ λ3. The asphericity of the chain can be characterized in terms of these eigenvalues: . The asphericity values reported throughout the text are normalized by Rg2 = λ3 + λ2 + λ1: b̃ = b/Rg2. For a self-avoiding random walk, the ratios of eigenvalues are λ3:λ2:λ1 ≅ 12:3:1 (i.e., λ3/λ1 = 12 and λ3/λ2 = 4).[65]

Helical Propensity

The helical propensity of the peptide is characterized by the average fraction of helical segments, h(i), for each residue i. h(i) is calculated within the context of the Lifson–Roig formulation,[66] which represents the state of each residue as being in either a helical, h, or coil, c, state.[67] More specifically, h(i) is defined as the average propensity of sequential triplets of h states along the peptide chain. Following previous work,[68] we define the helical region of the Ramachandran (ϕ, ψ) map as ϕ ∈ [−160°, −20°] and ψ ∈ [−120°, 50°], although the precise definition has little impact on h(i).

Dimensionality Reduction and Clustering

The conformational landscape of disordered proteins is difficult to characterize within a low-dimensional representation. Linear dimensionality reduction methods typically fail to provide meaningful representations, due to the high level of structural heterogeneity and subtle distinctions between different sub-ensembles. Nonlinear manifold learning methods overcome the limited ability of linear methods to capture nonlinear relationships in the data and can determine the low-dimensional embedding based on a wide variety of criteria. These methods have been more successful in finding low-dimensional embeddings which provide a clear picture of distinct structures in disordered landscapes.[69,70] Here we employed the Uniform Manifold Approximation and Projection (UMAP) method, a type of multidimensional-scaling algorithm that attempts to find a balance between resolving global and local properties of the conformational landscape.[71] More specifically, given a set of N input features (e.g., intramolecular coordinates), the conformation of the peptide is defined within an N-dimensional space. UMAP obtains the optimal (nonlinear) projection into an n-dimensional space (n < N) using a cost function which simultaneously incorporates pairwise distances between conformations at the largest (global) and smallest (local) scales. In other words, the projection attempts to preserve these two sets of high-dimensional pairwise distances in the low-dimensional space, which results in the preservation of certain features of the conformational landscape. As input features, we employed pairwise distances between Cα atoms and angles between triplets of Cα atoms. To reduce the dimension of the input, we applied the following coarse-graining procedure. We divided the peptide into four-residue segments and computed the minimum distance between atoms belonging to pairs of segments. Pairs of segments separated by less than 3 other segments were excluded. Thus, a total of 28 pairwise distances were included in the input features. We then applied the same segment representation to calculate the average angles between triplets of segments, again excluding any combinations where any pair of segments is separated by less than 3 other segments. This yields a total of 84 angles. We performed UMAP with an embedding dimension of 2, using the standard Euclidean distance as the metric for evaluating similarity of structures (according to their input features). UMAP requires the choice of two other hyperparameters: the number of neighbors and the minimum distance. Over the range of hyperparameters considered, the resulting embedding space appeared to be relatively robust, but displayed a noticeable change in the “clustering” of data points as a function of either of the hyperparameters. We chose parameter values which resulted in “reasonable” clustering (i.e., a balance between a single cluster and a very diffuse landscape of points): 819 neighbors and 0.01 minimum distance. Since the conclusions made from this analysis are largely qualitative, we do not believe that the hyperparameter choice plays a significant role in our analysis. The UMAP projection was determined using the conformational ensemble generated by model 4. Subsequently, this projection was applied to the ensembles from each of the other models for consistent comparisons. This projection involves a “small” statistical component which has been shown to be normally distributed. Thus, we performed the projection 10 times for each configuration while randomly shuffling the input features. The average of the resulting UMAP coordinates were taken as the “true” projection and used to generate the free-energy landscapes presented below. While nonlinear dimensionality reduction is necessary for providing a clear description of the overall conformational ensemble, linear methods are very effective if one is only interested in distinguishing between different helical states. Thus, we also applied principal component analysis (PCA) on the conformation space characterized by the ϕ/ψ dihedral angles of each residue along the peptide backbone.[72] We then performed a k-means clustering[73] along the largest three principal components in order to partition the conformation space into 50 states. We subsequently grouped these 50 microstates into 8 coarser states by applying the PCCA+ dynamical coarse-graining method.[74]

Results and Discussion

In this work, we characterize the role of distinct interactions in determining the disordered ensembles of IDPs. The focus of the study is the “fully disordered” peptide ACTR, which displays only transient helical structures. ACTR has 71 residues consisting of 26 hydrophobic residues, 18 charged residues, and a net charge of −8 (eq ). The average radius of gyration of ACTR, , determined from small-angle X-ray scattering experiments, is 26.5 Å at 5 °C and 23.9 Å at 45 °C.[6] Note that the average size of ACTR decreases when the temperature is increased from 5 to 45 °C. It has been argued that many disordered proteins undergo such a collapse with increasing temperature due to the unfavorable solvation free energy of individual residues.[75] The temperature-dependent collapse of IDPs can be captured by atomistic simulations with explicit solvent, while temperature-dependent force field parameters are required for implicit solvent CG models.[76] For this reason, the present study focuses on the ensemble of conformations sampled at a single temperature. In particular, we focus on the higher temperature ensemble of ACTR and investigate models which approximately reproduce . We employ an intermediate-resolution physics-based CG model, which represents the excluded volume of the peptide with united-atom resolution, while treating the attractive interactions which stabilize secondary and tertiary structure in a much coarser manner. The model also represents the solvent implicitly through these attractive interactions. We consider eight distinct models with different interaction sets, as summarized in Table and described in detail in the Methods. The models are separated into three groups: (i) models 1 and 2, without explicit attractive interactions, (ii) models 3a, 3b, and 4, without hydrogen-bonding-like interactions, and (iii) models 5a, 5b, and 6, with hydrogen-bonding-like interactions.

ACTR as a Sequence-Specific Self-Avoiding Random Walk

By employing only bond and excluded volume interactions, model 1 treats ACTR as a self-avoiding polymer, similar to standard BS polymer models. The main difference here is that the excluded volume interactions are highly specific (represented at a united-atom level of resolution), such that they induce some amount of sequence specificity into the model. Figure a shows the distribution of Rg values for ACTR generated by simulations of model 1 (blue curve) at a reduced temperature 0.87T*. T* is defined as the temperature at which the model reproduces . For model 1, ⟨Rg⟩ is approximately independent of temperature, as expected for a self-avoiding random walk under athermal solvent conditions.[12] For this reason, and since there is no free interaction parameter in model 1 for reproducing , we cannot directly define T* in this case. However, the value of the temperature-independent ⟨Rg⟩ for model 1 is 26.4 Å (dashed blue line in Figure a), which is nearly the same as the experimentally measured . Therefore, we can interpret this model as representing an ensemble at 0.87T* ([5 °C + 273 °C]/[45 °C + 273 °C] ≃ 0.87).

Figure 2

Table 2

Eigenvalues of the Gyration Tensor and Normalized Asphericity Values

model id	λ₃ [nm²]	λ₂ [nm²]	λ₁ [nm²]	b̃
BS	4.88	1.25	0.40	0.62
1	5.08	1.34	0.43	0.61
2	6.47	1.76	0.55	0.61

BS-LJ	3.58	1.41	0.57	0.47
3a	3.54	1.20	0.41	0.53
3b	3.80	1.19	0.42	0.55
4	3.56	1.14	0.39	0.55

5a	3.55	1.15	0.40	0.54
5b	3.51	1.12	0.39	0.55
6	3.38	1.04	0.36	0.56

Figure 3

Ratio of eigenvalues of the gyration tensor: (a) λ3/λ1; (b) λ2/λ1.

Ratio of eigenvalues of the gyration tensor: (a) λ3/λ1; (b) λ2/λ1. Figure also presents properties generated from simulations of model 2 (red curves). In contrast to model 1, model 2 introduces an effective backbone stiffness into the set of interactions which results in an overall expansion of the peptide for comparable absolute temperatures. In fact, for this particular model, ⟨Rg(expt)⟩45°C is too low to reproduce at any temperature due to the fixed nature of the effective stiffness of the backbone, as determined by the Amber99SB-ILDN force field. Nevertheless, to illustrate the overall properties of the model, Figure presents results from 0.4T*, with ⟨Rg(2)⟩0.4 = 30.9 Å (dashed red line in Figure a). In this case, T* was approximated via a linear extrapolation of ln Rg(T) (i.e., assuming Arrhenius behavior). Figure b demonstrates that model 2 has properties similar to those of model 1 (i.e., the peptide behaves approximately as a polymer in good solvent). However, the crossover to S(q) ∼ q–1 occurs at a smaller q compared with model 1, indicating that the addition of backbone stiffness results in a larger approximate lk, as expected. The contact probability maps of the two models are also quite similar (Figure S10). However, Figure c demonstrates more clearly the effect of local backbone stiffness. In particular, grows more quickly for |j – i| ≤ 40, compared with model 1, and then drops slowly to a value of about 0.92 nm at |j – i| = N. The peak at |j – i| ≈ 40 indicates that the chain is locally more rigid in model 2, while the larger distance at |j – i| = N is indicative of more extended conformations overall, as seen in Figure a. We also compared the gyration tensor for these models (Figure ). The ratios of the gyration tensor eigenvalues for model 2 is λ3:λ2:λ1 = 11.76:3.20:1, again demonstrating behavior similar to that of model 1. The ensemble generated by model 2 also has asphericity comparable to that of the ensemble generated by model 1 (see Table ). To obtain a more detailed picture of the conformational landscapes of these models, we performed a dimensionality reduction using the UMAP nonlinear manifold learning algorithm[71] to determine a two-dimensional embedding upon which to view the ensembles. UMAP attempts to retain both the local pairwise connectivity as well as the overall global structure of the high-dimensional input space, within a lower-dimensional (e.g., two-dimensional) projection. As input features for this procedure, we employed distances between pairs of segments along the peptide and angles between triplets of segments, as described in the Methods. For consistent comparison, the two-dimensional UMAP embedding was determined from simulations of model 4 and subsequently applied to the other conformational ensembles. Figure a,b demonstrates an approximate physical interpretation of each of the embedding dimensions. Figure bi presents a scatter plot of points sampled along the embedding, with colors corresponding to the Rg of each conformation. There is a significant correlation between UMAP-1 and Rg, although this relationship is notably nonlinear. Additionally, the distribution of conformations is significantly broader along UMAP-1 compared with Rg. The second dimension is more difficult to directly interpret. Figure bii presents a scatter plot with colors corresponding to the average angle formed between segments 1, 7, and 11, when the peptide is partitioned into segments of four residues. Figure a presents an illustration of this angle for two representative conformations. As one moves from the lower left to the upper right of the embedding space, conformations display an overall transition from extended structures to more hairpin-like structures. The UMAP landscape provides a clearer view of the heterogeneous ensemble of structures sampled by ACTR, compared with, for example, free-energy landscapes plotted as a function of Rg and Re (Figure S11). The nonlinear nature of this embedding results in structured free-energy landscapes, which are often not possible for disordered ensembles using linear techniques.[69,70]Figure c presents the free-energy landscapes along the embedding for model 1 and for the BS model. Both models appear to sample very similar conformational ensembles (of primarily extended, larger Rg, structures), consistent with the analysis of the shape parameters above.

Figure 4

(a) Illustrations of the angle θ, formed between segments 1, 7, and 11, when the peptide is partitioned into segments of four consecutive residues along the backbone. (b) Heat maps of (i) Rg and (ii) θ along the coordinates determined from the UMAP manifold learning algorithm. (c) Free-energy landscapes generated by (i) model 1 and (ii) the BS model along the UMAP coordinates.

Effect of Hydrophobic Attraction between Side-Chains

Models 3a, 3b, and 4 go beyond the simple self-avoiding walk picture by incorporating attractive interactions between Cβ atoms to represent the solvent-induced hydrophobic attraction between amino acid side chains. While models 3b and 4 take into account the relative hydrophobicity of each residue and scale this hydrophobic attraction accordingly, model 3a employs a uniform hydrophobic attraction which reproduces the average hydrophobicity of the peptide chain. In addition to hydrophobic attractions, model 4 incorporates explicit electrostatic interactions between charged residues via the Debye–Hückel formalism. Figure presents a comparison of the properties generated by these models at T*. Figure a presents the distribution of Rg values for models 3a, 3b, and 4 as the blue, red, and orange curves, respectively. The distributions are nearly identical, although model 4 has a slight tendency toward more collapsed structures. This demonstrates an insensitivity in the overall dimensions of the peptide to changes in specific interactions between residues (given the constraints enforced by the excluded volume interactions). Similar to models 1 and 2, the formation of helices is negligible for these models (Figure S6). However, these models no longer demonstrate properties of a polymer in good solvent (Figure b,c). In particular, S(q) displays ν = 1/2 dependence, representing a polymer in Θ solvent. In other words, the attractive hydrophobic interactions approximately counteract the effect of excluded volume and chain stiffness, resulting in random walk behavior.

Figure 5

(a) Distribution of the radius of gyration, Rg, (b) single-chain backbone structure factor, S(q), (c) root-mean-square normalized distance between pairs of residues separated by |j – i| residues along the chain, , and (d) the probability of pairs of Cα atoms to be within a cutoff of 1.0 nm. In panel (a) the dashed black line indicates the experimental result of ⟨Rg⟩ at 45 °C. In panels (a)–(c), blue, red, orange, and magenta curves correspond to results from model 3a, model 3b, model 4, and the BS-LJ model, respectively. In panel (d), the top and bottom triangles correspond to results from model 3a and the BS-LJ model, respectively. Figure c also demonstrates notable differences of these conformational ensembles, relative to the self-avoiding random walks. In particular, displays a maximum at |j – i| ≈ 15, which reflects the local rigidity of the chain due to the backbone stiffness (as seen for model 2). As |j – i| increases beyond 15, decreases until a minimum is reached at |j – i| ≈ 55, due to the attractive hydrophobic interactions between side chains which promote more collapsed structures. Finally, the slight increase of for larger |j – i| values demonstrates persistent conformational heterogeneity (i.e., the ensemble is not completely collapsed). Models 3a and 3b demonstrate very similar behavior, although a slight expansion of distances is observed in model 3b over the entire range of |j – i| separations. The inclusion of electrostatics in model 4 results in noticeable compaction of the ensemble for larger |j – i| separations. This result may seem surprising, since ACTR has a −8 net charge. However, recall that we have calibrated the energy scale of each model by adjusting the absolute simulation temperature to match ⟨Rg⟩ with the experimental value. In this case, the direct effect of adding electrostatics to the model does indeed result in a shift in the Rg distribution to larger values if the absolute simulation temperature remains fixed, as expected from the net charge on the chain. By considering the models at T* we demonstrate that, given ensembles with fixed ⟨Rg⟩, the ensemble generated by the model with electrostatics samples somewhat more compact structures. We again compare these ensembles with a standard polymer model (BS-LJ) but incorporate attractive interactions between monomers, as described in the Methods. The obtained distribution of Rg values as a function of temperature can be seen in Figure S9. We again aligned the length scale of the models by applying a rescaling factor (0.73 nm) to the BS-LJ model such that ⟨Rg(BS-LJ)⟩ = ⟨Rg(expt)⟩45°C. The distribution of Rg generated by the BS-LJ model is presented in Figure a (magenta curve), showing a narrower distribution and fewer very compact structures compared with model 3a. This may be partially due to the fact that we have not reoptimized the temperature for the BS-LJ model (T* = 2.0 ε/kB) to fit the width of the distribution of Rg values. Significant differences are also observed in S(q) (Figure b), which demonstrates ν = 1/4 behavior, indicating that the chain behaves more like a polymer under poor solvent conditions in the BS-LJ model (i.e., samples overall more compact conformations). This result is consistent with previous work with this model, which identified the Theta temperature as approximately T* = 3.0 ε/kB.[78] The S(q) behavior appears to be in conflict with the distribution of Rg (Figure a), which is narrower than the distribution generated by model 3a, without sampling the compact tail of the distribution from model 3a. However, Figure c demonstrates that although a maximum in occurs at short residue separations in the BS-LJ model, due to a lack of interactions governing local stiffness of the chain, larger values are also attained in this region. These larger average distances between residues at short separation along the chain likely prevent the sampling of structures with the smallest Rg values. At the same time, the lack of chain rigidity along with the presence of attractive interactions between monomers together promote an increased sampling of compact structures, leading to apparently compact behavior at intermediate length scales. Additional distinctions between the two ensembles can be seen by examining the ratios of the gyration tensor eigenvalues, which are λ3:λ2:λ1 = 6.28:2.47:1 for the BS-LJ model and λ3:λ2:λ1 = 8.63:2.93:1 for model 3a (Figure ). Moreover, the ensemble generated by the BS-LJ model (b̃(BS-LJ) = 0.47) is slightly more spherical than the ensemble generated by model (b̃(3a) = 0.53). Figure d presents the contact probability maps generated by model 3a (upper left) and the BS-LJ model (lower right). While both models display increased probability of long-separation (along the chain) contacts, relative to the models without attractive interactions, the comparison highlights the simplicity of the BS-LJ ensemble relative to the ensemble generated by model 3a. In contrast to the slightly more expanded ensembles generated by models 1 and 2, sequence-specific excluded volume interactions (along with the details of local protein chain stiffness) appear to play a more significant role in determining the finer details of these more collapsed conformational ensembles. However, the contact probability maps of models 3a, 3b, and 4 display relatively smaller deviations from one another (Figure S10). Overall, the inclusion of attractive interactions results in a structured contact probability map, but remains largely independent of the precise distribution of hydrophobic attractions. Column (i) of Figure presents the free-energy landscapes for models 3a, 3b, and 4, plotted along the UMAP embedding introduced above. The most striking difference between these landscapes compared to those generated by the self-avoiding walk models is the expanded diversity of structures sampled despite rather similar distributions of Rg. The addition of attractive interactions results in sampling both more collapsed and more expanded structures compared with model 1. There are also more subtle differences between the conformational ensembles generated by models 3a, 3b, and 4. The redistribution of hydrophobicity in model 3b, compared with model 3a, leads to only a slight shift in the conformational ensemble, as indicated by the analysis above. The most prominent difference is perhaps the increased sampling of the smallest UMAP-1 (largest Rg) values, although this difference manifests itself as only a minor change in the overall distribution of Rg. There is also an increase of structures corresponding to the largest values of UMAP-1. Overall, the differences between the ensembles generated by models 3a and 3b appear to be distributed throughout the entire embedding space, resulting in “averaging out” and little difference in the overarching features of the disordered ensembles. However, the introduction of electrostatics (model 4) leads to more significant differences in the ensemble of structures and, in particular, a more rugged free-energy landscape (i.e., a larger number of clearly separated local minima), as seen in Figure ci. ACTR has 18 charged residues: 5 are positive charges and 13 are negatively charged (see eq ). Overall, the electrostatic interactions lead to increased sampling of compact structures (positive values of UMAP-1) and a slight increase in structures with the smallest UMAP-1 (largest Rg) values. The conformations along UMAP-2 (i.e., with different θ values) appear to more uniformly affected by the addition of electrostatic interactions. It should be noted that the calibration of the energy scales through the use of reduced temperatures, as discussed above, results in a distinct balance of stiffness versus attractive interactions in the different models. In the case of model 4 (and for model 6 below), a lower absolute simulation temperature is required for this model to reproduce the appropriate ⟨Rg⟩ value, resulting in larger stiffness energies relative to kBT*. This difference in absolute simulation temperatures might be interpreted as the reason for the larger difference in the free-energy landscape for model 4, compared with models 3a and 3b. Alternatively, one can say that given the fixed model details (e.g., chain stiffness, hydrophobicity, etc.), the ensemble which incorporates electrostatics and reproduces does so through an increase in the ensemble ruggedness.

Figure 6

Free-energy landscapes generated by models 3a, 3b, and 4 [column (i)] and models 5a, 5b, and 6 [column (ii)] along the coordinates determined from the UMAP manifold learning algorithm.

Transient Helices

In addition to hydrophobic attractions between side chains, models 5a, 5b, and 6 employ attractive interactions between Cα atoms separated by three peptide bonds along the protein backbone in order to represent hydrogen-bonding interactions. The parameter for this interaction was chosen to approximately reproduce the overall propensity for helices in ACTR, as measured in experiments (described further in the Methods). The current models do not include hydrogen-bonding-like interactions between residues farther apart along the peptide chain, since propensity toward β-sheet-like secondary structures has not been observed in ACTR. Similar to the previous set of models, model 5a employs uniform hydrophobic interactions, while models 5b and 6 use residue-specific hydrophobicity parameters. Additionally, model 6 incorporates electrostatic interactions between charged residues. Figure a presents the distribution of Rg values at T* for models 5a, 5b, and 6 as the red, blue, and orange curves, respectively. We find that the distributions are rather insensitive to the addition of hydrogen-bonding interactions. These models generate SAXS profiles and corresponding Kratky plots in good agreement with experimental measurements (see Figure S12 compared with Figure 2b of ref (7)).

Figure 7

Figure 8

Figure 9

Intracluster fraction of helical segments, h(i), for (a) N-terminus and (b) C-terminus folding pathways, as characterized on the PCA landscape (Figure ) using model 6.

(a) Conformational clusters of ACTR presented along the two dominant principal components (PCs). (b)–(d) Free-energy surfaces of ACTR generated using models 5a, 5b, and 6, plotted along the two dominant PCs. Intracluster fraction of helical segments, h(i), for (a) N-terminus and (b) C-terminus folding pathways, as characterized on the PCA landscape (Figure ) using model 6.

Clarifying the Role of Excluded Volume in the Formation of Helical Structures

We have considered the impact that both generic and specific attractive interactions have on the resulting conformational ensembles of peptides with the length and approximate excluded volume of ACTR. To explicitly demonstrate the role that the steric interactions play, we consider models for the uncharged polypeptides (Alanine)71 and (Glycine)71, denoted as polyA and polyG, respectively, which have the same parameters εhp and εhb as model 5a but lack the sequence-specific side-chain sterics of ACTR. Because the resulting ensembles are dramatically different from one another, it is not feasible to match by adjusting the temperature, as performed for the other models in this study. Instead, we employ the same absolute simulation temperature as for model 5a, allowing ⟨Rg⟩ to deviate from the experimental value. Figure presents the average fraction of helical segments, h(i), for ACTR, polyA, and polyG. In going from the ACTR model to the polyA model, all side-chain atoms except the Cβ atoms are removed. The removal of excluded volume interactions reduces the entropy loss upon helix formation, significantly promoting the sampling of helical conformations. By further removing the Cβ atoms, from polyA to polyG, the attractive interactions which stabilize compact structures (including helices) are removed, resulting in a complete absence of helical conformations. Despite the uniformity of the sequence, the helicity of polyA demonstrates sequence-dependent behavior. The two regions of smaller helicity at [25:30] and [47:52] (black lines in Figure ) arise due to the likelihood of the chain bending at positions corresponding to 1/3 and 2/3 of the total chain length, in order to maximize hydrophobic contact in compact structures (see Figure S13). Thus, even if one were to reparametrize the model to reproduce the appropriate ⟨Rg⟩ value and overall h(i) magnitude, the formation of helices in the models that do not accurately represent the side-chain sterics would be qualitatively incorrect. Furthermore, in contrast to the small differences in the overall “shape” of the protein for the various models with differing energetics considered above, there is a dramatic change in the conformational ensemble associated with the amendment of excluded volume interactions (Figure b). This motivates the use of models that accurately represent the protein sterics for the investigation of disordered conformational ensembles.

Figure 10

(a) Average fraction of helical segments, h(i), and (b) distribution of the radius of gyration, Rg, for ACTR (blue), polyA (red) and polyG (orange), determined from simulations of model 5a. The two regions marked by the black lines in panel (a) include the residues ranges [25:30] and [47:52].

Conformational Ensemble of NCBD

To investigate the applicability of the considered models for investigating distinct disordered ensembles, we consider NCBD, the binding partner of ACTR. NCBD has 59 residues, with 27 hydrophobic residues and 8 charged residues (see eq ). Its unbound conformer (PDB ID: 2KKJ) is a molten globule that has three helices at residue positions [6:19], [23:36], and [36:47] (see Figure for helix positions in the bound state).[35,50] Thus, the unbound NCBD protein generates a very distinct conformational ensemble compared with that of ACTR. In fact, NCBD and ACTR are representative examples of two different classes of IDPs[77] (see Figure S2b). The ⟨Rg⟩ for NCBD was measured from SAXS experiments to be approximately 18.8 Å under nativelike conditions.[6] We consider here only models 5b and 6, to investigate whether electrostatics play a significant role in shaping the unbound conformational ensemble of NCBD. Because initial simulations of these models resulted in a lack of helix stabilization, we increased the energy of the hydrogen-bonding-like interaction, εhb, from 13 to 16.9 (30% larger than that of ACTR), which lead to good agreement of both ⟨Rg⟩ and h(i) with respect to the experimental values. We have again calibrated the energy scale of the model by finding the simulation temperature at which the experimental values of ⟨Rg⟩ and h(i) are reproduced, independent from ACTR, although the resulting T* is only 10% larger (in absolute temperature units, i.e., K) than the value for ACTR. The adjustment of parameters to reproduce the properties of NCBD was expected, since these quantities are free-energy functions which rigorously depend on the system identity and thermodynamic state point.[80] In fact, the relative insensitivity of the model parameters indicates a certain level of transferability of the model, further motivating the use of simple energetic functions for representing disordered ensembles. Figure a presents the distribution of Rg for models 5b (red curve) and 6 (orange curve), which are nearly identical (⟨Rg⟩ = 18.7 and 18.3 Å, respectively). Similarly, S(q) (Figure b) demonstrates ν = 1/2 behavior for both models, indicating that electrostatics play a relatively small role in the overall shape of NCBD. However, Figure c demonstrates that is significantly different for models 5b and 6 for |j – i| > 15, qualitatively similar to the comparison of models 5b and 6 for ACTR. Without electrostatics (model 5b), NCBD demonstrates less compaction in the intermediate regime due to the onset of attractive interactions, compared with ACTR. The inclusion of electrostatics (model 6) leads to a greater degree of compaction for NCBD in this regime and a significant difference in generated by the two models. The eventual increase of demonstrates that NCBD retains significant conformational heterogeneity within its molten globule ensemble, despite the presence of largely formed helices. The difference in the behavior of for the two models in the case of NCBD is striking, considering the similarity of the ensemble in terms of ⟨Rg⟩, S(q), and h(i). This may be a result of the slightly lower propensity for middle helices in model 6 (Figure d), which can allow for the sampling of more compact structures through stacking of the outer helices. However, the gyration tensor provides further evidence of the similarity of the ensembles generated by models 5b and 6. The ratios of the gyration tensor eigenvalues are 9.75:3.35:1 and 9.78:3.04:1 for models 5b and 6, respectively, while the normalized asphericity values are 0.53 and 0.56. Overall, it appears that the conformational ensembles of IDPs with large fractions of secondary structure motifs may be more robust to perturbations in the interactions, assuming a fixed representation of sterics.

Figure 11

Conclusions

We have studied the ensembles of two intrinsically disordered peptides, ACTR and NCBD, using a simple physics-based model, which accurately represents peptide sterics and allows an adjustable parametrization to match experimental quantities (e.g., ⟨Rg⟩ and h(i)). A hierarchy of models was considered, which systematically incorporated an increasing number and complexity of interactions, in order to clarify the impact of these interactions on the features of the resulting ensembles. Our analysis demonstrates that the differences between these distinct ensembles are difficult to fully characterize using only traditional shape parameters, such as the distribution of radius of gyration values and the single-chain backbone structure factor. However, the root-mean-square normalized inter-residue distances between Cα atoms, the ratios of gyration tensor eigenvalues, and the contact probability map assist in further distinguishing the overarching features of the ensembles. Additionally, we have employed a manifold learning algorithm in this work, to determine an optimal two-dimensional representation for viewing the ensemble of conformations, which provides an effective way to further clarify the differences between distinct disordered ensembles. Our investigation found that, with respect to a self-avoiding random walk, disordered ensembles that incorporate hydrophobic interactions lead to a significant increase in conformational heterogeneity. However, given the presence of attractive interactions, the precise identity of these interactions (e.g., the distribution of hydrophobic interactions along the chain or the presence of electrostatics) appear to play a relatively small role in determining the major features of the disordered free-energy landscapes. At the same time, specific interactions can stabilize particular structures which may be relevant for processes under a perturbation of the system (e.g., when a disordered peptide comes into contact with its binding partner). For example, electrostatic interactions increase the ruggedness of the free-energy landscape and stabilize multiple routes to secondary structure formation. These effects appear to be more significant for more disordered, flexible IDPs (e.g., ACTR) than for molten globules (e.g., NCBD). While electrostatics are thought to play an important role in the formation of encounter complexes in IDPs,[38,39] the present work suggests that specific contacts between charged residues can promote the presence of transient helices within the ensemble of conformations sampled in solution, which may be relevant for coupled folding and binding processes. The flexible physics-based model employed in this work facilitated the reproduction of experimental ⟨Rg⟩ and h(i) values for both ACTR and NCBD. These two peptides are representative examples of two different classes of IDPs: “fully disordered” (ACTR) and molten globule (NCBD). Although the (free-energy) parameters of this simple model should be, in principle, highly sequence-specific, we find that only relatively small adjustments were necessary to reproduce the experimental measurements for both systems. This indicates a certain level of transferability in terms of the essential features shaping the free-energy landscape for these disordered systems, motivating the continued use of CG models. Moreover, in conjunction with previous investigations of helix–coil transitions,[33,34] our results indicate that excluded volume interactions play a key role in determining the overarching characteristics of heterogeneous landscapes. This further motivates the development of models that can accurately model protein sterics while efficiently sampling conformational space.

67 in total

Review 1. Target gene context influences the transcriptional requirement for the KAT3 family of CBP and p300 histone acetyltransferases.

Authors: David C Bedford; Lawryn H Kasper; Tomofusa Fukuyama; Paul K Brindle
Journal: Epigenetics Date: 2010-01-27 Impact factor: 4.528

2. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.

Authors: Berk Hess; Carsten Kutzner; David van der Spoel; Erik Lindahl
Journal: J Chem Theory Comput Date: 2008-03 Impact factor: 6.006

3. 50 Years of Lifson-Roig Models: Application to Molecular Simulation Data.

Authors: Andreas Vitalis; Amedeo Caflisch
Journal: J Chem Theory Comput Date: 2011-12-28 Impact factor: 6.006

4. Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis.

Authors: Alexandros Altis; Moritz Otten; Phuong H Nguyen; Rainer Hegger; Gerhard Stock
Journal: J Chem Phys Date: 2008-06-28 Impact factor: 3.488

5. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues.

Authors: Rahul K Das; Rohit V Pappu
Journal: Proc Natl Acad Sci U S A Date: 2013-07-30 Impact factor: 11.205

Review 6. Perspective: Coarse-grained models for biomolecular systems.

Authors: W G Noid
Journal: J Chem Phys Date: 2013-09-07 Impact factor: 3.488

Review 7. Introducing protein intrinsic disorder.

Authors: Johnny Habchi; Peter Tompa; Sonia Longhi; Vladimir N Uversky
Journal: Chem Rev Date: 2014-04-17 Impact factor: 60.622

8. Conformational propensities of intrinsically disordered proteins influence the mechanism of binding and folding.

Authors: Munehito Arai; Kenji Sugase; H Jane Dyson; Peter E Wright
Journal: Proc Natl Acad Sci U S A Date: 2015-07-20 Impact factor: 11.205

9. How Robust Is the Mechanism of Folding-Upon-Binding for an Intrinsically Disordered Protein?

Authors: Daniela Bonetti; Francesca Troilo; Maurizio Brunori; Sonia Longhi; Stefano Gianni
Journal: Biophys J Date: 2018-04-24 Impact factor: 4.033

10. Sequence determinants of protein phase behavior from a coarse-grained model.

Authors: Gregory L Dignon; Wenwei Zheng; Young C Kim; Robert B Best; Jeetain Mittal
Journal: PLoS Comput Biol Date: 2018-01-24 Impact factor: 4.475

4 in total

Review 1. Physics-based computational and theoretical approaches to intrinsically disordered proteins.

Authors: Joan-Emma Shea; Robert B Best; Jeetain Mittal
Journal: Curr Opin Struct Biol Date: 2021-02-02 Impact factor: 6.809

2. Global Structure of the Intrinsically Disordered Protein Tau Emerges from Its Local Structure.

Authors: Lukas S Stelzl; Lisa M Pietrek; Andrea Holla; Javier Oroz; Mateusz Sikora; Jürgen Köfinger; Benjamin Schuler; Markus Zweckstetter; Gerhard Hummer
Journal: JACS Au Date: 2022-03-01

3. Tuning Protein Hydrogel Mechanics through Modulation of Nanoscale Unfolding and Entanglement in Postgelation Relaxation.

Authors: Matt D G Hughes; Sophie Cussons; Najet Mahmoudi; David J Brockwell; Lorna Dougan
Journal: ACS Nano Date: 2022-06-22 Impact factor: 18.027

4. Integrating single-molecule spectroscopy and simulations for the study of intrinsically disordered proteins.

Authors: Jhullian J Alston; Andrea Soranno; Alex S Holehouse
Journal: Methods Date: 2021-04-06 Impact factor: 3.608

4 in total