We present a new coarse-grained Cα-based protein model with a nonradial multibody pseudo-improper-dihedral potential that is transferable, time-independent, and suitable for molecular dynamics. It captures the nature of backbone and side-chain interactions between amino acid residues by adapting a simple improper dihedral term for a one-bead-per-residue model. It is parameterized for intrinsically disordered proteins and applicable to simulations of such proteins and their assemblies on millisecond time scales.
We present a new coarse-grained Cα-based protein model with a nonradial multibody pseudo-improper-dihedral potential that is transferable, time-independent, and suitable for molecular dynamics. It captures the nature of backbone and side-chain interactions between amino acid residues by adapting a simple improper dihedral term for a one-bead-per-residue model. It is parameterized for intrinsically disordered proteins and applicable to simulations of such proteins and their assemblies on millisecond time scales.
Molecular dynamics simulations provide insights into the properties
of biomolecular systems. They make use of empirical potentials[1−3] that depend on the length scale of the description. The atomic level
descriptions[4−9] are substantially distinct from the coarse-grained (CG) residue-level
descriptions in which a residue is represented by a single bead. In
between, there are CG models in which several atoms are grouped into
beads with still different sets of potentials.[10−14] The one-bead-per-residue CG protein models are especially
useful when analyzing large systems at long time scales such as those
occurring in the dynamics of virus capsids (assembly and indentation)[15−17] or protein stretching at near-experimental speeds.[18−20]The simplest versions of such a model are structure-based
or Go-like,[21−24] meaning that all parameters in the potentials are derived from the
experimentally determined native structure[25−28] and the solvent is implicit.
There is no unique way to construct a Go-like model because its most
important descriptor is the contact map, and there are various criteria
to define it. In addition, various contact potentials and the backbone
stiffness terms can be employed. Our benchmarking to the stretching
experiments[24] indicates that the Lennard-Jones
(LJ) potential between the effective beads combined with the native
contacts determined through an atomic overlap criterion[29−31] can correctly recreate protein dynamics of stretching and, in addition,
leads to proper folding. The criterion involves checking for an overlap
between spherical spaces associated with heavy atoms in a pair of
residues in the native state. Its presence introduces an attractive
potential well, and its absence results in a soft repulsion between
the beads.The structure-based approach clearly cannot be applied
to the intrinsically
disordered proteins (IDP)[32] because such
proteins dynamically adopt significantly differing conformations and
there is no dominant “native state”. Sampling their
rich energy landscape may be challenging for all-atom models, especially
in the case of simulating processes involving multiple chains, like
aggregation.[33] Short all-atom simulations
may still be used to parameterize CG models built solely for the purpose
of simulating one specific system (like in the multiscale approaches[34−36]); however, our goal is to construct a CG potential that is transferable
to many systems.We have argued[37] that the contact-based
CG description for IDPs is still possible provided that the contacts
are determined dynamically from the instantaneous shape of the backbone,
as described by the locations of the Cα atoms, and
are thus allowed to form and disappear in an adiabatic fashion. The
contacts can effectively arise either from the side-chain–side-chain,
side-chain–backbone, or backbone–backbone interactions.
This model also includes electrostatic interactions as described by
a Debye–Hückel (D-H) potential,[38] and it leads to a reasonable agreement with experimental and all-atom
theoretical results pertaining to the average geometry of the conformations
for a set of systems. It is also appealing computationally because
it effectively involves only two-body interactions. We have already
used this model to determine the phase diagram for aggregation of
polyglutamines,[39] which involved simulating
1800 residues for over 1 ms.However, switching the contacts
dynamically on and off violates
the detailed balance and can lead to nonequilibrium stationary states,
which have been studied, for example, in the context of active biomembranes.[40−42] One may hope that sufficiently slow, adiabatic switches may mask
such glitches, but it is nevertheless desirable to construct a model
without any time-dependent potentials.In our approach, we distinguish
side-chain and backbone interactions
using only the positions of the Cα atoms. Determining
the type of interaction between two residues in the previous model
required knowledge of the positions of six residues, but after the
contact was quasi-adiabatically turned on, the forces were acting
only between a pair of residues. Without this switching, we could
either return to Monte Carlo sampling (where the idea was originally
implemented[43,44]) or introduce a multibody term
in the potential (such terms are crucial in reproducing the fine structure
of residues that is lost in coarse-graining[45,46]). An example of such a term is the one used for dipole–dipole
interactions that can describe hydrogen bonding in the protein backbone.[47]Here, we propose a new and empirically
motivated molecular dynamics
model in which the short-range interactions are represented by time-independent
four-body potentials. The point of departure is an observation that
the backbone stiffness energy involves computation of a four-body
dihedral potential[48] that restricts the
angle between two planes, each set by three residues. This potential
can be used in an “improper” way to take into account
the rigidity of the side chain in a two-bead-per-residue model. When
two residues interact, this picture can be further simplified by removing
the side-chain bead (in our one-bead-per-residue model, the beads
are centered on the Cα atoms). We can still use the
improper dihedral term by replacing the side-chain bead by the bead
of the second residue that participates in the interaction and vice
versa: the side-chain bead of the second residue in the interacting
pair can be replaced by the bead representing the first residue. The
planes defined by this procedure are shown in Figure . Thus, despite using the four-body terms,
we still retain the pairwise nature of interactions. In our model,
each contact between residues is described by two pseudo-improper-dihedral
(PID) angles associated with each residue.
Figure 1
Idea of the PID angles.
The interaction between residues i and j involves angles Ψ (defined by i – 1, i, i + 1,
and j beads)
and Ψ (defined by j – 1, j, j + 1, and i beads).
Idea of the PID angles.
The interaction between residues i and j involves angles Ψ (defined by i – 1, i, i + 1,
and j beads)
and Ψ (defined by j – 1, j, j + 1, and i beads).We made a survey of the
structures from the Protein Data Bank,
which showed specific patterns made by the PID angles. Those patterns
are clearly different for the interactions made by the backbone and
the side-chain groups, which proves that we can distinguish these
two cases in our approach.We show that our new model can successfully
recreate the experimentally
determined radii of gyration (Rg) for
a set of 23 IDPs. Because replacing the quasi-adiabatic contacts with
four-body PID potentials significantly changed the dynamics, we had
to reparameterize all aspects of the model. In order to quantify which
variant of the model after reparameterization agrees best with the
experiment, we computed Pearson coefficients that show how close the
simulation and experimental results are. The best variants surpass
our previous model.We also compared energy distributions of
the models, which proved
that adiabatic switching caused discrepancies from the Boltzmann distribution.
Those discrepancies were not present in the new model with the PID
potential. However, the new model turned out to require significantly
more computational resources.
Methods
Results
of the PDB Survey
Virtually
all proteins are made from the same set of 20 amino acids, and the
geometry of interactions between the residues should be similar for
the case of IDPs and structured proteins even if the relative occurrence
of those interactions may be different (since, e.g., the IDPs contain
much fewer hydrophobic residues). Therefore, we made a survey of 21,090
structured proteins from the CATH database[49] (the set of proteins with the sequence similarity not exceeding
40%: cath-dataset-nonredundant-S40.pdb) to determine which PID angles
are favorable in the inter-residue interactions. We computed the values
of PID angles for each pair of contacting residues from the database,
where a contact between residues is defined through the overlap criterion.
These heavy atoms may be a part of a backbone or a side chain. If
a backbone atom from one residue overlaps with a side-chain atom from
another residue, we call it a backbone–side-chain contact (bs).
We analogously define side-chain–side-chain (ss) and backbone–backbone
(bb) contacts. There may be many overlaps, so one residue pair can
form more than one type of contact simultaneously.In order
to associate a characteristic set of PID angles and distances with
a unique type of contact, we derive subdistributions corresponding
to situations in which the overlaps arise only in one class of atoms
(e.g., only ss). Table shows how many overlap contacts of each type are in the database.
Table 1
Number of Overlap Contacts for All
Proteins in the Used Database (7,974,804 in Total)
type of contact
bb
bs
ss
number of residue pairs with only
this type of contact
624,699
953,782
1,870,746
number of residue pairs
that include this type of contact
3,742,271
3,872,292
4,297,919
Figure shows PID
angle distributions for the ss, bs, and bb cases. The distinction
between contact types is based on the overlap criterion described
above (green histograms) or on a subset of these contacts that fulfill
directional criteria introduced in the previous model[37] (red histograms). Figure shows distributions of contacts that are only one
type (e.g., only bb overlaps). Figure S2 in the Supporting Information is the same as Figure , but contacts can be of more than one type
(differences are minor and refer only to the bs case). The Boltzmann
inversion potential VB = – kBT ln (p(ΨPID)) made from these distributions was fitted by an analytical
function described in the next section. We observe that side-chain
and backbone contacts are associated with different sets of the PID
angles. The two minima for the bb case correspond to the parallel
or antiparallel β sheet or to the right- or left-handed α
helix (for the i, i +3 contacts,
only one minimum is present, which reflects right-handedness, see Figure S3 in the Supporting Information).
Figure 2
Distributions
of the PID angles in the contacts from the PDB survey
that are only of one type (green histograms). Local i, i + 3 and i, i + 4 contacts are excluded. Each contact has two angles. Distribution
of the first (ΨPID) is on the top panels, and that of the
second (ΨPID) is on the bottom panels. Subdistributions
made from contacts that obey the directional criteria defined in ref (37) are shown as red histograms.
The potential resulting from the Boltzmann inversion procedure (blue
dots; unit of energy, ε ≈ 1.5 kcal/mol) was fitted to
an analytical function (purple line).
Distributions
of the PID angles in the contacts from the PDB survey
that are only of one type (green histograms). Local i, i + 3 and i, i + 4 contacts are excluded. Each contact has two angles. Distribution
of the first (ΨPID) is on the top panels, and that of the
second (ΨPID) is on the bottom panels. Subdistributions
made from contacts that obey the directional criteria defined in ref (37) are shown as red histograms.
The potential resulting from the Boltzmann inversion procedure (blue
dots; unit of energy, ε ≈ 1.5 kcal/mol) was fitted to
an analytical function (purple line).Figure shows two-dimensional
distributions, where the PID angle ΨPID is on one
axis, and the Cα–Cα distance r is on the other axis. Different side chains result in
different distance distributions, as shown for GLN or ALA residues,
but the PID angle distributions stay mostly the same (with the exception
of special cases, PRO and GLY). We find that the backbone contacts
correspond to smaller and better defined distances than the side-chain
contacts, which result in a diffuse cloud for ΨPID ≈ 0 rad and r > 6 Å (side chain)
and
sharp peaks for ΨPID ≈ ±1 rad and r < 6 Å (backbone). Two-dimensional distributions
of the PID angle and distance for contacts made by all 20 types of
residues are available in Figure S4 in
the Supporting Information, and one-dimensional distance distributions
used to determine the equilibrium distances for ss interactions (rminss) are available in our previous article;[37] however, the values of rminss are reprinted in Table S1 in the Supporting Information.
Figure 3
Two-dimensional distributions
of contacts, where the PID angle
ΨPID is on one axis, and the Cα–Cα distance r is on the other axis. The i, i + 3 and i, i + 4 contacts are excluded. The distributions include each
contact obtained in the PDB survey where at least one residue in the
pair was of the given amino acid type (PRO, GLY, ALA, and GLN). The
PID angle ΨPID for each contact is the one associated
with this residue (if both residues were the same amino acid, then
it corresponds to two counts with two different PID angles).
Two-dimensional distributions
of contacts, where the PID angle
ΨPID is on one axis, and the Cα–Cα distance r is on the other axis. The i, i + 3 and i, i + 4 contacts are excluded. The distributions include each
contact obtained in the PDB survey where at least one residue in the
pair was of the given amino acid type (PRO, GLY, ALA, and GLN). The
PID angle ΨPID for each contact is the one associated
with this residue (if both residues were the same amino acid, then
it corresponds to two counts with two different PID angles).
Implementation of the PID
Potential
In our model, the interaction between two residues
depends on their
distance and the two PID angles they make. Therefore, we chose our
PID potential to be a product of three terms: V(ψA, ψB, r) = λA(ψA)λB(ψB)ϕ(r), where ψA is the first PID angle in
a pair, ψB is the second angle in the pair, and r is the Cα–Cα distance.
As the first approximation, we decided to use the cosine function
for λ and the LJ potential for , where rmin is the minimum, and εLJ is the depth (discussed
later). Due to the broad character of the bs distribution (green histograms
in Figure ), we take
only the bb and ss contacts into account (see section 2.3 in the Supporting Information). This feature is
distinct from our previous model in which the bs interactions were
included.[37] For the bb and ss interactions,
we have clearly defined peaks that can be fitted to the potential
function (separately for PID angles and distances). Each peak has
a different width and center, so the detailed form of the λ
function isEach pair of the ss contacts has rminss corresponding to the minimum
identified in our previous work.[37] Because
for r < rmin the ϕ(r) potential
becomes strongly repulsive, α parameters for ss and bb contacts
must be chosen so that if λbb ≠ 0, then λss = 0 and vice versa.Because the bb PID angle distribution
has two peaks, the bb potential
has two terms corresponding to both of them (ψ0bb+ and ψ0bb–). ThereforeThe repulsive part of the LJ potential should always be present
for small distances to prevent the residues from passing through one
another (excluded volume effect). This is why the bb terms have a
more complicated form:This formula ensures the presence of the excluded volume because,
for r < rminbb, the LJ potential ϕbb is no longer multiplied by the λbb factors. For the ss contacts, Vss(ψA, ψB, r) =
λss(ψA)λss(ψB)ϕss(r) and rminss depends
on the types of amino acids in the pair in contact (see Section 2.4
about the details of the LJ potential). The total PID potential is V = Vss + Vbb.Fitting the function to the Boltzmann inversion
potential based
on the contact distributions of only one type (bb, bs, or ss) that
fulfill directional criteria defined in ref (37) (red histograms in Figure ) resulted in the
following parameters: αbb+ = 6.4, αbb– = 6.0, αss = 1.2, ψ0ss = – 0.23 rad, ψ0bb+ = 1.05 rad,
and ψ0bb– = – 1.44 rad. Fitting to the distributions that do not have
to fulfill directional criteria (green histograms in Figure ) resulted in a model that
agreed poorly with the experiment (see section 3.2 of the Supporting Information). In order to improve numerical
efficiency, the cosine function was replaced by its algebraic approximation,
defined in section 1 of the Supporting
Information.Values of rmin are
given in ref (37) and Table S1 in the Supporting Information. In Section
3, we denote
the pseudo-improper-dihedral potential by letter P and the old, quasi-adiabatic
model by letter A.
Backbone Stiffness and
the Thermostat
Our model has an implicit solvent, represented
by the Langevin thermostat
with the damping term, so the equation of motion for the ith residue is , where m is the average
amino acid mass, r⃗ is the position
of the residue, F⃗ is the force
resulting from the potential, γ = 2m/τ
is the damping coefficient, and Γ⃗ is the thermal white noise with the variance σ2 = 2γkBT. The time unit τ ≈ 1 ns was verified for a different
model with the same equations of motion,[50] and even if for this new model τ is expected to be slightly
different, it should still correspond to an overdamped case with diffusional
(not ballistic) dynamics.The residues in our model are connected
harmonically with the spring constant k = 100 Å–2·ε and equilibrium distance 3.8 Å
(ε ≈ 1.5 kcal/mol is the energy unit[51]). The backbone stiffness potential in our model consists
of a bond angle and (proper) dihedral terms. Its depth and form were
obtained from a Boltzmann inversion potential based on a random coil
library.[52] Its exact analytical form is
described in ref (37). It is defined in kcal/mol, so it can be used to verify what is
the effective room temperature. The results for polyproline, which
cannot form side-chain–side-chain contacts (see the next section)
and has high backbone stiffness, indicate[37] that simulations for the room temperature 0.38 ε/kB give the best agreement with experimental values for
the polyproline end-to-end distance, so we set the temperature to
0.38 in the reduced units.Because the potential for backbone
stiffness, as in the previous
model, is based on a random coil library, it does not favor any secondary
structure. In order to make structures like α helices or β
turns possible, we allow attractive contacts between the ith and i + 3rd residues in the chains as these contacts
correspond to hydrogen bonds between backbone atoms in an all-atom
representation.[53] However, the nature of i, i + 4 contacts is different (see section 2.1 in the Supporting Information), so
in the results, we tried models with i, i + 4 contacts (+ in superscript) and without them (− in superscript).
Depth and Form of the Lennard-Jones Potential
The energy unit ε = 110 pN·Å ≈ 1.5 kcal/mol
is taken from the earlier models,[24,37] and although
it corresponds to the strength of one contact in those models, it
is not necessarily the case for the PID model. We keep εbb = ε because it is roughly equal to the energy of one
hydrogen bond made by the protein backbone,[54] but the varied nature of ss contacts required parameterization.
We checked values between 0 and 1 ε for the uniform εss potential (we denote this case as ME) and tried two matrices
where εss depends on the pair of residues: the classic
Miyazawa–Jernigan matrix based on PDB statistics from 1996[55] (denoted MJ) and the MDCG matrix derived from
all-atom simulations[56] (denoted MD). We
also tried scaling them by a factor from 0 to 1 (in the results, the
factor is denoted in subscript). In all three cases (ME, MJ, and MD),
εss = 0 for PRO and GLY residues.The distance
distributions for some pairs of residues are very broad.[37] This is caused by the ability of longer side
chains to deform and adapt different conformations. This ability is
lost in any CG model in which each residue is represented by a spherical
bead. To correct this, one may imagine a modified form of the LJ potential
for ss contacts:We tested both this modified
form (denoted by letter F, for flat)
and the traditional form (denoted by letter L) of the LJ potential.
Electrostatics
Our previous model
used a Debye–Hückel-screened electrostatic potential
with electric permittivity ϵ = 4 Å/r depending on the Cα– Cα distance r following the approach of Tozzini et al.[38] (thus we denote this version of electrostatic interactions by letter
T). However, this approach was designed for structured proteins, where
the permittivity inside the hydrophobic core is significantly lower
than in water. IDPs lack this core and are more solvent-exposed, so
we tried a simpler term with ϵ = 80 (we denote this term by
letter C). In both cases, the form of the electrostatic potential
is , where s is the screening
length that depends on the ionic strength. In our model, histidine
is considered to be uncharged.
Results
and Discussion
We tested our model on a benchmark of 23 IDPs
whose radius of gyration
was measured in SAXS (small-angle X-ray scattering) experiments[57−67] (their sequences, screening lengths used, and a Pappu diagram[68] are in section 3.1 of the Supporting Information). We considered over 200 variants
of the model: with a pseudo-improper-dihedral potential (denoted by
letter P) or the previous version with a quasi-adiabatic potential
(letter A), with or without i, i + 4 attractive contacts (+ or – in superscript), with the
standard (letter L) or flat (letter F) form of the LJ potential, different
depths of this potential (we considered three residue–residue
matrices: uniform, ME; Miyazawa–Jernigan, MJ; and all-atom
based, MD; all scaled by a factor denoted in subscript), and with
standard (letter C) or distance-dependent (letter T) electric permittivity.
Thus, the name of each version is made from four symbols (the full
legend is available in Table S4 in the
Supporting Information), e.g., A+ L ME1 T is
the model with quasi-adiabatic switching, attractive i, i + 4 contacts, LJ potential with uniform ε,
and electrostatics used by Tozzini et al.[38] P– F MD0.1 C means a model with the
PID potential, no i, i + 4 contacts,
LJ potential with a flat region and depth depending on the identity
of residues according to the MDCG matrix rescaled by 0.1, and classic
D-H electrostatics.For each model, we computed its Pearson
coefficient[24] defined as , where Rgexp is the radius of gyration from
the experiment, Rgsim is from the simulation, and the sum is over
each of N = 23 proteins. The full list of models
with their Pearson coefficients P and χ2 values is in section 2 of the
Supporting Information. Here, we show just the five most significant
ones, starting from the original model[37] shown in panel (a) of Figure .
Figure 4
Comparison between the simulated and measured radius of gyration
for 23 IDPs. Panels (a)–(f) show simulation results obtained
within six different models. The model names are specified in the
panels and defined in the text. The value of the Pearson coefficient
is also indicated in each panel. Colors indicate the number of residues
constituting a given IDP.
Comparison between the simulated and measured radius of gyration
for 23 IDPs. Panels (a)–(f) show simulation results obtained
within six different models. The model names are specified in the
panels and defined in the text. The value of the Pearson coefficient
is also indicated in each panel. Colors indicate the number of residues
constituting a given IDP.Applying a pseudo-improper-dihedral potential to this model results
in worse agreement with the experiment (panel (b)), so we reparameterized
the model by changing five features described above. The end result
(panel (c)) has much better agreement with the experiment. We checked
that this is indeed the result of using the new form of the potential
as applying the reparameterized features to the previous model (panels
(d) and (e)) improves it only slightly. Panel (f) shows the results
of the Gaussian chain model[69] calculated
with the formula , where n is the number
of residues, and b is the effective Kuhn length that
may be treated as a fit parameter. By fitting with the least-squares
method to the set of 23 IDPs, we obtained b = 6.7
Å.The Rg values for the set
of 23 IDPs
have been measured in SAXS experiments. We note that the direct result
of a biomolecular SAXS experiment is a scattering profile that contains
information not only about the value of Rg but also about the average shape and size of the IDP under study.
It is possible to compute such a scattering profile from an ensemble
of simulation structures and compare it with the experiment,[70] but this task involves fitting parameters and
requires raw experimental data, which are not available for many proteins
used in the testing set. On the other hand, the value of Rg is a single number that can be easily compared to simulation
results. For these reasons, we decided to compare our results only
with the Rg values. An example of a comparison
with a SAXS intensity profile for protein 6AAA is shown in Figure S12 in the Supporting Information.Another way to validate our model is to perform a histogram test,[71] where we can check if our simulation is consistent
with the Boltzmann distribution of energy: the state with energy E should occur with the probability p(E) = Ω(E) exp ( – E/kBT)/Q, where Ω(E) is the state density,
and Q is the normalization factor. If we perform
simulations using two different temperatures T1 and T2, then we can compute the
following quantity: log(p1(E)/p2(E)) = log (Q1/Q2) + E · (1/kBT2 – 1/kBT1). This quantity should depend on the energy linearly
with the coefficient (1/kBT2 – 1/kBT1). We can plot this dependence (treating log(Q1/Q2) as a fit parameter)
for the new version (P– F MD0.1 C) and
previous version (A+ L ME1 T) of the model.[37] Such a plot for proteins his5 (24 residues),
NRG1 (75 residues), and p53 (93 residues) is shown in Figure . The data points obtained
with the new model lie closer to the line with the coefficient (1/kBT2 – 1/kBT1).
Figure 5
Histogram test[71] for the new model (P– F MD0.1 C) and previous model (A+ L ME1 T)
based on simulations of proteins his5, NRG1,
and p53 at temperatures 0.35 and 0.4 ε/kB. Insets show the energy histograms binned every 1 ε
and used to construct the quantity log(p1(E)/p2(E)) shown on the main plots as a function of energy.
Histogram test[71] for the new model (P– F MD0.1 C) and previous model (A+ L ME1 T)
based on simulations of proteins his5, NRG1,
and p53 at temperatures 0.35 and 0.4 ε/kB. Insets show the energy histograms binned every 1 ε
and used to construct the quantity log(p1(E)/p2(E)) shown on the main plots as a function of energy.The new version of the model has significant advantages over
the
previous one: it better agrees with the experimental data and the
Boltzmann distribution. However, even though the computational speed
of both models scales almost linearly with the system size, the new
model is usually at least five times slower (see Figure ).
Figure 6
Physical time of 1 ns-simulation
run on a single core for the new
model (P– F MD0.1 C, squares) and previous
model (A+ L ME1 T, stars) as a function of the
system size. The number of protein chains in the simulation is indicated
by color. The system density of multichain simulations was set to
1 residue/nm3. Fitted lines have coefficients of 0.003
and 0.0005 s/residue, respectively, with a fit error in the order
of 0.0001 s/residue.
Physical time of 1 ns-simulation
run on a single core for the new
model (P– F MD0.1 C, squares) and previous
model (A+ L ME1 T, stars) as a function of the
system size. The number of protein chains in the simulation is indicated
by color. The system density of multichain simulations was set to
1 residue/nm3. Fitted lines have coefficients of 0.003
and 0.0005 s/residue, respectively, with a fit error in the order
of 0.0001 s/residue.The new model is also
harder to parallelize (due to multibody terms)
and does not perform well for folded proteins: we tried folding small
proteins (PDB access codes 1L2Y, 1UBQ, and 1ERY)
with it but arrived with an RMSD of 6 Å or higher even for the
smallest protein, 1L2Y (see section 4 of
the Supporting Information). This is not surprising because IDPs have
very weak inter-residue interactions: the top seven variants of our
model considered in IDP parameterization had their residue–residue
matrices multiplied by a factor smaller than 0.5 (see Figure ). It is interesting to note
that two out of our top seven variants multiply the matrix by 0, meaning
that there are no interactions with the exception of excluded volume,
backbone stiffness, and electrostatics. A simple Gaussian chain model
also works quite well for IDPs (panel (f) in Figure ). This proves that water is a good solvent
for IDPs, and their inter-residue interactions affect chain dimensions
in a minor way.[32] This fact was also used
in a recent hierarchical approach for studying IDPs, where only local
fragments are modeled in detail, and the whole chain is constructed
from those fragments.[72]
Figure 7
Top seven variants of
the PID model ranked by their Pearson coefficient.
Top seven variants of
the PID model ranked by their Pearson coefficient.The top-ranked variants of the model may be further refined
by
fine-tuning the ss interactions. In future studies, our new CG model
can be used to explore conformational dynamics of IDPs and their assemblies.
It can also be adapted to study physical properties of flexible linkers
in multidomain proteins.[73,74] In this case, each
of the structured domains can be kept stable by a Go-model potential,[24] and each of the linkers (as well as their interactions
with the domains) can be described by the PID potential.
Authors: Lukas S Stelzl; Lisa M Pietrek; Andrea Holla; Javier Oroz; Mateusz Sikora; Jürgen Köfinger; Benjamin Schuler; Markus Zweckstetter; Gerhard Hummer Journal: JACS Au Date: 2022-03-01