Intrinsically disordered proteins are ubiquitous throughout all known proteomes, playing essential roles in all aspects of cellular and extracellular biochemistry. To understand their function, it is necessary to determine their structural and dynamic behavior and to describe the physical chemistry of their interaction trajectories. Nuclear magnetic resonance is perfectly adapted to this task, providing ensemble averaged structural and dynamic parameters that report on each assigned resonance in the molecule, unveiling otherwise inaccessible insight into the reaction kinetics and thermodynamics that are essential for function. In this review, we describe recent applications of NMR-based approaches to understanding the conformational energy landscape, the nature and time scales of local and long-range dynamics and how they depend on the environment, even in the cell. Finally, we illustrate the ability of NMR to uncover the mechanistic basis of functional disordered molecular assemblies that are important for human health.
Intrinsically disordered proteins are ubiquitous throughout all known proteomes, playing essential roles in all aspects of cellular and extracellular biochemistry. To understand their function, it is necessary to determine their structural and dynamic behavior and to describe the physical chemistry of their interaction trajectories. Nuclear magnetic resonance is perfectly adapted to this task, providing ensemble averaged structural and dynamic parameters that report on each assigned resonance in the molecule, unveiling otherwise inaccessible insight into the reaction kinetics and thermodynamics that are essential for function. In this review, we describe recent applications of NMR-based approaches to understanding the conformational energy landscape, the nature and time scales of local and long-range dynamics and how they depend on the environment, even in the cell. Finally, we illustrate the ability of NMR to uncover the mechanistic basis of functional disordered molecular assemblies that are important for human health.
Unexpected discoveries regularly revolutionize our understanding
of molecular biology. The remarkable observation that intrinsically
disordered proteins are prevalent throughout all known proteomes represents
one such example, forcing a reassessment of established approaches
for investigating biological function at the molecular level.[1−5] Unlike folded proteins, the primary amino acid sequence of intrinsically
disordered proteins (IDPs) does not adopt a stable tertiary fold to
function but dynamically samples a broad free-energy surface. IDPs
thus access a vast conformational landscape that nevertheless encodes
specific biological activity.[6] This conformational
heterogeneity endows IDPs with considerable advantages over their
folded counterparts, for example, the ability to interact with multiple
partners, possibly simultaneously as in the case of hub-proteins.
Combining transient and local disorder-to-order transitions with rapid
dissociation rates allows efficient processing and provides the necessary
level of multivalent, weak intermolecular binding to transiently form
membraneless organelles[7] (another phenomenon
whose importance has revised our understanding of cell regulation
and function). In general, although the potential benefits of conformational
disorder are quite well discussed in the literature, we are still
discovering the true breadth of functional diversity encoded in IDPs.Structural dynamics are of course essential to biological function
in all proteins, and the characterization of the conformational fluctuations
that enable function is a vital aspect of our quest for a molecular
understanding of biology. Complementary to the stabilization of distinct
conformational substates and the determination of their three-dimensional
structures at given points in a functional cycle, direct physical
methods such as infrared,[8,9] terahertz,[10] neutron,[11] dielectric[12] Mössbauer,[13] and Raman[14] spectroscopies can be used
to describe the characteristic time scales of protein motions. Time-resolved
X-ray diffraction techniques[15] and X-ray
free electron lasers[16] also provide simultaneous
access to both high resolution structure and dynamics. Within the
broad panoply of physical techniques available to characterize biomolecular
dynamics, nuclear magnetic resonance (NMR) spectroscopy occupies a
unique place, providing atomic resolution information over an incredibly
broad range of motional time scales extending from tens of picoseconds
to hours or even days (Figure ).
Figure 1
NMR probes biomolecular conformational changes on a vast range
of time scales. NMR spin relaxation provides accurate information
on the reorientational properties of relaxation-active interactions,
normally interatomic bonds, up to tens of nanoseconds. In the fast
exchange limit, a single NMR peak represents a population weighted
average over the chemical shifts of each populated substate. When
the exchange rate is in the same range as the difference in chemical
shifts of the distinct states, on time scales from tens of microseconds
to hundreds of milliseconds in proteins, line-broadening is observed,
and 1H, 13C, and 15N NMR exchange
approaches can be used to characterize interconversion between the
different conformational states. Exchange that is significantly slower
than the difference in chemical shifts of the distinct states gives
rise to slow exchange, allowing all states to be individually investigated.
NMR probes biomolecular conformational changes on a vast range
of time scales. NMR spin relaxation provides accurate information
on the reorientational properties of relaxation-active interactions,
normally interatomic bonds, up to tens of nanoseconds. In the fast
exchange limit, a single NMR peak represents a population weighted
average over the chemical shifts of each populated substate. When
the exchange rate is in the same range as the difference in chemical
shifts of the distinct states, on time scales from tens of microseconds
to hundreds of milliseconds in proteins, line-broadening is observed,
and 1H, 13C, and 15N NMR exchange
approaches can be used to characterize interconversion between the
different conformational states. Exchange that is significantly slower
than the difference in chemical shifts of the distinct states gives
rise to slow exchange, allowing all states to be individually investigated.Flexibility and dynamics not only define the physical
nature but
also the biological function of IDPs, and the two major challenges
facing interpretation of experimental data from IDPs are related to
these characteristics. The first concerns the accurate description
of the conformational space sampled by the protein. NMR reports on
a population-weighted average over the ensemble of interconverting
states sampled at equilibrium so that as long as the exchange rates
are fast on an NMR time scale, conformation-dependent parameters,
such as chemical shift or scalar and dipolar couplings, report on
interconversion between a potentially immense number of conformers.
In practice for NMR studies of proteins using 1H, 15N, and 13C nuclei, this means interconversion
on time scales faster than hundreds of microseconds. Interpretation
of experimental data therefore requires statistical mechanical approaches
to evaluate the nature of the conformational ensemble. The available
degrees of conformational freedom that are accessible to IDPs significantly
outweigh the ability of the experimental constraints to uniquely define
the free-energy surface. Regardless of the approach used to delineate
the conformational space, caution must therefore be employed to derive
meaningful ensemble models that correctly describe the long-range
and local conformational sampling. To this end, there has been considerable
methodological development aiming to delineate the contours and limits
of local and long-range conformational space sampled by IDPs in solution,[17−27] from NMR, and other complementary biophysical techniques such as
small angle scattering and single molecule Förster resonance
energy transfer (smFRET).[28−32] Progress in this direction has focused on the use of extensive exploration
of conformational space, using for example stochastic sampling of
the available degrees of freedom, and subsequent identification of
combinations of conformers that when assembled into representative
ensembles agree with experimental data and can describe the contours
of the Boltzmann ensemble.[33−37] The success of such approaches is predicated on the ability to accurately
calculate the expected value of experimental data for a given conformation
or conformational sampling regime. The same end can be achieved via
ensemble restrained molecular dynamics simulation,[38−41] for example, by including experimental
data into the force field via a target function applied over the entire
ensemble.[42−51] The amount of detail concerning the conformational sampling of IDPs
in solution that can be derived from all of these ensemble approaches
depends of course heavily on the extent of experimental data available.[52]The advantage of the fast exchange regime,
reporting on a population-weighted
average over an ensemble of states that interconvert on time scale
faster than 100 μs, also highlights its key limitation that
more precise information about the associated motional time scales
is not explicitly contained in this average. Knowledge of the time
scales of diffusion and chain dynamics, of interconversion rates between
locally structured binding-competent and incompetent substates, and
of transient contacts relating the conformational properties of distant
regions of IDPs will all play an essential role in developing a deeper
understanding of IDP reaction kinetics and thermodynamics. Understanding
the dynamic properties of IDPs complements Cartesian descriptions
of their exploration of conformational space, providing a new and
essential dimension to our description of their functional behavior.
In response to this challenge, time scales of conformational rearrangements
of IDPs have been investigated using a vast range of experimental
techniques,[53] sensitive to local conformational
dynamics such as infrared,[54,55] Raman,[56] or neutron spectroscopy[57−59] or to long-range interactions
using single molecule fluorescence,[60−69] electron paramagnetic resonance,[70−72] and NMR paramagnetic
relaxation spectroscopies,[73−79] but by far the most powerful technique is the use of NMR spin relaxation.NMR spin relaxation probes the angular correlation functions of
relaxation active mechanisms, typically dipole–dipole interactions
between neighboring nuclei, arising due to reorientation processes
of macromolecules on time scales ranging from 10s of picoseconds to
10s of nanoseconds or even slower. These time scales are also readily
accessible to atomistic molecular dynamics (MD) simulation of fully
solvated proteins, rendering the combination of MD and NMR extremely
powerful. Advances in molecular simulation, in terms of accuracy of
force-fields or sampling of slower dynamic time scales,[80−85] have always accompanied advances in our understanding of the interpretation
of NMR relaxation in terms of global and local molecular motions,
demonstrating the synergy between these two atomic resolution techniques.
Indeed, 15N and 13C NMR relaxation data have
often been used to test and benchmark MD force fields and algorithms,[82,86−91] establishing the accuracy of dynamic trajectories of soluble, folded
proteins.15N spin relaxation provides a remarkably
sensitive
probe of the motional time scales exhibited by IDPs, characterizing
the dynamic properties of bond vectors throughout the length of the
unfolded protein.[92] The physical interpretation
of the dynamic time scales contributing to the quenching of the angular
correlation function is however less straightforward than in the case
of folded proteins. The amount of information that can be extracted
from spin relaxation is also limited by the efficiency with which
fast large-scale motions quench the angular correlation function. 15N spin relaxation measurements in unfolded proteins have
nevertheless been measured extensively, leading to the detection of
extensive pico- and nanosecond motions, as well as clear correlations
between motional time scales and structural propensities detected
from chemical shifts and scalar and dipolar couplings.[93−114]Further insight into the actual physical origin of the motional
modes and time scales giving rise to NMR spin relaxation can again
be derived from the combination of MD simulation with spin relaxation
measurements.[115−118] Measured relaxation rates report on population-weighted averages
so that accurate simulation should account for fast motions occurring
over the ensemble of states sampled by the protein. The value of relaxation
rates associated with each substate depends on the nature of this
conformation, so that in principle it would be necessary to simulate
each of the substates and average the individual rates as a function
of their populations, or to simulate sufficiently long trajectories
to sample all individual states. In the case of globular proteins,
the identification and simulation of distinct conformational substates
that are in fast exchange on the chemical shift time scale but that
exhibit distinct fast reorientational properties have indeed been
shown to significantly improve the description of the ensemble of
fast motions, as measured by the reproduction of experimental 15N relaxation rates.[119] This demonstrates
the improved accuracy of dynamic information when considering the
entire free-energy surface but also the interdependence of fast and
slower motions in proteins. For IDPs, this potential interdependence
has an even greater importance and underlines the relevance of adequate
sampling of the ensemble of conformational states.[120]Despite major progress in the simulation of highly
flexible or
unfolded proteins,[42,118,121−123] a more general application of these techniques
has been hindered by the inability of state-of-the-art force fields
to describe the dynamics of IDPs with acceptable accuracy.[90,124,125] While the degrees of conformational
freedom available to internuclear covalent bonds present in folded
proteins are mainly dictated by the immediate environment, and therefore
intraprotein interactions, for IDPs the solvent protein interactions
take on a far greater importance, so that an imbalance between potential
energy terms reporting on protein–protein and protein–solvent
interactions[124] may result in inaccurate
kinetic and thermodynamic behavior. The resolution of this question,
and the development of force fields that can describe both folded
and unfolded proteins with equal accuracy,[126] remains an important challenge.[90,120,124,127−130] The availability of accurate and calibrated NMR relaxation rates
from proteins with well-described conformational behavior will undoubtedly
contribute to this important task.Beyond the fast exchange
regime, NMR relaxation experiments no
longer represent a population-weighted average of the reorientational
properties of the exchanging species but report on motions occurring
on time scales defined by the difference in chemical shifts of the
exchanging subspecies, in the range of micro to milliseconds. In this
regime, NMR exchange spectroscopy is particularly powerful way to
probe the molecular mechanisms underlying the exchange contributions,
providing information on the thermodynamics, free-energy landscape,
and kinetics of the interconversion between the species.[131−136]Finally, our understanding of the functional modes adopted
by IDPs
is enriched by every physiologically relevant complex that is characterized
experimentally. The functional interactome of IDPs is vast and potentially
highly diverse, and our experimental sampling of the interaction modes
employed by IDPs remains extremely punctual. Although specific model
systems that are experimentally well-characterized provide useful
bench-marks, insight into the true diversity of the IDP interactome
requires more sampling, of more diverse systems, at atomic resolution.
Exchange NMR, whether fast, intermediate, or slow, provides powerful
tools to deliver this essential insight.The aim of this review
is to describe recent developments of NMR-based
approaches to understand the conformational dynamic behavior of IDPs
in physiological, and even cellular environments, and to illustrate
the insight that NMR offers to reveal the mechanistic basis of functional
disordered assemblies that are important for human health. Part of
the power of NMR spectroscopy lies in the use of combinatorial approaches
with structural techniques such as cryoEM and X-ray diffraction that
provide the structural context within which the functional role of
IDRs can be best understood. Examples will also be shown of the ability
of NMR to characterize large-scale dynamics of complex biomolecular
assemblies comprising highly disordered elements.
Accurate Mapping of the Conformational Landscape
of IDPs
An accurate understanding of the conformational properties
of IDPs,
and intrinsically disordered regions (IDRs) of multidomain proteins,
is of primordial importance. The dynamic behavior of IDPs is defined
by the amino acid sequence, and the ability of the protein to interact
via, for example linear motifs, is encoded and controlled by the intrinsic
conformational sampling. In addition, IDRs, often linking folded domains,
define the free-energy landscape of the protein, providing the degrees
of conformational freedom of the entire molecular assembly.[6,137−139] Characteristics such as charge and hydrophobicity
distribution of IDPs have been interpreted in terms of their role
in controlling physical parameters, for example, compactness and extendedness,[140] and the ability of IDPs to participate in multivalent
interactions.[141−144] Similarly, regulation of these degrees of freedom can be achieved
by post-translationally modifying the chemical nature of the chain.[145−148]Two recent studies described herein illustrate the importance
of
a detailed consideration of the averaging properties of different
experimental data types to understand the conformational nature of
IDRs. In particular, the combination of long-range and local transient
structure poses specific challenges to the analysis of disordered
proteins in terms of representative ensembles, and certain pitfalls
must be avoided to extract accurate structural information.Chemical shifts and scalar couplings present two important features
that directly impact their interpretation. First, they depend primarily
on the local structural environment of the observed spin, and second,
if interconversion between the different states is much faster than
the difference between the expectation values of the different states
in isolation, the measured NMR spectrum represents a weighted average
of the ensemble of states. Conversely, parameters whose experimental
values depend on time-dependent interactions, such as paramagnetic
relaxation for example, require a more detailed consideration of the
averaging properties, as has been discussed.[79] Residual dipolar couplings (RDCs) depend on the average of the orientations
of the internuclear vector (I–S) with respect to the magnetic field,where K describes physical constants
such as the gyromagnetic ratio
and the internuclear distance, and P2 (x) = (3x2 – 1)/2. In
a molecule of fixed shape, we can expand this average,where α refers to the orientation of the internuclear
vector
with respect to a traceless second rank tensor S that
describes the alignment properties of the molecule.In highly
flexible proteins, S can clearly vary significantly
over the ensemble such that proteins of different shape, and therefore
different alignment properties, but identical local sampling, would
give rise to very different RDCs:Using simple and
intuitive simulation of target
ensembles, it was demonstrated that ensemble descriptions derived
from RDCs of molecular systems whose shape varies significantly over
the ensemble can actually reproduce experimental data very closely,
even without explicit consideration of the alignment properties of
the component conformations. However, the orientational properties
of the internuclear vectors are then severely compromised and inaccurately
describe the conformational space compared to the target ensemble.[149] This reiterates the long-held observation that
to accurately describe local and long-range conformational sampling,
it is necessary to respect both of these contributions to the average
over the ensemble of states.[150]The
importance of considering long-range order in the interpretation
of RDCs was also illustrated in a recent study of the δ domain
of RNA polymerase (δ−RNAP), where multiple NMR parameters
and small angle scattering data were combined using the ensemble selection
approach, ASTEROIDS, to compare the free energy landscape of different
forms of the protein. ASTEROIDS uses extensive conformational sampling
described in an initial prior database, broadly sampling amino-acid
specific statistical-coil distribution for the unfolded chain,[151,152] and a genetic algorithm, to select representative subensembles of
conformers that in combination are in agreement with the experimental
data. The sampling of the prior database is modified iteratively until
convergence is achieved within the estimated uncertainty.[37]In the case of δ−RNAP, the
90 amino acid C-terminal
IDR follows the similarly sized folded domain.[153] The IDR is locally highly charged, with mainly acidic but
also basic stretches of amino acids. As in the case of a number of
acidic disordered domains in RNA-polymerase machinery, the acidic
sequence has been suggested as an RNA mimic.[154]Experimental data used to describe the conformational sampling
of the IDR included 13C, 15N, and 1H backbone chemical shifts, paramagnetic relaxation enhancements
(PREs), residual dipolar couplings (RDCs), and small-angle X-ray scattering
data. PREs provide clear evidence of transient long-range order in
the IDP, with apparent contacts between regions exhibiting opposite
charges (Figure ).[155] Analysis of δ-RNAP in terms of representative
ensembles results in close agreement with expected behavior of the
averaged RDCs. Characteristic modulations of multiple RDCs were observed
in each peptide unit (manifest as quenching of the RDCs measured between
the points of contact), and these RDCs were only correctly predicted
when the long-range contact identified from the PREs was included
in the analysis.
Figure 2
Experimental comparison of conformational behavior of
the intrinsically
disordered δ subunit of bacterial RNA polymerase. (A) Experimental
parameters measured on wild-type protein (green bars) compared to
ensemble-averaged values calculated from 10 ensembles comprising 200-strong
ASTEROIDS ensembles (red lines). From top to bottom: secondary chemical
shifts, paramagnetic relaxation enhancements (labeled at residue 132),
residual dipolar couplings, and SAXS. Bottom: comparison of distribution
of radii of gyration from a statistical coil pool (black) and the
ASTEROIDS ensemble (red). Structural models of five conformations
are displayed below the plots (ordered domain in green, IDR in yellow
with positively and negatively charged residues highlighted in blue
and red, respectively. (B) Same parameters for the mutated protein
in which a lysine-rich tract 96KAKKKKAKK104 are replaced by 96EAEEEEAEE104. This
results in a clear abrogation of long-range contacts with the C-terminal
half of the domain that collapse the protein. This collapse, and its
abrogation, are visible not only in SAXS and PRE data but also in
the residual dipolar coupling data. (Reproduced with permission from
Kuban et al. 2019 Copyright 2019 ACS[156]).
Experimental comparison of conformational behavior of
the intrinsically
disordered δ subunit of bacterial RNA polymerase. (A) Experimental
parameters measured on wild-type protein (green bars) compared to
ensemble-averaged values calculated from 10 ensembles comprising 200-strong
ASTEROIDS ensembles (red lines). From top to bottom: secondary chemical
shifts, paramagnetic relaxation enhancements (labeled at residue 132),
residual dipolar couplings, and SAXS. Bottom: comparison of distribution
of radii of gyration from a statistical coil pool (black) and the
ASTEROIDS ensemble (red). Structural models of five conformations
are displayed below the plots (ordered domain in green, IDR in yellow
with positively and negatively charged residues highlighted in blue
and red, respectively. (B) Same parameters for the mutated protein
in which a lysine-rich tract 96KAKKKKAKK104 are replaced by 96EAEEEEAEE104. This
results in a clear abrogation of long-range contacts with the C-terminal
half of the domain that collapse the protein. This collapse, and its
abrogation, are visible not only in SAXS and PRE data but also in
the residual dipolar coupling data. (Reproduced with permission from
Kuban et al. 2019 Copyright 2019 ACS[156]).Mutation of the cluster of basic
amino acids to acidic residues
abrogates the long-range contacts, resulting in extinction of the
characteristic PRE- and SAXS-derived evidence of compaction in the
wild type protein, revealing a highly extended IDR in the absence
of the basic cluster, and a disappearance of the characteristic long-range
RDC modulation. The combined analysis thus results in an accurate,
integrated description of the ensemble of states sampled by both wild-type
and mutant protein in solution, providing insight into the impact
of the electrostatic charge distribution on local and long-range conformational
behavior.[156] Interestingly, the loss of
long-range contacts induced by mutagenesis influences cell fitness
and transcription efficiency in vitro. While the
complete knockout of the delta subunit makes transcription too fast
and insensitive to regulation by initiating nucleoside triphosphates,
the mutation disrupting long-range contacts has the opposite effect:
it inhibits transcription from promoters that form unstable complexes
with RNA polymerase.
NMR Studies of IDP Dynamic
Modes and Timescales
NMR Relaxation of IDPs
and Models of Correlation
Functions
As introduced earlier, NMR relaxation occurs due
to angular fluctuations of relaxation-active interactions resulting
in transitions and incoherent dephasing that relax the spin state
back to equilibrium.[92,157,158] The angular reorientation of such interactions can be described
in the time domain (correlation function C(τ)) or the frequency domain (the spectral density
function J(ω)). Protein backbone
dynamics are typically characterized in solution using longitudinal
(R) and transverse (R) autocorrelated 15N relaxation rates, heteronuclear 1H–15N cross-relaxation, and 15N longitudinal (η) and transverse (η) cross-correlated dipole–dipole/CSA
(chemical shift anisotropy) cross-relaxation (σ).[92] The
advantage of measuring different rates lies in their distinct dependence
on different combinations of the angular spectral density function
at the characteristic Larmor frequencies defined by the spin system,
ωN, ωH, ωH ±
ωN.If enough measurements are available, the
spectral density functions can be mapped from the different relaxation
rates[159,160] using reduced spectral density mapping[161−164] to estimate J(0), J(ωN) and an approximate mean value at high frequencies throughout the sequence.
Alternatively,
the correlation function of internal motional modes can be described
analytically, in terms of geometric and temporal parameters (for example,
n-site jumps of diffusion in a cone), although it can be difficult
to differentiate between these models on the basis of NMR relaxation
rates alone. A simple and popular alternative is to use the model-free
approach, where mathematical contributions to the autocorrelation
function are parametrized. The approach is simply understood in the
case of internal modes in a folded protein,[165−169] where it is possible to express the angular correlation function
aswhere C (t) is the correlation function for global
motion, and a faster internal contribution, that is not associated
with a specific motional mode, describes restricted motion relative
to the molecular frame:where μ̂ is
a unit orientation
vector of the relevant relaxation-active interaction (dipolar or CSA).If the internal correlation function C(t) is approximated to a single
exponential, the associated spectral density function can be described
aswhere τ’ = (τ-1 + τ-1)−1τ describes
the overall rotational diffusion
and S2 is the generalized order parameter. Extension[168] to two internal components with distinct correlation
times (τ and τ and order parameters,
gives)where τ′ = (τ–1 + τ–1)−1, τ′ = (τ–1 + τ–1)−1.This formalism is commonly used to interpret
relaxation measured
in folded proteins, with the global contribution to the autocorrelation
and spectral density functions assumed to be common for all sites.
Although, due to their high flexibility, IDPs are not expected to
exhibit a shared diffusion tensor for distinct regions in the chain,
the same mathematical formalism can be used to model the spectral
density functions of each site independently, assuming that the time
scales of the component modes are sufficiently separated, and that
all the motions are isotropic:with ∑A = 1, andThis formalism has
been diversely exploited for the interpretation
of relaxation from partially denatured proteins and IDPs.[93,95,97,170−173] Alternatively, it is possible to describe the spectral density function
in terms of an analytical distribution of motions, of which the model-free
approach represents one of the simplest manifestations.[99,103,110,174] Here again, the complexity of the models makes differentiation difficult,
although they have been successfully used to explain the dynamic behavior
of synthetic homopolymers,[175] and surely
provide a more physical representation of the complex dynamics of
flexible proteins.[103]In highly dynamic
molecules such as IDPs, large-amplitude motions
occur in the range of nanoseconds,[93−114] rapidly quenching angular correlation and reducing the slowest sensitive
time scales to the nanosecond range (at room temperature and in free
solution). Nevertheless, the existence of segmental motions was suggested
from the bell-shaped dependence of transverse relaxation components
(with respect to primary sequence, tailing off to low values at both
termini), in chemically denatured and intrinsically disordered proteins,[104] relating to stiffness or side chain bulkiness,[96,176] and from 1H relaxometry.[177] IDRs connected to folded domains have been shown to induce slower
components on the rotational diffusion properties of multidomain proteins
indicating the importance of local viscosity and drag on dynamic time
scales.[178−181] Faster time scales are expected to relate to more local dynamics,
for example, of backbone dihedral angles, which may be important in
terms of local folding or binding;[6,23,182−192] however, in general the physical origin of observed relaxation rates
remains weakly characterized.
Recent
Applications of Model-Free Approaches
to IDPs
It is clear from eq that amplitudes and time scales of the different components
may be correlated and that the resulting parametrization will depend
on the accurate estimation of the number of contributions. In the
context of identifying the most appropriate model for the accurate
interpretation of NMR relaxation from IDPs, a number of recent studies
used extensive data sets to shed important light on the available
information content. Rather than fixing the number of models and determine
the most appropriate correlation times, Ferrage and co-workers[193] used an array of fixed correlation times (τ), distributed on a
logarithmic scale, with variable amplitudes (A), that could also be zero, to analyze the
spectral density function from eq . The backbone dynamics of the partially disordered
protein Engrailed 2 were analyzed using a large range of auto- and
cross-correlated relaxation rates measured at five magnetic fields
between 400 and 1000 MHz 1H frequencies. This provides
a grid of motional amplitudes corresponding to six characteristic
correlation times for the entire protein, clearly delineating the
folded and unfolded domains, and revealing dominant time scales around
1 ns in the unfolded domain.Gill et al.[194] also studied the dynamics of a partly unfolded protein,
the basic leucine-zipper region of GCN4. In this case, 15N R1, R2,
and σ, measured
at 600, 700, 800, and 900 MHz 1H frequency were analyzed
by rearranging the measured relaxation rates using a modified spectral
density mapping, and comparing these results to a model free analysis
using eq to determine
how many independent contributions can be extracted from this analysis.
The results demonstrate that the extended model-free approach accurately
describes the experimental data as well as being statistically justified
on the basis of the experimental uncertainty. The authors note that
more than three contributions cannot be theoretically justified from
these data.A similar study of the dynamic behavior of the 126
amino acid C-terminal
disordered domain of Sendai virus nucleoprotein (NT), examined 15N R1, R2, σ, η, and η measured at four magnetic field strengths
(600, 700, 850, and 950 MHz 1H frequency). In a first step,
autocorrelated and cross-correlated rates measured at each field strength
were analyzed using reduced spectral density mapping at each magnetic
field strength, confirming the self-consistency of the data, and the
absence of exchange contributions to R. The data were then analyzed using eq to determine the optimal number
of contributions. Two procedures were undertaken, the first based
on statistical testing, to determine the minimum number of contributions
for each site. Models with 2 (τ 1 and θ), 4
(τ 1, τ2, A, and θ), 5 (A2, A3, τ 2, τ3, and θ), or 6 (A2, A3, τ 1, τ 2, τ3, and θ) parameters for all sites in the
molecule, corresponding to 1, 2, or 3 contributions to the relaxation-active
correlation function. The 3-component model was found to be justified
throughout the protein. Second, 10% of all data were removed from
each data set, and their values predicted from the parameters determined
from the remaining data sets, again demonstrating that 3 components
are essential to correctly predict experimental values. This implies
that sufficient relaxation data have been measured to justify the
more complex model.Experimentally measured relaxation rates
vary significantly throughout
the length of IDPs, exhibiting apparent correlation with transient
secondary structure/linear motifs and differential dynamic behavior
depending on sequence composition. It is therefore interesting to
investigate the physical origin of the three components. The ability
to measure NMR relaxation rates in complex environments such as liquid–liquid
phase separation[195−197] and in cellulo(198) also calls for a careful analysis of the possible
physical mechanisms underlying these experimentally observed dynamic
modes. To this end, two approaches, described below, have recently
shed more light on the information content of this site-specific variation
of relaxation in IDPs, in particular concerning the relative importance
of local backbone conformational sampling and long-range chain-like
behavior. The first concerns the dependence of the different components
on environmental parameters such as temperature and crowding, and
the second combines novel MD-based approaches to the interpretation
of relaxation in IDPs.
Developing a Unified Description
of IDP Dynamics
in Solution
Temperature-Dependent Relaxation Reveals Properties
of Distinct Dynamic Modes
The study of NT, a disordered protein
containing a short helical linear motif was extended to measure R1, R2 and σ, and η and η at four magnetic field strengths (600, 700, 850,
and 950 MHz) and over a large range of temperatures (268–298
K) (Figure A).[199] Up to 61 rates were measured for each amide
group in the protein and interpreted using a simple Arrhenius relationship
to couple the correlation times at the different temperatures (in
analogy to the study of the temperature-dependent response of a microcrystalline
protein by solid state NMR[200]):
Figure 3
Temperature-dependent 15N relaxation maps three modes
of intrinsically disordered protein dynamics. (A) 15N auto-
and cross-relaxation rates of NT measured at different magnetic field
strengths (green, 600 MHz 1H frequency; blue, 700 MHz;
red, 850 MHz; orange, 950 MHz) and at different temperatures (top:
298 K, second row 288 K, third row 278 K, bottom 274 K). (B–F)
Analysis of all relaxation data in (A), using a three-component model-free
approach, with characteristic correlation times related via an Arrhenius
expression. (B) Slow (τ3) and intermediate (τ2) correlation times at 274 K (red), 278 K (orange), 288 K
(green), and 298 K (blue). (C) Activation energies for slow (red)
and intermediate (blue) time scales for each residue. (D–F)
Amplitude of slow (D), intermediate (E), and fast (F) time scale contributions
(Reproduced with permission from Abyzov et al. JACS 2016 Copyright
2016 ACS[199]).
Temperature-dependent 15N relaxation maps three modes
of intrinsically disordered protein dynamics. (A) 15N auto-
and cross-relaxation rates of NT measured at different magnetic field
strengths (green, 600 MHz 1H frequency; blue, 700 MHz;
red, 850 MHz; orange, 950 MHz) and at different temperatures (top:
298 K, second row 288 K, third row 278 K, bottom 274 K). (B–F)
Analysis of all relaxation data in (A), using a three-component model-free
approach, with characteristic correlation times related via an Arrhenius
expression. (B) Slow (τ3) and intermediate (τ2) correlation times at 274 K (red), 278 K (orange), 288 K
(green), and 298 K (blue). (C) Activation energies for slow (red)
and intermediate (blue) time scales for each residue. (D–F)
Amplitude of slow (D), intermediate (E), and fast (F) time scale contributions
(Reproduced with permission from Abyzov et al. JACS 2016 Copyright
2016 ACS[199]).The different temperature dependences of the three components are
described by temperature coefficients, or activation energies given
by E (τ is the Arrhenius prefactor).
Fitting to this function requires the determination of parameters
defining the relative amplitude of the three components at each temperature
and the effective temperature coefficients of the intermediate and
slowest contribution (the fastest contribution around 50 ps shows
insignificant temperature dependence). Again, cross-validation by
removal of either 10% of all data, or data from each magnetic field,
indicates that the analysis is satisfactorily overdetermined. It is
worth pointing out that predictive cross-validation is not so common
in analysis of protein dynamics from NMR spin-relaxation but when
applied shows a reassuring level of confidence in the data analysis.[199]The simultaneous analysis of data from
all five temperatures (Figure B–F) reveals
fascinating insight into the origin of the three resolved components.
The amplitude of the slowest component exhibits a bell-shaped distribution
with respect to primary sequence, with a clear maximum in the helical
region. The time scale parallels this distribution, reaching time
scales up to 25 ns in the helical region at 268 K. Although this contribution
is dominated by the slowest times experienced by the helix, the effective
activation energy, or rate of change of τ with temperature, exhibits a smooth function
along the sequence, reaching a maximum (20–25 kJ mol–1) in the center of the sequence. It was proposed that the slowest
contribution reports on chain or segmental dynamics. The reason that
slower motions are detected in the helix is that C(t) is not as efficiently quenched by the high amplitude
fast motions occurring in the remaining unfolded part of the chain.
The residual order left after the more restricted fast motions occurring
in the helix allow for the detection of slower motion that has little
effect on correlation functions from the less-structured parts of
the chain. This is further supported by the analysis of data measured
using protein constructs engineered to comprise 50, 75, or 126 amino
acids, revealing a clear dependence of the τ3 on the length of the peptide chain, as expected for
chain dynamics considered using Rouse or Zimm models.[201−203]The intermediate motion has a much flatter distribution over
the
unfolded regions, and the apparent activation energies are in the
range expected from studies of peptide backbone free energy landscapes.[204,205] In this case, there is a discontinuity in activation energy between
the unfolded and helical regions, motivating the suggestion that these
contributions report respectively on local fluctuations within Ramachandran
wells and constrained internal dynamics or partial unfolding in the
helix.[54,128,206,120]Although relaxation in IDPs is often thought
to provide information
essentially concerning subnanosecond motions, the analysis shown here
clearly demonstrated that short, structured motifs in unfolded polymers
are also dependent on slower, segmental or chain-like motions, or
whatever other motion finally quenches the angular correlation function.
Most regions are not sensitive to these motions because of the extent
of the faster motions, but if one can locally quench these, a great
deal of insight can be derived from the resulting relaxation rates.We note that while the contribution of the slowest motion increases
at lower temperatures, as the fastest motion falls, the amplitude
of the intermediate motion systematically passes through a maximum
at 288 K. This may provide us with information about the shape of
the actual distribution of correlation times and their impact on the
sampled correlation function.
IDP Dynamics
under Crowded Conditions Experienced In Cellulo
Although significant progress has thus
been made over recent years in our understanding of the information
provided by NMR relaxation studies of IDPs, it remained unclear how
to interpret data measured in more complex, and more specifically
in the more crowded, physiological environments in which they function.[208,209] This question is particularly relevant with respect to NMR in cellulo,[198,210−216] where IDPs function in environments with molecular concentrations
reaching 400 g/L,[217−219] very likely strongly affecting the time
scales of IDP dynamics.[220−222] The effect of local environment
on IDP function is also relevant for understanding the mechanistic
role of IDPs in membraneless organelles.[195−197,223−225] IDPs are subjected to extreme solvent accessibility compared to
folded proteins, suggesting that the physiological environment in
complex multicomponent environments will very likely strongly influence
dynamic modes and time scales. Single molecule fluorescence techniques
have provided unique insight into the importance of so-called internal
and solvent friction on IDP dynamics and partially folded or destabilized
protein states as well as on the kinetics of protein folding.[66,226,227] These approaches have been used
to investigate the dynamics of IDPs[228,229] and protein
function[230] in the cellular environment.Similarly, NMR spectroscopy has been used to investigate modulation
of the folding/unfolding equilibrium of globular proteins in cellulo, indicating changes in both population and exchange
rates as a function of the cellular milieu, and a dependence on weak,
so-called quinary[231] interactions between
the protein of interest and diverse other molecules constituting the
intracellular matrix.[232−234] NMR was also used to describe the impact
of the cellular milieu on protein dynamics, from small globular proteins
to IDPs.[210,211,215,235−238] In a detailed study, Theillet and co-workers compared the influence
of different viscogens on the dynamics of α-synuclein, with 15N relaxation measurements made in mammalian cells, revealing
changes in dynamics of the termini of the protein, presumably associated
with crowding-induced compaction or inter- and intramolecular interactions.
The extent of changes appeared to be more pronounced in cellulo, suggesting additional impact of intermolecular interactions on
the relative deceleration of the NH-backbone fluctuations.[198] In the context of these examples, and the growing
body of experimental data,[239−244] a physical framework that incorporates the effects of molecular
crowding on the dynamics of the protein would provide a welcome tool
allowing quantitative interpretation of NMR relaxation measured under
physiological conditions.Recent work further addressed this
challenge by measuring dynamics
of IDPs as a function of environmental complexity. An extensive set
of multifield NMR relaxation rates were measured over a broad range
of conditions, using inert crowding agents to systematically modify
viscosity, as well as temperature (Figure ).[207] This calibration
allowed the dynamics of two IDPs to be mapped as a function of environmental
conditions, including both viscosity and temperature. The two IDPs
exhibit distinct physical properties, comprising both partially folded
and highly flexible elements. Local, or nanoviscosity was gauged by
measuring 1H longitudinal relaxation of water,[157] which, at the high magnetic fields used here,
is expected to be dominated by rotational diffusion of the water molecules.[245−247] The overall dependences of the nanoviscosity of the solvent and
solute on the concentration of viscogen show similar features, with
the intermediate and slow correlation times of the backbone of the
protein, and the 1H R1 both deviating from the
linear regime in the range of 200 mg/mL (Figure ). Nevertheless, the two motional modes of
the protein backbone exhibit very different responses, with friction
coefficients that are much steeper (approximately a factor of 3) for
the slower motions. As noted from fluorescence-based studies, viscosity
probes of different dimensions are expected to measure different effective
viscosities,[248−252] so that friction coefficients would be expected to be characterized
by distinct length scales and to decrease for smaller probes.[253,254] This suggests, perhaps not surprisingly, that intermediate and slow
dynamic modes are associated with fragments of different dimensions,
for example, respectively, single and multiple peptide units. The
ratio of friction coefficients corresponding to intermediate and slow
motions was reproduced for both experimental systems (over 200 amino
acids), suggesting that the observation may be general. The observed
differences in effective friction coefficients may be related to observations
made by Schuler and co-workers that translational diffusion slows
down considerably more than rotational diffusion of the IDP prothymosin
α inside crowded cells, suggesting very different length scales
and susceptibilities to crowding.[229]
Figure 4
Viscosity-dependent 15N relaxation maps distinct response
of local and longer-range dynamics in intrinsically disordered proteins.
(A) Transverse (R2) and longitudinal (R1) relaxation, transverse cross-correlated DD/CSA
(η) and heteronuclear {1H}-15N nuclear Overhauser enhancement (NOE) recorded at
600, 700, and 850 MHz as a function of concentration of Dextran 40.
(B) Longitudinal water relaxation (solid red line, normalized to the
value in free solution; ρ0) shows a similar dependence
on concentration of viscogen to the intermediate time scale motion
(green points). The slow motional component (purple) resembles approximately
3* ρ0 (dotted line). (C) Friction coefficients (ε)
for intermediate backbone (blue) and slower, segmental (red) motions.
(D) Cartoon representation of the length scales of intermediate and
slower motions (Reproduced with permission from Adamski et al. JACS
2019[207] Copyright 2019 ACS).
Viscosity-dependent 15N relaxation maps distinct response
of local and longer-range dynamics in intrinsically disordered proteins.
(A) Transverse (R2) and longitudinal (R1) relaxation, transverse cross-correlated DD/CSA
(η) and heteronuclear {1H}-15N nuclear Overhauser enhancement (NOE) recorded at
600, 700, and 850 MHz as a function of concentration of Dextran 40.
(B) Longitudinal water relaxation (solid red line, normalized to the
value in free solution; ρ0) shows a similar dependence
on concentration of viscogen to the intermediate time scale motion
(green points). The slow motional component (purple) resembles approximately
3* ρ0 (dotted line). (C) Friction coefficients (ε)
for intermediate backbone (blue) and slower, segmental (red) motions.
(D) Cartoon representation of the length scales of intermediate and
slower motions (Reproduced with permission from Adamski et al. JACS
2019[207] Copyright 2019 ACS).On the basis of these observations, it was possible to develop,
and test, a single expression to describe the dynamic modes and their
characteristic time scales of IDPs in complex mixtures, their temperature
and viscosity coefficients, using a minimal set of physical parameters
to relate both the intermediate and slow time-scales (τ) to the nanoviscosity of the solvent:where
ρ(C) = (η – η0)/η0 = (R1, – R1,0)/R1,0, and R1,0 and
η0 are the longitudinal relaxation rate of water
and the viscosity in the absence of viscogen, R1,C is the longitudinal relaxation rate, η is the viscosity, and τ′,∞ is a prefactor
representing the correlation time at infinite dilution and temperature. ε is the residue-specific
friction coefficient relative to η of intermediate or slow motions. The model turns
out to be robust and remarkably transferable in vitro. For example, once sequence-specific friction coefficients have
been determined as a function of concentration for a particular protein,
highly sensitive dynamic probes such as a complete set of 15N relaxation rates measured in very different crowding conditions
are predicted with very high accuracy, simply on the basis of the
measurement of the water R1 (Figure A).
Figure 5
Residue-specific friction
coefficients are transferable between
different in vitro crowding environments and even
predict values measured in cellulo. (A) Experimental 15N relaxation rates recorded on Sendai virus NT in the presence
of 135g/L PEG (gray bars) compared to values calculated using sequence-specific
friction coefficients (eq ) (red lines) determined as a function of Dextran concentrations
and water relaxation in the sample of interest. For comparison, relaxation
rates predicted under dilute conditions are shown in blue. (B) Relaxation
rates measured at 600 MHz 1H frequency at a concentration
of 90 g/L PEG (colors as in (A)). (C) 15N relaxation rates
recorded in-cell (red points) compared to values calculated on the
basis of dynamic parameters determined in vitro (green
bars and line). Orange bars and lines show rates predicted for dilute
conditions. Experimentally determined friction coefficients and the
experimental measurement of the water Rin cellulo were used in the prediction.
(Reproduced with permission from Adamski et al. JACS 2019[207] Copyright 2019 ACS).
Residue-specific friction
coefficients are transferable between
different in vitro crowding environments and even
predict values measured in cellulo. (A) Experimental 15N relaxation rates recorded on Sendai virus NT in the presence
of 135g/L PEG (gray bars) compared to values calculated using sequence-specific
friction coefficients (eq ) (red lines) determined as a function of Dextran concentrations
and water relaxation in the sample of interest. For comparison, relaxation
rates predicted under dilute conditions are shown in blue. (B) Relaxation
rates measured at 600 MHz 1H frequency at a concentration
of 90 g/L PEG (colors as in (A)). (C) 15N relaxation rates
recorded in-cell (red points) compared to values calculated on the
basis of dynamic parameters determined in vitro (green
bars and line). Orange bars and lines show rates predicted for dilute
conditions. Experimentally determined friction coefficients and the
experimental measurement of the water Rin cellulo were used in the prediction.
(Reproduced with permission from Adamski et al. JACS 2019[207] Copyright 2019 ACS).Perhaps most remarkably, the expression reproduces experimental
relaxation measured in cellulo in Xenopus oocytes, on the basis of viscosity coefficients measured in vitro and nanoviscosity measured in the cell (Figure B). This unified
description offers new insight into the nature of IDPs, and extends
our ability to quantitatively investigate their conformational dynamics
in complex environments. Such a successful application of experimental
methodology from in vitro viscogen to in
cellulo observation may appear surprising in view of the
complexity of the cellular environment[255] and the evident inability of synthetic polymers to reproduce this
complexity.[256] This study suggests that
such concerns do not prevent the accurate prediction of average reorientational
properties of IDPs in cells and indicates that the averaging of observable
signals from IDPs and water remain closely coupled even in the multicompartmental
environment of the cell.
Interpreting NMR Relaxation
in IDPs Using MD
Simulation
Accounting for Ensemble Conformational Sampling
to Interpret Relaxation from IDPs
Although MD simulation
provides unique insight into the conformational dynamics of IDPs,[42,118,122,123] force-fields that accurately describe the behavior of folded proteins
often fail to reproduce ensemble averaged properties of IDPs in solution,
probably due to the importance of protein–solvent interactions.
This in turn has motivated the conception of force fields that have
been specifically designed for IDPs.[90,120,124,127−130]Spin relaxation remains the most powerful NMR observable to
characterize dynamic time scales at a sequence specific level, and
reproduction of experimental values is often the most challenging
for MD simulation. As described earlier, assuming conformational exchange
that is fast on the chemical shift (and relaxation rate) time scale,
experimentally observed rates derive from a population-weighted average
over individual relaxation occurring within the different states sampled
up to the micro- to millisecond range, such that ⟨R⟩ = ∑pR (p and R are the population and the
relaxation of each state). The problem of reproducing experimental
relaxation rates from IDPs using MD simulation is illustrated in Figure , where the 18 rates
from Sendai virus NT are compared to those derived from several microseconds
of fully solvated trajectories, using (in 2016) state-of-the-art,
IDP-adapted force fields.[90,258]
Figure 6
NMR relaxation allows
for the identification of ensembles of time-dependent
trajectories that represent fast motions in interconverting substates.
(A) Experimental 15N relaxation rates recorded on Sendai
virus NT at 298 K in dilute conditions (gray bars) compared to values
calculated from 4 μs of MD simulation, (blue line). The red
line shows values calculated from the ABSURD procedure targetting
only transverse relaxation measured at 850 MHz (orange box). (B) The
ABSURD procedure results in average time-dependent correlation functions
that can be decomposed into local and segmental motions of the peptide
chain. (Reproduced with permission from Salvi et al. JPCL 2016[125] Copyright 2016 ACS and Salvi et al. Angewandte
Chemie 2017[257] Copyright Wiley 2017).
NMR relaxation allows
for the identification of ensembles of time-dependent
trajectories that represent fast motions in interconverting substates.
(A) Experimental 15N relaxation rates recorded on Sendai
virus NT at 298 K in dilute conditions (gray bars) compared to values
calculated from 4 μs of MD simulation, (blue line). The red
line shows values calculated from the ABSURD procedure targetting
only transverse relaxation measured at 850 MHz (orange box). (B) The
ABSURD procedure results in average time-dependent correlation functions
that can be decomposed into local and segmental motions of the peptide
chain. (Reproduced with permission from Salvi et al. JPCL 2016[125] Copyright 2016 ACS and Salvi et al. Angewandte
Chemie 2017[257] Copyright Wiley 2017).Analysis of these trajectories indicates that the
origin of the
discrepancy derived from the over-representation of rare events, such
as long-range contacts, whose frequency is poorly sampled, leading
to statistical instability because the sampled correlation time does
not fulfill the necessary criterion τ ≪ tmax,[259] where tmax is the
maximal sampled time of the angular correlation function. To address
this problem, the following procedure was adopted: The entire trajectory,
or multiple distinct trajectories nucleated from different conformations,
are divided into subtrajectories of 100 ns, from which correlation
functions C(τ) (and rates R) are calculated and combined in an ensemble average that explicitly
mimics the actual heterogeneous conformational origin of the measured
relaxation. The maximum length of each subtrajectory is dictated according
to the experimental analysis described above for the studies of two
IDPs, NT and MKK4. At T = 298 K, the slowest contribution
to the rotational correlation function detected by experimental spin
relaxation (see above) is approximately 5 ns, so that the dynamic
reorientations occurring in each distinct substate can be reasonably
sampled using a sampling window of 100 ns (tmax= 50 ns). The ABSURD (average block selection using relaxation
data) approach then estimates the relative weights or segments of C(τ)
with respect to a single experimental relaxation rate, compiling an
ensemble of subtrajectories that interchange on time scales significantly
slower than the correlation time limit (100 ns) and significantly
faster than the chemical shift time scale (100s of μs).[125] In this way, a representative ensemble of time-dependent
trajectories is identified, thereby extending the concept of conformationally
averaged ensemble-descriptions into the time dimension. Optimization
against a unique relaxation rate at a single field identifies an ensemble
of trajectories that systematically improves agreement with a broad
set of rates, sensitive to motions occurring on a range of time scales
(R, R, σ, η measured at multiple fields) (Figure ), as well as local (13C chemical
shift) and global (SAXS) conformational sampling properties.The fact that the ensemble of trajectories improves reproduction
of “passive” dynamic reporters highlights the importance
of correctly sampling the free energy landscape of the IDP in solution,
and illustrates the complex interdependence of motions occurring on
time scales varying over many orders of magnitude. While it has previously
been shown that simulating motions occurring in distinct substates
improves reproduction of relaxation in folded proteins,[119,260] it is challenging to make this observation for IDPs.[120]
Analytical Description
of the Dynamics of
IDPs Sampled by NMR Relaxation
The ability to simulate the
ensemble averaged angular correlation functions is of course only
half of the challenge. In principle this function describes all of
the molecular mechanisms that are relaxation-active, but in practice
it is not straightforward to extract motional modes from this complex
function. To address this problem, the correlation function was recently
analytically decomposed into three components using internal coordinates
to describe librational and reorientational dihedral angle modes relative
to the average peptide plane, and tumbling of each peptide relative
to the laboratory frame.[257] This deconvolution
of the angular components allowed the identification of locally correlated
and segmental motions along the chain. The advantage of such an approach
was exemplified in a comparison of temperature dependent 15N relaxation measured on Sendai virus NT, and compared to relaxation
calculated from average correlation functions derived using different
force fields.[261] This allowed the identification
of the best force field over a range of temperatures (Figure ) but also the exact dynamic
mode that was responsible for the incorrect reproduction of experimental
data (in this case the reorientation of water molecules and their
correlation with intrasegmental backbone motions). In this way, the
combination of ABSURD and the analytical description of the correlation
functions can be seen as a forensic tool to improve molecular dynamics
force fields with respect to experimental data.
Figure 7
Temperature-dependent
NMR relaxation identifies accurate and transferable
molecular force fields for IDPs. Experimental 15N{1H} steady-state nOes (gray bars) measured on Sendai virus
NT at different magnetic fields (left 600 MHz, middle 700 MHz, and
right 850 MHz) and temperatures. ABSURD-selected ensembles of trajectories
using Charmm36m combined with the TIP4P/2005 water
model (red) reproduces experimental values better than when combined
with TIP3P (blue), at all temperatures. (Reproduced with permission
from Salvi et al. Sci. Adv. 2019[125] Copyright
2019 AAAS).
Temperature-dependent
NMR relaxation identifies accurate and transferable
molecular force fields for IDPs. Experimental 15N{1H} steady-state nOes (gray bars) measured on Sendai virus
NT at different magnetic fields (left 600 MHz, middle 700 MHz, and
right 850 MHz) and temperatures. ABSURD-selected ensembles of trajectories
using Charmm36m combined with the TIP4P/2005 water
model (red) reproduces experimental values better than when combined
with TIP3P (blue), at all temperatures. (Reproduced with permission
from Salvi et al. Sci. Adv. 2019[125] Copyright
2019 AAAS).
How Do
IDPs Function? Time-Resolved Atomic Resolution
Descriptions of IDP Complexes
The detailed study of IDP-binding
to receptors and cofactors has
revealed that IDP-based affinities range from tight subnanomolar binding
of highly specific chaperone complexes to multivalent interactions
with individual dissociation constants in the millimolar range.[262−267] NMR spectroscopy has the immense benefit of providing residue- or
even atomic-resolution detail of the interaction trajectories of IDPs,
even in the weak binding regime, and it is in this range of affinities
that it most often provides unique functional insight.Depending
on the exchange regime between free and bound protein,
NMR chemical shifts report on the population-weighted average of the
free and bound forms of the protein (fast exchange, where the exchange
occurs at a rate faster than the difference in chemical shifts ΔΔω
in the two states) or slow exchange, that in principle allows for
simultaneous detection of both environments.The former regime
has been elegantly exploited by Brüschweiler
et al. to investigate the binding modes of different amino acids present
in disordered proteins by measuring the impact of aqueous colloidal
dispersions of anionic silica nanoparticles on the transverse relaxation
rates of IDPs.[268,269] Electrostatic and hydrophobic
interactions are thought to dominate these weak interactions, and
these are shown to differ largely between amino acid types. The authors
show that these interactions can be parametrized and the binding profile
of a given IDP can be accurately predicted using a simple mathematical
model. This method also has the considerable advantage that transverse
relaxation rates are impacted by motions occurring on time scales
that are normally difficult to access by solution state NMR, also
providing insight into the intrinsic dynamics of folded proteins.[270]Beyond the fast exchange limit, intermediate
exchange, occurring
on time scales that are comparable to ΔΔω, leads
to line-broadening of the observable peaks (Figure ). This latter regime can be particularly
informative because NMR exchange spectroscopy can be used to unravel
the molecular mechanisms responsible for the observed broadening,
even at very low population of bound state, simultaneously providing
information both about the exchange kinetics and the free energy surface
of the exchanging environments. Rotating frame relaxation (R1ρ),[134,135] Carr–Purcell–Meiboom–Gill
(CPMG) relaxation dispersion,[131,132,136] chemical exchange saturation transfer (CEST),[133,271,272] and zz-exchange[273,274] provide information about exchange processes from the tens of microseconds
to the subsecond range.
Describing the Interaction
Trajectories of
IDPs with Their Partner Proteins
The power of CPMG relaxation
dispersion to describe complex interaction trajectories of IDPs was
demonstrated by Sugase et al.,[182] who
studied the interaction between the KIX domain of CREB binding protein
and the phosphorylated form of kinase inducible activation domain
(pKID). 15N CPMG measurements in the presence of substoichiometric
admixtures of KIX provided evidence for weak binding between pKID
and KIX, and allowed the authors to propose a model of the binding
trajectory according to a three-site exchange model, describing binding
via a partially folded encounter complex. This approach has been further
exploited, using a combination of 1H, 13C, and 15N CPMG, to map the interaction trajectory of Sendai NT upon
binding to the C-terminal domain of the phosphoprotein (PX).[191] While 1H and 15N amide
chemical shifts are commonly used as probes to map interaction interfaces, 13C backbone chemical shifts are more sensitive to secondary
structure. 1H, 13C, and 15N CPMG,
measured at substoichiometric admixtures of PX, was used to map the
conformational transitions along the interaction trajectory of the
partially formed helical motif (Figure ). This motif had previously been characterized on
the basis of RDCs and chemical shifts as a rapidly exchanging ensemble
of distinct helical elements.[275] The initial
step of the interaction involves the stabilization of one of the helical
elements present in the free-state equilibrium in an encounter complex
on the surface of PX. This step is mainly characterized by 13Ć differences between the free state and the encounter complex.
The second and final step, as reported mainly by 1H and 15N shifts, involves binding of the stabilized NT helix into
a groove between two helices on the surface of PX. The combination
of multinuclear CPMG, measurements on both partners and at multiple
admixtures thus provides the necessary information to reconstruct
a complex interaction trajectory involving both folding and binding.
This study also highlights the importance of the intrinsic conformational
dynamics of the binding partners that is already present in their
free states. The conformational equilibrium of free NT comprises a
pre-existing population of the state that is stabilized in the encounter
complex, while the second binding step appears to be limited by breathing
motions that open and close the binding pocket on PX in its free form.[108] This example also demonstrates that simple
models of intermolecular interaction such as “induced-fit”
or “conformational selection” are not necessarily applicable
to interactions involving highly dynamic proteins such as IDPs, where
a broader terminology, for example, conformational funneling, would
be necessary to describe such multistate interaction trajectories.[192]
Figure 8
Multinuclear CPMG relaxation dispersion maps the molecular
recognition
trajectory of an intrinsically disordered protein as it binds its
physiological partner. (A) 1H, 13C, and 15N CPMG were used to map the interaction trajectory of Sendai
virus NT with the C-terminal domain of the phosphoprotein (PX). The
combination of multinuclear CPMG, measured at multiple substoichiometric
admixtures (2, 3.5, 5, and 8% of PX compared to NT) provides the necessary
information to reconstruct a complex interaction trajectory involving
both folding and binding. (B) The first step involves funnelling of
one of the helical elements present in the equilibrium of rapidly
exchanging substates, in an encounter complex on the surface of PX.
(C) The second step involves binding of the stabilized helix into
a groove between two helices on the surface of PX. (D) Relaxation
dispersion measured on NT confirms that the second step coincides
with events occurring on the surface of NT. (E) Representation of
the most likely interaction trajectory derived from the ensemble of
the experimental data. (Reproduced with permission from Schneider
et al. JACS 2015[191] Copyright 2015 American
Chemical Society).
Multinuclear CPMG relaxation dispersion maps the molecular
recognition
trajectory of an intrinsically disordered protein as it binds its
physiological partner. (A) 1H, 13C, and 15N CPMG were used to map the interaction trajectory of Sendai
virus NT with the C-terminal domain of the phosphoprotein (PX). The
combination of multinuclear CPMG, measured at multiple substoichiometric
admixtures (2, 3.5, 5, and 8% of PX compared to NT) provides the necessary
information to reconstruct a complex interaction trajectory involving
both folding and binding. (B) The first step involves funnelling of
one of the helical elements present in the equilibrium of rapidly
exchanging substates, in an encounter complex on the surface of PX.
(C) The second step involves binding of the stabilized helix into
a groove between two helices on the surface of PX. (D) Relaxation
dispersion measured on NT confirms that the second step coincides
with events occurring on the surface of NT. (E) Representation of
the most likely interaction trajectory derived from the ensemble of
the experimental data. (Reproduced with permission from Schneider
et al. JACS 2015[191] Copyright 2015 American
Chemical Society).The crowded environment
of living cells can clearly influence interactions
involving IDPs,[255,276,277] impacting association and dissociation rates, via nonspecific interactions
or modulation of the structural and dynamic behavior of the proteins
described above. Although fluorescence[278] and simulation has provided useful insight, for example, into the
potential impact of attractive and repulsive interactions with the
cellular milieu on coupled folding and binding,[279] atomic or residue-specific experimental characterizations
of IDP-mediated interactions in vivo remain relatively
rare.[198,280−282]To achieve a
deeper understanding of the effects of crowding on
the thermodynamics and kinetics of reactions involving IDPs and their
partners, a more detailed, residue-specific picture is required, for
example, using relaxation and exchange measurements in crowded environments
and living cells. Kay and co-workers already performed 15N R1ρ relaxation dispersion experiment
in a highly concentrated phase-separated state (which can be regarded
as a particular form of crowding) of the germ granule protein Ddx4,
discovering a slowly exchanging excited state with increased intermolecular
contacts.[283]
On the
Importance of Multivalent, Weak Interactions
in Biology
It is becoming increasingly clear that not all
IDPs fold upon binding to their partners, even locally. The nuclear
pore is filled with proteins (FG-nucleoporins) comprising extremely
long IDRs, that are decorated with phenylalanine-glycine (FG) motifs,
that control transition between the cytoplasm and the nucleoplasm.
Larger proteins can only pass the filter when bound to nuclear transport
receptors (NTRs). Despite the high selectivity of the filter, transport
across the pore is extremely fast. The crucial interaction between
NTRs and FG motifs was recently investigated using NMR, revealing
weak chemical shift perturbations in the nucleoporin Nup153 in the
presence of a series of NTRs.[68] In this
case, 15N R1ρ and chemical shift titration
confirmed that the interaction was in fast exchange, allowing an estimate
of the intrinsic individual dissociation constant of a single site
of around 8 mM. The presence of multiple motifs in a single protein
clearly illustrated the effect of multivalency on the apparent affinity,
which decreased with increasing multivalency. Finally, assignment
of both free and bound forms of Nup153 demonstrated a complete absence
of backbone conformational transition upon binding, with the disordered
domain maintaining a high level of plasticity in the complex. On the
basis of these results, a model was proposed of rapid passage, assured
by the quasi continuum of NTR-binding sites present throughout the
pore, and the fast on and off rates that are maintained by multivalent
ultraweak binding throughout this continuum. Related results were
also found for other nucleoporins,[284,285] suggesting
that the mechanism may be general.Another example of the physiological
importance of ultraweak binding is shown from the study of the chaperone
complex between the partially disordered nucleoprotein (N) and the
intrinsically disordered phosphoprotein of Measles virus (MeV).[286] Paramyxoviral phosphoproteins (P) are essential
cofactors of the replication complex: they are tetrameric and all
comprise long IDRs that are hundreds of amino acids in length and
whose function remains largely unknown.[287] N has a folded domain that encapsidates the viral genome, protecting
it from the host immune system, and a disordered C-terminal domain.
ASTEROIDS analysis of the 304 amino acid IDR of P from MeV identifies
short helical elements in the N-terminal domain, and an additional
fourth helix 150 amino acids downstream of this (α4), adjacent to a highly acidic strand. The N-terminal helices bind
tightly to N, maintaining it in its monomeric form prior to encapsidation
of the RNA genome. The 90 kDa NP complex was investigated using NMR,
including over 450 intrinsically disordered residues, identifying
the known N-terminal chaperone binding site, but also a second, previously
unknown binding site positioned at the fourth helical element, α4 (Figure ). 15N CPMG using a molecular construct comprising only this site
revealed that the interaction has an intrinsic affinity that is around
5 orders of magnitude weaker than the main interaction site, allowing
P to transiently wrap around N, and to exchange between compact and
extended forms. Remarkably, the conserved interaction motif is shown
to be essential for viral replication. Although the exact role of
the second binding site remains unknown, it is possible that conformational
fluctuations of the acidic loop between the binding sites on P frustrate
access to the surface of N, for example, by cellular RNA or inhibit
self-assembly with other N monomers. More generally, the combination
of two distant interactions involving the same IDR suggests the existence
of long-range coupling between the two interaction sites linking opposite
ends of N that is regulated by the highly disordered nature of P.
This example again highlights the extreme sensitivity of NMR to detect
ultraweak interactions, even in the presence of very strong affinity
interactions between the same partners.
Figure 9
NMR detects essential,
ultraweak interactions in the dynamic assembly
of Measles virus nucleo/phosphoprotein complex. (A) 15N–1H HSQC spectrum of the complex formed between PTAIL and the nucleoprotein. The complex comprises more than 450 intrinsically
disordered amino acids. (B) Representation of the two interaction
sites involved in the complex. The phosphoprotein of Measles virus
(yellow) is known to bind the nucleoprotein (gray) in a tight complex
at its N-terminal end. NMR reveals a second binding site (δα4)
that is 150 amino acids away from the first binding site, in the middle
of a long intrinsically disordered domain that binds a distal site
of the nucleoprotein. NMR exchange (C) 15N CPMG and (D)
rotating frame relaxation in the free and bound forms of the region
140–304 of PTAIL, reveals that the intrinsic affinity
of this second site is 5 orders of magnitude lower than the known
binding site. (E) Normalized peak intensities (I/I0) of
P1–304 (50 μM) with P1–50N1–525 (gray, 25; red, 50; green, 100; and blue,
150 μM concentrations of P1–304. (F) Interaction
profile of P1–304,HELL → AAAA mutation (concentrations
as in E). Mutation of these four residues in the binding site knocks
out the second interaction and replication. (Reproduced with permission
from Milles et al. Sci. Adv. 2018[286] Copyright
2018 AAAS).
NMR detects essential,
ultraweak interactions in the dynamic assembly
of Measles virus nucleo/phosphoprotein complex. (A) 15N–1H HSQC spectrum of the complex formed between PTAIL and the nucleoprotein. The complex comprises more than 450 intrinsically
disordered amino acids. (B) Representation of the two interaction
sites involved in the complex. The phosphoprotein of Measles virus
(yellow) is known to bind the nucleoprotein (gray) in a tight complex
at its N-terminal end. NMR reveals a second binding site (δα4)
that is 150 amino acids away from the first binding site, in the middle
of a long intrinsically disordered domain that binds a distal site
of the nucleoprotein. NMR exchange (C) 15N CPMG and (D)
rotating frame relaxation in the free and bound forms of the region
140–304 of PTAIL, reveals that the intrinsic affinity
of this second site is 5 orders of magnitude lower than the known
binding site. (E) Normalized peak intensities (I/I0) of
P1–304 (50 μM) with P1–50N1–525 (gray, 25; red, 50; green, 100; and blue,
150 μM concentrations of P1–304. (F) Interaction
profile of P1–304,HELL → AAAA mutation (concentrations
as in E). Mutation of these four residues in the binding site knocks
out the second interaction and replication. (Reproduced with permission
from Milles et al. Sci. Adv. 2018[286] Copyright
2018 AAAS).
Atomic
Resolution Descriptions of Highly Dynamic
Molecular Assemblies from NMR
Disordered domains are thought
to play a role in the replication of numerous single strand RNA viruses,
with components of the replication machinery from both negative[287,288] and positive sense[289−293] RNA viruses exhibiting extensive disorder. A recent description
of the nucleoprotein of SARS-CoV-2, involved in protection of the
viral genome and regulation of gene transcription, revealed that the
flexible central region undergoes a disorder to order transition,
folding around the N-terminal domain of its viral partner nsp3 and
inducing a collapse of the remainder of the protein that impacts its
ability to bind RNA.[294]Influenza
A represents another example where extreme disorder appears to play
an essential role in viral function. To efficiently replicate in human
cells, avian influenza polymerase undergoes host adaptation, with
adaptive mutants (in particular E627 K) localized on two C-terminal
(627 and NLS) domains of the PB2 polymerase subunit. This region of
the protein shows remarkable behavior in solution, populating an equilibrium
between open and closed conformations that can be characterized using 15N CEST experiments, revealing open form chemical shifts that
are essentially identical to the isolated domains in free solution
and determine the exchange rate to be around 20 s–1.[295] The closed form is stabilized by
an interdomain salt bridge[296] while in
the open form the linker connecting the two domains becomes highly
dynamic and the two domains evolve freely. The host transcription
factor ANP32a was identified as an essential cofactor for the adaptation
of the viral polymerase,[297] suggesting
a direct interaction between the two proteins. ANP32a has a highly
acidic, intrinsically disordered domain whose length varies between
species, with the avian form containing a 33 amino acid insert, comprising
a unique hydrophobic hexapeptide and a repeat of the first 27 acidic
amino acids. Somehow the absence of this insert in mammals is compensated
by a single E627 K mutation of the avian polymerase, allowing cross-species
infection. It was therefore important to investigate the complexes
between these two highly flexible proteins.Here again, the
IDR mediates the interaction, with a polyvalent
interaction between the acidic tail of ANP32a and the positively charged
surface of the 627 domain.[298] The intrinsic KD measured from the side of ANP32a is more than
1 order of magnitude lower than the KD measured from the side of 627 due to the multiple interaction sites
on ANP32a dispersed along the IDR visiting the same sites on 627-NLS.
To characterize the dynamic ensembles, a series of eight cysteine
mutants of the avian and human adapted forms of 627-NLS were made,
and PREs measured on ANP32a. In the fast exchange regime, these data
provide a sensitive map of the population-weighted proximity of the
two proteins over the dynamic assembly and were used to develop an
ensemble description of the human and avian complexes using the ASTEROIDS
ensemble approach.This comparison identifies clear distinctions
between the binding
modes exploited in the two complexes (Figure ), as shown quantitatively in the average
distance map, where closer or more populated contacts are observed
between the positively charged 627 domain and the acidic IDR for the
human complex than for the avian complex where the electrostatic distribution
on the surface of 627 is disrupted by the E627 K mutation. This study
allows us to speculate further on the role of the interaction in the
function of the replication complex and more generally demonstrates
the ability of NMR to characterize intermolecular complexes exhibiting
extreme levels of flexibility and multivalency.
Figure 10
Influenza polymerase
forms a highly dynamic assembly with the intrinsically
disordered host transcription factor ANP32a in a species specific-way.
(A) PREs measured on hANP32A (orange, experimental;
and blue, representative ensembles selected using ASTEROIDS) in the
presence of paramagnetically labeled human adapted 627-NLS. (B) Same
information for avANP32A in the presence of paramagnetically
labeled avian adapted 627-NLS. (C, D) Representation of the dynamic
complexes determined from the data shown in A and B, respectively.
Multivalent interactions between ANP32a (yellow/red) and the 627 domain
(gray) are localized to the basic patch on the surface of 627. In
the case of avANP32A and avian adapted 627-NLS(E),
ANP32A disordered domain is in general closer to the NLS domain (yellow)
mediated by the hydrophobic hexapeptide (green). (E) Position of the
cysteine residues used to label 627-NLS. (F) Representation of the
ensemble of conformers of the hANP32A:627-NLS complex.
(G) Average distance difference matrix (in Å) between ANP32A
(x-axis) and the 627-NLS domains (y-axis) over the two ensembles. (Reproduced with permission from Camacho-Zarco
et al. Nat. Commun. 2020[298]).
Influenza polymerase
forms a highly dynamic assembly with the intrinsically
disordered host transcription factor ANP32a in a species specific-way.
(A) PREs measured on hANP32A (orange, experimental;
and blue, representative ensembles selected using ASTEROIDS) in the
presence of paramagnetically labeled human adapted 627-NLS. (B) Same
information for avANP32A in the presence of paramagnetically
labeled avian adapted 627-NLS. (C, D) Representation of the dynamic
complexes determined from the data shown in A and B, respectively.
Multivalent interactions between ANP32a (yellow/red) and the 627 domain
(gray) are localized to the basic patch on the surface of 627. In
the case of avANP32A and avian adapted 627-NLS(E),
ANP32A disordered domain is in general closer to the NLS domain (yellow)
mediated by the hydrophobic hexapeptide (green). (E) Position of the
cysteine residues used to label 627-NLS. (F) Representation of the
ensemble of conformers of the hANP32A:627-NLS complex.
(G) Average distance difference matrix (in Å) between ANP32A
(x-axis) and the 627-NLS domains (y-axis) over the two ensembles. (Reproduced with permission from Camacho-Zarco
et al. Nat. Commun. 2020[298]).It is perhaps not surprising that electrostatic interactions
in
low complexity IDPs can be responsible for highly multivalent interactions.
This was clearly demonstrated by a combination of smFRET and NMR spectroscopy,
together with coarse grained MD simulation, to investigate the complex
between two IDPs, the strongly basic histone H1 and the highly negatively
charged prothymosin-α.[299] Fluorescence
spectroscopy reveals affinities in the picomolar range, while NMR
and smFRET reveal that the proteins remain dynamic within the complex,
implying a high level of dynamic polyvalency and possible formation
of transient ternary complexes.[300] The
presence of dynamics in the bound state of IDRs was also characterized
in two recent studies of the disordered domain of kinases MKK7,[301] MKK4[302,303] in complex with JNK1
and p38α. CEST, CPMG, and spin relaxation were measured as a
function of stoichiometric ratio, suggesting that the bound state
of MKK7, and the kinase specificity regions flanking the main interaction
site of MKK4, both exhibited additional dynamics in the bound state,
in the former case on the micro to millisecond time scale and the
latter on relaxation-active ps-ns time scales. Similar data were used
to investigate the interaction between Artemis and the DNA binding
domain of ligase IV, in this case identifying a single step binding
interaction.[304]
Perspectives
Over the course of this review, we have demonstrated the unique
insight that NMR offers concerning the structure, dynamics and interactions
of IDPs at atomic resolution not only in reduced systems comprising
isolated proteins but also in the context of more complex molecular
environments that are relevant to physiological function. In particular,
we have drawn attention to the importance of describing the ensemble
and time-averaging processes that govern interpretation of NMR parameters,
and the remarkable insight that this can provide concerning the functional
modes exploited by such highly dynamic systems. The power of NMR results
in part from analytical understanding of the ensemble and time-averaging
processes occurring on time scales covering orders of magnitude from
pico- to milliseconds that remains one of its unique advantages for
studying flexible molecules. In addition to providing unique new insight
into the relationship between protein flexibility and function, the
combination of atomic resolution characterization of essential dynamic
processes from NMR with complementary structural and dynamic probes
that can be measured on similar sample preparations ensures an exciting
future for NMR as an integral tool for the investigation of increasingly
complex biological systems.
Authors: Tanja Mittag; Stephen Orlicky; Wing-Yiu Choy; Xiaojing Tang; Hong Lin; Frank Sicheri; Lewis E Kay; Mike Tyers; Julie D Forman-Kay Journal: Proc Natl Acad Sci U S A Date: 2008-11-13 Impact factor: 11.205
Authors: Mikayel Aznauryan; Leonildo Delgado; Andrea Soranno; Daniel Nettels; Jie-Rong Huang; Alexander M Labhardt; Stephan Grzesiek; Benjamin Schuler Journal: Proc Natl Acad Sci U S A Date: 2016-08-26 Impact factor: 11.205
Authors: Andrea Sottini; Alessandro Borgia; Madeleine B Borgia; Katrine Bugge; Daniel Nettels; Aritra Chowdhury; Pétur O Heidarsson; Franziska Zosel; Robert B Best; Birthe B Kragelund; Benjamin Schuler Journal: Nat Commun Date: 2020-11-12 Impact factor: 14.919