Lukas S Stelzl1,2,3,4, Lisa M Pietrek1, Andrea Holla5, Javier Oroz6,7, Mateusz Sikora1,8, Jürgen Köfinger1, Benjamin Schuler5,9, Markus Zweckstetter6,10, Gerhard Hummer1,11. 1. Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Max-von-Laue-Straße 3, 60438 Frankfurt am Main, Germany. 2. Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128 Mainz, Germany. 3. KOMET 1, Institute of Physics, Johannes Gutenberg University Mainz, 55099 Mainz, Germany. 4. Institute of Molecular Biology (IMB), 55128 Mainz, Germany. 5. Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland. 6. German Center for Neurodegenerative Diseases (DZNE), von-Siebold-Str. 3a, 37075 Göttingen, Germany. 7. Rocasolano Institute for Physical Chemistry, CSIC, Serrano 119, 28006 Madrid, Spain. 8. Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria. 9. Department of Physics, University of Zurich, 8057 Zurich, Switzerland. 10. Department for NMR-based Structural Biology, Max Planck Institute for Multidisciplinary Sciences, Am Faßberg 11, 37077 Göttingen, Germany. 11. Institute for Biophysics, Goethe University Frankfurt, Max-von-Laue-Straße 9, 60438 Frankfurt am Main, Germany.
Abstract
The paradigmatic disordered protein tau plays an important role in neuronal function and neurodegenerative diseases. To disentangle the factors controlling the balance between functional and disease-associated conformational states, we build a structural ensemble of the tau K18 fragment containing the four pseudorepeat domains involved in both microtubule binding and amyloid fibril formation. We assemble 129-residue-long tau K18 chains with atomic detail from an extensive fragment library constructed with molecular dynamics simulations. We introduce a reweighted hierarchical chain growth (RHCG) algorithm that integrates experimental data reporting on the local structure into the assembly process in a systematic manner. By combining Bayesian ensemble refinement with importance sampling, we obtain well-defined ensembles and overcome the problem of exponentially varying weights in the integrative modeling of long-chain polymeric molecules. The resulting tau K18 ensembles capture nuclear magnetic resonance (NMR) chemical shift and J-coupling measurements. Without further fitting, we achieve very good agreement with measurements of NMR residual dipolar couplings. The good agreement with experimental measures of global structure such as single-molecule Förster resonance energy transfer (FRET) efficiencies is improved further by ensemble refinement. By comparing wild-type and mutant ensembles, we show that pathogenic single-point P301L, P301S, and P301T mutations shift the population from the turn-like conformations of the functional microtubule-bound state to the extended conformations of disease-associated tau fibrils. RHCG thus provides us with an atomically detailed view of the population equilibrium between functional and aggregation-prone states of tau K18, and demonstrates that global structural characteristics of this intrinsically disordered protein emerge from its local structure.
The paradigmatic disordered protein tau plays an important role in neuronal function and neurodegenerative diseases. To disentangle the factors controlling the balance between functional and disease-associated conformational states, we build a structural ensemble of the tau K18 fragment containing the four pseudorepeat domains involved in both microtubule binding and amyloid fibril formation. We assemble 129-residue-long tau K18 chains with atomic detail from an extensive fragment library constructed with molecular dynamics simulations. We introduce a reweighted hierarchical chain growth (RHCG) algorithm that integrates experimental data reporting on the local structure into the assembly process in a systematic manner. By combining Bayesian ensemble refinement with importance sampling, we obtain well-defined ensembles and overcome the problem of exponentially varying weights in the integrative modeling of long-chain polymeric molecules. The resulting tau K18 ensembles capture nuclear magnetic resonance (NMR) chemical shift and J-coupling measurements. Without further fitting, we achieve very good agreement with measurements of NMR residual dipolar couplings. The good agreement with experimental measures of global structure such as single-molecule Förster resonance energy transfer (FRET) efficiencies is improved further by ensemble refinement. By comparing wild-type and mutant ensembles, we show that pathogenic single-point P301L, P301S, and P301T mutations shift the population from the turn-like conformations of the functional microtubule-bound state to the extended conformations of disease-associated tau fibrils. RHCG thus provides us with an atomically detailed view of the population equilibrium between functional and aggregation-prone states of tau K18, and demonstrates that global structural characteristics of this intrinsically disordered protein emerge from its local structure.
Intrinsically disordered
proteins (IDPs) are enriched in the proteomes
of higher eukaryotes, where they perform essential functions.[1−3] In healthy neurons, the paradigmatic IDP tau binds and stabilizes
microtubules.[1] In diseased neurons, tau
loses the ability to bind to microtubules and forms the toxic aggregates
associated with Alzheimer’s and other neurodegenerative diseases.[2] Hyperphosphorylation of tau correlates with the
progression of Alzheimer’s disease. Tau has recently been shown
to form biomolecular condensates.[3−6] Dysregulation of the formation of biomolecular
condensates by mutations[7] and aberrant
post-translational modifications such as phosphorylation[4,7] may underlie the pathogenicity of tau. Some tau mutations, e.g.,
P301L and P301S, show drastic effects in patients and are used in
mouse models of tauopathies.[8,9] The conformational dynamics
of tau around P301 may play a direct role in modulating the aggregation
of tau in disease,[10−12] as studied also by molecular dynamics (MD) simulations
of tau fragments.[12] Efforts to gain a clearer
picture of the local conformational dynamics of tau promise a deeper
understanding of its roles in health and disease.The challenges
in resolving structural ensembles of IDPs ask for
an integrative approach.[13] Important progress
in dealing with the high flexibility of disordered biomolecules has
been made using nuclear magnetic resonance (NMR) spectroscopy,[14−17] solution X-ray scattering (SAXS),[18] and
single-molecule Förster resonance energy transfer (FRET).[19−23] To harness the full power of these experiments and interpret the
data in detail, the construction of ensembles of structures[24−32] has proved to be a powerful strategy, especially for the interpretation
of NMR experiments and the combination of multiple experimental methods.[31,33,34] For instance, Borgia et al.[32] combined data from single-molecule FRET, SAXS,
dynamic light scattering, and fluorescence correlation spectroscopy
with MD simulations to characterize the ensembles of a marginally
stable spectrin domain and the IDP ACTR over a broad range of solution
conditions. Gomes and co-workers[35] recently
described an ensemble of the disordered N-terminal region of the Sic
protein, obtained by integrating different combinations of SAXS, single-molecule
FRET and NMR experiments using the ENSEMBLE approach.[36]Structural ensembles obtained from computational
modeling can be
combined with experimental data by using Bayesian and maximum entropy
ensemble refinement methods.[29,37−45] The Bayesian formulation accounts naturally for uncertainties in
the measurements, the model used to generate the ensemble, and the
calculation of observables from the ensemble members.[39] Input ensembles[46] are obtained,
e.g., from MD simulations[44,47−49] or chain growth,[26,28,50−53] and are then minimally modified to account for the experimental
observations. However, for long protein or nucleic acid chains, it
is difficult to create initial ensembles that have sufficient overlap
with the final ensemble for reliable ensemble refinement. For experimental
data that report on the local structure along the chain of a disordered
protein, we expect that cumulative systematic errors in the MD force
field will cause the summed squared error χ2 between
model and experiment to grow linearly with the length of the chain.
As a consequence, the overlap between input and final ensemble deteriorates
exponentially as the chain grows in length. Consequently, for long
IDPs, only a few chains will tend to dominate the ensemble after refinement,
with the rest of the large ensemble being mostly irrelevant.The problem of poor overlap between the initial and final ensemble
can be overcome by applying a bias already in the generation of the
initial ensemble, e.g., by imposing restraints directly on observables
or related quantities in the initial MD simulations. The use of chemical
shifts and other NMR data in the structural modeling of flexible systems
has a long and productive history. Approaches based on fragment selection
proved particularly powerful.[54−56] Protocols have been developed
that combine biased fragment choice with corrections to remove the
biases introduced.[42] In an early combination
of biased chain growth with Bayesian weighting applied to tau K18,[28] overlapping peptide fragments were stitched
together. Fragment selection was biased to double the radius of gyration
in an otherwise overly compact ensemble. Steric clashes were resolved
by energy minimization in implicit solvent, and high-energy structures
were randomly removed in a pruning step. Excellent agreement with
NMR observables[27] could be achieved by
adjusting the weights of the ensemble members. However, formal and
practical questions are raised: how does one incorporate experimental
data already during chain growth without compromising the Bayesian
framework of ensemble refinement, where such information would normally
be used a posteriori? And how does one ensure that the final ensemble
is well-defined and fully reproducible?We will show here that
in a Bayesian formulation any bias in ensemble
generation can be accounted for fully and quantitatively in a final
global refinement step by exploiting the direct connection of ensemble
refinement to traditional free energy calculations.[39] Meaningful input ensembles can thus be generated without
sacrificing the rigor and reproducibility of the ensemble refinement
procedure.We propose reweighted hierarchical chain growth (RHCG)
as a general
method to integrate data reporting on local structure into models
of disordered and flexible polymeric molecules such as disordered
proteins or nucleic acids. Protein chains are assembled from fragment
structures, as obtained here from MD simulations. As in hierarchical
chain growth (HCG),[52] chains with steric
clashes are consistently removed in such a way that the resulting
ensemble does not depend on arbitrary choices such as the direction
of chain growth, N-to-C versus C-to-N. In RHCG, fragment choice is
biased according to experiments reporting on the local structure.
In a final reweighting step, any resulting bias is then removed. RHCG
is thus a form of importance sampling.Using RHCG, we arrive
at an integrative model of tau K18 with atomic
detail. Tau K18 contains the four pseudorepeat domains R1-R4 involved
both in functional binding to microtubules[57] and in forming amyloid fibrils.[10,12] NMR chemical
shift data that report on local structure are incorporated already
during chain growth. Electrostatic[58] and
other interactions between regions distant in sequence can impact
the global structure of IDPs. Deviations from random coil behavior
can emerge also from local residual structure.[29] For tau K18, it is not clear a priori how its local and
global structure are shaped. We show that the RHCG ensembles also
capture the global structure of tau K18, as probed by NMR, RDC, single-molecule
FRET, and SAXS measurement. The global structure of tau K18 is thus
determined to a significant degree by its local structure.By
comparing wild-type (WT) and mutant sequences, we provide a
molecular view of possible differences between tau in a healthy cell
and tau with pathogenic mutations. Our modeling of tau K18 reveals
turns as in microtubule-bound states and extended structures as in
tau fibrils. We found that pathogenic single-point P301 mutations
shift the equilibrium from the former to the latter, emphasizing the
close connection between functional forms of tau in solution and the
fibrillar structures in tau-associated pathologies.
Theory
Bayesian Ensemble
Refinement of Polymeric Molecules
We combine molecular simulations
with ensemble refinement to create
ensembles of proteins or nucleic acids that faithfully reflect the
distribution of conformations in experiment. To create an initial
ensemble, we adapt the hierarchical chain growth (HCG) method introduced
recently,[52] as described in detail below.
We then use Bayesian Inference of Ensembles (BioEn)[39] to adjust the weights of the individual ensemble members
according to the experimental data, e.g., NMR chemical shifts.BioEn ensemble refinement minimally adjusts the vector = (w1, ..., w) of normalized weights of
individual chains c = 1, ..., C in
the ensemble to match the experimental data. We define a posterior P(|data, I) as a function of the weights ,with P0 (|I) being the prior and P (data|, I) being the likelihood. Here, I denotes background
information, e.g., that we model polymeric molecules with internal
structure. The BioEn maximum-entropy prior[38] is given byθ is a hyperparameter
that controls
the strength of the entropy regularization and thus expresses our
confidence in the initial ensemble of chains.[39]SKL is the Kullback–Leibler (KL)
divergencewhich reports how close the normalized refined
weights w are to the
normalized reference weights w0.Assuming
Gaussian uncorrelated errors, the likelihood is withThe
first sum is over the different experimental
observations i = 1, ..., Mdata with measured values Y, and the second sum is over the ensemble members c = 1, ..., C. For each chain c and
observable i, we use a forward model to compute individual
observations y. The error
σ2 is the sum of the
squared standard errors of the measurements Y and the forward calculations y.In applications of BioEn to long biopolymers, small but systematic
weight corrections at the monomer level can add up to large corrections
overall. For NMR chemical shifts, for instance, the sum over i in eq corresponds
to a sum over residues. As a result, the χ2 statistic
is extensive; i.e., it tends to grow linearly with the length of the
chain. Reweighting of assembled chains thus becomes progressively
more challenging as the length of the chain grows (i.e., for chains
with more fragments). The reason is that it becomes progressively
unlikely that all fragments in an assembled chain occupy the relevant
subspace with proper weight. As a result, chains will contribute with
very uneven weights after BioEn reweighting. In other words, a few
chains will dominate, and the rest of the large ensemble will be more
or less irrelevant.
Reweighted Hierarchical Chain Growth
We address the
problem of poor overlap between initial and final ensemble by using
importance sampling. In MD simulations of complete biopolymer chains,
bias potentials could be introduced, acting for instance on the torsion
angles to better match NMR chemical shifts or J-couplings.
Here, we focus instead on fragment-based chain growth. The key idea
is to grow chains by using fragment libraries that have already been
biased to enrich the ensemble with members of high weight, and then
to correct for this biased choice of fragments in a final reweighting
step. If the bias weights were chosen perfectly, the final step would
give each chain equal weight.In RHCG, we adapt HCG[52] to assemble polymer chains from fragments. At
each of the N positions, fragments are picked at
random from a fragment library and then combined by superimposition
of residues at their termini with the equivalent residues in the adjacent
fragments. Any models with steric clashes are discarded. In HCG, all
fragments have equal weight; in RHCG, the fragments in the library
{i} (with F being the number of fragments created at position n) are picked according to a weight w normalized to Σw = 1 for all n. These weights have to be chosen appropriately, as described below,
and constitute our initial guess as to how likely a particular fragment
is in the final reweighted ensemble of chains. The probability p[ ] for a particular chain c to be created in
this way is given by the product of weights for each of its fragments,where f ∈ {1,...,F} is the index of fragment n in chain c.Here, we construct the fragment libraries from
MD simulations of
short overlapping blocked peptides. Alternatively, fragment libraries
can be constructed from MD simulations of full-length chains that
are then broken up into overlapping segments and reassembled by chain
growth. A similar approach has recently been used to explore the flexibility
of the SARS-CoV-2 spike stalk.[59] Fragment
libraries can also be built from experimentally resolved structures
with appropriately defined weights.We used NMR chemical shifts
to bias the fragment choice. The weights
of the fragments w were determined
with BioEn applied to the fragment library at position n with a confidence parameter θ. This confidence parameter was chosen to produce nearly uniform
weights w of the assembled
chains after a global BioEn reweighting (Figure S1C). Importantly, there is no issue of circularity because
the bias applied during chain growth is fully accounted for, as described
in the following section.
BioEn Reweighting of Assembled Chains
After the biased
assembly of an ensemble of chains, we use BioEn[39,40] to correct for the bias in chain growth and to reweight the entire
ensemble globally. To correct for the bias in chain assembly, chain c enters the global BioEn refinement with a relative weight
proportional to the reciprocal of the bias probability, , with which its fragments
were selected.
Normalization of these relative weights gives usor, expressed
more compactly in terms of reciprocal
weight factors,where the sum
extends over the C chains of the ensemble. To the
ensemble with these initial weights,
we then apply BioEn reweighting, using as a reference experimental
data reporting on local or global structural properties.
Chain Growth
with Nonbonded Interactions beyond Steric Repulsion
Fragment
assembly can, in principle, be extended to account for
nonbonded interactions beyond steric repulsion to account, e.g., for
electrostatic interactions between fragments.[60] This can be accomplished by using a free energy function G(f1,..,f) that describes the interfragment interactions in
chain c and can be calculated from an implicit solvent
model or, by free energy calculations, from explicit solvent models.
Chains c assembled from fragments f1 ,...,f are
then weighted by an additional factor exp[−βG(f1 ,..,f)] with 1/β = kBT and kB being the Boltzmann
constant and T being the absolute temperature. In
the Bayesian formulation, the normalized reference weight of chain c in an ensemble of C chains then becomesTo sample efficiently from this distribution,
one can again use importance sampling by performing hierarchical assembly[52] with biased fragment selection. If, as above, w is the bias weight factor
to choose fragment f at position n, then eq becomesHere,
we use only excluded volume interactions,
which amounts to exp(−βG) = 1 for chains
without interfragment steric clashes and exp(−βG) = 0 with clashes.
Assessment of Importance
Sampling
In ideal importance
sampling, we would grow chains of equal relative importance. Global
BioEn reweighting would then give each member of the resulting ensemble
equal weight, w = 1/C. We use the KL divergence of the BioEn-optimized weights w from ideal importance sampling
to assess the effectiveness of our bias in chain growth:If SKLbias ≲ 1, the overlap between
the ensembles produced by biased chain growth and after BioEn refinement
is large; conversely, if SKLbias ≫ 1, the chain growth protocol
should be optimized. We use SKLbias also to choose the confidence
parameter θ quantifying the strength
of the bias in fragment choice during RHCG. As illustrated in Figure S1C, SKLbias as a measure of weight uniformity
is minimal for a range of θ values
given a confidence parameter θ in the global BioEn ensemble
reweighting.
Methods
Hierarchical
and Reweighted Hierarchical Chain Growth
We generated structural
ensembles of tau K18 (residues 244–372)
using HCG[52] and RHCG. (RHCG software can
be downloaded free of charge at https://github.com/bio-phys/hierarchical-chain-growth.) Tau structures were assembled from 43 pentamer fragments with
two residues overlap between subsequent fragments. All fragments had
their N and C termini capped by acetyl and N-methyl
groups, respectively. The first (N-terminal) fragment started from
the last residue outside tau K18, which was then removed in chain
assembly. Fragment structures were sampled in all-atom replica exchange
molecular dynamics (REMD) with explicit solvent. For each fragment,
we used 24 replicas spanning a temperature range of 278–420
K. Each pentamer fragment was simulated for 100 ns as in our previous
study.[52] We used structures from the T = 278 K ensemble to assemble tau K18 chains, which corresponds
to the temperature of the NMR experiments.[27] To investigate the effect of point mutations at the P301 position,
we also sampled fragments with P301 and mutations P301L, P301S and
P301T. We repeated fragment simulations for WT P301, P301L, P301S,
and P301T fragments with residue 301 at the central position of their
respective fragments instead of the second position of its respective
pentamer. Since we lack detailed chemical shift information, the P301X
mutant chains were assembled with HCG, not RHCG. We note that in all
fragment REMD simulations P301 was sampled exclusively as trans isomer.We biased the fragment selection in RHCG according to Cα chemical shifts measured by NMR. At each fragment position n, we performed independent BioEn reweighting[39,40] using the chemical shift data reported for the nonterminal residues
in this fragment (Supporting Information (SI) text). A large confidence parameter of θ = 10 ensured improved consistency of the chemical shifts (with
the average χ2 across fragments dropping from 0.856
to 0.688) with minimal weight changes (SKLBioEn = 0.004 on
average). These local BioEn calculations gave us fragment weight factors w. In numerical tests on comparably
small ensembles of 104 chains and with θ = 5 fixed
for the global BioEn ensemble reweighting, we found that SKLBioEn was
minimal for θ = 5 to 10 (Figure S1C).We then used RHCG to build
ensembles of between 2000 and 106 WT tau K18 models from
the reweighted fragment libraries.
For reference, we also constructed unbiased ensembles of WT tau using
HCG[52] with unweighted fragment libraries.
HCG was also used to construct tau K18 ensembles of P301 mutants.
If not specified otherwise, the results shown are for ensembles of C = 50 000 chains. Following the procedure described in
ref (52), we assembled
10000 representatives at each hierarchy level below the final assembly
level to sample a high diversity of possible local conformations.
At the final level, full-length models were assembled from this pool.
The assembly process was trivially parallelized by using different
random number seeds. In a final step, the RHCG ensembles were reweighted
using BioEn to correct for the biased fragment choice while retaining
consistency with the NMR chemical shift data. In this global BioEn
reweighting step, the confidence parameter was set to θ = 5
according to an L-curve analysis (SI text
and Figure S1A). The resulting ensembles
were structurally diverse and, among 50 000 HCG and RHCG structures,
did not contain any knots (SI text).
Calculation of Experimental Observables
NMR Secondary Chemical
Shifts and J Couplings
For comparison
with NMR experiments, we calculated chemical shifts from fragments
and full-length structures using SPARTA+.[61] We subtracted random-coil shifts calculated using POTENCI[62] to compare to secondary chemical shifts ΔC. We computed 3JHNHα couplings with the Karplus parameters by Vögeli et al.[63] with the mdtraj Python library.[64]
NMR Residual Dipolar Couplings
RDCs
were calculated
from the ensembles of full-length structures with PALES[65,66] in the steric alignment mode. Even for random flight polymers, the
presence of an ordering medium modeled as a hard surface induces nonzero
RDCs.[67] The value DHN( for a particular residue r was calculated by computing
the alignment of each chain c in the ensemble with
PALES and then taking the average over all structureswhere DHNmax = 21.7 kHz for an idealized
amide bond length of 1.04 Å,[68] ϑ( is the angle between the amide bond vector of residue r in chain c, the external magnetic field, P2(x) = (3x2 – 1)/2 is the second-order Legendre polynomial,
and ⟨...⟩ denotes an average over the orientations of
the chain biased by the alignment.
Small-Angle X-ray Scattering
We used FoXS[69] to calculate SAXS intensity
profiles for the
tau K18 structures in an ensemble and then calculated the weighted
average over the ensemble. In the FoXS calculations, we took the solvation
shell into account by setting c2 = 3.
The excluded-volume parameter was set to the default value of c1 = 1. Geometric RG values were computed using the MDAnalysis library.[70,71] To compare measured scattering intensities to those predicted for
the weighted ensemble, Isim(q), we first estimated an intensity scale factor a and a constant for background correction b by performing
least-squares fitting ofto the SAXS intensities with q being the scattering vector. For a regime unaffected by aggregation, q > 0.012 Å–1, the best fit to
experiment
was achieved with the coefficients a = 1.1 ×
10–11 and b = 3.8 × 10–5. For q < 0.012 Å–1, we took possible mild aggregation into account by approximating
the scattering intensity including possible aggregates asBy least-squares
fitting with fixed a and b, we find
an aggregate intensity
of c = 0.001 56 and an aggregate size of Ra = 234 Å. The fit to the combined model
is shown in Figure S2. An earlier set of
scattering data[18] is restricted to q > 0.03 Å–1.
Comparison
to Single-Molecule FRET Experiments
We compared
Cα–Cα distances extracted
from FRET experiments using the SAW-ν polymer model[72] to RHCG models. To quantify the effect of the
fluorescent dyes on the distance distribution, we performed additional
calculations in which we adapted the RHCG method to add dyes[73] during chain growth (SI text and Figure S3).
Comparison
to NMR Paramagnetic Relaxation Enhancement Measurements
NMR
paramagnetic relaxation enhancement (PRE) measurements on tau
K18 have been previously reported.[74] We
computed PREs for the tau K18 ensembles using the PREdict[75] Python library (https://github.com/KULL-Centre/DEERpredict). PREdict adds explicit spin labels to the chains modeled with a
rotamer library. The PRE is calculated in the fast-exchange limit
with respect to both spin-label and chain dynamics. Details of the
PRE calculation are given in the SI text.
Experiments
Single-Molecule FRET Experiments
For the single-molecule
FRET experiments, tau K18 was labeled with Alexa Fluor 488 and CF660R
at its naturally occurring cysteine residues, C291 and C322 (SI text). The labeled tau K18 was diluted to
a concentration of 100 pM in 50 mM sodium phosphate buffer, pH 6.8,
1 mM DTT, 0.001% Tween 20 or 20 mM HEPES, 5 mM KCl, 10 mM MgCl2, pH 7.4, 1 mM DTT, 0.001% Tween 20. The experiments were
performed at 295 K on a MicroTime 200 confocal single-molecule instrument
(Pico-Quant, Berlin, Germany) as described in detail in the SI text. The SAW-ν model was used to analyze
the single-molecule FRET data to extract distances and the polymer
properties of tau K18[72] (SI text).
Small-Angle X-ray Scattering Experiments
SAXS data
were collected at 298 K from monodisperse samples of K18 ranging from
50 to 67 μM in 20 mM Hepes, 5 mM KCl, 10 mM MgCl2, 1 mM DTT at pH 7.4. Scattering profiles were analyzed with standard
procedures using ATSAS.[76] SAXS measurements
were performed at DESY (Hamburg, Germany) and Diamond Light Source
(Oxford, UK) stations.
Results and Discussion
RHCG Produces
a Diverse Ensemble of Tau K18 Chains
During chain assembly,
we applied a gentle bias on the fragment choice
by using fragment weights from BioEn reweighting against Cα chemical shifts. To correct for the bias, the assembled chains were
then reweighted with BioEn, again using the chemical shift data as
experimental reference. In this global BioEn reweighting step, the
chains were given near-uniform weights w with SKLbias ≪ 1 (Figure S1B). By comparison, the BioEn weights of the HCG ensemble
created without bias are less uniform. The resulting ensemble of tau
K18 is comprised of highly diverse structures with atomic detail (Figure C). The typical Cα root-mean-square distance (RMSD) between two chains
is about 26 Å (Figure S4 and SI text), and backbone dihedral angles are broadly
sampled (Figure S5).
Figure 1
The RHCG ensemble reproduces
global structural features of tau
K18. (A) Comparison of experimental (gray) and predicted (blue) 1H–15N RDCs, which were not used in the construction
of the RHCG ensemble (see Table S1 for
the amino acid sequence of tau K18). (B) Scatter plot of calculated
and measured RDCs. (C) Backbone traces of 30 members of the RHCG ensemble.
Zoom-ins show superpositions of 10 representative structures of a
turn at position L284-S285 (top) and an extended segment at position
Q276-I277 (bottom) with negative and positive RDCs, respectively,
as highlighted by shading in panel (A). (D) Comparison of calculated
(blue) and experimental SAXS scattering intensity profiles (gray symbols)
and from ref (18) (orange
dashed line). See Figure S2 for a plot
of the low-q regime. Inset: Distribution of RG in the RHCG ensemble. Vertical dashed lines
indicate the average RG from RHCG (blue)
and experiment[18] (gray; ± SEM shown
by shading). (E) Distribution of Cα–Cα distance inferred from FRET experiments using the SAW-ν
model[72] (gray), RHCG (blue), and RHCG*
(orange). Root-mean-square distances are indicated as (dashed) vertical
lines.
The RHCG ensemble reproduces
global structural features of tau
K18. (A) Comparison of experimental (gray) and predicted (blue) 1H–15N RDCs, which were not used in the construction
of the RHCG ensemble (see Table S1 for
the amino acid sequence of tau K18). (B) Scatter plot of calculated
and measured RDCs. (C) Backbone traces of 30 members of the RHCG ensemble.
Zoom-ins show superpositions of 10 representative structures of a
turn at position L284-S285 (top) and an extended segment at position
Q276-I277 (bottom) with negative and positive RDCs, respectively,
as highlighted by shading in panel (A). (D) Comparison of calculated
(blue) and experimental SAXS scattering intensity profiles (gray symbols)
and from ref (18) (orange
dashed line). See Figure S2 for a plot
of the low-q regime. Inset: Distribution of RG in the RHCG ensemble. Vertical dashed lines
indicate the average RG from RHCG (blue)
and experiment[18] (gray; ± SEM shown
by shading). (E) Distribution of Cα–Cα distance inferred from FRET experiments using the SAW-ν
model[72] (gray), RHCG (blue), and RHCG*
(orange). Root-mean-square distances are indicated as (dashed) vertical
lines.
RHCG Models of Tau K18
Capture the Average Local Structure of
Tau as Reported by NMR
Chemical shifts are accurate reporters
of local structure and secondary structure.[16,17,27,29,61,77] Overall, we found that
the Cα chemical shifts calculated for the RHCG ensemble
of tau K18 are close to random coil values, with secondary chemical
shifts ΔC mostly close to zero. Despite the
residual amplitude typically being smaller than the error of ≈1
ppm[61] in the forward chemical shift calculation,
the models capture important features of the variation of experimental
secondary chemical shifts along the tau K18 amino acid sequence, such
as a drop in secondary chemical shift going from L285 to V300. HCG
without reweighting of the fragment library underestimates the populations
of extended and β-strand like structures and overestimates the
helical-like conformations. Going from HCG to RHCG, the average residual
drops from 0.35 to 0.27 ppm and Pearson’s r for the secondary chemical shifts ΔC of the
Cα atoms increases from 0.28 to 0.41. RHCG lowers
in particular positive ΔC values, e.g., at
the S420 position (Figure S6A,B). In light
of the considerable uncertainties in the forward calculation (≈1
ppm) and the small ΔC amplitudes, a lower θ
value resulting in an even tighter fit was not justified (Figure S1A).We also calculated NMR 3JHNHα couplings, which report
primarily on the ϕ-dihedral angles of the protein backbone.
The couplings calculated for our models agree well with the NMR experimental
data[27] (Figure S7). Also in terms of 3JHNHα couplings, which were not used in the RHCG procedure, RHCG somewhat
improves the representation of the local structures over HCG, as reflected
by the increase of Pearson’s r from 0.59 to
0.62. The root-mean-squared error dropped from 0.47 Hz (HCG) to 0.41
Hz (RHCG). For reference, the uncertainty of the calculated 3JHNHα couplings has been estimated
at ∼0.9 Hz.[78] We do not expect a
more significant improvement because the 3JHNHα coupling is sensitive primarily to the ϕ
backbone torsion, whereas the Cα chemical shift used
in RHCG is particularly sensitive to the ψ backbone torsion.
Indeed, even for a simple Ala pentapeptide we found small but systematic
differences between a state-of-the-art force field and 3JHNHα couplings.[40] Overall we conclude that reweighting in fragment assembly
alleviates the small but systematic deviations caused by small imbalances
in state-of-the-art force fields used to generate fragment libraries.
As a result, the local structure of the tau K18 chains produced by
RHCG is more consistent with NMR chemical shift and J-coupling experiments.
The RHCG Ensemble of Tau K18 Reproduces the
Experimental NMR
Residual Dipolar Couplings
We calculated the RDCs for the
assembled tau K18 chain using the steric alignment mode of PALES,[66] and then averaged the RDC values over the ensemble
with the respective weight of the chain. The measured[27] and calculated RDCs agree remarkably well and capture both
the signature as a function of position along the chain (Figure A) and the magnitude
at individual residue positions (Figure B). Without further fitting, we obtained
Pearson r correlation coefficients of 0.73 for RHCG
and 0.70 for HCG for tau K18 ensembles of 50 000 models. This
consistency not only validates the ensemble but also gives direct
insights into the interpretation of the RDCs measured for IDPs. RDCs
inform on how restricted a chain is locally, with larger absolute
RDCs expected for more restricted segments than for fully flexible
segments.[15] The RDC DHN ∝⟨P2(cos(θ))⟩
reports on the relative orientation of an amide bond vector with respect
to the magnetic field. Changes in the sign of the measured RDCs have
been interpreted as changes in the direction of the protein backbone.[27] Our conformational ensemble reproduces the four
changes in the sign of DHN found in experiments.[27] Importantly, as highlighted for the region centered
on L284-S385 in Figure C, our structures on average trace a turn in the region where the
sign changes, as indicated by a shortened distance across the four-residue
segments (Figure S8). By contrast, in regions
such as Q276-I277, where the sign of DHN does not change, our structures do not show a preference in the
chain direction and scatter around an average straight chain (Figure C). We note that
simple polymeric models that ignore amino acid chemistry and the correlations
between subsequent residues tend not to capture the trends in the
experimental RDCs, as previously noted.[15,27,79]
Residual Dipolar Coupling Calculations Require
Large Ensemble
Sizes
The need for large ensembles has been highlighted before.[26] Building large ensembles relies on the possibility
to quickly generate statistically independent atomically detailed
models of IDPs. The RDC values predicted for particular residues in
our models are widely and asymmetrically distributed with a range
of about ±25 Hz (Figure A). By contrast, the experimental average is roughly in the
range of −5 to 10 Hz (Figure A). As a result, RDCs calculated from small ensembles
are biased (Figure B). We found that relatively large ensembles of ≥10 000
tau K18 chains are needed to get converged RDC values (Figure B). We found in particular
that Pearson’s r correlation coefficient improved
with increasing ensemble size. The ensemble-size dependence is similar
for RHCG and HCG, even if the RHCG ensemble consistently performs
somewhat better than the HCG ensemble (Figures D, 2B,C, and S9).
Figure 2
Large ensembles are required to capture NMR
RDC measurements. (A)
Distribution of 1H–15N RDC values for
L285 in RHCG ensembles of different size, as calculated by PALES[66] without rescaling. (B) Average 1H–15N RDC for L285 in dependence of the ensemble size for HCG
(dark green squares) and RHCG (blue circles). Error bars indicate
± SEM. (C) Ensemble-size dependence of Pearson r correlation coefficient between tau K18 1H–15N RDC measurements[27] and calculations
from RHCG (blue circles) and HCG (green squares), respectively.
Large ensembles are required to capture NMR
RDC measurements. (A)
Distribution of 1H–15N RDC values for
L285 in RHCG ensembles of different size, as calculated by PALES[66] without rescaling. (B) Average 1H–15N RDC for L285 in dependence of the ensemble size for HCG
(dark green squares) and RHCG (blue circles). Error bars indicate
± SEM. (C) Ensemble-size dependence of Pearson r correlation coefficient between tau K18 1H–15N RDC measurements[27] and calculations
from RHCG (blue circles) and HCG (green squares), respectively.
RDCs from Short Chain Segments
In
the modeling of RDCs
of IDRs, it is frequently assumed that ensembles of short peptide
segments of about 15 amino acids contain sufficient structural information
to calculate RDCs.[80,81] We tested this assumption by
cutting overlapping 15-mer segments out of the BioEn ensemble of full-length
tau K18 and then calculating the average RDCs for their central 9
amino acids using a steric alignment.[66] We found that the RDCs calculated for the full ensemble and for
the 15-mer segments are highly correlated (r = 0.91; Figure S10). Compared to the NMR RDCs, the correlation
coefficient for segments (r = 0.61) is nearly as
good as for full-length chains (r = 0.73). In line
with earlier findings,[81] we conclude that
comparably short peptide segments can indeed be used to model the
RDCs of long IDRs such as tau.This finding also makes it possible
to use RDC data during chain growth in RHCG. RDCs can be precalculated
either directly for fragments of sufficient length or for a library
of segments that have been assembled by chain growth. With the precalculated
RDCs, subsequent chain growth can be biased to improve the overlap
between the initial and BioEn-optimized ensembles of chains. Here,
for tau K18, including RDCs in chain growth proved unnecessary because
they were predicted accurately without any bias.
The RHCG Ensemble
Captures the Extension of Tau K18 in Solution
The RHCG ensemble
also captures the size and shape of tau K18 in
solution as probed by SAXS measurements (Figure D). The mean scattering profiles calculated
from our tau K18 models agree well with the experimental scattering
profiles (Figure D),
taking possible unspecific aggregation in the low q regime into account. The computed root-mean-square radius of gyration
of approximately 39 Å coincides with the experimentally determined R of 38 ± 3 Å.[18] The RHCG ensemble (⟨Rh⟩ = 34 Å) is also consistent with the hydrodynamic
radius Rh 34 ± 6 Å, as reported
by dynamic light scattering (DLS).[74]Rh was computed from the RHCG ensemble using
an empirical approach.[82,83]Our RHCG ensemble agrees
quite well with previously reported NMR paramagnetic relaxation enhancement
(PRE) measurements[74] (Figure S11), which were not used in the generation of our
ensembles. Spin-label dynamics were modeled with a rotamer-library
approach.[75] The overall shapes of the experimental
profiles measured for four different spin-labels[74] were captured without any refinement.[46] However, a fully quantitative comparison is challenging
because of the sensitive dependence of the PRE on infrequent close
contacts between proton and spin-label in the fast-exchange regime.
As a result, the calculated PRE profiles are noisy and, without weight
adjustments, tend to underestimate the actual PRE for residues and
labels close in sequence. The good agreement with SAXS, dynamic light
scattering, and NMR measurements suggests that the RHCG ensemble captures
the global conformational properties of tau K18 in solution quite
well without further refinement. However, BioEn reweighting of the
spin-label rotamers[46] used to calculated
the PRE and possibly also the chains should address some of the challenges
in calculating PREs of disordered proteins.
Structure of tau K18 as
Assessed by Single-Molecule FRET
Comparison to single-molecule
FRET experiments suggests that our
RHCG models are somewhat too extended (Figure E), with longer Cα–Cα distances in the RHCG ensemble than those extracted
from the FRET experiments.[45] This initial
analysis of the FRET data with a commonly used polymer model[72] provides a valuable check on the validity of
more involved comparisons with explicit representations of dyes.[45,73,84] In a BioEn calculation, we found
that already a small adjustment of the RHCG chain weights suffices
to match the mean distance deduced from FRET perfectly (RHCG* in Figure S3D and Table S2). The resulting RHCG* ensemble agrees as well with experiment as
the RHCG ensemble in terms of the SAXS measurements, and slightly
worse in terms of NMR RDC and PRE measurements (Figure S12 and Table S2). The Kullback–Leibler
divergence of SKL ≈ 0.2 corresponds
to a change of the underlying MD simulation potential energy function
of SKLkBT = ∫dx p(opt)(x)[U(opt)(x) – U(x)] ≈ 0.5 kJ/mol on average.[39] Conversely,
this sensitivity also highlights the intricacies of the free energy
landscape of disordered proteins, where subtle shifts in the energetics
result in appreciable changes in conformation.[85]We explored possible effects of the fluorescent dyes
by generating RHCG models with dyes attached. For these models, we
calculated the mean FRET efficiency and compared it directly to the
experimental measurement (Figure S3C).
We found that an even smaller force field correction of 0.35 kJ/mol
on average[39] would be sufficient to achieve
full consistency of the ensemble means (Figure S3D). Reweighting according to the FRET data changes the RG from 39.4 Å (RHCG) to 37.4 Å (RHCG*),
and with explicit dye models from 40.1 Å (RHCG+dyes) to 39.1
Å (RHCG+dyes*), respectively.The scaling exponent of 0.56
inferred from the SAW-ν model[72] is
close to the value of an excluded-volume
chain. The tau K18 segment is thus more extended than most moderately
charged disordered IDPs.[21] Interestingly,
the transfer efficiency and average distance between the Cys residues
of tau K18 from single-molecule FRET are virtually independent of
salt concentration (Figure S3C), indicating
that the rather pronounced expansion of this segment is not caused
by charge repulsion. The FRET experiments are thus in line with our
modeling, which highlights that local structural preferences along
the chain rather than long-range charge–charge interactions
primarily shape the ensemble of tau K18.
Aggregation-Prone Extended
Structures Feature Prominently in
the Solution Ensemble of Tau K18
Interestingly, a small but
significant fraction of our atomically detailed models feature conformations
of the two aggregation-prone hexapeptide motifs[10] as seen in the high-resolution structures of tau fibrils.[86,87] Chain growth thus captures biologically important structural features.
For the first hexapeptide motif 275VQIINK280, we found that about 9% of the models are within 1 Å Cα RMSD of a tau fragment fibril structure (PDB: 5V5B(87)) (Figure A,C). A similar fraction of the tau K18 population has local structures
matching that of a fibril from a corticobasal degeneration (CBD) patient
sample[88] (PDB: 6TJO). The fraction of our ensemble that closely
matches the experimental structures (Figures B and S13) is
clearly larger than what would be expected for a random six amino
acid segment. For the second hexapeptide motif 306VQIVYK311, we also found that about 8% of the models are within 1.0
Å Cα RMSD of the X-ray structure (PDB: 2ON9(86)) (Figure B,D), about 2.5 times more than what would be expected for random
hexapeptide segments. We found similar consistency for the second
hexapeptide motif with the structures of tau fibrils (Figure S13), as formed in Alzheimer’s
disease (PDB: 5O3O,[89]5O3T,[89]6HRE,[90]6HRF[90]), CBD (6TJO,[88]6VI3[91]), Pick’s disease (6GX5[92]), and chronic
traumatic encephalopathy (6NWP[93]). Experiments on tau
K18 in solution suggest that these motifs should be partially in extended
conformations, consistent with our ensemble.[16,27]
Figure 3
RHCG
ensembles feature the extended conformations seen in high-resolution
structures of tau fibrils. (A,B) 275VQIINK280 and (C,D) 306VQIVYK311 hexapeptide motifs
are compared to their experimental structures in tau fibrils. (A,C)
Five RHCG structures (Cα RMSD < 0.5 Å) from
RHCG are superimposed on the respective experimental structure (gray,
PDB: 5V5B and 2ON9). (B) Cumulative
distribution of RMSD to experimental structure. For reference, the
gray line shows the distributions obtained for the RMSD between 50 000
randomly chosen six amino-acid segments in our model ensembles and
the motifs in 5V5B. (D) Cumulative distribution of RMSD to experimental
structure. For reference, the gray line shows the distribution of
the RMSD between randomly chosen six amino-acid segments and the hexapeptide
motif in the fibril (PDB: 2ON9).
RHCG
ensembles feature the extended conformations seen in high-resolution
structures of tau fibrils. (A,B) 275VQIINK280 and (C,D) 306VQIVYK311 hexapeptide motifs
are compared to their experimental structures in tau fibrils. (A,C)
Five RHCG structures (Cα RMSD < 0.5 Å) from
RHCG are superimposed on the respective experimental structure (gray,
PDB: 5V5B and 2ON9). (B) Cumulative
distribution of RMSD to experimental structure. For reference, the
gray line shows the distributions obtained for the RMSD between 50 000
randomly chosen six amino-acid segments in our model ensembles and
the motifs in 5V5B. (D) Cumulative distribution of RMSD to experimental
structure. For reference, the gray line shows the distribution of
the RMSD between randomly chosen six amino-acid segments and the hexapeptide
motif in the fibril (PDB: 2ON9).
The Solution Ensemble Contains
the Functional Conformations
of Tau in Complex with Microtubules
We found that a considerable
fraction of WT tau K18 adopts locally compact turn-like structures
(Figure A–C).
Similar turn-like structures have been resolved by NMR transfer NOESY
experiments probing the conformations of microtubule-bound tau,[57] with an O(300)–N(303) distance below
4 Å in 18 out of the 20 structures in the NMR ensemble (PDB: 2MZ7; see Figure B). In the WT RHCG ensemble,
15% of structures of the 300VPGGG304 segment
are within 1 Å Cα RMSD of the closest representative
of the NMR ensemble (Figure A). This indicates that tau samples the turn-like structures
of the microtubule-bound form also free in solution.
Figure 4
Tau P301 mutations favor
more extended local structures. (A) Cumulative
distributions of the minimum Cα RMSD of 300VPGGG304 to the closest representative of the NMR ensemble
of microtubule-bound structures.[57] Results
are shown for the RHCG ensembles of WT tau K18 and for the HCG ensemble
of the P301L, P301S, and P301T variants. (B) Five representative structures
of 300VPGGG304 from RHCG (oxygen: red; nitrogen:
blue; carbon: cyan; Cα RMSD < 0.5 Å) are
superimposed on a representative of the NMR structural ensemble (gray
sticks, PDB: 2MZ7, structure 17). Tubes indicate the amino acid backbone. The O(300)–N(303)
hydrogen bond is indicated by the blue dashed line. (C) Cumulative
distributions of O(V300)–N(G303) distances for WT tau K18 from
RHCG compared to P301L, P301S, and P301T tau K18 variants from HCG.
(D) Representative local structures of WT tau K18. (E) Representative
local structures of the P301L variant. (F) Local structures of P301S.
(G) Local structures of P301T. In (D)–(G), the structures were
aligned on residues 300 and 301. Tubes indicate the backbone. Side-chain
heavy atoms, amide nitrogen, and Cα of residue 301
are shown as sticks (oxygen: red; nitrogen: blue; carbon: cyan).
Tau P301 mutations favor
more extended local structures. (A) Cumulative
distributions of the minimum Cα RMSD of 300VPGGG304 to the closest representative of the NMR ensemble
of microtubule-bound structures.[57] Results
are shown for the RHCG ensembles of WT tau K18 and for the HCG ensemble
of the P301L, P301S, and P301T variants. (B) Five representative structures
of 300VPGGG304 from RHCG (oxygen: red; nitrogen:
blue; carbon: cyan; Cα RMSD < 0.5 Å) are
superimposed on a representative of the NMR structural ensemble (gray
sticks, PDB: 2MZ7, structure 17). Tubes indicate the amino acid backbone. The O(300)–N(303)
hydrogen bond is indicated by the blue dashed line. (C) Cumulative
distributions of O(V300)–N(G303) distances for WT tau K18 from
RHCG compared to P301L, P301S, and P301T tau K18 variants from HCG.
(D) Representative local structures of WT tau K18. (E) Representative
local structures of the P301L variant. (F) Local structures of P301S.
(G) Local structures of P301T. In (D)–(G), the structures were
aligned on residues 300 and 301. Tubes indicate the backbone. Side-chain
heavy atoms, amide nitrogen, and Cα of residue 301
are shown as sticks (oxygen: red; nitrogen: blue; carbon: cyan).
Chain Growth Captures the Effect of Mutations
Toward Aggregation-Prone
Structures
The PGG motifs at the end of each repeat favor
turn-like structures.[94] We expect that
mutations of the prolines shift the local structure away from turns.
To test the effect of mutations at the 301 position, we considered
the frontotemporal dementia with Parkinsonism-linked to chromosome
17 (FTDP-17) mutations P301L, P301S and P301T. Mutations of P301 have
been shown to strongly promote tau aggregation[9,10] and
are used in mouse models of tauopathies.[8,9]In our
hierarchical modeling, the P301L, P301S, and P301T variants consistently
form more extended structures than WT (Figure C,D), both in ensembles of full-length tau
K18 (Figure C,E,F,G)
and in fragment MD simulations (Figure S14). This loss of turn-like structures is indicated by a more than
2-fold reduction in the fraction of O–N distances < 4 Å
between V300 and G303. The P301L mutation has been studied in detail
by NMR and biophysical experiments.[11] The
shift from turns to extended structures in our P301L ensemble is in
line with smaller 15N chemical shift values for K298, H299,
and V300 in P301L tau K18.[11]The
shift from turns to extended structures rationalizes the enhanced
aggregation propensity of tau P301L in vitro[10,12] because extended structures predominate in fibrils. Locally more
extended structures in the mutant proteins facilitate intermolecular
contacts between tau chains and subsequent assembly and aggregation
via intermolecular β-sheets. The shift to extended structures
seen here also explains why P301L tau binds less strongly to microtubules.[11,95] In a population-shift mechanism, P301L, P301S, and P301T mutations
thus appear to decrease the fraction of tau with locally compact turn
structures, which are competent to bind to microtubules and to increase
the fraction of aggregation-prone extended structures (Figures and S11). The combination of these two effects may render P301 mutations
deleterious both with respect to a loss in function and an increased
tendency to form disease-associated fibrils.
Figure 5
P301 mutations shift
the balance from functional to aggregation-prone
conformations. Turn conformations (bottom) are required for functional
microtubule binding (left), whereas extended conformations (top) are
associated with aggregation and the formation of pathogenic fibrils
(right). In the wild-type ensemble (P301; bottom), turn-like structures
predominate. By contrast, extended structures are significantly populated
in the mutant ensemble (P301L; top). The zoom-ins on the right show
representative backbone traces around amino acid 301 as tubes.
P301 mutations shift
the balance from functional to aggregation-prone
conformations. Turn conformations (bottom) are required for functional
microtubule binding (left), whereas extended conformations (top) are
associated with aggregation and the formation of pathogenic fibrils
(right). In the wild-type ensemble (P301; bottom), turn-like structures
predominate. By contrast, extended structures are significantly populated
in the mutant ensemble (P301L; top). The zoom-ins on the right show
representative backbone traces around amino acid 301 as tubes.According to chemical shift mapping, the P301L/P301S/P301T
mutations
do not significantly alter the overall structure of tau.[11] Whereas the tendency to form aggregation-prone
extended structures at position 301 more than doubles (see Figure ), the absolute increase
in the extended population is small (<15%) and confined locally
to the turn region. The change in the calculated radius of gyration
compared to WT is small, ∼0.2 Å, and thus within the uncertainty
of both calculations and measurements. The same limitation applies
to the mean Cα–Cα distances
of the fluorophore labeled residues, which change by only ∼0.1–0.3
Å.
Conclusions
We showed that reweighted
hierarchical chain growth captures both
the local and the global structures of tau K18. Locally, NMR Cα chemical shifts were reproduced within the expected
uncertainties without any fitting. The agreement was improved further
with only a gentle Bayesian ensemble refinement against NMR chemical
shift data. Globally, the tau K18 chains assembled in this way reproduced
SAXS, FRET, and NMR RDC measurements and thus captured the overall
shape, dimension, and changes in orientation. In addition, the FRET
experiments showed that the extension of tau K18 is insensitive to
varying salt concentration unlike other disordered proteins.[58] The global structure of tau K18 thus emerged
from its local structure in the sense that the ensembles of global
chain structures built by combining short peptide fragments capture
the measured global structural properties with good accuracy.Fragment assembly and coil models have proved highly successful
in the modeling of disordered proteins.[24−26,28,36,52,60,79] The quality
of the ensemble models can be improved even further by integrating
experimental data.[26,35] In BioEn,[39,40] the data enter through a χ2 term. The summed squared
error χ2 of the models often grows roughly linearly
with chain length, e.g., because of systematic errors in the force
field used to generate the fragment models. As a result, the relative
weights of the assembled chains in a refined ensemble will vary widely.
The overlap between the ensemble of assembled chains and the final
ensemble, as measured by exp(−SKL), then decreases exponentially with increasing chain length, and
ensemble refinement becomes increasingly inefficient.Reweighted
hierarchical chain growth is an importance sampling
procedure designed to address this problem by producing evenly weighted
ensembles. By applying a bias already during chain assembly, we ensure
that the assembled chains have near-uniform weights in the final ensemble.
A poorly designed importance sampling scheme would produce ensembles
with an uneven weight distribution, as indicated by a high value of SKLbias in eq . By using
hierarchical chain growth[52] and correcting
for any bias in the assembly process in a formally rigorous manner
using a form of Bayesian ensemble refinement, BioEn,[39] we ensure further that the final ensemble is well-defined
and independent of arbitrary choices in the assembly process, such
as the strength of the bias in fragment selection or the direction
of chain growth.In practice, RHCG may only be a starting point
for further investigations
and improvements. For instance, representative structures can be used
as seeds for MD simulations of the full-length protein.[52] By drawing conformations according to the BioEn
weights, one can systematically select subensembles that are consistent
with the available experimental data. If BioEn[39,40] indicates that entire regions of configuration space require large
changes in weights, up or down, one may need to bias chain growth
accordingly or may have to use different or improved simulation force
fields.[96]The tau K18 ensembles obtained
by reweighted hierarchical chain
growth revealed how patient-associated mutations shift the balance
from protein function to disease. In modeling the effect of mutations,
we took advantage of a chemically informed description[79,97−104] of the disordered tau protein. We found that, already free in solution,
the microtubule-interacting regions of tau K18 populate local structures
as observed in the microtubule-bound state by NMR. Also consistent
with conformational selection, we found that a comparable fraction
of free tau K18 chains exhibits local structures as observed in pathogenic
tau fibrils. We could further show that the disease-associated mutations
P301L, P301S, and P301T shift the balance away from the microtubule-bound
local turn structures toward the fibril-associated extended structures
(Figure ). Such shifts
can have dramatic effects on the kinetics of aggregation[105] by lowering the barrier to nucleation. Indeed,
a shift to extended structures was recently reported to be associated
with fibril formation in tau condensates.[106] The emergence of global structure from local structure thus extends
beyond chain shape, dimension, and orientation to the competition
between tau’s role as microtubule-bound regulator of cellular
transport and as fibril-forming driver of neuropathologies.
Authors: Andrea Cavalli; Xavier Salvatella; Christopher M Dobson; Michele Vendruscolo Journal: Proc Natl Acad Sci U S A Date: 2007-05-29 Impact factor: 11.205
Authors: Mikayel Aznauryan; Leonildo Delgado; Andrea Soranno; Daniel Nettels; Jie-Rong Huang; Alexander M Labhardt; Stephan Grzesiek; Benjamin Schuler Journal: Proc Natl Acad Sci U S A Date: 2016-08-26 Impact factor: 11.205
Authors: João M C Teixeira; Zi Hao Liu; Ashley Namini; Jie Li; Robert M Vernon; Mickaël Krzeminski; Alaa A Shamandy; Oufan Zhang; Mojtaba Haghighatlari; Lei Yu; Teresa Head-Gordon; Julie D Forman-Kay Journal: J Phys Chem A Date: 2022-08-28 Impact factor: 2.944
Authors: Justus F Gräf; Ivan Mikicic; Xiaofei Ping; Claudia Scalera; Katharina Mayr; Lukas S Stelzl; Petra Beli; Sebastian A Wagner Journal: iScience Date: 2022-08-09