Literature DB >> 35373198

Global Structure of the Intrinsically Disordered Protein Tau Emerges from Its Local Structure.

Lukas S Stelzl^1,2,3,4, Lisa M Pietrek¹, Andrea Holla⁵, Javier Oroz^6,7, Mateusz Sikora^1,8, Jürgen Köfinger¹, Benjamin Schuler^5,9, Markus Zweckstetter^6,10, Gerhard Hummer^1,11.

Abstract

The paradigmatic disordered protein tau plays an important role in neuronal function and neurodegenerative diseases. To disentangle the factors controlling the balance between functional and disease-associated conformational states, we build a structural ensemble of the tau K18 fragment containing the four pseudorepeat domains involved in both microtubule binding and amyloid fibril formation. We assemble 129-residue-long tau K18 chains with atomic detail from an extensive fragment library constructed with molecular dynamics simulations. We introduce a reweighted hierarchical chain growth (RHCG) algorithm that integrates experimental data reporting on the local structure into the assembly process in a systematic manner. By combining Bayesian ensemble refinement with importance sampling, we obtain well-defined ensembles and overcome the problem of exponentially varying weights in the integrative modeling of long-chain polymeric molecules. The resulting tau K18 ensembles capture nuclear magnetic resonance (NMR) chemical shift and J-coupling measurements. Without further fitting, we achieve very good agreement with measurements of NMR residual dipolar couplings. The good agreement with experimental measures of global structure such as single-molecule Förster resonance energy transfer (FRET) efficiencies is improved further by ensemble refinement. By comparing wild-type and mutant ensembles, we show that pathogenic single-point P301L, P301S, and P301T mutations shift the population from the turn-like conformations of the functional microtubule-bound state to the extended conformations of disease-associated tau fibrils. RHCG thus provides us with an atomically detailed view of the population equilibrium between functional and aggregation-prone states of tau K18, and demonstrates that global structural characteristics of this intrinsically disordered protein emerge from its local structure.

Entities: Chemical

Year: 2022 PMID： 35373198 PMCID： PMC8970000 DOI： 10.1021/jacsau.1c00536

Source DB: PubMed Journal: JACS Au ISSN： 2691-3704

Introduction

Intrinsically disordered proteins (IDPs) are enriched in the proteomes of higher eukaryotes, where they perform essential functions.[1−3] In healthy neurons, the paradigmatic IDP tau binds and stabilizes microtubules.[1] In diseased neurons, tau loses the ability to bind to microtubules and forms the toxic aggregates associated with Alzheimer’s and other neurodegenerative diseases.[2] Hyperphosphorylation of tau correlates with the progression of Alzheimer’s disease. Tau has recently been shown to form biomolecular condensates.[3−6] Dysregulation of the formation of biomolecular condensates by mutations[7] and aberrant post-translational modifications such as phosphorylation[4,7] may underlie the pathogenicity of tau. Some tau mutations, e.g., P301L and P301S, show drastic effects in patients and are used in mouse models of tauopathies.[8,9] The conformational dynamics of tau around P301 may play a direct role in modulating the aggregation of tau in disease,[10−12] as studied also by molecular dynamics (MD) simulations of tau fragments.[12] Efforts to gain a clearer picture of the local conformational dynamics of tau promise a deeper understanding of its roles in health and disease. The challenges in resolving structural ensembles of IDPs ask for an integrative approach.[13] Important progress in dealing with the high flexibility of disordered biomolecules has been made using nuclear magnetic resonance (NMR) spectroscopy,[14−17] solution X-ray scattering (SAXS),[18] and single-molecule Förster resonance energy transfer (FRET).[19−23] To harness the full power of these experiments and interpret the data in detail, the construction of ensembles of structures[24−32] has proved to be a powerful strategy, especially for the interpretation of NMR experiments and the combination of multiple experimental methods.[31,33,34] For instance, Borgia et al.[32] combined data from single-molecule FRET, SAXS, dynamic light scattering, and fluorescence correlation spectroscopy with MD simulations to characterize the ensembles of a marginally stable spectrin domain and the IDP ACTR over a broad range of solution conditions. Gomes and co-workers[35] recently described an ensemble of the disordered N-terminal region of the Sic protein, obtained by integrating different combinations of SAXS, single-molecule FRET and NMR experiments using the ENSEMBLE approach.[36] Structural ensembles obtained from computational modeling can be combined with experimental data by using Bayesian and maximum entropy ensemble refinement methods.[29,37−45] The Bayesian formulation accounts naturally for uncertainties in the measurements, the model used to generate the ensemble, and the calculation of observables from the ensemble members.[39] Input ensembles[46] are obtained, e.g., from MD simulations[44,47−49] or chain growth,[26,28,50−53] and are then minimally modified to account for the experimental observations. However, for long protein or nucleic acid chains, it is difficult to create initial ensembles that have sufficient overlap with the final ensemble for reliable ensemble refinement. For experimental data that report on the local structure along the chain of a disordered protein, we expect that cumulative systematic errors in the MD force field will cause the summed squared error χ2 between model and experiment to grow linearly with the length of the chain. As a consequence, the overlap between input and final ensemble deteriorates exponentially as the chain grows in length. Consequently, for long IDPs, only a few chains will tend to dominate the ensemble after refinement, with the rest of the large ensemble being mostly irrelevant. The problem of poor overlap between the initial and final ensemble can be overcome by applying a bias already in the generation of the initial ensemble, e.g., by imposing restraints directly on observables or related quantities in the initial MD simulations. The use of chemical shifts and other NMR data in the structural modeling of flexible systems has a long and productive history. Approaches based on fragment selection proved particularly powerful.[54−56] Protocols have been developed that combine biased fragment choice with corrections to remove the biases introduced.[42] In an early combination of biased chain growth with Bayesian weighting applied to tau K18,[28] overlapping peptide fragments were stitched together. Fragment selection was biased to double the radius of gyration in an otherwise overly compact ensemble. Steric clashes were resolved by energy minimization in implicit solvent, and high-energy structures were randomly removed in a pruning step. Excellent agreement with NMR observables[27] could be achieved by adjusting the weights of the ensemble members. However, formal and practical questions are raised: how does one incorporate experimental data already during chain growth without compromising the Bayesian framework of ensemble refinement, where such information would normally be used a posteriori? And how does one ensure that the final ensemble is well-defined and fully reproducible? We will show here that in a Bayesian formulation any bias in ensemble generation can be accounted for fully and quantitatively in a final global refinement step by exploiting the direct connection of ensemble refinement to traditional free energy calculations.[39] Meaningful input ensembles can thus be generated without sacrificing the rigor and reproducibility of the ensemble refinement procedure. We propose reweighted hierarchical chain growth (RHCG) as a general method to integrate data reporting on local structure into models of disordered and flexible polymeric molecules such as disordered proteins or nucleic acids. Protein chains are assembled from fragment structures, as obtained here from MD simulations. As in hierarchical chain growth (HCG),[52] chains with steric clashes are consistently removed in such a way that the resulting ensemble does not depend on arbitrary choices such as the direction of chain growth, N-to-C versus C-to-N. In RHCG, fragment choice is biased according to experiments reporting on the local structure. In a final reweighting step, any resulting bias is then removed. RHCG is thus a form of importance sampling. Using RHCG, we arrive at an integrative model of tau K18 with atomic detail. Tau K18 contains the four pseudorepeat domains R1-R4 involved both in functional binding to microtubules[57] and in forming amyloid fibrils.[10,12] NMR chemical shift data that report on local structure are incorporated already during chain growth. Electrostatic[58] and other interactions between regions distant in sequence can impact the global structure of IDPs. Deviations from random coil behavior can emerge also from local residual structure.[29] For tau K18, it is not clear a priori how its local and global structure are shaped. We show that the RHCG ensembles also capture the global structure of tau K18, as probed by NMR, RDC, single-molecule FRET, and SAXS measurement. The global structure of tau K18 is thus determined to a significant degree by its local structure. By comparing wild-type (WT) and mutant sequences, we provide a molecular view of possible differences between tau in a healthy cell and tau with pathogenic mutations. Our modeling of tau K18 reveals turns as in microtubule-bound states and extended structures as in tau fibrils. We found that pathogenic single-point P301 mutations shift the equilibrium from the former to the latter, emphasizing the close connection between functional forms of tau in solution and the fibrillar structures in tau-associated pathologies.

Theory

Bayesian Ensemble Refinement of Polymeric Molecules

We combine molecular simulations with ensemble refinement to create ensembles of proteins or nucleic acids that faithfully reflect the distribution of conformations in experiment. To create an initial ensemble, we adapt the hierarchical chain growth (HCG) method introduced recently,[52] as described in detail below. We then use Bayesian Inference of Ensembles (BioEn)[39] to adjust the weights of the individual ensemble members according to the experimental data, e.g., NMR chemical shifts. BioEn ensemble refinement minimally adjusts the vector = (w1, ..., w) of normalized weights of individual chains c = 1, ..., C in the ensemble to match the experimental data. We define a posterior P(|data, I) as a function of the weights ,with P0 (|I) being the prior and P (data|, I) being the likelihood. Here, I denotes background information, e.g., that we model polymeric molecules with internal structure. The BioEn maximum-entropy prior[38] is given byθ is a hyperparameter that controls the strength of the entropy regularization and thus expresses our confidence in the initial ensemble of chains.[39]SKL is the Kullback–Leibler (KL) divergencewhich reports how close the normalized refined weights w are to the normalized reference weights w0. Assuming Gaussian uncorrelated errors, the likelihood is withThe first sum is over the different experimental observations i = 1, ..., Mdata with measured values Y, and the second sum is over the ensemble members c = 1, ..., C. For each chain c and observable i, we use a forward model to compute individual observations y. The error σ2 is the sum of the squared standard errors of the measurements Y and the forward calculations y. In applications of BioEn to long biopolymers, small but systematic weight corrections at the monomer level can add up to large corrections overall. For NMR chemical shifts, for instance, the sum over i in eq corresponds to a sum over residues. As a result, the χ2 statistic is extensive; i.e., it tends to grow linearly with the length of the chain. Reweighting of assembled chains thus becomes progressively more challenging as the length of the chain grows (i.e., for chains with more fragments). The reason is that it becomes progressively unlikely that all fragments in an assembled chain occupy the relevant subspace with proper weight. As a result, chains will contribute with very uneven weights after BioEn reweighting. In other words, a few chains will dominate, and the rest of the large ensemble will be more or less irrelevant.

Reweighted Hierarchical Chain Growth

We address the problem of poor overlap between initial and final ensemble by using importance sampling. In MD simulations of complete biopolymer chains, bias potentials could be introduced, acting for instance on the torsion angles to better match NMR chemical shifts or J-couplings. Here, we focus instead on fragment-based chain growth. The key idea is to grow chains by using fragment libraries that have already been biased to enrich the ensemble with members of high weight, and then to correct for this biased choice of fragments in a final reweighting step. If the bias weights were chosen perfectly, the final step would give each chain equal weight. In RHCG, we adapt HCG[52] to assemble polymer chains from fragments. At each of the N positions, fragments are picked at random from a fragment library and then combined by superimposition of residues at their termini with the equivalent residues in the adjacent fragments. Any models with steric clashes are discarded. In HCG, all fragments have equal weight; in RHCG, the fragments in the library {i} (with F being the number of fragments created at position n) are picked according to a weight w normalized to Σw = 1 for all n. These weights have to be chosen appropriately, as described below, and constitute our initial guess as to how likely a particular fragment is in the final reweighted ensemble of chains. The probability p[ ] for a particular chain c to be created in this way is given by the product of weights for each of its fragments,where f ∈ {1,...,F} is the index of fragment n in chain c. Here, we construct the fragment libraries from MD simulations of short overlapping blocked peptides. Alternatively, fragment libraries can be constructed from MD simulations of full-length chains that are then broken up into overlapping segments and reassembled by chain growth. A similar approach has recently been used to explore the flexibility of the SARS-CoV-2 spike stalk.[59] Fragment libraries can also be built from experimentally resolved structures with appropriately defined weights. We used NMR chemical shifts to bias the fragment choice. The weights of the fragments w were determined with BioEn applied to the fragment library at position n with a confidence parameter θ. This confidence parameter was chosen to produce nearly uniform weights w of the assembled chains after a global BioEn reweighting (Figure S1C). Importantly, there is no issue of circularity because the bias applied during chain growth is fully accounted for, as described in the following section.

BioEn Reweighting of Assembled Chains

After the biased assembly of an ensemble of chains, we use BioEn[39,40] to correct for the bias in chain growth and to reweight the entire ensemble globally. To correct for the bias in chain assembly, chain c enters the global BioEn refinement with a relative weight proportional to the reciprocal of the bias probability, , with which its fragments were selected. Normalization of these relative weights gives usor, expressed more compactly in terms of reciprocal weight factors,where the sum extends over the C chains of the ensemble. To the ensemble with these initial weights, we then apply BioEn reweighting, using as a reference experimental data reporting on local or global structural properties.

Chain Growth with Nonbonded Interactions beyond Steric Repulsion

Fragment assembly can, in principle, be extended to account for nonbonded interactions beyond steric repulsion to account, e.g., for electrostatic interactions between fragments.[60] This can be accomplished by using a free energy function G(f1,..,f) that describes the interfragment interactions in chain c and can be calculated from an implicit solvent model or, by free energy calculations, from explicit solvent models. Chains c assembled from fragments f1 ,...,f are then weighted by an additional factor exp[−βG(f1 ,..,f)] with 1/β = kBT and kB being the Boltzmann constant and T being the absolute temperature. In the Bayesian formulation, the normalized reference weight of chain c in an ensemble of C chains then becomesTo sample efficiently from this distribution, one can again use importance sampling by performing hierarchical assembly[52] with biased fragment selection. If, as above, w is the bias weight factor to choose fragment f at position n, then eq becomesHere, we use only excluded volume interactions, which amounts to exp(−βG) = 1 for chains without interfragment steric clashes and exp(−βG) = 0 with clashes.

Assessment of Importance Sampling

In ideal importance sampling, we would grow chains of equal relative importance. Global BioEn reweighting would then give each member of the resulting ensemble equal weight, w = 1/C. We use the KL divergence of the BioEn-optimized weights w from ideal importance sampling to assess the effectiveness of our bias in chain growth:If SKLbias ≲ 1, the overlap between the ensembles produced by biased chain growth and after BioEn refinement is large; conversely, if SKLbias ≫ 1, the chain growth protocol should be optimized. We use SKLbias also to choose the confidence parameter θ quantifying the strength of the bias in fragment choice during RHCG. As illustrated in Figure S1C, SKLbias as a measure of weight uniformity is minimal for a range of θ values given a confidence parameter θ in the global BioEn ensemble reweighting.

Methods

Hierarchical and Reweighted Hierarchical Chain Growth

We generated structural ensembles of tau K18 (residues 244–372) using HCG[52] and RHCG. (RHCG software can be downloaded free of charge at https://github.com/bio-phys/hierarchical-chain-growth.) Tau structures were assembled from 43 pentamer fragments with two residues overlap between subsequent fragments. All fragments had their N and C termini capped by acetyl and N-methyl groups, respectively. The first (N-terminal) fragment started from the last residue outside tau K18, which was then removed in chain assembly. Fragment structures were sampled in all-atom replica exchange molecular dynamics (REMD) with explicit solvent. For each fragment, we used 24 replicas spanning a temperature range of 278–420 K. Each pentamer fragment was simulated for 100 ns as in our previous study.[52] We used structures from the T = 278 K ensemble to assemble tau K18 chains, which corresponds to the temperature of the NMR experiments.[27] To investigate the effect of point mutations at the P301 position, we also sampled fragments with P301 and mutations P301L, P301S and P301T. We repeated fragment simulations for WT P301, P301L, P301S, and P301T fragments with residue 301 at the central position of their respective fragments instead of the second position of its respective pentamer. Since we lack detailed chemical shift information, the P301X mutant chains were assembled with HCG, not RHCG. We note that in all fragment REMD simulations P301 was sampled exclusively as trans isomer. We biased the fragment selection in RHCG according to Cα chemical shifts measured by NMR. At each fragment position n, we performed independent BioEn reweighting[39,40] using the chemical shift data reported for the nonterminal residues in this fragment (Supporting Information (SI) text). A large confidence parameter of θ = 10 ensured improved consistency of the chemical shifts (with the average χ2 across fragments dropping from 0.856 to 0.688) with minimal weight changes (SKLBioEn = 0.004 on average). These local BioEn calculations gave us fragment weight factors w. In numerical tests on comparably small ensembles of 104 chains and with θ = 5 fixed for the global BioEn ensemble reweighting, we found that SKLBioEn was minimal for θ = 5 to 10 (Figure S1C). We then used RHCG to build ensembles of between 2000 and 106 WT tau K18 models from the reweighted fragment libraries. For reference, we also constructed unbiased ensembles of WT tau using HCG[52] with unweighted fragment libraries. HCG was also used to construct tau K18 ensembles of P301 mutants. If not specified otherwise, the results shown are for ensembles of C = 50 000 chains. Following the procedure described in ref (52), we assembled 10000 representatives at each hierarchy level below the final assembly level to sample a high diversity of possible local conformations. At the final level, full-length models were assembled from this pool. The assembly process was trivially parallelized by using different random number seeds. In a final step, the RHCG ensembles were reweighted using BioEn to correct for the biased fragment choice while retaining consistency with the NMR chemical shift data. In this global BioEn reweighting step, the confidence parameter was set to θ = 5 according to an L-curve analysis (SI text and Figure S1A). The resulting ensembles were structurally diverse and, among 50 000 HCG and RHCG structures, did not contain any knots (SI text).

Calculation of Experimental Observables

NMR Secondary Chemical Shifts and J Couplings

For comparison with NMR experiments, we calculated chemical shifts from fragments and full-length structures using SPARTA+.[61] We subtracted random-coil shifts calculated using POTENCI[62] to compare to secondary chemical shifts ΔC. We computed 3JHNHα couplings with the Karplus parameters by Vögeli et al.[63] with the mdtraj Python library.[64]

NMR Residual Dipolar Couplings

RDCs were calculated from the ensembles of full-length structures with PALES[65,66] in the steric alignment mode. Even for random flight polymers, the presence of an ordering medium modeled as a hard surface induces nonzero RDCs.[67] The value DHN( for a particular residue r was calculated by computing the alignment of each chain c in the ensemble with PALES and then taking the average over all structureswhere DHNmax = 21.7 kHz for an idealized amide bond length of 1.04 Å,[68] ϑ( is the angle between the amide bond vector of residue r in chain c, the external magnetic field, P2(x) = (3x2 – 1)/2 is the second-order Legendre polynomial, and ⟨...⟩ denotes an average over the orientations of the chain biased by the alignment.

Small-Angle X-ray Scattering

We used FoXS[69] to calculate SAXS intensity profiles for the tau K18 structures in an ensemble and then calculated the weighted average over the ensemble. In the FoXS calculations, we took the solvation shell into account by setting c2 = 3. The excluded-volume parameter was set to the default value of c1 = 1. Geometric RG values were computed using the MDAnalysis library.[70,71] To compare measured scattering intensities to those predicted for the weighted ensemble, Isim(q), we first estimated an intensity scale factor a and a constant for background correction b by performing least-squares fitting ofto the SAXS intensities with q being the scattering vector. For a regime unaffected by aggregation, q > 0.012 Å–1, the best fit to experiment was achieved with the coefficients a = 1.1 × 10–11 and b = 3.8 × 10–5. For q < 0.012 Å–1, we took possible mild aggregation into account by approximating the scattering intensity including possible aggregates asBy least-squares fitting with fixed a and b, we find an aggregate intensity of c = 0.001 56 and an aggregate size of Ra = 234 Å. The fit to the combined model is shown in Figure S2. An earlier set of scattering data[18] is restricted to q > 0.03 Å–1.

Comparison to Single-Molecule FRET Experiments

We compared Cα–Cα distances extracted from FRET experiments using the SAW-ν polymer model[72] to RHCG models. To quantify the effect of the fluorescent dyes on the distance distribution, we performed additional calculations in which we adapted the RHCG method to add dyes[73] during chain growth (SI text and Figure S3).

Comparison to NMR Paramagnetic Relaxation Enhancement Measurements

NMR paramagnetic relaxation enhancement (PRE) measurements on tau K18 have been previously reported.[74] We computed PREs for the tau K18 ensembles using the PREdict[75] Python library (https://github.com/KULL-Centre/DEERpredict). PREdict adds explicit spin labels to the chains modeled with a rotamer library. The PRE is calculated in the fast-exchange limit with respect to both spin-label and chain dynamics. Details of the PRE calculation are given in the SI text.

Experiments

Single-Molecule FRET Experiments

For the single-molecule FRET experiments, tau K18 was labeled with Alexa Fluor 488 and CF660R at its naturally occurring cysteine residues, C291 and C322 (SI text). The labeled tau K18 was diluted to a concentration of 100 pM in 50 mM sodium phosphate buffer, pH 6.8, 1 mM DTT, 0.001% Tween 20 or 20 mM HEPES, 5 mM KCl, 10 mM MgCl2, pH 7.4, 1 mM DTT, 0.001% Tween 20. The experiments were performed at 295 K on a MicroTime 200 confocal single-molecule instrument (Pico-Quant, Berlin, Germany) as described in detail in the SI text. The SAW-ν model was used to analyze the single-molecule FRET data to extract distances and the polymer properties of tau K18[72] (SI text).

Small-Angle X-ray Scattering Experiments

SAXS data were collected at 298 K from monodisperse samples of K18 ranging from 50 to 67 μM in 20 mM Hepes, 5 mM KCl, 10 mM MgCl2, 1 mM DTT at pH 7.4. Scattering profiles were analyzed with standard procedures using ATSAS.[76] SAXS measurements were performed at DESY (Hamburg, Germany) and Diamond Light Source (Oxford, UK) stations.

Results and Discussion

RHCG Produces a Diverse Ensemble of Tau K18 Chains

During chain assembly, we applied a gentle bias on the fragment choice by using fragment weights from BioEn reweighting against Cα chemical shifts. To correct for the bias, the assembled chains were then reweighted with BioEn, again using the chemical shift data as experimental reference. In this global BioEn reweighting step, the chains were given near-uniform weights w with SKLbias ≪ 1 (Figure S1B). By comparison, the BioEn weights of the HCG ensemble created without bias are less uniform. The resulting ensemble of tau K18 is comprised of highly diverse structures with atomic detail (Figure C). The typical Cα root-mean-square distance (RMSD) between two chains is about 26 Å (Figure S4 and SI text), and backbone dihedral angles are broadly sampled (Figure S5).

Figure 1

The RHCG ensemble reproduces global structural features of tau K18. (A) Comparison of experimental (gray) and predicted (blue) 1H–15N RDCs, which were not used in the construction of the RHCG ensemble (see Table S1 for the amino acid sequence of tau K18). (B) Scatter plot of calculated and measured RDCs. (C) Backbone traces of 30 members of the RHCG ensemble. Zoom-ins show superpositions of 10 representative structures of a turn at position L284-S285 (top) and an extended segment at position Q276-I277 (bottom) with negative and positive RDCs, respectively, as highlighted by shading in panel (A). (D) Comparison of calculated (blue) and experimental SAXS scattering intensity profiles (gray symbols) and from ref (18) (orange dashed line). See Figure S2 for a plot of the low-q regime. Inset: Distribution of RG in the RHCG ensemble. Vertical dashed lines indicate the average RG from RHCG (blue) and experiment[18] (gray; ± SEM shown by shading). (E) Distribution of Cα–Cα distance inferred from FRET experiments using the SAW-ν model[72] (gray), RHCG (blue), and RHCG* (orange). Root-mean-square distances are indicated as (dashed) vertical lines.

RHCG Models of Tau K18 Capture the Average Local Structure of Tau as Reported by NMR

Chemical shifts are accurate reporters of local structure and secondary structure.[16,17,27,29,61,77] Overall, we found that the Cα chemical shifts calculated for the RHCG ensemble of tau K18 are close to random coil values, with secondary chemical shifts ΔC mostly close to zero. Despite the residual amplitude typically being smaller than the error of ≈1 ppm[61] in the forward chemical shift calculation, the models capture important features of the variation of experimental secondary chemical shifts along the tau K18 amino acid sequence, such as a drop in secondary chemical shift going from L285 to V300. HCG without reweighting of the fragment library underestimates the populations of extended and β-strand like structures and overestimates the helical-like conformations. Going from HCG to RHCG, the average residual drops from 0.35 to 0.27 ppm and Pearson’s r for the secondary chemical shifts ΔC of the Cα atoms increases from 0.28 to 0.41. RHCG lowers in particular positive ΔC values, e.g., at the S420 position (Figure S6A,B). In light of the considerable uncertainties in the forward calculation (≈1 ppm) and the small ΔC amplitudes, a lower θ value resulting in an even tighter fit was not justified (Figure S1A). We also calculated NMR 3JHNHα couplings, which report primarily on the ϕ-dihedral angles of the protein backbone. The couplings calculated for our models agree well with the NMR experimental data[27] (Figure S7). Also in terms of 3JHNHα couplings, which were not used in the RHCG procedure, RHCG somewhat improves the representation of the local structures over HCG, as reflected by the increase of Pearson’s r from 0.59 to 0.62. The root-mean-squared error dropped from 0.47 Hz (HCG) to 0.41 Hz (RHCG). For reference, the uncertainty of the calculated 3JHNHα couplings has been estimated at ∼0.9 Hz.[78] We do not expect a more significant improvement because the 3JHNHα coupling is sensitive primarily to the ϕ backbone torsion, whereas the Cα chemical shift used in RHCG is particularly sensitive to the ψ backbone torsion. Indeed, even for a simple Ala pentapeptide we found small but systematic differences between a state-of-the-art force field and 3JHNHα couplings.[40] Overall we conclude that reweighting in fragment assembly alleviates the small but systematic deviations caused by small imbalances in state-of-the-art force fields used to generate fragment libraries. As a result, the local structure of the tau K18 chains produced by RHCG is more consistent with NMR chemical shift and J-coupling experiments.

The RHCG Ensemble of Tau K18 Reproduces the Experimental NMR Residual Dipolar Couplings

We calculated the RDCs for the assembled tau K18 chain using the steric alignment mode of PALES,[66] and then averaged the RDC values over the ensemble with the respective weight of the chain. The measured[27] and calculated RDCs agree remarkably well and capture both the signature as a function of position along the chain (Figure A) and the magnitude at individual residue positions (Figure B). Without further fitting, we obtained Pearson r correlation coefficients of 0.73 for RHCG and 0.70 for HCG for tau K18 ensembles of 50 000 models. This consistency not only validates the ensemble but also gives direct insights into the interpretation of the RDCs measured for IDPs. RDCs inform on how restricted a chain is locally, with larger absolute RDCs expected for more restricted segments than for fully flexible segments.[15] The RDC DHN ∝⟨P2(cos(θ))⟩ reports on the relative orientation of an amide bond vector with respect to the magnetic field. Changes in the sign of the measured RDCs have been interpreted as changes in the direction of the protein backbone.[27] Our conformational ensemble reproduces the four changes in the sign of DHN found in experiments.[27] Importantly, as highlighted for the region centered on L284-S385 in Figure C, our structures on average trace a turn in the region where the sign changes, as indicated by a shortened distance across the four-residue segments (Figure S8). By contrast, in regions such as Q276-I277, where the sign of DHN does not change, our structures do not show a preference in the chain direction and scatter around an average straight chain (Figure C). We note that simple polymeric models that ignore amino acid chemistry and the correlations between subsequent residues tend not to capture the trends in the experimental RDCs, as previously noted.[15,27,79]

Residual Dipolar Coupling Calculations Require Large Ensemble Sizes

The need for large ensembles has been highlighted before.[26] Building large ensembles relies on the possibility to quickly generate statistically independent atomically detailed models of IDPs. The RDC values predicted for particular residues in our models are widely and asymmetrically distributed with a range of about ±25 Hz (Figure A). By contrast, the experimental average is roughly in the range of −5 to 10 Hz (Figure A). As a result, RDCs calculated from small ensembles are biased (Figure B). We found that relatively large ensembles of ≥10 000 tau K18 chains are needed to get converged RDC values (Figure B). We found in particular that Pearson’s r correlation coefficient improved with increasing ensemble size. The ensemble-size dependence is similar for RHCG and HCG, even if the RHCG ensemble consistently performs somewhat better than the HCG ensemble (Figures D, 2B,C, and S9).

Figure 2

Large ensembles are required to capture NMR RDC measurements. (A) Distribution of 1H–15N RDC values for L285 in RHCG ensembles of different size, as calculated by PALES[66] without rescaling. (B) Average 1H–15N RDC for L285 in dependence of the ensemble size for HCG (dark green squares) and RHCG (blue circles). Error bars indicate ± SEM. (C) Ensemble-size dependence of Pearson r correlation coefficient between tau K18 1H–15N RDC measurements[27] and calculations from RHCG (blue circles) and HCG (green squares), respectively.

RDCs from Short Chain Segments

In the modeling of RDCs of IDRs, it is frequently assumed that ensembles of short peptide segments of about 15 amino acids contain sufficient structural information to calculate RDCs.[80,81] We tested this assumption by cutting overlapping 15-mer segments out of the BioEn ensemble of full-length tau K18 and then calculating the average RDCs for their central 9 amino acids using a steric alignment.[66] We found that the RDCs calculated for the full ensemble and for the 15-mer segments are highly correlated (r = 0.91; Figure S10). Compared to the NMR RDCs, the correlation coefficient for segments (r = 0.61) is nearly as good as for full-length chains (r = 0.73). In line with earlier findings,[81] we conclude that comparably short peptide segments can indeed be used to model the RDCs of long IDRs such as tau. This finding also makes it possible to use RDC data during chain growth in RHCG. RDCs can be precalculated either directly for fragments of sufficient length or for a library of segments that have been assembled by chain growth. With the precalculated RDCs, subsequent chain growth can be biased to improve the overlap between the initial and BioEn-optimized ensembles of chains. Here, for tau K18, including RDCs in chain growth proved unnecessary because they were predicted accurately without any bias.

The RHCG Ensemble Captures the Extension of Tau K18 in Solution

The RHCG ensemble also captures the size and shape of tau K18 in solution as probed by SAXS measurements (Figure D). The mean scattering profiles calculated from our tau K18 models agree well with the experimental scattering profiles (Figure D), taking possible unspecific aggregation in the low q regime into account. The computed root-mean-square radius of gyration of approximately 39 Å coincides with the experimentally determined R of 38 ± 3 Å.[18] The RHCG ensemble (⟨Rh⟩ = 34 Å) is also consistent with the hydrodynamic radius Rh 34 ± 6 Å, as reported by dynamic light scattering (DLS).[74]Rh was computed from the RHCG ensemble using an empirical approach.[82,83] Our RHCG ensemble agrees quite well with previously reported NMR paramagnetic relaxation enhancement (PRE) measurements[74] (Figure S11), which were not used in the generation of our ensembles. Spin-label dynamics were modeled with a rotamer-library approach.[75] The overall shapes of the experimental profiles measured for four different spin-labels[74] were captured without any refinement.[46] However, a fully quantitative comparison is challenging because of the sensitive dependence of the PRE on infrequent close contacts between proton and spin-label in the fast-exchange regime. As a result, the calculated PRE profiles are noisy and, without weight adjustments, tend to underestimate the actual PRE for residues and labels close in sequence. The good agreement with SAXS, dynamic light scattering, and NMR measurements suggests that the RHCG ensemble captures the global conformational properties of tau K18 in solution quite well without further refinement. However, BioEn reweighting of the spin-label rotamers[46] used to calculated the PRE and possibly also the chains should address some of the challenges in calculating PREs of disordered proteins.

Structure of tau K18 as Assessed by Single-Molecule FRET

Comparison to single-molecule FRET experiments suggests that our RHCG models are somewhat too extended (Figure E), with longer Cα–Cα distances in the RHCG ensemble than those extracted from the FRET experiments.[45] This initial analysis of the FRET data with a commonly used polymer model[72] provides a valuable check on the validity of more involved comparisons with explicit representations of dyes.[45,73,84] In a BioEn calculation, we found that already a small adjustment of the RHCG chain weights suffices to match the mean distance deduced from FRET perfectly (RHCG* in Figure S3D and Table S2). The resulting RHCG* ensemble agrees as well with experiment as the RHCG ensemble in terms of the SAXS measurements, and slightly worse in terms of NMR RDC and PRE measurements (Figure S12 and Table S2). The Kullback–Leibler divergence of SKL ≈ 0.2 corresponds to a change of the underlying MD simulation potential energy function of SKLkBT = ∫dx p(opt)(x)[U(opt)(x) – U(x)] ≈ 0.5 kJ/mol on average.[39] Conversely, this sensitivity also highlights the intricacies of the free energy landscape of disordered proteins, where subtle shifts in the energetics result in appreciable changes in conformation.[85] We explored possible effects of the fluorescent dyes by generating RHCG models with dyes attached. For these models, we calculated the mean FRET efficiency and compared it directly to the experimental measurement (Figure S3C). We found that an even smaller force field correction of 0.35 kJ/mol on average[39] would be sufficient to achieve full consistency of the ensemble means (Figure S3D). Reweighting according to the FRET data changes the RG from 39.4 Å (RHCG) to 37.4 Å (RHCG*), and with explicit dye models from 40.1 Å (RHCG+dyes) to 39.1 Å (RHCG+dyes*), respectively. The scaling exponent of 0.56 inferred from the SAW-ν model[72] is close to the value of an excluded-volume chain. The tau K18 segment is thus more extended than most moderately charged disordered IDPs.[21] Interestingly, the transfer efficiency and average distance between the Cys residues of tau K18 from single-molecule FRET are virtually independent of salt concentration (Figure S3C), indicating that the rather pronounced expansion of this segment is not caused by charge repulsion. The FRET experiments are thus in line with our modeling, which highlights that local structural preferences along the chain rather than long-range charge–charge interactions primarily shape the ensemble of tau K18.

Aggregation-Prone Extended Structures Feature Prominently in the Solution Ensemble of Tau K18

Interestingly, a small but significant fraction of our atomically detailed models feature conformations of the two aggregation-prone hexapeptide motifs[10] as seen in the high-resolution structures of tau fibrils.[86,87] Chain growth thus captures biologically important structural features. For the first hexapeptide motif 275VQIINK280, we found that about 9% of the models are within 1 Å Cα RMSD of a tau fragment fibril structure (PDB: 5V5B(87)) (Figure A,C). A similar fraction of the tau K18 population has local structures matching that of a fibril from a corticobasal degeneration (CBD) patient sample[88] (PDB: 6TJO). The fraction of our ensemble that closely matches the experimental structures (Figures B and S13) is clearly larger than what would be expected for a random six amino acid segment. For the second hexapeptide motif 306VQIVYK311, we also found that about 8% of the models are within 1.0 Å Cα RMSD of the X-ray structure (PDB: 2ON9(86)) (Figure B,D), about 2.5 times more than what would be expected for random hexapeptide segments. We found similar consistency for the second hexapeptide motif with the structures of tau fibrils (Figure S13), as formed in Alzheimer’s disease (PDB: 5O3O,[89]5O3T,[89]6HRE,[90]6HRF[90]), CBD (6TJO,[88]6VI3[91]), Pick’s disease (6GX5[92]), and chronic traumatic encephalopathy (6NWP[93]). Experiments on tau K18 in solution suggest that these motifs should be partially in extended conformations, consistent with our ensemble.[16,27]

Figure 3

RHCG ensembles feature the extended conformations seen in high-resolution structures of tau fibrils. (A,B) 275VQIINK280 and (C,D) 306VQIVYK311 hexapeptide motifs are compared to their experimental structures in tau fibrils. (A,C) Five RHCG structures (Cα RMSD < 0.5 Å) from RHCG are superimposed on the respective experimental structure (gray, PDB: 5V5B and 2ON9). (B) Cumulative distribution of RMSD to experimental structure. For reference, the gray line shows the distributions obtained for the RMSD between 50 000 randomly chosen six amino-acid segments in our model ensembles and the motifs in 5V5B. (D) Cumulative distribution of RMSD to experimental structure. For reference, the gray line shows the distribution of the RMSD between randomly chosen six amino-acid segments and the hexapeptide motif in the fibril (PDB: 2ON9).

The Solution Ensemble Contains the Functional Conformations of Tau in Complex with Microtubules

We found that a considerable fraction of WT tau K18 adopts locally compact turn-like structures (Figure A–C). Similar turn-like structures have been resolved by NMR transfer NOESY experiments probing the conformations of microtubule-bound tau,[57] with an O(300)–N(303) distance below 4 Å in 18 out of the 20 structures in the NMR ensemble (PDB: 2MZ7; see Figure B). In the WT RHCG ensemble, 15% of structures of the 300VPGGG304 segment are within 1 Å Cα RMSD of the closest representative of the NMR ensemble (Figure A). This indicates that tau samples the turn-like structures of the microtubule-bound form also free in solution.

Figure 4

Tau P301 mutations favor more extended local structures. (A) Cumulative distributions of the minimum Cα RMSD of 300VPGGG304 to the closest representative of the NMR ensemble of microtubule-bound structures.[57] Results are shown for the RHCG ensembles of WT tau K18 and for the HCG ensemble of the P301L, P301S, and P301T variants. (B) Five representative structures of 300VPGGG304 from RHCG (oxygen: red; nitrogen: blue; carbon: cyan; Cα RMSD < 0.5 Å) are superimposed on a representative of the NMR structural ensemble (gray sticks, PDB: 2MZ7, structure 17). Tubes indicate the amino acid backbone. The O(300)–N(303) hydrogen bond is indicated by the blue dashed line. (C) Cumulative distributions of O(V300)–N(G303) distances for WT tau K18 from RHCG compared to P301L, P301S, and P301T tau K18 variants from HCG. (D) Representative local structures of WT tau K18. (E) Representative local structures of the P301L variant. (F) Local structures of P301S. (G) Local structures of P301T. In (D)–(G), the structures were aligned on residues 300 and 301. Tubes indicate the backbone. Side-chain heavy atoms, amide nitrogen, and Cα of residue 301 are shown as sticks (oxygen: red; nitrogen: blue; carbon: cyan).

Chain Growth Captures the Effect of Mutations Toward Aggregation-Prone Structures

The PGG motifs at the end of each repeat favor turn-like structures.[94] We expect that mutations of the prolines shift the local structure away from turns. To test the effect of mutations at the 301 position, we considered the frontotemporal dementia with Parkinsonism-linked to chromosome 17 (FTDP-17) mutations P301L, P301S and P301T. Mutations of P301 have been shown to strongly promote tau aggregation[9,10] and are used in mouse models of tauopathies.[8,9] In our hierarchical modeling, the P301L, P301S, and P301T variants consistently form more extended structures than WT (Figure C,D), both in ensembles of full-length tau K18 (Figure C,E,F,G) and in fragment MD simulations (Figure S14). This loss of turn-like structures is indicated by a more than 2-fold reduction in the fraction of O–N distances < 4 Å between V300 and G303. The P301L mutation has been studied in detail by NMR and biophysical experiments.[11] The shift from turns to extended structures in our P301L ensemble is in line with smaller 15N chemical shift values for K298, H299, and V300 in P301L tau K18.[11] The shift from turns to extended structures rationalizes the enhanced aggregation propensity of tau P301L in vitro[10,12] because extended structures predominate in fibrils. Locally more extended structures in the mutant proteins facilitate intermolecular contacts between tau chains and subsequent assembly and aggregation via intermolecular β-sheets. The shift to extended structures seen here also explains why P301L tau binds less strongly to microtubules.[11,95] In a population-shift mechanism, P301L, P301S, and P301T mutations thus appear to decrease the fraction of tau with locally compact turn structures, which are competent to bind to microtubules and to increase the fraction of aggregation-prone extended structures (Figures and S11). The combination of these two effects may render P301 mutations deleterious both with respect to a loss in function and an increased tendency to form disease-associated fibrils.

Figure 5

P301 mutations shift the balance from functional to aggregation-prone conformations. Turn conformations (bottom) are required for functional microtubule binding (left), whereas extended conformations (top) are associated with aggregation and the formation of pathogenic fibrils (right). In the wild-type ensemble (P301; bottom), turn-like structures predominate. By contrast, extended structures are significantly populated in the mutant ensemble (P301L; top). The zoom-ins on the right show representative backbone traces around amino acid 301 as tubes. According to chemical shift mapping, the P301L/P301S/P301T mutations do not significantly alter the overall structure of tau.[11] Whereas the tendency to form aggregation-prone extended structures at position 301 more than doubles (see Figure ), the absolute increase in the extended population is small (<15%) and confined locally to the turn region. The change in the calculated radius of gyration compared to WT is small, ∼0.2 Å, and thus within the uncertainty of both calculations and measurements. The same limitation applies to the mean Cα–Cα distances of the fluorophore labeled residues, which change by only ∼0.1–0.3 Å.

Conclusions

We showed that reweighted hierarchical chain growth captures both the local and the global structures of tau K18. Locally, NMR Cα chemical shifts were reproduced within the expected uncertainties without any fitting. The agreement was improved further with only a gentle Bayesian ensemble refinement against NMR chemical shift data. Globally, the tau K18 chains assembled in this way reproduced SAXS, FRET, and NMR RDC measurements and thus captured the overall shape, dimension, and changes in orientation. In addition, the FRET experiments showed that the extension of tau K18 is insensitive to varying salt concentration unlike other disordered proteins.[58] The global structure of tau K18 thus emerged from its local structure in the sense that the ensembles of global chain structures built by combining short peptide fragments capture the measured global structural properties with good accuracy. Fragment assembly and coil models have proved highly successful in the modeling of disordered proteins.[24−26,28,36,52,60,79] The quality of the ensemble models can be improved even further by integrating experimental data.[26,35] In BioEn,[39,40] the data enter through a χ2 term. The summed squared error χ2 of the models often grows roughly linearly with chain length, e.g., because of systematic errors in the force field used to generate the fragment models. As a result, the relative weights of the assembled chains in a refined ensemble will vary widely. The overlap between the ensemble of assembled chains and the final ensemble, as measured by exp(−SKL), then decreases exponentially with increasing chain length, and ensemble refinement becomes increasingly inefficient. Reweighted hierarchical chain growth is an importance sampling procedure designed to address this problem by producing evenly weighted ensembles. By applying a bias already during chain assembly, we ensure that the assembled chains have near-uniform weights in the final ensemble. A poorly designed importance sampling scheme would produce ensembles with an uneven weight distribution, as indicated by a high value of SKLbias in eq . By using hierarchical chain growth[52] and correcting for any bias in the assembly process in a formally rigorous manner using a form of Bayesian ensemble refinement, BioEn,[39] we ensure further that the final ensemble is well-defined and independent of arbitrary choices in the assembly process, such as the strength of the bias in fragment selection or the direction of chain growth. In practice, RHCG may only be a starting point for further investigations and improvements. For instance, representative structures can be used as seeds for MD simulations of the full-length protein.[52] By drawing conformations according to the BioEn weights, one can systematically select subensembles that are consistent with the available experimental data. If BioEn[39,40] indicates that entire regions of configuration space require large changes in weights, up or down, one may need to bias chain growth accordingly or may have to use different or improved simulation force fields.[96] The tau K18 ensembles obtained by reweighted hierarchical chain growth revealed how patient-associated mutations shift the balance from protein function to disease. In modeling the effect of mutations, we took advantage of a chemically informed description[79,97−104] of the disordered tau protein. We found that, already free in solution, the microtubule-interacting regions of tau K18 populate local structures as observed in the microtubule-bound state by NMR. Also consistent with conformational selection, we found that a comparable fraction of free tau K18 chains exhibits local structures as observed in pathogenic tau fibrils. We could further show that the disease-associated mutations P301L, P301S, and P301T shift the balance away from the microtubule-bound local turn structures toward the fibril-associated extended structures (Figure ). Such shifts can have dramatic effects on the kinetics of aggregation[105] by lowering the barrier to nucleation. Indeed, a shift to extended structures was recently reported to be associated with fibril formation in tau condensates.[106] The emergence of global structure from local structure thus extends beyond chain shape, dimension, and orientation to the competition between tau’s role as microtubule-bound regulator of cellular transport and as fibril-forming driver of neuropathologies.

99 in total

1. Protein structure determination from NMR chemical shifts.

Authors: Andrea Cavalli; Xavier Salvatella; Christopher M Dobson; Michele Vendruscolo
Journal: Proc Natl Acad Sci U S A Date: 2007-05-29 Impact factor: 11.205

2. Defining conformational ensembles of intrinsically disordered and partially folded proteins directly from chemical shifts.

Authors: Malene Ringkjøbing Jensen; Loïc Salmon; Gabrielle Nodet; Martin Blackledge
Journal: J Am Chem Soc Date: 2010-02-03 Impact factor: 15.419

3. NMR: prediction of molecular alignment from structure using the PALES software.

Authors: Markus Zweckstetter
Journal: Nat Protoc Date: 2008 Impact factor: 13.491

Review 4. Computational and theoretical advances in studies of intrinsically disordered proteins.

Authors: Robert B Best
Journal: Curr Opin Struct Biol Date: 2017-03-01 Impact factor: 6.809

Review 5. Single-molecule studies of intrinsically disordered proteins.

Authors: Marco Brucale; Benjamin Schuler; Bruno Samorì
Journal: Chem Rev Date: 2014-01-17 Impact factor: 60.622

6. Comprehensive structural and dynamical view of an unfolded protein from the combination of single-molecule FRET, NMR, and SAXS.

Authors: Mikayel Aznauryan; Leonildo Delgado; Andrea Soranno; Daniel Nettels; Jie-Rong Huang; Alexander M Labhardt; Stephan Grzesiek; Benjamin Schuler
Journal: Proc Natl Acad Sci U S A Date: 2016-08-26 Impact factor: 11.205

7. Extending fragment-based free energy calculations with library Monte Carlo simulation: annealing in interaction space.

Authors: Steven Lettieri; Artem B Mamonov; Daniel M Zuckerman
Journal: J Comput Chem Date: 2010-11-29 Impact factor: 3.376

8. Mutations of tau protein in frontotemporal dementia promote aggregation of paired helical filaments by enhancing local beta-structure.

Authors: M von Bergen; S Barghorn; L Li; A Marx; J Biernat; E M Mandelkow; E Mandelkow
Journal: J Biol Chem Date: 2001-10-17 Impact factor: 5.157

9. Metainference: A Bayesian inference method for heterogeneous systems.

Authors: Massimiliano Bonomi; Carlo Camilloni; Andrea Cavalli; Michele Vendruscolo
Journal: Sci Adv Date: 2016-01-22 Impact factor: 14.136

3 in total

Review 1. Generating Ensembles of Dynamic Misfolding Proteins.

Authors: Theodoros K Karamanos; Arnout P Kalverda; Sheena E Radford
Journal: Front Neurosci Date: 2022-03-31 Impact factor: 4.677

2. IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States.

Authors: João M C Teixeira; Zi Hao Liu; Ashley Namini; Jie Li; Robert M Vernon; Mickaël Krzeminski; Alaa A Shamandy; Oufan Zhang; Mojtaba Haghighatlari; Lei Yu; Teresa Head-Gordon; Julie D Forman-Kay
Journal: J Phys Chem A Date: 2022-08-28 Impact factor: 2.944

3. Substrate spectrum of PPM1D in the cellular response to DNA double-strand breaks.

Authors: Justus F Gräf; Ivan Mikicic; Xiaofei Ping; Claudia Scalera; Katharina Mayr; Lukas S Stelzl; Petra Beli; Sebastian A Wagner
Journal: iScience Date: 2022-08-09

3 in total