Literature DB >> 34817999

Quantitative Description of Intrinsically Disordered Proteins Using Single-Molecule FRET, NMR, and SAXS.

Samuel Naudi-Fabra¹, Maud Tengo¹, Malene Ringkjøbing Jensen¹, Martin Blackledge¹, Sigrid Milles¹.

Abstract

Studying the conformational landscape of intrinsically disordered and partially folded proteins is challenging and only accessible to a few solution state techniques, such as nuclear magnetic resonance (NMR), small-angle scattering techniques, and single-molecule Förster resonance energy transfer (smFRET). While each of the techniques is sensitive to different properties of the disordered chain, such as local structural propensities, overall dimension, or intermediate- and long-range contacts, conformational ensembles describing intrinsically disordered proteins (IDPs) accurately should ideally respect all of these properties. Here we develop an integrated approach using a large set of FRET efficiencies and fluorescence lifetimes, NMR chemical shifts, and paramagnetic relaxation enhancements (PREs), as well as small-angle X-ray scattering (SAXS) to derive quantitative conformational ensembles in agreement with all parameters. Our approach is tested using simulated data (five sets of PREs and 15 FRET efficiencies) and validated experimentally on the example of the disordered domain of measles virus phosphoprotein, providing new insights into the conformational landscape of this viral protein that comprises transient structural elements and is more compact than an unfolded chain throughout its length. Rigorous cross-validation using FRET efficiencies, fluorescence lifetimes, and SAXS demonstrates the predictive nature of the calculated conformational ensembles and underlines the potential of this strategy in integrative dynamic structural biology.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 34817999 PMCID： PMC8662727 DOI： 10.1021/jacs.1c06264

Source DB: PubMed Journal: J Am Chem Soc ISSN： 0002-7863 Impact factor: 15.419

Introduction

Intrinsically disordered proteins (IDPs) play important roles in many biological systems and exert their tasks thanks to their ability to sample conformational ensembles that can have different degrees of compactness and that often comprise transiently folded regions functioning as interaction sites.[1,2] Although IDPs are known to be devoid of stable secondary and tertiary structures, primary structure determines their function and modulates the conformations sampled on a rapid time scale: small motifs can locally enrich the IDP in hydrophobic amino acids, and clusters of charged residues may lead to self-repulsion, thus affecting the properties of the chain.[3−5] Single-molecule Förster resonance energy transfer (smFRET) has demonstrated to be a very powerful tool to access the dimension of the unfolded chain through the measurement of energy transfer between site-specifically attached donor and acceptor fluorophores as a function of their distance.[6,7] The technique is compatible with very large IDPs,[8] covering distances that range from 2 to 10 nm approximately, and structural information can be obtained in the presence of transiently folded or folded domains,[9] in complex environments, and even within the living cell.[10,11] Obtaining quantitative structural insight has, however, remained challenging in particular as the distance between the fluorophores, rather than between their attachment points in the protein backbone, is determined experimentally, and the chemical composition of the dyes and their linkers therefore has to be taken into account in structural modeling. For folded proteins, recent advances have overcome this problem by generating structural models explicitly considering the attached fluorophores mainly through calculation of the volumes that the fluorophores can occupy when attached to a specific site in the protein (accessible volumes, AVs).[12−15] Determination of distances for IDPs suffers from the additional challenge that the measured FRET efficiency (EFRET) describes an ensemble of distances rather than an individual distance, which has frequently been taken into account by assuming the sampling of a Gaussian chain (or other polymer-) distribution between the fluorophores.[16] These distributions can be expressed as a function of the number of amino acids between the attachment points of the fluorophores, and in order to consider the contribution of the fluorophores and their linkers to the measured distance, they are usually assumed to contribute a number of additional residues. Although this approach has led to distance distributions in agreement with conformational ensembles derived from other experimental techniques (nuclear magnetic resonance, NMR, and small-angle X-ray scattering, SAXS),[17] the number of amino acids that has to be added to consider the dyes and their linkers is not unambiguous.[17−19] This has consequences when distances within IDPs are measured by different techniques. Radii of gyration (RG) measured using SAXS and those inferred from end-to-end distances (RE) using smFRET have apparently disagreed for a long time.[20−22] A number of approaches have been presented to resolve this controversy, employing improved analysis procedures and explicit ensembles, generated using Bayesian statistics or maximum entropy approaches, in agreement with smFRET and SAXS.[19,23−25] In this context, fluorophores have been attached in silico to describe measured EFRET of individual distances (one distance per protein).[19,26] While these approaches are promising, the study of IDPs demands a systematic analysis integrating distance information between different regions of the protein, its global extension, but also local structural information to accommodate heterogeneity in compaction, as well as population of transiently structured elements. Here, we propose an approach for the systematic integration of various solution state structural data of IDPs based on the implementation of FRET efficiencies into the algorithm ASTEROIDS that derives representative structural ensembles of IDPs from NMR and SAXS data describing both local conformational propensities and long-range distance information.[27,28] Our approach is based on the selection of smaller ensembles from a large statistical coil ensemble (calculated using flexible-meccano[29] and of an extension approximately equal to a fully unfolded protein[30]) solely using experimental data, and the fluorophores are explicitly taken into account through the per-conformer calculation of AVs. This strategy does not require a conversion between different distance measures (e.g., RG and RE), nor does it require an approximation of the dyes/linker length in the context of a polymer model and therefore allows describing IDPs of varying degrees of compactness along their sequence, theoretically even including entirely folded domains. We first selected and cross-validated conformational ensembles using a large set of in silico PRE (paramagnetic relaxation enhancement) and FRET data. Finally, we validate our approach with respect to experimental FRET efficiencies, SAXS data, as well as NMR chemical shifts and PREs, obtaining new insights into the conformational landscape of an intrinsically disordered region of the measles virus phosphoprotein. Notably, in addition to a number of FRET efficiencies and SAXS data, we also use experimental fluorescence lifetimes of the FRET-labeled protein to cross-validate our conformational ensemble, demonstrating correct sampling of the ensemble itself as well as the dye AVs. We demonstrate complementarity between different parameters (particularly FRET and PREs) and the importance of using distance information across the IDP sequence to generate meaningful conformational ensembles. The presented approach now allows addressing dynamic integrated structural biology quantitatively and in a predictive manner.

Results

FRET Distance Networks in Conformational Ensembles

In order to determine conformational ensembles based on experimental smFRET data, we build on an approach that has been developed and frequently used for calculating conformational ensembles based on diverse NMR parameters and SAXS.[1,31−34] A large ensemble of conformers (e.g., 10 000) is calculated based on a statistical distribution of Φ and Ψ angles of the protein backbone using the software flexible-meccano.[29] From this large ensemble, smaller subensembles that describe the experimental data are selected using the genetic algorithm ASTEROIDS.[35] Distance measurements through FRET rely on the attachment of a donor and an acceptor fluorophore to specific sites within the protein chain. Our goal being to describe the experimental FRET efficiencies directly, the fluorescent dyes thus have to be accounted for in the conformational ensemble. We calculated accessible volumes for the fluorophores Alexa488 and Alexa594 attached to cysteines via maleimide chemistry and comprising a C5 linker connecting the Cys side chain and the fluorophore, as previously described.[12,36] As a first step, we calculated a conformational ensemble of a 110 amino acid long model protein containing two cysteines as dye attachment points, and we calculated AVs for every conformer in the ensemble. Both sampling of the AV and sampling of the different conformers were assumed to be on a time scale significantly slower than the fluorescence lifetime, as suggested by AV sampling based on molecular dynamics simulations of fluorescently labeled DNA.[37] We first used the cysteine side chains as attachment points. In order to allow labeling positions that are not native cysteines and that are experimentally generated through point mutations, we estimated the average distance between the Cβ atom and the SH and elongated the linker length in the simulation accordingly (see Methods). The distance distributions calculated on a 100 conformer ensemble with attachment points at either the SH or Cβ with their respective parametrization of the linker can be considered equal (SI Figure 1). For the selection of meaningful ensembles, AVs have to be calculated and FRET efficiencies determined for all conformers in the large flexible-meccano ensemble before selection using ASTEROIDS. Since AV calculation is time-consuming, the iterative sampling of positions in the AV was optimized to 500 iterations (Figure ) and the pairwise distance calculation coarsened (see Materials and Methods).

Figure 1

Influence of step size on conformational ensembles. (A) Examples of conformations of a model protein with accessible volumes (AVs) of Alexa488 (green) and Alexa594 (red), calculated using 100, 500, 1000, or 10 000 iterations (steps) for position determination. (B) Distance histogram over accessible volumes calculated over a 100 conformer ensemble using 100 (light gray), 500 (blue), 1000 (dark gray), and 10 000 (black) dye positions sampled iteratively.

Benchmarking an Ensemble Selection Using FRET against in Silico Data

After optimizing AV calculations for multiconformational ensembles, we investigated whether FRET efficiencies (EFRET) could be used in the context of the ensemble selection algorithm ASTEROIDS. For this, we used an IDP sequence of 155 amino acids in length, for which we calculated an ensemble comprising a long-range contact between amino acid segments 15–25 and 90–100 and for which we generated 15 in silicoEFRET (SI Figure 2) using AV calculations as described above. In order to obtain distances that adequately reflect the long-range behavior of the ensemble, we selected labeling positions covering different regions of the protein and care was taken to cover both short and long amino acid distances between the attachment points of the labels so as to address FRET efficiencies throughout the sensitive regime of FRET (around 2–10 nm). From a large statistical-coil ensemble calculated using flexible-meccano, we then selected smaller subensembles of 200 conformers in size using ASTEROIDS based on six of the 15 in silico FRET efficiencies (Figure ). When the remaining nine FRET efficiencies, that were not used in the selection, were back-calculated from the selected ASTEROIDS ensemble, the in silico FRET efficiencies of the input ensemble comprising a long-range contact were predicted with high accuracy (SI Figure 3A).

Figure 2

Scheme of incorporation of FRET distances into ASTEROIDS based on simulated data. An in silico ensemble of conformations is generated, for which accessible volumes occupied by Alexa488 and Alexa594 are computed and FRET data are calculated (blue frame). FRET efficiencies are used as an input for ASTEROIDS selection (red frame) from a pool of statistical coil conformers (calculated from flexible-meccano, gray frame). The FRET efficiencies used in this selection were chosen to represent varying distances across the sequence of the protein, and sufficient sampling of the different regions of the protein is indeed crucial for reproducing the long-range characteristics of the ensemble with confidence. If only three FRET efficiencies were used in the selection, even when distributed along the sequence, the remaining FRET efficiencies not used in the selection were only poorly predicted by the ASTEROIDS ensemble and the long-range distances of the simulated target ensemble much less well captured (SI Figure 4).

PREs and FRET Distances Provide Complementary Long-Range Distance Information

Through paramagnetic relaxation enhancements, NMR also offers a probe for longer range distances that can reach up to around 2.5 nm.[38] For this, a paramagnetic probe (usually a spin radical) is attached to a site-specifically engineered cysteine within the protein chain, and its effects on spin relaxation of the different 1HN nuclei within the protein backbone are measured and depend on the inverse sixth order of the respective distance from the spin radical. PREs thus have a distance dependence similar to FRET with, however, different sensitive regimes (Figure A–C). Indeed, the distance windows at which FRET and PREs are sensitive, respectively, are entirely complementary, and only both techniques together are expected to provide insights into both intermediate (around 1–3 nm) and long-range (around 4–8 nm) distance ranges.

Figure 3

ASTEROIDS selection based on PREs and FRET efficiencies. (A) Dependence of the FRET rate (kET) on distance with a Förster radius of 56 Å and a fluorescence lifetime of the donor (τD) of 4 ns. (B) Dependence of the FRET efficiency (EFRET) on distance for a kET as displayed in A (blue curve, blue y axis). The red curve with the red y axis shows the dependence of peak intensity ratios (I/I0) for a paramagnetic as compared to a diamagnetic sample at a PRE rate described in C. Red and blue shading illustrate the distance ranges to which FRET and PREs are sensitive. (C) PRE rate (R2,PRE) dependence on the distance between the proton and electron spins for τc = 5 ns, with τc = τrτs/(τr + τs), τr being the rotational correlation time of the protein and τs the effective electron relaxation time (see ref (38)). Note that reorientation dynamics of the spin label was not taken into account for this illustration, but was considered in the ensemble calculations. (D) Schematic of a 155 residue long IDP comprising a long-range interaction between regions indicated by green boxes. Below: Distances for which EFRET has been calculated after in silico addition of the fluorescent dyes. Black protein constructs have been used in the ASTEROIDS selection; orange protein constructs have been used for cross validation (corresponding EFRET above yellow background in E). See also SI Figure 2 for a more detailed scheme. (E) FRET efficiencies plotted against the amino acid distance between the labels of a flexible-meccano ensemble (gray), the simulated ensemble with a long-range contact (blue), and an ensemble selected based on six FRET efficiencies and five PRE labeling positions (red). Only EFRET on a white background have been used in the selection. Cross-validated distances are on a yellow background. Error bars on the blue points indicate the error in FRET efficiency that was allowed in the selection (0.02). Error bars on the red points refer to the standard deviation of EFRET calculated from six independent selections. (F) Histogram of average pairwise Cα–Cα distances of the flexible-meccano ensemble (gray), the simulated ensemble (blue), and the ensemble selected based on PREs and six different FRET distances (red bars). (G) PREs of a flexible-meccano ensemble (gray lines), of the simulated ensemble with a long-range contact (blue lines), and of the selected ensemble (red bars). All simulated PREs (in blue) were used in the selection. Red error bars are standard deviations over six independent selections. While the contribution of paramagnetic relaxation can be directly determined through the measurement of spin relaxation, we do not have access to the FRET rate (kET) itself, which can only be measured indirectly through the fluorescence lifetime of the donor in the absence (τD) and presence (τD(FRET)) of the acceptor,or through the FRET efficiency,leading to a dampened dependence between the measurement parameter (τD or EFRET) and the donor–acceptor distance for short distances. A similar dependence can be obtained if peak intensity ratios (I/I0) of the para- and diamagnetic PRE sample are considered, allowing a visual inspection of the complementary distance ranges (Figure B). We calculated PREs[39] using five different attachment sites for a spin radical in our long-range ensemble and used these in silico PREs to select smaller subensembles of 200 conformers using ASTEROIDS. Although all PREs are captured very well in these ensembles, they fail to reproduce the expected FRET efficiencies (SI Figure 3C,D). This observation remains true also if fast (faster than the fluorescence lifetime) sampling of the AVs was assumed (SI Figure 5). The selection based on six FRET efficiencies described above, on the other hand, also fails to reproduce the expected PREs, thus illustrating the expected complementary distance ranges to which PREs and FRET are sensitive (SI Figure 3B). An ensemble that has been selected based on five in silico PRE labeling sites and six FRET efficiencies, however, leads to an excellent reproduction of all in silico PREs and EFRET (Figure ), and this ensemble also reliably reproduces the expected average pairwise as well as specific Cα–Cα distance distributions (Figure F, SI Figure 6) that can be calculated directly from the selected ensemble without additional approximation concerning fluorescent dyes and their linkers (or PRE labels). Indeed FRET efficiencies and PREs are both necessary to correctly describe a conformational ensemble that populates various intermediate- and long-range distances. Including only FRET or only PREs into a selection can only be expected to reproduce the respective other parameter for a very narrow distance window and depending on the properties of the pool of conformers from which ensembles are selected. We demonstrate this on the example of a new set of in silico data, in which we allowed the long-range contact to reach up to 50 rather than 20 Å, to which FRET efficiencies, but not PREs are sensitive. In this case, selection based on six FRET efficiencies leads to agreement with the in silico PREs, which are not noticeably different from a flexible-meccano-derived statistical coil (SI Figure 7).

Analysis of Ensemble Sizes

Ensemble selections based on in silico data back-calculated from a known target ensemble also allowed us to test the number of conformers required to represent the data and sufficient to reliably reproduce the statistics of the target ensemble. We have thus performed selections of 10, 20, 50, 100, 200, and 400 conformers per ensemble and calculated average absolute deviations from the in silico data. This analysis indicates that reproduction of the data improves as the ensemble size increases (Figure A and B), reaching excellent agreement with the in silico data starting from around 200 conformers per ensemble. Reproduction of the Cα–Cα distance distributions between the labeling sites is comparatively poor at low numbers of conformers, and only starts improving once an ensemble size of approximately 100 conformers is reached. Reproduction further improves with increasing numbers of conformers (Figure C and SI Figure 6). We thus conclude that, overall, an ensemble size of 200 conformers, as proposed earlier for ensembles selected based on PREs and residual dipolar couplings,[39] is a good size to reconcile reproducibility, statistics, and computation speed.

Figure 4

Varying the size of the selected ensemble. (A and B) Averaged absolute deviations of the FRET efficiency (A) or PRE (B) as calculated from the selected ensemble (x) from the respective values of the target in silico ensemble (x0). Error bars show the corresponding standard deviations. Red points illustrate data not used in the selection. Ensemble sizes were 10, 20, 50, 100, 200, or 400 conformers. Dashed lines represent exponential fits representing the trend of the data. (C) Cα–Cα distances between the in silico labeling sites 2 and 92 for different ensemble sizes. In red are the distances calculated from one selection based on six FRET efficiencies and using five PRE labeling sites. The expected Cα–Cα distances are shown in blue; the distances obtained from a flexible-meccano statistical coil ensemble in gray. Black numbers inside the graphs indicate the numbers of conformers used in the selected ensembles.

Description of Experimental FRET, PREs, and Chemical Shift Data

While our comprehensive in silico data set demonstrates how to accurately describe long-range distances within intrinsically disordered proteins, we aimed to test the validity of this approach on experimental data. For this, we used a 110 residue long protein from the disordered N-terminus of the measles virus phosphoprotein (P1–100). This protein has been extensively characterized by NMR spectroscopy[1,40] and harbors two transient α-helices, as can be inferred from backbone chemical shifts (Figure D). We acquired nine FRET efficiencies, PREs from five different labeling sites (Figure A–C), a full set of backbone chemical shifts[1] sensitive to local structural propensities, and SAXS reporting on the distribution of RG, i.e., the overall dimension of the protein. FRET efficiencies, obtained from random labeling of two engineered cysteines with Alexa488 and Alexa594 using maleimide chemistry, were recorded on a custom-built single-molecule fluorescence spectrometer. The corrected (see Methods for details) FRET histograms were fit with double-Gaussians describing populations at EFRET = 0 (donor only population) and at EFRET > 0, which was extracted for ensemble selection or cross-validation (Figure A, SI Figure 8 and SI Table 1). Comparison of the experimentally obtained FRET efficiencies with efficiencies expected from a flexible-meccano statistical coil ensemble suggests that P1–100 samples a conformational ensemble that is slightly more compact than a random coil.

Figure 5

Description of experimental FRET, PREs, and SAXS by a common multiconformational model. (A) Experimental FRET histograms of P1–100 (black bars) with double Gaussian fit (green) from which EFRET of the nonzero population was extracted. (B) 1H–15N heteronuclear single quantum coherence (HSQC) spectrum of P1–100 C64 unlabeled (green) and labeled with MTSL (yellow). (C) Visualization of FRET distances for which data have been acquired. (D) Cα secondary chemical shifts of P1–100 calculated based on experimental chemical shifts (blue) and based on chemical shifts calculated from an ensemble selected based on five PRE labeling positions, six FRET efficiencies and chemical shifts (red). (E) Experimental (blue) PREs and PREs calculated from the selected ensemble (red). All PREs were used in the selection. PRE labeling sites are indicated by green dashed lines (note that the same cysteines have been used for PRE and FRET labeling). Intensity ratios between the PRE labeled (I) and unlabeled (I0) peaks are shown. (F) FRET efficiencies (EFRET) of P1–100 plotted against the amino acid distance between the fluorophores. The gray line indicates values expected from a flexible-meccano statistical coil (polynomial fit of in silico data presented in Figure ). Experimental data are shown in blue with error bars resulting from standard deviations calculated from independent measurements. Red points indicate EFRET calculated from the ASTEROIDS selection. Data points plotted in front of a yellow background were not used in the selection. (G) Experimental SAXS curve (blue) and SAXS curve back-calculated from the ASTEROIDS ensemble (red). SAXS data were not used in the selection. (H) Cumulated fluorescence lifetime histograms calculated from the FRET population of the single molecule data (corresponding to FRET mutants shown in (A)). Blue points are experimental data, and red curves are decays back-calculated from the selected ensemble, comprising a scattering contribution and scaled to best fit the experimental data. Conformational ensembles comprising 200 conformers (see SI Figure 9 for an assessment of ensemble sizes) were selected using ASTEROIDS based on all PREs, chemical shifts (N, HN, CO, Cα, Cβ), and six of the nine experimental EFRET. FRET efficiencies were included in the ensemble selection as described above, and the selected ensemble reliably reproduced the data used in the selection (Figure D, E, and F) as well as the four FRET efficiencies that have not been used in the selection (Figure F). A SAXS curve that was acquired from P1–100 and not used in the selection was also well described by the ASTEROIDS ensemble selected based on PREs, chemical shifts, and FRET efficiencies, suggesting that the ensemble also captured the overall dimension of the protein (Figure G). Analysis of the experimental SAXS curve as well as the SAXS curve back-calculated from the ensemble using extended Guinier analysis[41] yielded comparable RG values, which were also in agreement with the average RG calculated directly from the selected conformational ensemble (SI Figure 10). The scaling exponent calculated from the selected ensembles, indicative of solvent quality, was determined to be 0.52, in agreement with θ-solvent conditions (SI Figure 10A) under which excluded volume interactions cancel out.[25,42] Our experimental data combined with ASTEROIDS selections based on only FRET or only PREs show that long-range and intermediate- range distances of the conformational ensemble are only correctly sampled when combining both sets of data (SI Figure 11). This is in agreement with the theoretical complementarity of FRET and PREs regarding their sensitive distance ranges (Figure B), as shown on the example of an in silico data set (SI Figure 3). It is interesting to note that integration of PREs into the selection also improves the reproduction of two of the experimental FRET efficiencies, indicating that the FRET efficiencies alone might not sufficiently cover all relevant protein regions in the case of P1–100. As, for this experimental data set, it is a priori not known on what time scale the fluorescent dyes sample the accessible volume, we additionally considered the other extreme case of AV sampling significantly faster than the fluorescence lifetime. FRET efficiencies of all conformers in the pool from which ensembles were selected were thus calculated under this assumption, and an ASTEROIDS selection was performed on the basis of six FRET efficiencies, five sets of PREs, and chemical shifts. This ensemble reproduces the FRET efficiencies not used in the selection less well than when slow (slower than the fluorescence lifetime) AV sampling was assumed (compare SI Figure 5B to Figure 5F). We thus conclude that “slow” AV sampling is appropriate for the P1–100 experimental FRET data. We note, however, that more rapid diffusion of fluorescent dyes has been observed for other experimental systems.[43,44] As an additional cross-validation of both AV sampling and calibrations employed for the experimental smFRET experiments, we labeled one sample of P1–100 (C28–C64) with a different dye pair (Alexa488/Alexa647) and determined its FRET efficiency (SI Figure 12). In parallel, we simulated the Alexa488/Alexa647 dye pair onto the ensemble selected based on smFRET (Alexa488/Alexa594), PREs, and chemical shifts. The difference between experimental EFRET (0.52) and EFRET expected from the selected ensemble (0.56) is below the common error determined by a recent multilaboratory study.[45] While, in all ASTEROIDS selections, an error of 0.02 for EFRET was allowed in agreement with the measurement error over several independent measurements, a larger allowed error might be considered appropriate[45] as the measured quantum yields, Förster distance R0, or determination of spectral crosstalk is also error prone. ASTEROIDS selections based on six FRET efficiencies, five sets of PREs, and chemical shifts allowing an error of 0.06, however, are in very good agreement with those selected allowing an error of 0.02 in the case of P1–100 (SI Figure 13).

Reproduction of Experimental Fluorescence Lifetimes by Conformational Ensembles

In addition to intensity-based FRET efficiencies, calculated as a function of the number of emitted photons (cross-talk and background corrected; see Methods) of the donor (ID) and the acceptor (IA),and corrected for differences in quantum yield and detection efficiency in the green and red channel (γ), fluorescence lifetimes provide a complementary measure for distance distributions of a conformational ensemble.[46,47] While, for a static donor–acceptor distance, EFRET can be calculated from fluorescence lifetimes of the donor in the absence (τD) and presence of the acceptor (τD(FRET); see also eq ), this is not the case for distances with dynamics longer than the fluorescence lifetime and shorter than the interphoton time (usually on the order of tens of microseconds):[47,48]Indeed, taking into account fluorescence lifetimes in the conformational ensemble of an IDP is complex, as every conformer in the ensemble contributes a single-exponential decay to the time-resolved fluorescence intensity of a time-correlated single photon counting (TCSPC) experiment, and the resulting multiexponential intensity decay is then experimentally convolved with the instrument response function (IRF) of the smFRET setup.[17] In order to test whether the distance distributions of our conformational selection are in agreement with our experimental fluorescence lifetimes, we first extracted the fluorescence intensity decays of the FRET population from our single-molecule data (SI Figure 14A). The IRF was measured independently under the same experimental conditions, described with a double Gaussian function, and convoluted with the multiexponential decays expected for our conformational ensemble. The resulting decay curves described the experimental intensity decays remarkably well (Figure H, SI Figure 14B), indicating that our conformational ensemble correctly reproduces another set of independent long-range data that was not used in the ASTEROIDS selection process, thus confirming the validity of the selected ensemble as well as the time scales applied for motional sampling of both dyes and proteins within the ensemble.

Discussion

A molecular description of the conformational landscape sampled by IDPs and proteins containing intrinsically disordered regions (IDRs) is of paramount interest, as IDPs and IDRs are enriched in several essential biological processes, such as signaling,[49,50] cellular transport processes,[51,52] and gene regulation,[53,54] and their misregulation is often also linked to disease.[55] Although multiconformational models have been conceived using mainly NMR and small angle scattering data,[29,39,56−58] and in some individual cases single-molecule FRET efficiencies,[19,26] those approaches fall short in integrating specific long-range and short-range information in a predictive manner. We now demonstrate a tool-set to integrate the three most powerful techniques for the analysis of IDPs: NMR, SAXS, and single-molecule FRET. We show the integration of several FRET efficiencies into ensemble selections, and we reproduce them with confidence. We perform the selection using the experimentally obtained FRET efficiencies rather than their inferred distances and reproduce the corresponding fluorescence lifetimes. Modeling of the fluorophores in terms of accessible volumes[12] on top of the pool of conformers from which the ensembles are selected is key to allowing an integration of parameters from techniques that have different experimental requirements: the attachment of fluorophores or spin radicals for single-molecule FRET and PREs, or no labeling/stable isotope labeling for SAXS/NMR. This approach assumes that the conformational ensemble remains quasi-identical in the presence and absence of the different labels (FRET/PRE) and that the parametrization of the AVs accurately reproduces the volumes sampled by the fluorophores. Successful cross-validation of a number of FRET efficiencies (including one with a different dye pair) not used in the selection and a SAXS curve suggest that these assumptions are indeed correct. The selection of explicit ensembles combined with the in silico attachment of labels also allows for its use if complex distance distributions are sampled that include transiently folded protein regions or even entire folded domains.[1,49,59] Distance distributions within the protein backbone can be directly calculated from the selected ensemble. While we employ the genetic algorithm ASTEROIDS[27] to select conformational ensembles in agreement with the experimental data, our developments concerning the integration of fluorophore AVs into conformational ensembles as well as insights into sampling of (sufficient) FRET distances along the protein sequence can also be used with other ensemble selection approaches.[19,23,26] Importantly, we show that we can reproduce not only the FRET efficiencies that were used for the ensemble selection and cross-validate additional FRET efficiencies but also their corresponding fluorescence lifetimes. As fluorescence lifetimes of a FRET sample also depend on the distance distribution between the two attached fluorophores[47] and have thus frequently been used in the analysis of folded as well as intrinsically disordered proteins,[17,33,48,60−62] these results are particularly remarkable testifying to the predictive nature of our ensembles by reproducing an independent data set. We show that PREs and FRET efficiencies provide complementary intermediate- and long-range information on the conformational ensemble, and it is worth noting that the ensembles selected on the basis of chemical shifts, PREs, and FRET efficiencies also reproduce an independently measured SAXS curve. This shows that these fundamentally different experimental techniques effectively agree with each other, therefore also supporting recent advances resolving[19,23] the long-lasting controversy concerning compaction of IDPs measured by smFRET and SAXS.[19−24] Apart from contributing distance ranges much longer than those accessible by PREs, including smFRET into the calculation of conformational ensembles of IDPs or proteins comprising intrinsically disordered regions has far-reaching consequences regarding the applicability of ensemble calculation: Since smFRET is not limited by the size of the protein, nor any dynamic time scale sampled by the protein, FRET efficiencies can also be measured under conditions where NMR line broadening leads to factual disappearance of the signal.[33,63] Furthermore, the low protein concentrations used in an smFRET experiment (in the picomolar range) also allow accessing aggregation-prone proteins[54,64] or performing experiments within the cell under physiological conditions.[10,11] Using FRET efficiencies for the calculation of conformational ensembles thus allows addressing the conformational landscape of IDPs under conditions that are not accessible by any other technique.

Conclusion

With the integrated use of NMR, SAXS, and single-molecule FRET to calculate multiconformational models that satisfy all data, we now demonstrate how different experimental techniques can synergize to reliably describe IDPs, and we demonstrated this on the example of the measles virus phosphoprotein. With the increasing awareness of the importance of IDPs, in particular also in liquid–liquid phase separation,[65,66] we expect this tool-set to make an important impact in integrative multiconformational modeling of dynamic systems.

Materials and Methods

Accessible Volume Calculations

AV calculations were based on procedures described previously.[12,36] Briefly, positions that the dyes are expected to sample were calculated considering a linker length, as well as three radii (R1, R2, R3). Pairwise distances between the positions sampled by the donor and the acceptor fluorophore were then calculated with a coarsening step size of 200 with respect to the position list. Distance histograms as well as average FRET efficiencies were compared over an ensemble of 200 conformers using a step size of 10, 50, and 200. For the calculation of FRET efficiencies on large conformational ensembles, the calculation speed of the AV had to be optimized: Positions describing the accessible volumes were sampled in an iterative way. A total of 100, 500, 1000, and 10 000 iteration steps were tested for reproducibility of distance histograms and average FRET efficiencies over a 100 conformer ensemble. A total of 500 iterations led to sufficiently accurate distance histograms that reproduce FRET efficiencies reliably. In order to avoid “mutating” amino acids into cysteines in silico, AVs were calculated from the CB atom of the respective amino acid. The linker length in the simulations was optimized to take the distance between CB and SH of a cysteine into account. The estimate (L = 22.83 Å) was based on geometrical considerations, and an ensemble by which the AV was calculated from the CB as attachment point has been verified to reproduce the distance histograms and FRET efficiencies calculated from the same ensemble, but with SH as attachment point (L = 21 Å). Scripts provided by Walczewska-Szewc et al.[36] have been adapted to contain the changes above. Attachment points were read from the respective PDB files of the conformational ensemble in an automated way using in-house software and were then used for the calculation of AVs and distance histograms. Parametrization for Alexa647 was adapted from Peter et al.[67] PDB files containing full side chains were used for AV calculation. Conformer-wise FRET files were then generated as an input for ASTEROIDS[27,28] selection, containing the different FRET distances used in the selection.

Incorporation of FRET Efficiencies into Multiconformational Models

AVs were calculated per conformer as described. Pairwise distances between the sampled volumes of the two fluorophores are calculated and converted into FRET efficiencies according towith the Förster distance R0 and the distance r between the sampled points in the AV. The average FRET efficiency of one conformer is then calculated as the average of ε over all pairwise distances n:in accordance with a sampling of the AV on a time scale significantly longer than the fluorescence lifetime. The average FRET efficiency of the ensemble comprising all conformers (which is then compared to the measured EFRET) can be described aswith ε(r) as described in eq and P(r) describing the distance distribution containing all pairwise distances of the AVs for every conformer. Computationally, for an ensemble of m conformers, ⟨Eens⟩ can be calculated aswith ⟨Econf⟩ describing the average FRET efficiency of the ith member of the ensemble as described in eq . For considerations assuming a sampling of the AV that is much faster than the fluorescence lifetime, pairwise positions of the fluorophores were first determined and their sixth power was calculated and then averaged per conformer.[68,69] The FRET efficiency was calculated from these averaged distances on a conformer-by-conformer basis:R0 used in the calculations was determined experimentally. The quantum yield of P1–100 labeled with Alexa488 was determined by the comparative method[70] described previously with fluorescein (in 0.1 M NaOH, Φ = 0.95, n = 1.334)[71] and Rhodamine 6G (in ethanol, Φ = 0.94, n = 1.361)[72] as quantum yield standards. The overlap integral J(λ) was determined from P1–100 samples labeled with Alexa488 and Alexa594.[46] Rapid orientation averaging of the fluorescent dyes was assumed, leading to the common assumption of κ2 = 2/3. Fluorescence anisotropies measured on the different P1–100 samples suggested that this assumption was valid (SI Table 2). R0 was then calculated according to[46]with the Avogadro number NA, the overlap integral J(λ), the orientation factor κ2, the quantum yield of the donor in the absence of acceptor ΦD, and the refractive index n. n = 1.3 was used for P1–100 in its measurement buffer. An R0 of 56 Å was obtained for the dye pair Alexa488/Alexa594 in 50 mM Na-phosphate pH 6, 150 mM NaCl, and 2 mM dithiothreitol (DTT). The same R0 was used to compute the in silico data set.

Generation of in Silico Ensemble

Flexible-meccano[29] was used to generate a large conformational ensemble (10 000 conformers) of a 155 amino acid long IDP. The centers of mass of all Cα atoms from residues 15–25 as well as residues 90–100 were calculated, and all conformers with a distance of less than 20 Å between these centers of mass were selected. AVs of Alexa488 and Alexa594 were computed as described above for 15 in silico “labeling positions” (SI Figure 2), and FRET efficiencies for this ensemble comprising a long-range contact were calculated as described above and used as an input for ASTEROIDS or for cross-validation of ensembles selected using ASTEROIDS. Expected PREs for this ensemble were calculated as described elsewhere (labeling sites were residues 23, 65, 70, 92, and 130).[39] τc and τe were set to 5 and 0.5 ns, respectively. τC = τrτs/(τr + τs) describes the rotational correlation time of the protein (τr) and the effective electron relaxation time (τs), and τe = 1/(τi–1 + τr–1 + τs–1) depends on the effective correlation time of the spin label (τi) according to a model-free expression of the spectral density function.[38,39]1H R2 relaxation was assumed to be 18 s–1 throughout the protein.

Selection of Conformational Ensembles

Ensembles of 200 conformers were selected from a large statistical coil ensemble (10 000 conformers), generated through flexible-meccano,[29] using the genetic algorithm ASTEROIDS.[27] Selection based on PREs and chemical shifts was performed as described previously.[39] Selection based on FRET efficiencies allowed an error of 0.02 and was weighed 50% as compared to an NMR experiment (e.g., all PREs arising from one spin labeling site). For P1–100, ensembles were first selected based on only chemical shifts during four iterations of flexible-meccano/ASTEROIDS. A large conformational ensemble (10 000 conformers) was then calculated based on the resulting Φ/Ψ angles, from which subensembles were selected based on FRET, PREs, and chemical shifts. FRET efficiencies not used in the selection were back-calculated as described above. SAXS curves were back-calculated using CRYSOL.[73] Chemical shifts were calculated using SPARTA.[74]

Back-Calculation of Fluorescence Lifetimes

Distance distributions between the donor and acceptor fluorophores were calculated from the selected conformational ensembles, and the corresponding fluorescence lifetime decays were calculated as[23]with the instrument response function (IRF) experimentally determined and described by a double Gaussian function, and the fluorescence lifetime of the donor in the presence of the acceptor τD(FRET) calculated for every distance r according to The fluorescence lifetime of the donor in the absence of the acceptor (τD) and the Förster distance (R0) were experimentally determined. A scattering contribution was added to the fluorescence lifetime decays, and both decay and scattering were scaled independently to best fit the experimental data.

Protein Production

P1–100 tagged with 8 His was expressed and purified as described earlier.[1,40] Briefly, a pET41c(+) plasmid containing P1–100 was transformed into Rosetta (λDE3)/pRARE (Novagen), and cultures were grown at 37 °C in lysogeny broth (LB) medium until an optical density (OD) of >0.6. Expression was induced with 1 mM isopropyl-β-d-thiogalactopyranoside and continued at 20 °C overnight. Cells were lysed by sonication in 20 mM Tris pH 8/150 mM NaCl and purified using standard Ni purification. The protein was eluted from the Ni resin by adding 400 mM imidazole to the lysis buffer. The protein was then further purified on a Superdex 75 column (GE Healthcare) in 50 mM Na-phosphate, pH 6, 150 mM NaCl, and 2 mM DTT. Expression of protein labeled with 15N followed the same procedure, except that the protein was expressed in M9 minimal medium. All experiments were conducted in 50 mM Na-phosphate pH 6, 150 mM NaCl, and 2 mM DTT. DTT was not contained in buffers used for PRE experiments.

Protein Labeling with Fluorophores or Spin Radical Labels

P1–100 was randomly labeled with Alexa488 and Alexa594 essentially as described previously.[8,75] Briefly, 20 mM DTT was added to the protein sample and incubated overnight at 4 °C. The protein was then dialyzed into degassed 50 mM Na-phosphate pH 7 and 150 mM NaCl buffer until all DTT was washed out. Alexa488 and Alexa594 were added simultaneously at an excess of approximately 5× compared to protein. Labeling was allowed to proceed 30 min at room temperature, followed by 4 °C overnight. The labeled protein was then separated from excess dye by size exclusion chromatography on an Enrich SEC70 (Biorad) column using 50 mM Na-phosphate buffer (pH 6), 150 mM NaCl, and 2 mM DTT. Labeling of 15N P1–100 single cysteine mutants for PREs was achieved using S-(1-oxyl-2,2,5,5-tetramethyl-2,5-dihydro-1H-pyrrol-3-yl)methylmethanesulfonothioate (MTSL) and followed essentially the same procedure as for fluorescence labeling. The final buffer used for size exclusion chromatography, however, did not contain DTT.

Experimental NMR Data

All NMR experiments were performed at a temperature of 19 °C. The assignment of P1–100 was obtained previously[1] and used as an input for ASTEROIDS[39] selection as well as for calculation of secondary chemical shifts and secondary structure propensities[76] (SSPs). For calculation of PREs, HSQC spectra of the different Cys mutants of 15N P1–100 were measured in the presence and absence of the MTSL label. Spectra were processed with NMRPipe,[77] peak intensities were extracted from the respective spectra, and the ratio between MTSL labeled and unlabeled peak intensities was determined and used as an input for ASTEROIDS.

Experimental Single-Molecule FRET data

Single-molecule fluorescence spectroscopy was measured on a custom setup built around an Olympus IX73 microscope equipped with a 60× water immersion objective (NA 1.2). A pulsed laser diode (40 MHz, LDH 485, Picoquant, Berlin, Germany) was fed through a λ/4 plate and focused onto the sample to excite freely diffusing P1–100 molecules with circularly polarized light. Fluorescence emission was spatially filtered through a pinhole with a 100 μm diameter, separated into green (Alexa488) and orange (Alexa594) fluorescence, and focused onto two PMA hybrid detectors (Picoquant). Photons were counted using a Hydraharp (Picoquant). smFRET experiments were performed at room temperature. FRET histograms were calculated using custom code written in Python. Lists of photon arrival times were first extracted using a code written in C, adapted from a demo-code provided by Picoquant.[78] Photon streams were then binned with a 1 ms bin width and subjected to a Lee filter before bust integration and thresholding.[79] A threshold of at least 50 photons was used. Fluorescence intensities were corrected for background contribution, spectral crosstalk, differences in quantum yield (determined as described previously[70]), and differences in detection efficiencies between the green and the orange channel. Microtimes were extracted for bursts corresponding to the FRET peak and to the 0-FRET peak separately, and population averaged lifetime histograms were built. The instrument response function was measured on buffer under the same conditions as the single-molecule experiments, and lifetimes of the donor were extracted through fitting the lifetime histograms of the 0-FRET population with a single-exponential function convolved with the IRF.

Corrections Employed in the smFRET Experiments

Buffer background was measured using the same conditions as for the single molecule experiments, and bin-wise background contributions were determined for the donor and acceptor channel and subtracted from the bin-wise photon counts in the single-molecule FRET experiments. Differences in detection efficiencies and quantum yields were included in the correction factor γ (see eq :with being the difference in detection efficiency between acceptor (ηAc) and donor (ηDo) signal of the instrument determined as described in Ferreon et al., 2009.[80] Briefly, fluorescence of free donor and acceptor dyes in the measurement buffer was measured on an ensemble fluorescence spectrometer (PTI Quantamaster) and on the single-molecule fluorescence setup at the same excitation wavelength. Ensemble fluorescence spectra were corrected for detection differences at different wavelengths, and the total signal was extrapolated to the full emission spectra. Plots displaying the integrated ensemble fluorescence versus fluorescence recorded on the single-molecule setup were fitted with a line for donor and acceptor fluorescence independently. The ratio of the slopes (mAc/mDo) was determined to be γinstrument and is 0.81 for the Alexa488/Alexa594 dye pair and 0.83 for the Alexa488/Alexa647 dye pair in 50 mM Na-phosphate pH 6, 150 mM NaCl, and 2 mM DTT. The spectral properties of fluorescently labeled P1–100 were equal to those of the free dyes in the same buffer. γinstrument was corrected on a daily basis based on a short measurement of Rhodamine 6G,[46] which emits into the donor and acceptor channel of the smFRET setup. Fluorescence quantum yields of the donor (ΦDo) and acceptor (ΦAc) were determined from singly labeled P1–100 proteins in the measurement buffer using the comparative method described by Williams et al. as described above.[70] Rhodamine 101 (in ethanol, Φ = 1.0, n = 1.36)[81] was used as a quantum yield standard for Alexa594-labeled proteins (see section Incorporation of FRET Efficiencies into Multiconformational Models for standards used for Alexa488-labeled proteins). For Alexa647-labeled proteins (SI Figure 12), cresyl violet (in ethanol rather than methanol, Φ = 0.54, n = 1.33)[82] was added as an additional quantum yield standard. A refractive index of n = 1.3 was used for all P1–100 samples. The quantum yields determined for the different P1–100 single cysteine constructs were very similar, and their average quantum yields were thus used both for γ correction and for the calculation of the Förster distance (R0). Leakage was determined from measurements undertaken in the context of γ correction by calculating the ratio of donor fluorescence arriving in the acceptor versus the donor channel of smFRET setup. These values were corrected on a daily basis using the Rhodamine 6G calibration measurement and validated by ensuring that the donor-only peak of the single-molecule FRET histograms was situated at a FRET efficiency of 0. To estimate the contribution of direct excitation, an IDP sample labeled with Alexa488 and Alexa594 separated by 164 amino acids was prepared, which is not expected to yield EFRET > 0.[46] While we cannot entirely exclude that this is indeed not the case, the contribution of direct excitation was tentatively attributed to be 0.2 photon per 1 ms under this assumption. Since application of this correction yields differences in EFRET of only around 0.01 to 0.03, we decided not to apply this correction. This remains true if the ratio of extinction coefficients between the donor and acceptor at the excitation wavelength is used to correct for direct excitation.[83,84] In order to test the validity of this approximation, the DNA sample “4-mid”, labeled with Atto488/Atto594 used in Hellenkamp et al., 2018,[45] has been measured and corrected using the same procedure (SI Figure 12B, γ was determined independently, and quantum yields as well as R0 were used as described by atto-tec[85]), leading to EFRET = 0.39 compared to 0.41 ± 0.04 as reported by Hellenkamp et al.[45]

Experimental SAXS Data

SAXS experiments were measured for five different concentrations of P1–100 from 0.5 to 2 mg/mL at 20 °C on BM29 at the European Synchrotron Radiation Facility (ESRF), Grenoble, France. Scattering was measured at a wavelength of 0.992 Å, and samples were exposed during 10 frames. Frames not impacted by radiation damage were averaged. Buffer scattering curves were subtracted from the scattering curves of P1–100.

Theoretical Comparison between FRET and PRE Rates

Figure A–C were generated considering a static measured distance for both FRET (kET) and PRE rates (R2,PRE).was calculated with a Förster distance (R0) of 56 Å and a fluorescence lifetime of the donor (τD) of 4 ns. EFRET was calculated from kET as described in eq . r is the distance between donor and acceptor fluorophores.[46] The PRE transverse relaxation rate (R2,PRE) was calculated according towith the electron g-factor ge, the gyromagnetic ratio of the observed proton γH, the electron spin se, the Bohr magneton μB, and the proton frequency ωH. A spectral density function ofwas used with τc = τrτs/(τr + τs), τr being the rotational correlation time of the protein and τs the effective electron relaxation time. τc was set to 5 ns for Figure C. r is the distance between the 1HN nuclei and the PRE label.[38,39] Note that for the calculation of PREs in the context of a conformational ensemble of an IDP, a model-free expression of the spectral density function was used, describing the internal motion of the IDP as well as the motion of the spin label: The order parameter S2 denotes the motion of the dipolar interaction vector, τc is as described above, and τe = 1/(τi–1 + τr–1 + τs–1) additionally depends on the effective correlation time of the spin label (τi).[39]

Table 1

	Alexa488	Alexa594	Alexa647
L	22.83	22.83	22.83
R₁	6.8	7.6	11
R₂	3.9	4.1	4.7
R₃	1.5	2.2	1.5

75 in total

1. Detection of structural dynamics by FRET: a photon distribution and fluorescence lifetime analysis of systems with multiple states.

Authors: Stanislav Kalinin; Alessandro Valeri; Matthew Antonik; Suren Felekyan; Claus A M Seidel
Journal: J Phys Chem B Date: 2010-06-17 Impact factor: 2.991

2. Accurate FRET measurements within single diffusing biomolecules using alternating-laser excitation.

Authors: Nam Ki Lee; Achillefs N Kapanidis; You Wang; Xavier Michalet; Jayanta Mukhopadhyay; Richard H Ebright; Shimon Weiss
Journal: Biophys J Date: 2005-01-14 Impact factor: 4.033

3. Coil-globule transition in the denatured state of a small protein.

Authors: Eilon Sherman; Gilad Haran
Journal: Proc Natl Acad Sci U S A Date: 2006-07-20 Impact factor: 11.205

4. Theory of the energy transfer efficiency and fluorescence lifetime distribution in single-molecule FRET.

Authors: Irina V Gopich; Attila Szabo
Journal: Proc Natl Acad Sci U S A Date: 2012-05-01 Impact factor: 11.205

5. Comprehensive structural and dynamical view of an unfolded protein from the combination of single-molecule FRET, NMR, and SAXS.

Authors: Mikayel Aznauryan; Leonildo Delgado; Andrea Soranno; Daniel Nettels; Jie-Rong Huang; Alexander M Labhardt; Stephan Grzesiek; Benjamin Schuler
Journal: Proc Natl Acad Sci U S A Date: 2016-08-26 Impact factor: 11.205

6. Structural Description of the Nipah Virus Phosphoprotein and Its Interaction with STAT1.

Authors: Malene Ringkjøbing Jensen; Filip Yabukarski; Guillaume Communie; Eric Condamine; Caroline Mas; Valentina Volchkova; Nicolas Tarbouriech; Jean-Marie Bourhis; Viktor Volchkov; Martin Blackledge; Marc Jamin
Journal: Biophys J Date: 2020-04-18 Impact factor: 4.033