A large range of debilitating medical conditions is linked to protein misfolding, which may compete with productive folding particularly in proteins containing multiple domains. Seventy-five per cent of the eukaryotic proteome consists of multidomain proteins, yet it is not understood how interdomain misfolding is avoided. It has been proposed that maintaining low sequence identity between covalently linked domains is a mechanism to avoid misfolding. Here we use single-molecule Förster resonance energy transfer to detect and quantify rare misfolding events in tandem immunoglobulin domains from the I band of titin under native conditions. About 5.5 per cent of molecules with identical domains misfold during refolding in vitro and form an unexpectedly stable state with an unfolding half-time of several days. Tandem arrays of immunoglobulin-like domains in humans show significantly lower sequence identity between neighbouring domains than between non-adjacent domains. In particular, the sequence identity of neighbouring domains has been found to be preferentially below 40 per cent. We observe no misfolding for a tandem of naturally neighbouring domains with low sequence identity (24 per cent), whereas misfolding occurs between domains that are 42 per cent identical. Coarse-grained molecular simulations predict the formation of domain-swapped structures that are in excellent agreement with the observed transfer efficiency of the misfolded species. We infer that the interactions underlying misfolding are very specific and result in a sequence-specific domain-swapping mechanism. Diversifying the sequence between neighbouring domains seems to be a successful evolutionary strategy to avoid misfolding in multidomain proteins.
A large range of debilitating medical conditions is linked to protein misfolding, which may compete with productive folding particularly in proteins containing multiple domains. Seventy-five per cent of the eukaryotic proteome consists of multidomain proteins, yet it is not understood how interdomain misfolding is avoided. It has been proposed that maintaining low sequence identity between covalently linked domains is a mechanism to avoid misfolding. Here we use single-molecule Förster resonance energy transfer to detect and quantify rare misfolding events in tandem immunoglobulin domains from the I band of titin under native conditions. About 5.5 per cent of molecules with identical domains misfold during refolding in vitro and form an unexpectedly stable state with an unfolding half-time of several days. Tandem arrays of immunoglobulin-like domains in humans show significantly lower sequence identity between neighbouring domains than between non-adjacent domains. In particular, the sequence identity of neighbouring domains has been found to be preferentially below 40 per cent. We observe no misfolding for a tandem of naturally neighbouring domains with low sequence identity (24 per cent), whereas misfolding occurs between domains that are 42 per cent identical. Coarse-grained molecular simulations predict the formation of domain-swapped structures that are in excellent agreement with the observed transfer efficiency of the misfolded species. We infer that the interactions underlying misfolding are very specific and result in a sequence-specific domain-swapping mechanism. Diversifying the sequence between neighbouring domains seems to be a successful evolutionary strategy to avoid misfolding in multidomain proteins.
Multidomain proteins comprise covalently-linked, frequently similar domains, resulting in high effective local protein concentration. It is therefore probable that these proteins have evolved to avoid inter-domain misfolding in vivo. While co-translational, “domain-by-domain” folding is assumed to be important for avoiding misfolding[6], many proteins that are long-lived or subject to tensile forces will fold and unfold numerous times during their lifetime and may thus be particularly vulnerable to misfolding. The giant muscle protein titin, for example, undergoes reversible domain unfolding which may play a role in muscle elasticity[7].Single-molecule techniques are ideal for detecting rare events[8] such as misfolding in native conditions. Indeed, the first evidence for misfolding of adjacent domains in long tandem arrays of the well-characterised 27th domain from the I-band of titin, I27, was obtained using single-molecule atomic force microscopy (AFM)[9]. An alternative approach is single-molecule FRET[4,5], whose great sensitivity allows the detection of very small populations. FRET enables the mapping of intramolecular distances by means of the distance-dependent efficiency of excitation energy transfer between a donor and acceptor fluorophore attached to specific positions of the protein[10] (for details, see Supplementary Fig. 1).We hypothesised that denaturation of tandem constructs of titin domains with guanidinium chloride (GdmCl), followed by rapid refolding into native conditions, might allow formation of misfolded species, which should be detectable using single-molecule FRET[11]. We labelled a tandem construct of I27 (I27-I27) in the A-strand of domain 1 (E3C) and the G-strand of domain 2 (N83C) (Fig. 1a) with a donor (Alexa Fluor 488) and an acceptor (Alexa Fluor 594) fluorophore, attached via cysteine residues engineered on the protein surface. For a misfolded domain to be formed by strands from domains 1 and 2, we predicted that these two strands must be adjacent for this domain to have the mechanical properties observed in the previous AFM experiments[9]. The correctly folded tandem would then have low transfer efficiency, while a misfolded state would have high transfer efficiency. A monomer of I27 was labelled in the corresponding positions to provide a model for the misfolded state (Fig. 1b). Labelling was found to have little effect on the stability of I27 (Supplementary Fig. 2), and doubly-labelled proteins that had not previously been unfolded in GdmCl (‘never-unfolded’) show single, correctly folded populations with transfer efficiencies (E) of 0.37 (±0.01) and 0.93 (±0.01) for the tandem I27-I27 and monomer I27, respectively (Fig. 2 a & b).
Figure 1
Structures of Native and Misfolded I27 Constructs. (a) Natively folded I27-I27 tandem repeat with labelling positions highlighted (golden spheres). (b) Native I27 crystal structure (1tit.pdb) with labelling positions corresponding to those expected for the misfolded state of I27-I27. (c) One of the domain-swapped misfolded state structures formed in Gō-model simulations. (d) Schematic of this misfolded state topology: hydrogen bonds that are perpendicular to the direction of applied force in AFM mechanical unfolding are shown by dashed lines (circled). Four other misfolded state topologies were populated in the simulations (Supplementary Fig. 5b). We note that we cannot distinguish between such topologies from the results presented here.
Figure 2
Transfer Efficiency Histograms of Doubly-Labelled I27 Constructs. (a) ‘Never-unfolded’ I27-I27. (b) ‘Never-unfolded’ monomeric I27. (c) Refolded I27-I27. (d) Refolded I27-I27-I27; fits of individual populations shown as coloured lines for clarity. Histograms are fitted with normal or log-normal distributions. The peak in the grey shaded area consists of events from molecules without an active acceptor fluorophore[28]. Note that in these experiments a short four-amino acid linker (Arg-Ser-Glu-Leu) is included between the domains in the I27-I27 tandem to allow direct comparison with previous AFM and aggregation experiments[3,9]. ‘Never-unfolded’ I27-I27-I27 is shown in Supplementary Fig. 6.
We conducted refolding experiments by diluting unfolded I27-I27 into refolding buffer. The resulting transfer efficiency histograms now exhibited two populations (Fig. 2c): one corresponding to the correctly folded native state (E = 0.37), and one with precisely the same transfer efficiency as the analogously labelled monomer (E = 0.93). This observation reveals that the A-strand of the first domain and the G-strand of the second domain are arranged as in the monomer. A quantitative analysis (Supplementary Table 1) showed that 5.5(±0.2)% of the molecules are found in the misfolded form.Based on the results of the AFM studies[9] we had supposed that the misfolded species consisted of a single strand-swapped titin domain with the remaining sequence unstructured, and thus with an unfolding time similar to that of a native domain (τ ≈ 34 minutes[12]). We therefore investigated the unfolding kinetics of the misfolded state[13] (Fig. 3a). At high GdmCl concentrations the decay in the number of high-transfer efficiency events (E > 0.8), corresponding to the misfolded state, is fitted well by a single exponential (Fig. 3b) with rate constants slightly higher than the unfolding rate constant for I27wt determined in ensemble measurements[12] (Fig. 3c). We can also estimate the unfolding rate constant of the correctly folded species, which agrees well with the ensemble data (Fig. 3c) (see Supplementary Fig. 3 and Supplementary Information). In the absence of denaturant, however, the misfolded species was surprisingly long lived, converting to the correctly folded form only on a time scale of days (Supplementary Fig. 4). The formation of the misfolded structure is thus under kinetic, rather than thermodynamic, control. Its remarkable kinetic stability clearly distinguishes the misfolded species described here from short-lived, partially folded intermediates[14-16] sometimes termed “misfolded” because they contain some non-native interactions[17].
Figure 3
Unfolding Kinetics. (a) Evolution of transfer efficiency histograms over time (E ≥ 0.7) from single-molecule double-jump experiments in which refolded/misfolded I27-I27 (doubly-labelled) was unfolded in 3.5 M GdmCl. Histograms were constructed for a moving window of 120s that was shifted by 30s for each increment (inset, colour key). (b) The number of events with E > 0.8 for each histogram in (a) was summed and the resulting kinetics fitted with a single exponential decay. The rate constants are unaffected by different window sizes or the use of non-overlapping windows. (c) Unfolding rate constants for I27wt monomer (black) (ensemble data from[12])*, and for the misfolded and natively folded states of I27-I27 from single-molecule measurements (red and blue, respectively). The error bars represent the standard error of the fit (see Methods). Note that for some data points the error bars are smaller than the symbols. *I27 domains have the same unfolding rate constants in tandem repeat proteins as in isolated domains.
An explanation for the slow unfolding under native conditions is suggested by folding simulations of I27-I27 with a Gō-like model[18]. In these simulations, only native interactions are attractive, and interactions between a given pair of residues are considered equal, independently of whether they are in the same or different domains. While most trajectories result in two correctly folded domains, misfolded species with two fully-folded, strand-swapped domains are occasionally formed. Five different strand-swapped topologies were observed (Fig. 1c,d and Supplementary Fig. 5). Such an extensively misfolded structure explains its persistence; correct folding cannot occur while either misfolded domain remains folded. Since refolding rate constants are much higher than unfolding rate constants under native conditions, the simultaneous unfolding of both domains is very unlikely, and conversion to the native state is extremely slow[19].We can test the validity of our model further by investigating the refolding of a three-domain tandem of I27 with the FRET labels in domain 1 (E3C) and domain 3 (N83C). If domain-swapped structures were to be formed, we would expect to see two misfolded populations: one with the FRET efficiency of the monomer (misfolding between domains 1 and 3) and another population with the efficiency of the I27-I27 tandem (misfolding between domains 1 and 2 or 2 and 3). This is precisely what we observe (Fig. 2 d). The proportion of monomer-like (high-FRET) species in the trimeric tandem is significantly lower than before (2.8(±0.6)%); this is likely to reflect the lower probability of association between domains that are more distant in sequence. The population of the misfolded species with dimer-like FRET efficiency (9(±2)%) was instead almost twice as high as in the two-domain tandem (5.5(±0.2)%); this is probably due to the two alternative possibilities to misfold in an analogous way to the dimeric tandem (domain 1 with 2 and domain 2 with 3). Simulations with the Gō-like model also predict domain-swapped structures with monomer- and dimer-like FRET efficiencies, with relative populations similar to experiment (Supplementary Fig. 6).Much work has been dedicated to investigating the sequence specificity of protein aggregation[20,21], including the hypothesis that there is selective pressure to prevent oligimerisation by a domain-swapping mechanism[22,23]. Misfolding is often considered to precede aggregation, suggesting that sequence-specific behaviour observed in aggregation also applies to misfolding. Our single-molecule FRET experiments now allow us to test this hypothesis directly by investigating mixed tandem constructs, I27-I28 and I27-I32 (Fig. 4b). Indeed I27-I28, natural neighbours in titin with only 24% sequence identity, did not yield any detectable population of high-FRET misfolded species upon refolding (Fig. 4c,d). However, misfolding is seen upon refolding of I27-I32 (sequence identity 42%) (Fig. 4e,f), to the same extent as I27-I27 (Supplementary Table 1). This misfolded species also unfolds with the same rate constant as that of I27-I27 (Supplementary Fig. 7). The I27-I32 misfold is consistent with previous experiments showing chimeric domains of I27 and I32 to be stable[24,25]. This result strongly supports the idea that protein misfolding is sequence-specific. In proteins where sequence identity between neighbouring domains is high, the topology may prevent formation of stable misfolded species[19].
Figure 4
Transfer Efficiency Histograms of Tandem Constructs with Identical and Non-identical Domains. (a) and (b) I27-I27 ‘never-unfolded’ control and refolded, respectively. (c) and (d) I27-I28 ‘never-unfolded’ control and refolded, respectively. (e) and (f) I27-I32 ‘never-unfolded’ control and refolded, respectively. In order to mimic the natural protein, there was no linker added between the domains in these experiments. Note that the frequency of misfolding was the same for I27-I27 with and without the linker, 5.5(±0.2)% and 5.7(±0.5)%, respectively (compare Fig. 2c with Fig. 4b and Supplementary Table 1). Addition of the four amino acid linker also made no difference to the results for I27-I28 (Supplementary Fig. 9).
Misfolding in our experiments is more frequent than had been observed in AFM experiments[9], suggesting that the tethering in those experiments reduces misfolding; this might be advantageous for titin domains in vivo. Unfolding of the misfolded species observed with AFM showed them to have the same mechanical resistance as correctly folded I27, but twice the chain length is released upon unfolding. Our results are entirely compatible with this finding. While the misfolded species has two folded domains, and is thus stable in folding conditions, only the terminal domain would experience shearing of the H-bonds between the parallel A’ and G strands (Fig. 1d
circled) perpendicular to the direction of the applied force, in the same way as the correctly folded I27[26], resulting in the same mechanical stability. Since force is not applied to the A and G strands in the central domain, this domain is likely to unfold at low force, together with the terminal domain. This hypothesis is supported by simulations (Supplementary Fig. 8).In summary, our results suggest that diversifying the sequence composition between neighbouring domains is an effective evolutionary strategy to ensure efficient folding in multidomain proteins and avoid the formation of stable misfolded species. This adds a significant piece of the puzzle in understanding the problems encountered during the crucial evolutionary transition from single to multi-domain proteins.
Methods Summary
For details of protein production, ensemble equilibrium measurements and labelling see Methods. Single-molecule experiments, instrumentation, data reduction and analysis are detailed in Methods. The resulting relative populations from all experiments and analysis techniques are summarised in Supplementary Table 1. Folding simulations using a Gō-like model were run using the CHARMM code[27] as described in Methods. For details of mechanical unfolding simulations see Methods.
Authors: Pétur O Heidarsson; Mohsin M Naqvi; Mariela R Otazo; Alessandro Mossa; Birthe B Kragelund; Ciro Cecconi Journal: Proc Natl Acad Sci U S A Date: 2014-08-25 Impact factor: 11.205