Literature DB >> 17545201

MinActionPath: maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locally harmonic energy landscape.

Joel Franklin¹, Patrice Koehl, Sebastian Doniach, Marc Delarue.

Abstract

The non-linear problem of simulating the structural transition between two known forms of a macromolecule still remains a challenge in structural biology. The problem is usually addressed in an approximate way using 'morphing' techniques, which are linear interpolations of either the Cartesian or the internal coordinates between the initial and end states, followed by energy minimization. Here we describe a web tool that implements a new method to calculate the most probable trajectory that is exact for harmonic potentials; as an illustration of the method, the classical Calpha-based Elastic Network Model (ENM) is used both for the initial and the final states but other variants of the ENM are also possible. The Langevin equation under this potential is solved analytically using the Onsager and Machlup action minimization formalism on each side of the transition, thus replacing the original non-linear problem by a pair of linear differential equations joined by a non-linear boundary matching condition. The crossover between the two multidimensional energy curves around each state is found numerically using an iterative approach, producing the most probable trajectory and fully characterizing the transition state and its energy. Jobs calculating such trajectories can be submitted on-line at: http://lorentz.dynstr.pasteur.fr/joel/index.php.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins

Year: 2007 PMID： 17545201 PMCID： PMC1933200 DOI： 10.1093/nar/gkm342

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Both structural and dynamical properties of macromolecules are essential to understand in order to account for their biological function. There are numerous examples (1) of large-scale and biologically important structural rearrangements in the Protein Data Base (PDB), including allosteric molecules, molecular motors and receptors that undergo a transition from an open to a closed state upon ligand binding (induced fit). While experimental methods such as X-Ray crystallography can give an atomic description of the two end states, the transition state is inaccessible to such structural methods because it is by nature unstable. NMR might be amenable to give some answers to the dynamical aspects of the transition as shown very recently (2) but such applications will likely be very difficult for large macromolecular systems such as molecular motors. Yet it would be of tremendous importance to have access to the structural characteristics of such a transition state. Indeed this would open the way to design drugs against transiently formed intermediate structures instead of just the starting or end points of the transition. Also, it would help to understand enzymology at the molecular level, which proceeds through the stabilization of the transition state complex with the substrate(s) (3). When studying structural transitions, it is important to distinguish between morphing techniques (4), which interpolate linearly between the starting and end states, and reaction path direct determination using physical potentials. Indeed, interpolation techniques will likely fail for large amplitude transitions and a physical description of the transition is to be preferred, if feasible, over a purely geometrical one in all cases. Simulation techniques such as molecular dynamics (MD) can in principle be used to study such transition in atomic detail, but the time scale accessible to such methods is several orders of magnitude smaller than the time scale during which these phenomena occur in solution. A common strategy then is to resort to coarse-grained models. Among the various existing coarse-grained models, two of them have been intensively studied in the past for structural transitions, namely the elastic network model (ENM) (5) and the Go model (6). While it is clear that ENM has the potential to describe a good part of naturally occurring and documented structural transitions through a handful of low-frequency normal modes derived from this simple model (7) the problem remains that the ENM has by definition only one minimum and is therefore inadequate to describe the full transition. This problem is also present in the Go model, which basically assumes that only native-like contacts can occur during the transition whereas in reality non-native contacts may well appear and disappear in the process. Recently, several groups have started to address these points through innovative ways, taking into account the double-well character of the energy landscape. The main difficulty with these methods is that they have to make simplifying assumptions in order to locate the crossing points of the two energy surfaces (8–11). Here we revisit the original Kramers problem (12) of finding the trajectory between two stable states experiencing each a harmonic potential. We use an entirely different method, the so-called Onsager–Machlup method, which reformulates the Langevin equation as an action minimization problem (13). Such an approach has been used recently by several groups to study both peptide (14,15) and protein folding-unfolding transitions (16–19), using classical force fields and numerical simulations. Here we show that by using a normal mode representation we can solve the equations on each side of the transition analytically, i.e. without any approximation; the problem is then narrowed down to finding the crossing point of these two solutions, which is achieved numerically through a one-dimensional search. We have implemented the technique on a web server that will automatically generate a trajectory between two states provided by the user, given just a few parameters: the energy difference between the two states (usually close to zero or a few kT), the relative spring constants of the harmonic potentials, which can be estimated through a fit with the experimental B-factors, and the cutoff radius of the ENM (typically 10–12 Å).

MATERIALS AND METHODS

We assume that the energy landscape around each state (initial and final) is harmonic. To describe the transition between the two states we must solve the Langevin equation in the following energy landscape: for the left side of the transition state and for the right side of the transition state, where Q and P are the Hessians around the initial states X and the final state X, respectively, and ▵E accounts for a difference in energy between the two states X and X. Assuming an overdamped regime and following Onsager and Machlup (13), the stochastic Langevin equation is transformed into a deterministic differential equation by asking for a minimum mechanical action of the form S = ∫ (dX/dt + ∂ V/∂ X)2dt. This leads to: and where we require continuity for both positions and velocities at the crossing point t = t0. An analytical solution can be found for both the left and right sides using boundary conditions X(t = 0) = X and X(t = t0) = X#, and X(t = t0) = X# and X(t = T) = X, respectively, by decomposing both Q and P into eigenstates (normal modes) and solving each mode separately (the null space corresponding to the three overall translations and three overall rotations is treated separately). Continuity of speed dX/dt is also required at the crossing point t0 to fully specify the analytical solution. An initial guess is made for t0, which is then progressively refined numerically until U<(X(t0)) = U>(X(t0)), up to machine precision; we find that this requires the use of all 3N normal modes (Franklin et al., in preparation). In order to study both a simple and realistic model of the system, we use the ENM based on a C-alpha (CA) representation of the molecule, that captures well collective and large-scale amplitude movements which typically occur in macromolecular transitions (7,20). The Energy thus reads where the sum is restricted to those pairs with an inter-atomic distance less than R in the ground state (5). The spring constant k, which sets the scale of the energy E, can be taken to be different for the initial state (k<) and the final state (k>). Also, it is possible to assign a much stronger elastic constant (100k) for distances involving between consecutive CAs, which should remain at 3.8 Å, except for the rare cis-peptides bonds (21).

Protocol

A typical run of the program consists of the following steps (Figure 1):

Figure 1.

Flow chart of the web site. The input consists of two PDB files of one macromolecule in two different conformations, the estimated energy difference between the two states, the ENM cutoff (10–12 Å) and the elastic constants for the two states, which can be estimated on-line. The returned output is the most probable trajectory between the two states (loadable by PyMol) and the coordinates of the transition state, its energy E# with respect to the initial state and the value of the minimized action Stot in the trajectory. Insert: One-dimensional representation of the shift on the crossing point of the two harmonic curves when using different energy differences between the two states, and when using an increasing elastic constant for the end state.

Extract the CAs atoms from both the initial and final states and superimpose them with Profit (http://www.bioinf.org.uk/software/profit/index.html). For nucleic acids, we would typically use three atoms per nucleotide (P, C1’ and C4’). Fit the experimental B-factors, if they are known, of each form, to the computed <▵r2>, to estimate both k< and k>. The default is k< = k> = 0.1 kcal/mol/Å2 Submit the trajectory job. The only physical parameter remaining to be chosen is ▵E, the energy difference between the two states. In general ▵E is unknown but of the order of a few kcal/mol and can be refined in the following step of the procedure. The length of the simulation T is fixed by the prescription that Stot has reached a plateau. A few increasing trial values may be necessary to ensure this prescription, which is usually met with T = 150–200. The number of sampling points is arbitrary since the maximum likelihood trajectory is obtained analytically so that one can sample it at any arbitrary set of time snapshots, at the user's convenience. Once a trajectory has been calculated, both Stot and E# are examined: Stot to check for asymptotic behavior and E# to compare with ▵G#exp, if available. Different values of ▵E [step (iii)] can be tried to achieve the desired value of ▵G#exp which is related to the observed rate constant of the reaction. Once a satisfactory trajectory has been obtained, all atom models for the transition state and possibly each frame of the trajectory can be reconstructed using online tools. In particular, all atoms can be generated for proteins from CA-only coordinates by first generating backbone atoms using a library of fragments of the PDB of length 4 (22) and then generating all sidechain atoms using the self-consistent Mean Field method (23). This will inevitably deform somewhat the CA positions, usually within 0.5–0.7 Å. The whole procedure can therefore be described as essentially coarse-grained, producing trajectories for CAs only and then building sidechains. This is in line with the description of the dynamics of proteins made earlier by Hinsen and Kneller (24), which they describe as essentially harmonic and where anharmonicity enters in the rigid-body movement of side chains (described here as rotamers). Flow chart of the web site. The input consists of two PDB files of one macromolecule in two different conformations, the estimated energy difference between the two states, the ENM cutoff (10–12 Å) and the elastic constants for the two states, which can be estimated on-line. The returned output is the most probable trajectory between the two states (loadable by PyMol) and the coordinates of the transition state, its energy E# with respect to the initial state and the value of the minimized action Stot in the trajectory. Insert: One-dimensional representation of the shift on the crossing point of the two harmonic curves when using different energy differences between the two states, and when using an increasing elastic constant for the end state. One might argue that the reconstruction of all atoms introduces some damage in the continuity of the trajectory. An interpolation technique could be used to restore this continuity, or, alternatively, these snapshots could be used for ‘steered’ MD simulations using harmonic restraints on the CA positions.

RESULTS

We tested the structural transitions studied in (25) for which the root mean square distance (rmsd) between the two forms is in the range of 3.0 Å to 15 Å. In all cases the algorithm produced a solution that was subsequently checked visually using PyMol (http://www.pymol.org) or VMD (www.ks.uiuc.edu/Research/vmd) and with a post-processing program that showed they were satisfying both in terms of the absence of steric clashes and of the small deviation of consecutive CA–CA bond lengths around the ideal value of 3.8 Å (Table 1). This allowed in all cases the reconstruction of all atom models for the transition states. The largest computation was performed on citrate synthase, a dimer of 852 CA atoms, for which it takes about 2 h c.p.u. time on a 2.8 GHz Pentium IV Linux workstation. The same computation takes about 1 min for calmodulin (138 residues). We do not recommend submitting jobs with files containing more than 1000 atoms. A couple of transitions were taken from (20), in order to have a representative set of protein sizes: this allowed to determine the power law of the c.p.u. time as a function of size namely O(N3.6).

Table 1.

Structural transitions for which a path has been calculated (20,25)

Name	PDB	Rmsd (Angstrom)	# aa	C.p.u. (mn)	d(i, i + 1) trans. state
Calmodulin	1K9K	4.52	89	<1	4.05 (0.55)
	1K9P
Calmodulin	1CTR	15	138	1	4.66 (0.93)
	1CLL
Dihydrofolate reductase	1RX2	1.22	160	1	3.80 (0.09)
	1RX6
T4	178L	3.45	162	1	3.80 (0.11)
lysozyme	256L
Adenylate kinase	4AKE	7.1	214	2	3.97 (0.19)
	1AKE
Glutamine binding	1GGG	5.3	225	2	3.87 (0.21)
	1WDN
Ornithine binding	2LAO	4.7	242	2	3.87 (0.16)
	1LST
DNA Pol	1BPX	2.8	326	6	3.78 (0.12)
beta	1BPY
Maltodextrin binding	1ANF	3.8	370	9	3.86 (0.126)
	1OMP
Pol I	3KTQ	1.96	528	27	3.84 (0.36)
Taq	2KTQ
Pol I	1L3V	2.06	580	35	3.82 (0.12)
Bacillus	1LV5
Lactoferrin	1LFG	4.7	691	76	3.87 (0.16)
	1CB6
Citrate synthase	5CSC	3	852	160	3.84 (0.21)
	6CSC

Structural transitions for which a path has been calculated (20,25) Particularly impressive was the ability of the program to generate a continuous trajectory for calmodulin (in which case the rmsd between the final and initial states is 15 Å), for which a morphing technique (26) that interpolates all intramolecular distances within Rc and recently implemented on a web server by both this group (27) and ours (28) required adjusting Rc by trial and error. Due to the large-scale character of the transition, calmodulin displayed the largest deviations of CA–CA distances but this did not prevent reconstruction of all atoms. We now discuss in more detail the case of adenylate kinase (rmsd of 7.1 Å between final and initial state), which is a widely used and studied example of an induced-fit mechanism in the field of simulating structural transitions (8,10,11). We, as others, observe that the closure of the LID domain occurs first and is nearly completed once the rearrangement of the catalytic domain begins. The transition state is located near the end of the LID domain closure. This may be related to the observation that a few low-frequency normal modes suffice to describe the closure of the LID (25). There has been some speculation that the transition is accompanied by unfolding and ‘cracking’ due to the accumulation of elastic strain at some points in the trajectory (10,11). Note, however, that we use all 3N normal modes to describe the transition, not just a handful of low-frequency ones (10,11). To test this strain and cracking hypothesis we calculated a Q1 vs Q2 plot for this transition, where Q(t) measures the fraction of native contacts at time t along the trajectory, defined as pairs of atoms with dij

Figure 2.

Q1 versus Q2 plot for adenylate kinase transition (4AKE and 1AKE). Q1 (resp. Q2) is the fraction of native contacts (d < R) as in the initial form (resp. in the final form). The trajectory is identical, within machine precision, if one exchanges the role of the initial and final states. For comparison, the same plot is drawn for a trajectory generated according to UMMS (26,27). Another possibility is to plot Q’1(t)–Q’2(t) versus Qcommon(t), i.e. the difference of the number of contacts specific to each state vs the number of contacts seen in both states, for each snapshot (9,11). It is then seen that only about 5% of the common contacts are temporarily lost during the transition (Supplementary Figure S2). Finally, because the calculation is very rapid for adenylate kinase, it is even possible to screen the effect of mutations of each residue on the kinetics of the transition: for each residue, in turn, we assign a different elastic constant (10k) describing its interactions with its neighbors, and we report the transition state energy of the calculated trajectory (Supplementary Figure S3). This is a new application analogous to the phi-values in protein folding studies, allowing direct comparison with experiments.

DISCUSSION

There exists several variants of the ENM that have also been implemented by us, namely (i) the chemical bond model (21), to take care of possible outliers in CA–CA bond distances and (ii) a distance-dependent weighting scheme for the spring constant as in Hinsen (29), who used or Bahar and colleagues (30), who used . This has the effect of putting more constraints to short distances, which are the most critical ones (they should never come close to zero). Also, one must mention the fact that the Langevin equation used here has constant atomic friction coefficients (set to unity); one way to use more realistic environment-dependent friction coefficients is described in (24) and has also been implemented by us, with qualitatively little effect on the trajectory. The robustness of the trajectory with respect to systematic variations of the k ratio, on one hand, or ▵E, on the other hand, has been checked (Supplementary Figures S4 and S5). The method works if the harmonic approximation for the energy landscape around each state is valid. It would not make any sense to generate a trajectory between two states when it is known that there is an obligatory intermediate state, just to see if this intermediate shows up during the simulation; indeed, in this case, the harmonic approximation would not be valid any more. However, we note that the method can easily be generalized to an energy landscape with three harmonic wells, leading to a search for t0 to be conducted in a plane instead of a line. The procedure presented here is quite general, and not limited to proteins. In particular, we are interested in looking at structural transitions occurring in nucleic acids, both DNA and RNA. In both cases however, we still need a method for generating a full atom model based on a coarse representation of the molecule (for example, only including P, C4’ and C1’ atoms for each base). We are currently working on this problem.

CONCLUSION AND FUTURE WORK

In summary, we have presented a new method that combines the action minimization formalism and the ENM to generate trajectories between two known structural states of a given macromolecule. The method is exact if each state is experiencing a harmonic potential and, contrary to other similar methods, does not resort to numerical optimization methods such as Monte Carlo or simulated annealing. It was tested on a large test set of large amplitude structural rearrangements, involving rmsds between the two forms of up to 15 Å. Extensive comparisons with other methods were done in the case of adenylate kinase, which is the most documented example in this field, leading to the systematic use of Q1 versus Q2 plots to characterize the transition. More tests and comparisons with experimental results will be necessary to assess further the usefulness of the method. This will be made possible by the present web server, which allows experimentalists to look at possible scenarios that can be easily visualized and checked with PyMol or VMD for the structural transition they are interested in. We are planning to extend the method to the simulation of a transition driven by an applied force (Franklin et al., in preparation), for which direct comparison with single-molecule experiments will then be made possible.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

25 in total

1. Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Analysis.

Authors:
Journal: Phys Rev Lett Date: 1996-08-26 Impact factor: 9.161

2. An atomically detailed study of the folding pathways of protein A with the stochastic difference equation.

Authors: Avijit Ghosh; Ron Elber; Harold A Scheraga
Journal: Proc Natl Acad Sci U S A Date: 2002-07-24 Impact factor: 11.205

3. Simplified normal mode analysis of conformational transitions in DNA-dependent polymerases: the elastic network model.

Authors: M Delarue; Y-H Sanejouand
Journal: J Mol Biol Date: 2002-07-26 Impact factor: 5.469

4. Large amplitude conformational change in proteins explored with a plastic network model: adenylate kinase.

Authors: Paul Maragakis; Martin Karplus
Journal: J Mol Biol Date: 2005-09-30 Impact factor: 5.469

5. Quantifying kinetic paths of protein folding.

Authors: Jin Wang; Kun Zhang; Hongyang Lu; Erkang Wang
Journal: Biophys J Date: 2005-07-01 Impact factor: 4.033

6. Anisotropic network model: systematic evaluation and a new web interface.

Authors: Eran Eyal; Lee-Wei Yang; Ivet Bahar
Journal: Bioinformatics Date: 2006-08-23 Impact factor: 6.937

7. Dominant pathways in protein folding.

Authors: P Faccioli; M Sega; F Pederiva; H Orland
Journal: Phys Rev Lett Date: 2006-09-06 Impact factor: 9.161

8. The Database of Macromolecular Motions: new features added at the decade mark.

Authors: Samuel Flores; Nathaniel Echols; Duncan Milburn; Brandon Hespenheide; Kevin Keating; Jason Lu; Stephen Wells; Eric Z Yu; Michael Thorpe; Mark Gerstein
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. UMMS: constrained harmonic and anharmonic analyses of macromolecules based on elastic network models.

Authors: Yunho Jang; Jay I Jeong; Moon K Kim
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

10. NOMAD-Ref: visualization, deformation and refinement of macromolecular structures based on all-atom normal mode analysis.

Authors: Erik Lindahl; Cyril Azuara; Patrice Koehl; Marc Delarue
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

35 in total

1. Manipulation of conformational change in proteins by single-residue perturbations.

Authors: C Atilgan; Z N Gerek; S B Ozkan; A R Atilgan
Journal: Biophys J Date: 2010-08-04 Impact factor: 4.033

2. Protein folding pathways and state transitions described by classical equations of motion of an elastic network model.

Authors: Gareth Williams; Andrew J Toon
Journal: Protein Sci Date: 2010-12 Impact factor: 6.725

3. Coupling between normal modes drives protein conformational dynamics: illustrations using allosteric transitions in myosin II.

Authors: Wenjun Zheng; D Thirumalai
Journal: Biophys J Date: 2009-03-18 Impact factor: 4.033

4. Independent saturation of three TrpRS subsites generates a partially assembled state similar to those observed in molecular simulations.

Authors: Poramaet Laowanapiban; Maryna Kapustina; Clemens Vonrhein; Marc Delarue; Patrice Koehl; Charles W Carter
Journal: Proc Natl Acad Sci U S A Date: 2009-01-27 Impact factor: 11.205

10. Large-scale evaluation of dynamically important residues in proteins predicted by the perturbation analysis of a coarse-grained elastic model.

Authors: Wenjun Zheng; Mustafa Tekpinar
Journal: BMC Struct Biol Date: 2009-07-10