Literature DB >> 25296279

Optimal Dimensionality Reduction of Multistate Kinetic and Markov-State Models.

Gerhard Hummer1, Attila Szabo2.   

Abstract

We develop a systematic procedure for obtaining rate and transition matrices that optimally describe the dynamics of aggregated superstates formed by combining (clustering or lumping) microstates. These reduced dynamical models are constructed by matching the time-dependent occupancy-number correlation functions of the superstates in the full and aggregated systems. Identical results are obtained by using a projection operator formalism. The reduced dynamic models are exact for all times in their full non-Markovian formulation. In the approximate Markovian limit, we derive simple analytic expressions for the reduced rate or Markov transition matrices that lead to exact auto- and cross-relaxation times. These reduced Markovian models strike an optimal balance between matching the dynamics at short and long times. We also discuss how this approach can be used in a hierarchical procedure of constructing optimal superstates through aggregation of microstates. The results of the general reduced-matrix theory are illustrated with applications to simple model systems and a more complex master-equation model of peptide folding derived previously from atomistic molecular dynamics simulations. We find that the reduced models faithfully capture the dynamics of the full systems, producing substantial improvements over the common local-equilibrium approximation.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25296279      PMCID: PMC4516310          DOI: 10.1021/jp508375q

Source DB:  PubMed          Journal:  J Phys Chem B        ISSN: 1520-5207            Impact factor:   2.991


Introductory Remarks

Suppose we divide a high-dimensional dynamical system into two parts, 1 and 2, and wish to describe its dynamics by a two-state kinetic system, 12. What is the best way of choosing the rate constants R12 and R21? To get the thermodynamics right, the ratio of the rate constants must equal the exact equilibrium constant, R12/R21 = Peq(1)/Peq(2), where Peq(I) is the equilibrium population of state I. The sum of the rate constants, R12 + R21, which is the inverse relaxation time, can be chosen in a variety of ways. Arguably the simplest is to make the number of transitions exact between the two states at equilibrium. This is equivalent to the so-called local-equilibrium approximation, where it is assumed that the time evolution of each microstate within a given superstate is the same, but with an amplitude proportional to its equilibrium population. This approximation is an excellent one if the interconversion of the microstates within a given superstate happens to be much faster than that between the two aggregated states. Otherwise, it is valid only at short times. Alternatively, one can devise an approximation valid at long times by setting the sum of the rate constants equal to the absolute value of the first nonzero eigenvalue of the operator that describes the dynamics of the entire system. This approximation will tend to perform poorly at short times. There is, however, a compromise choice. One can force the relaxation time for the equilibrium fluctuations of the populations of the two states to be exact. Such fluctuations are described by the correlation function ⟨θ(t)θ(0)⟩ of an indicator θ(t) that is equal to 1 when the system is in state I at time t, and zero otherwise. Since this occupancy-number correlation function is in general multiexponential, its relaxation time is defined as the integral over all times of an appropriately normalized form of this correlation function. Thus, in essence, a multiexponential correlation function is approximated here by a single-exponential function with the exact relaxation time. The two curves deviate both at short and at long times, yet the areas under them are the same. It is this procedure that we will generalize to the case when the dynamical system is divided into many discrete substates (see Figure 1, top). For the sake of simplicity, we begin by assuming that the undivided system consists of discrete states whose dynamics is described by a master or rate equation. The generalization to a continuous state space, divided into cells, is straightforward. We also derive reduced Markov transition matrices for Markov-state models with dynamics in discrete time and space. Such models appear not only in the modeling of experimental data, but increasingly also in molecular simulation studies[1−8] to deal with the problem of sampling rare transitions[9] and large conformation spaces by using dynamic information collected locally over short time.[10−17] Correlation functions have been used in the validation[18] and construction[19] of such models.
Figure 1

Schematic illustrating the reduction of an n = 18 multistate kinetic model by lumping microstates (open circles; top left) into N = 3 superstates (shaded areas; top left and right). The n × n rate matrix K for transitions i → j between microstates i and j is reduced to an N × N matrix whose elements R for transitions I → J between superstates I and J depend explicitly on time in the full non-Markovian description (eq 11) and are time-independent in the Markovian approximation (eq 12).

Schematic illustrating the reduction of an n = 18 multistate kinetic model by lumping microstates (open circles; top left) into N = 3 superstates (shaded areas; top left and right). The n × n rate matrix K for transitions i → j between microstates i and j is reduced to an N × N matrix whose elements R for transitions I → J between superstates I and J depend explicitly on time in the full non-Markovian description (eq 11) and are time-independent in the Markovian approximation (eq 12). By allowing the dynamics of the reduced system to be non-Markovian (i.e., described by a time-dependent rate matrix), one can make the entire time dependence (not only the relaxation times!) of all equilibrium occupancy-number correlation functions exact. At first sight, this may be somewhat surprising, but the fact that this is possible follows from Zwanzig’s paper[20] “From Classical Dynamics to Continuous-Time Random Walks.” He divided configuration space into discrete cells and showed that when the Liouville equation was converted into a generalized Langevin equation for the cell occupancies using projection operators, the resulting noise term vanishes when initially all substates in a cell are in local equilibrium. Consequently, for such initial conditions, the probability of finding the system in a given cell satisfies a generalized (non-Markovian) master equation. In the first part of the Theory section of this paper, we show how this can be done in a completely elementary way. In the second part, we use a projection operator approach to combining states and show that it leads to the same results that we obtained in a less formal way. We then apply the general formalism to the dynamics of systems in a continuous state space, and to Markov-state models with discrete time steps. Finally, we present several simple illustrative examples and discuss the implication of our results on the important problem of how to choose the states that should be combined.

Theory

Definitions and General Framework

Consider a system with n discrete states, whose dynamics is described by an n × n rate matrix, whose elements K are the rate constants that describe the i-to-j transition. Conservation of probabilities require that K = −∑K or, in matrix notation, 1K = 0, where 1 is a column vector with n unit elements, and superscript T denotes the transpose. The populations p(i, t) evolve according toor in Laplace space [f̂(s) = ∫0∞dt exp(−st)f(t) for a general function f(t)]The normalized equilibrium populations are solutions of Kpeq = 0, ∑peq(i) = 1. For future reference, note that the s→0 limit of the Laplace transform of a function is just the area under that function from 0 to ∞. We wish to combine (i.e., aggregate or lump) the states of the original system (labeled by lower-case indices i = 1, 2, ..., n) into “superstates” labeled by capital indices I = 1, 2, ..., N, where N < n (see Figure 1). We will require that not only the equilibrium populations but also the occupancy-number correlation functions of the reduced states are exact. We will show that this is possible if we describe the dynamics of populations P(I, t) of the superstates by the non-Markovian rate equations of the formor in Laplace space using the convolution theoremwhere R̂(s) is a reduced N × N matrix with elements R̂(s), I, J = 1, 2,..., N, that remain to be determined (Figure 1, bottom right).

Construction of Exact Kinetic Model for Aggregated Superstates

Let Peq(I) be the normalized equilibrium population of state I. We wish to construct an R̂ with the property that R̂Peq = 0 forwhere the sum is over all substates i in superstate I. The occupancy-number auto- and cross-correlation functions of the original system are denoted by ⟨θ(t)θ(0)⟩, where the indicator function θ(t) is equal to one if the system is in state i at time t, and zero otherwise. We recall that a general equilibrium correlation function ⟨f(t)g(0)⟩ [where f(t) = ∑fθ(t) and g(t) = ∑gθ(t)] is defined in terms of the propagator or Green’s function G(i, t|j, 0) as ⟨f(t)g(0)⟩ = ∑fG(i, t|j, 0)gpeq(j). The propagators are the conditional probabilities that the system starting in state j at time t = 0 is in state i at time t and thus describe the time evolution of the populations (e.g., p(i,t) = ∑G(i,t|j,0)p(j,0)). They are the solution of eq 1 with initial conditions that at t = 0, G(i,0|j,0) = δ, with the Kronecker δ equal to 1 for i = j and zero otherwise. Thus, in matrix notation, G = exp(tK). Its Laplace transform is Ĝ = (sI – K)−1, where I is the unit matrix of dimension n. From these definitions it follows that the occupancy-number correlation functions areor in Laplace space, with Ĝ(i, s|j) the Laplace transform of G(i, t|j, 0),Thus, the exact number correlation functions in the reduced system denoted by C(t) = ⟨θ(t)θ(0)⟩ can be easily found by simply summing the above results over all i ∈ I and j ∈ J. In Laplace space we obtainIf we were to calculate Ĉ directly for the reduced system using the matrix R̂(s) introduced in eq 2 we would have (sI – R̂(s))–1Peq(J), where (sI – R̂(s))−1 is the corresponding Green’s function. Thus, if we determine R̂(s) fromthen by construction we have reached our goal of making the occupancy-number correlation functions for the reduced system exact. This is one of the key results of this paper. It has the following “non-equilibrium” interpretation. Suppose in the original system we start with an initial condition that all states are unoccupied except those that belong to substate I. Let us further assume that the initial populations of the microstates i ∈ I are proportional to their exact equilibrium populations [i.e., there is local equilibrium with p(i, t = 0) = peq(i)/Peq(I) for i ∈ I and p(i, t = 0) = 0 otherwise]. Then the subsequent populations of all superstates, calculated from the reduced non-Markovian description in eq 2, are exact for all times. As discussed in the Introductory Remarks, this result was obtained by Zwanzig in a more general case.[20]

Local-Equilibrium Approximation

We now examine the limiting cases of eq 6 that are Markovian, i.e., the dynamics is described by an ordinary N × N rate matrix with elements R. The limit s → ∞ of eq 6 corresponds to the limit of time t → 0,Hence, by defining a reduced, s-independent rate matrix Rle ≡ R̂ (s → ∞) satisfyingwe obtain a reduced description that is exact at short times. Physically, eq 7 means that the number of transitions between I and J per unit time is exact at equilibrium. This is just the local-equilibrium approximation, indicated by the superscript “le”, as can be verified by setting p(i, t) = peq(i)P(I, t)/Peq(I) in eq 1 and then summing both sides over i ∈ I for all I. The above derivation shows that, in general, it is valid only at short times.

Optimal Markovian Model

Now let us examine the (so-called) Markovian limit, s → 0, when eq 2a becomes dP/dt = RP, where R = R̂(0). One cannot simply set s = 0 in eq 6 because K has a zero eigenvalue and K–1 is thus not defined. At long times, G(i, t|j, 0) goes to Peq(i) independent of j (i.e., the system relaxes to equilibrium independent of the starting point, assuming that there are no disconnected sets of microstates). In Laplace space, this means that lim(sI – K)−1 = peq1T/s + ···. Therefore, we determine R̂(0) from This limit can be carried out using the identitywhich can be proved using the Sherman-Morrison formula[21] along with Kpeq = 0 = 1TK for any λ0. Using this identity and an analogous one involving R̂(0), with λ0 = 1 in units of reciprocal time, we find that the elements of the s-independent optimal reduced matrix are determined bywith R = R̂(0) and R = R̂(0). This can be inverted to get R, and a computationally convenient formula is given later in eq 12. We note that eq 8 is valid independent of the units of R and K because λ0 in the Sherman–Morrison formula above is arbitrary, and is conveniently set to one in units of reciprocal time here and in similar relations below. A reduced Markovian description using the reduced matrix R obtained in this way guarantees that the weighted cross-relaxation times, τ, are exact,for all I and J, with ∑τ = 0. In other words, using the reduced matrix R ensures that the areas under all occupancy-number auto and cross-correlation functions are exact. Since ∑[⟨θθ⟩ – ⟨θ⟩⟨θ⟩] = ⟨θθ⟩ – ⟨θ⟩⟨θ⟩, this also means that the cross-relaxation times t = τ/(⟨θθ⟩ – ⟨θ⟩⟨θ⟩) are exact, defined as the areas under the normalized occupancy-number correlation functions,We note in passing that if the equilibrium populations Peq and weighted cross-relaxation times τ are known, e.g., from experiment or molecular dynamics simulation, the corresponding reduced matrix can be constructed by inversion, R = Peq1 – D(PeqPeq + τ)−1, where D is a diagonal matrix with elements Peq(I). The somewhat unusual structure of eq 8 can be understood as follows. The matrix K has a spectral expansion of the form K = ∑λab = 0peq1 + ∑λab, where a (b) are the right (left) eigenvectors of K with eigenvalues λ. a1 = peq and b1 = 1 are the eigenvectors for eigenvalue λ1 = 0, and λ < 0 for j > 1. Now K–1 has a spectral expansion K–1 = (1/0)peq1 + ∑n(1/λ)abT, and hence does not exist, with the first term being infinite. However, if we add λ0peq1 to K, the resulting matrix has a spectral expansion λ0peq1 + ∑λab, and hence (λ0peq1 + K)−1 = λ0–1peq1 + ∑λ–1ab. For computational reasons it will prove convenient to rewrite the above results in matrix notation. To this end we introduce an n × N-dimensional aggregation matrix A with elementsThus, the relation P(I, t) = ∑p(i, t) can be written as P = Ap. In addition, we introduce diagonal matrices D and D = ADA with elements peq(i) and Peq(I) on the diagonal, respectively [i.e., (D) = peq(i)δ and (D) = Peq(I)δ]. Using this notation, eq 6 becomeswhich can be solved explicitly to giveSimilarly, eq 8 can be explicitly solved to give the reduced matrix in the Markovian (s → 0) limit:

Projection Operator Formulation

We now show how to construct the dynamics of the reduced system using projection operators.[22] This formalism is very general and starts by rearranging the exact dynamical equations (e.g., eq 1) for quantities of interest (e.g., the aggregated states) and whatever remains. Let be a projection operator with defined so that are the probabilities of interest. The complementary operator , defined so that , gives probabilities that are orthogonal to those of interest. Multiplying both sides of eq 1 by and then by , and setting on the right-hand sides, one findsor in Laplace spaceThese equations are exact. If one chooses initial conditions for which v(0) = 0, then one can solve eq 14b for v and substitute the result into eq 14a to get a closed equation for the quantity of interest, u:In the present context, we must choose in such a way that the time dependence of the projected populations u(t) of all microstates in a superstate I is the same, up to a proportionality factor, i.e., with c a constant that depends on i. Moreover, let us require that at equilibrium u = peq(i) for all i. These two requirements are satisfied if we setIn terms of the aggregation matrix A defined above, it follows from eq 16 that the corresponding projection operator can be written asNow to be able to use eq 15, we must restrict ourselves to initial conditions where v(0) = 0. Since v(0) = p(i, 0) – u(0) = p(i, 0) – peq(i)P(I, 0)/Peq(I) according to eq 16, it follows that v(0) = 0 for the initial condition that p(i, 0) = peq(i)P(I, 0)/Peq(I), i.e., p(i, 0) ∝ Cpeq(i) with a proportionality constant C ≥ 0 that depends only on the superstate I to which i belongs. This is exactly the condition of local equilibrium within superstates I. For such initial conditions, eq 15 is exact. However, we are not interested in the u’s but rather in the time evolution of the probabilities of the aggregated states P(I, t). From eq 16 it follows that u = DAD–1P. Using this, the definition of in eq 17, and the fact that , we can rewrite eq 15 in the same form as eq 2b, if we definewhich at first sight does not look anything like the result in eq 11 that we obtained by matching the occupancy-number correlation functions. However, they are in fact equivalent, which can be proved by using the Woodbury matrix inversion formula,[21]with M = sI – K, U = DAD–1, and V = AK.

Reduced Model for Dynamics in Continuous Space

We now show that the above results can be readily generalized when the original system is continuous and is divided into N cells. Let us denote integration over cell coordinates x in cell I by ∫dx. Since in discrete state space (sI – K)−1 is the Laplace transform of the Green’s function Ĝ(i, s|j), the appropriate generalization of eq 6 iswhere Peq(J) = ∫dxpeq(x) and Ĝ(x′, s|x) is the Laplace transform of G(x′, t|x, 0).

Markov-State Model in Discrete Time

The procedure of constructing a reduced dynamic description is also applicable to the discrete-time dynamics of Markov-state models,where M is the n × n Markov matrix of transition probabilities M from microstates j to i, and p is the vector of normalized probabilities p(i) of microstate i after step k, starting from p0. Thus, M, the kth power of M, is the discrete analogue of the Green’s function G(t). The analogue of Ĝ(s) is the generating functionwhere λ is a parameter, 0 ≤ λ ≤ 1. Therefore, the analogue of eq 6 iswith the following solution in matrix form:where T̂(λ) is the generating function of operators T that define an N × N reduced, non-Markovian transition process,withIn analogy to the s → ∞ limit above, for λ → 0 one recovers the local-equilibrium approximation valid at short times, Tle = T̂(λ = 0) ≡ T0. The Markovian limit corresponding to s → 0 is obtained for λ → 1, and interestingly eq 8 remains essentially the samewith T ≡ T̂(λ = 1) the reduced transition matrix in the Markovian limit. In matrix form, the reduced Markov transition matrix becomes

Illustrative Applications

Four-State to Two-State Reduction with Well Chosen Superstates

Consider a four-state system,To aggregate microstates 1 and 2 into a superstate 1 + 2, and 3 and 4 into another superstate 3 + 4, we define an aggregation matrixThe corresponding diagonal matrices of probabilities areandwith peq(i) = 1/4 in our example. From eq 12, we then obtain the following reduced matrix for the reduced two-state model:Figure 2 compares the time-dependent populations obtained with this reduced two-state rate matrix to those obtained by integration of the full four-state model and the local-equilibrium two-state approximation. As one would expect, with h/k becoming smaller, transitions between 1 + 2 and 3 + 4 become rarer, and the two-state approximation becomes increasingly accurate. Remarkably, even for h = k, where one might expect the two-state approximation to fail, the reduced model produces populations of the aggregated states in excellent agreement with the full four-state model (center panel of Figure 2). In the reduced model, small deviations at short and long times compensate each other, with the areas between the exact and approximate curves exactly canceling by construction (gray shading in top panel of Figure 2). By contrast, the local-equilibrium approximation eq 7 results in significant deviations already after very short times t ≪ 1/k, in particular for h ≥ k. In this example, the reduced model indeed strikes a good balance between reproducing the dynamics at short and long times, and results in significant improvements over the local-equilibrium approximation.
Figure 2

Populations of aggregated states 1 + 2 for model eq 26 obtained from the reduced rate matrix R in eq 27 (red lines) and by exact time integration of the full rate equations, eq 1a, through diagonalization of the matrix K (symbols). Results are shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting from superstate 1 at time t = 0. The reduced model R is constructed such that the shaded gray areas between the populations exactly cancel and the exact equilibrium is recovered. By contrast, the populations obtained by using the local-equilibrium approximation, shown as solid black lines, match the time evolution only near t = 0. The populations of the other superstate, 3 + 4, are not shown, since they are exactly one minus the populations of the 1 + 2 superstate.

Populations of aggregated states 1 + 2 for model eq 26 obtained from the reduced rate matrix R in eq 27 (red lines) and by exact time integration of the full rate equations, eq 1a, through diagonalization of the matrix K (symbols). Results are shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting from superstate 1 at time t = 0. The reduced model R is constructed such that the shaded gray areas between the populations exactly cancel and the exact equilibrium is recovered. By contrast, the populations obtained by using the local-equilibrium approximation, shown as solid black lines, match the time evolution only near t = 0. The populations of the other superstate, 3 + 4, are not shown, since they are exactly one minus the populations of the 1 + 2 superstate.

Four-State to Three-State Reduction with Poorly Chosen Superstates

Now we instead aggregate the central microstates 2 and 3 to form a three-state system. We then end up with a reduced matrixThe three ordered eigenvalues 0, – 2hk/(2h+k), and −2k of R are zero or negative, and agree exactly (eigenvalues 1 and 3) and to second order in k/h (eigenvalue 2) with those of the original rate matrix K. However, we notice that in this matrix the off-diagonal elements R13 = R31 are negative, i.e., in this example, R is not strictly a rate matrix. Thus, while it is always possible to construct a matrix R that leads to the exact relaxation times, there is no guarantee that it will have positive off-diagonal elements. If we nonetheless integrate the “rate equations” for R, dP/dt = RP, the solutions are all positive after a brief initial period, as illustrated in Figure 3. As h/k increases, the three-state approximation, with microstates 2 + 3 aggregated, becomes increasingly accurate. Visually, one can recognize that the time-integrated area between the exact populations and those of the reduced model is zero, which is the criterion used to match the relaxation times. By contrast, the simple local-equilibrium approximation works well only at very short times.
Figure 3

Populations of aggregated states 1 (red), 2 + 3 (green), and 4 (blue) for model eq 26 obtained from the reduced R matrix in eq 28 (lines) and by exact integration of eq 1a (symbols). Results are shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting from superstate 1 at time t = 0. Whereas the population of state 2 + 3 is exact, exhibiting a single-exponential relaxation, the population of state 4 initially goes negative in the reduced model. For reference, the populations obtained from the local-equilibrium approximation are shown as solid black lines for states 1 and 4, with state 2 + 3 not shown since it is again integrated exactly for this model. Note that the reduced model is exact for all states if the initial state is the equilibrated 2 + 3 state.

Populations of aggregated states 1 (red), 2 + 3 (green), and 4 (blue) for model eq 26 obtained from the reduced R matrix in eq 28 (lines) and by exact integration of eq 1a (symbols). Results are shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting from superstate 1 at time t = 0. Whereas the population of state 2 + 3 is exact, exhibiting a single-exponential relaxation, the population of state 4 initially goes negative in the reduced model. For reference, the populations obtained from the local-equilibrium approximation are shown as solid black lines for states 1 and 4, with state 2 + 3 not shown since it is again integrated exactly for this model. Note that the reduced model is exact for all states if the initial state is the equilibrated 2 + 3 state.

Reduction of 32-State Protein Folding Model

As a realistic example, we consider a 32-state model for the formation of a short α-helix in water.[18] It was previously shown that this 32-state model could be reasonably well approximated with a two-state model (with a helically folded state F and an unfolded state U) or a four-state model (with two folded states F1 and F2, and two unfolded states U1 and U2). Here we use the aggregation into superstates determined in ref (18) by using a semiquantitative procedure. Below, in the Concluding Remarks, we show how these superstates can also be found by using a quantitative hierarchical procedure that uses the reduced matrix. We now reduce the 32-state rate matrix into a two-state model (F and U), a three-state model (F1, F2, and U), and a four-state model (F1, F2, U1, and U2) based on the definition of these superstates given in ref (18). Using eq 12, we find:andin units of 1/ns, with the subscript indicating the order of the states (left to right, and top to bottom). We note that for these well-chosen superstates, all off-diagonal elements of the reduced matrices are positive. As listed in Table 1, the eigenvalues of the matrices, and thus the slowest relaxation times given by their negative reciprocals, are in excellent agreement with those of the full rate matrix.
Table 1

Relaxation Rates [1/ns] from Eigenvalues of Full 32-State Rate Matrix K and Reduced Rate Matrices R with N = 4, 3, and 2 Superstates, Respectivelya

states–λ1–λ2–λ3–λ4
3200.1610.5300.660
400.167 (0.244)0.556 (0.608)0.705 (0.788)
300.167 (0.244)0.557 (0.609) 
200.174 (0.282)  

Numbers in parentheses are for the local-equilibrium approximation Rle, which performs significantly worse in all cases.

Numbers in parentheses are for the local-equilibrium approximation Rle, which performs significantly worse in all cases. Starting from state U or U1, respectively, we calculated the time-dependent populations of the superstates and compared them to the exact populations obtained by integration of the 32-state coupled rate equations. As shown in Figure 4, the reduced model obtained by matching the relaxation times gives excellent approximations to the full dynamics, already at the two-state level. By contrast, the local-equilibrium approximation fails at times much shorter than the global relaxation time, which again is reflected in the inaccurate eigenvalues of the resulting reduced rate matrix Rle (Table 1).
Figure 4

Protein folding kinetics. The full peptide-folding model has 32 states[18] and was reduced to 2 states (F, U; top), 3 states (F1, F2, U; middle) and 4 states (F1, F2, U1, U2; bottom). At time t = 0, the system starts from an equilibrated state U, U, and U1, respectively (top to bottom). Exact populations as a function of time are shown as symbols. Solid lines of matching color show the results obtained for the reduced models with two to four states. For reference, populations obtained with the local-equilibrium approximation are shown as thin black lines.

Protein folding kinetics. The full peptide-folding model has 32 states[18] and was reduced to 2 states (F, U; top), 3 states (F1, F2, U; middle) and 4 states (F1, F2, U1, U2; bottom). At time t = 0, the system starts from an equilibrated state U, U, and U1, respectively (top to bottom). Exact populations as a function of time are shown as symbols. Solid lines of matching color show the results obtained for the reduced models with two to four states. For reference, populations obtained with the local-equilibrium approximation are shown as thin black lines.

Concluding Remarks

This paper considered how to construct a Markovian rate (or transition) matrix for a given choice of aggregated states. We have not discussed the important problem of identifying microstates that can be faithfully aggregated. This problem of lumping microstates is central to the analysis of kinetic data from simulation and experiment alike, from the modeling of measured kinetics data to the construction of Markov-state or master-equation models.[1,18,23−26] The procedures introduced here can help also in this endeavor. Specifically, we would like to take this opportunity to propose a hierarchical approach that should work well with large numbers of microstates. We first order all n microstates according to the components b2(i) of the left eigenvector of K corresponding to the largest nonzero eigenvalue (i.e., to make the second eigenvector nondecreasing as the state index increases, where we assume that this eigenvalue is not degenerate). We then divide the system into two states, one including the first k microstates in this ranked list, and the other the remaining n – k microstates. For each of these aggregations, we calculate the reduced two-state rate matrix R, and in turn the relaxation time as the negative reciprocal of its nonzero eigenvalue. We then find the value of k that maximizes this relaxation time. At the next level, this procedure can be repeated for the superstates obtained in the previous rounds, setting rates out of these superstates to zero. This recursive procedure can be truncated once the relaxation times in all possible divisions fall below a set threshold. We applied this hierarchical aggregation procedure to the 32-state helix folding model of ref (18). The results are shown in Figure 5. By identifying the slowest relaxation for divisions first of the entire system and then of the resulting superstates, we recover in the first step the F and U states of ref (18); then F1, F2, and U; and finally U1 and U2. So at least in cases without significant degeneracies in the eigenvalue spectrum, the hierarchical procedure based on maximizing the relaxation time of the reduced rate matrix produces sensible superstates that lead to reduced models whose characteristic times closely match those of the full system (Table 1). It will be interesting to see how well this algorithm performs in even more complex contexts.
Figure 5

Hierarchical aggregation procedure applied to 32-state helix folding model of ref (18). (A) Division into two superstates. States i were rank-ordered according to their component in the second left-hand eigenvector b2(i) (blue circles; right axis). Different superstates were formed by aggregating the first k of these ordered states into one superstate, and the remaining ones into the other. The relaxation time −1/λ2 of the resulting reduced two-state rate matrix was calculated by diagonalization (filled red squares; left axis). We obtain the longest relaxation time by lumping six microstates into a folded state F, and the remaining 26 microstates into an unfolded state U, as indicated by the shading. Microstates are labeled in the binary notation of ref (18) (top to bottom: N to C terminus; 1 indicating a helical residue). (B) Division of F into superstates F1 and F2. (C) Division of U into superstates U1 and U2. The resulting superstates F, F1, F2, U, U1, and U2 are all identical with those found previously with a more heuristic approach in ref (18).

Hierarchical aggregation procedure applied to 32-state helix folding model of ref (18). (A) Division into two superstates. States i were rank-ordered according to their component in the second left-hand eigenvector b2(i) (blue circles; right axis). Different superstates were formed by aggregating the first k of these ordered states into one superstate, and the remaining ones into the other. The relaxation time −1/λ2 of the resulting reduced two-state rate matrix was calculated by diagonalization (filled red squares; left axis). We obtain the longest relaxation time by lumping six microstates into a folded state F, and the remaining 26 microstates into an unfolded state U, as indicated by the shading. Microstates are labeled in the binary notation of ref (18) (top to bottom: N to C terminus; 1 indicating a helical residue). (B) Division of F into superstates F1 and F2. (C) Division of U into superstates U1 and U2. The resulting superstates F, F1, F2, U, U1, and U2 are all identical with those found previously with a more heuristic approach in ref (18). In summary, we have developed a systematic procedure to construct reduced dynamic descriptions of aggregated superstates obtained by combining (or clustering or lumping) microstates. The procedure is generally applicable to kinetic (or master equation) models with discrete states and continuous time evolution, Markov-state models with discrete states and discrete time evolution, or continuous space models with discrete or continuous time evolution. The reduced dynamic models are exact in their non-Markovian formulation. In the approximate Markovian limit, we provide simple analytic expressions for the reduced rate or Markov transition matrices. Even under the Markovian approximation, one recovers exact auto- and cross-relaxation times. The resulting reduced models thus strike an optimal balance between recovering the dynamics at short and long times. This approach is not only useful to construct reduced models for already defined groupings of microstates into superstates, but also helps in finding optimal superstates. Specifically, we found that maximizing the relaxation time of the reduced-matrix model provides a quantitative criterion that can be used in a hierarchical construction of superstates through aggregation of microstates.
  16 in total

1.  Essential dynamics of reversible peptide folding: memory-free conformational dynamics governed by internal hydrogen bonds.

Authors:  B L de Groot; X Daura; A E Mark; H Grubmüller
Journal:  J Mol Biol       Date:  2001-05-25       Impact factor: 5.469

2.  Absolute comparison of simulated and experimental protein-folding dynamics.

Authors:  Christopher D Snow; Houbi Nguyen; Vijay S Pande; Martin Gruebele
Journal:  Nature       Date:  2002-10-20       Impact factor: 49.962

3.  A coarse graining method for the identification of transition rates between molecular conformations.

Authors:  Susanna Kube; Marcus Weber
Journal:  J Chem Phys       Date:  2007-01-14       Impact factor: 3.488

4.  Finding transition pathways using the string method with swarms of trajectories.

Authors:  Albert C Pan; Deniz Sezer; Benoît Roux
Journal:  J Phys Chem B       Date:  2008-02-22       Impact factor: 2.991

5.  Coarse master equations for peptide folding dynamics.

Authors:  Nicolae-Viorel Buchete; Gerhard Hummer
Journal:  J Phys Chem B       Date:  2008-01-31       Impact factor: 2.991

6.  A minimum-reaction-flux solution to master-equation models of protein folding.

Authors:  Huan-Xiang Zhou
Journal:  J Chem Phys       Date:  2008-05-21       Impact factor: 3.488

7.  Slow unfolded-state structuring in Acyl-CoA binding protein folding revealed by simulation and experiment.

Authors:  Vincent A Voelz; Marcus Jäger; Shuhuai Yao; Yujie Chen; Li Zhu; Steven A Waldauer; Gregory R Bowman; Mark Friedrichs; Olgica Bakajin; Lisa J Lapidus; Shimon Weiss; Vijay S Pande
Journal:  J Am Chem Soc       Date:  2012-07-19       Impact factor: 15.419

8.  Peptide loop-closure kinetics from microsecond molecular dynamics simulations in explicit solvent.

Authors:  In-Chul Yeh; Gerhard Hummer
Journal:  J Am Chem Soc       Date:  2002-06-12       Impact factor: 15.419

9.  A Bayesian method for construction of Markov models to describe dynamics on various time-scales.

Authors:  Emily K Rains; Hans C Andersen
Journal:  J Chem Phys       Date:  2010-10-14       Impact factor: 3.488

10.  On the assumptions underlying milestoning.

Authors:  Eric Vanden-Eijnden; Maddalena Venturoli; Giovanni Ciccotti; Ron Elber
Journal:  J Chem Phys       Date:  2008-11-07       Impact factor: 3.488

View more
  12 in total

1.  Perspective: Computer simulations of long time dynamics.

Authors:  Ron Elber
Journal:  J Chem Phys       Date:  2016-02-14       Impact factor: 3.488

2.  Inherent structure versus geometric metric for state space discretization.

Authors:  Hanzhong Liu; Minghai Li; Jue Fan; Shuanghong Huo
Journal:  J Comput Chem       Date:  2016-02-24       Impact factor: 3.376

3.  Theory of Diffusion-Influenced Reaction Networks.

Authors:  Irina V Gopich; Attila Szabo
Journal:  J Phys Chem B       Date:  2018-10-04       Impact factor: 2.991

4.  Conformational analysis of replica exchange MD: Temperature-dependent Markov networks for FF amyloid peptides.

Authors:  Brajesh Narayan; Colm Herbert; Ye Yuan; Brian J Rodriguez; Bernard R Brooks; Nicolae-Viorel Buchete
Journal:  J Chem Phys       Date:  2018-08-21       Impact factor: 3.488

5.  How wet should be the reaction coordinate for ligand unbinding?

Authors:  Pratyush Tiwary; B J Berne
Journal:  J Chem Phys       Date:  2016-08-07       Impact factor: 3.488

Review 6.  Multiscale kinetic analysis of proteins.

Authors:  Jessica Mj Swanson
Journal:  Curr Opin Struct Biol       Date:  2021-12-16       Impact factor: 7.786

7.  Computational IR Spectroscopy of Insulin Dimer Structure and Conformational Heterogeneity.

Authors:  Chi-Jui Feng; Anton Sinitskiy; Vijay Pande; Andrei Tokmakoff
Journal:  J Phys Chem B       Date:  2021-04-30       Impact factor: 2.991

8.  VAMPnets for deep learning of molecular kinetics.

Authors:  Andreas Mardt; Luca Pasquali; Hao Wu; Frank Noé
Journal:  Nat Commun       Date:  2018-01-02       Impact factor: 14.919

9.  Kinetic and thermodynamic insights into sodium ion translocation through the μ-opioid receptor from molecular dynamics and machine learning analysis.

Authors:  Xiaohu Hu; Yibo Wang; Amanda Hunkele; Davide Provasi; Gavril W Pasternak; Marta Filizola
Journal:  PLoS Comput Biol       Date:  2019-01-24       Impact factor: 4.475

10.  Molecular details of dimerization kinetics reveal negligible populations of transient µ-opioid receptor homodimers at physiological concentrations.

Authors:  Derya Meral; Davide Provasi; Diego Prada-Gracia; Jan Möller; Kristen Marino; Martin J Lohse; Marta Filizola
Journal:  Sci Rep       Date:  2018-05-16       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.