Gerhard Hummer1, Attila Szabo2. 1. †Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany. 2. ‡Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, United States.
Abstract
We develop a systematic procedure for obtaining rate and transition matrices that optimally describe the dynamics of aggregated superstates formed by combining (clustering or lumping) microstates. These reduced dynamical models are constructed by matching the time-dependent occupancy-number correlation functions of the superstates in the full and aggregated systems. Identical results are obtained by using a projection operator formalism. The reduced dynamic models are exact for all times in their full non-Markovian formulation. In the approximate Markovian limit, we derive simple analytic expressions for the reduced rate or Markov transition matrices that lead to exact auto- and cross-relaxation times. These reduced Markovian models strike an optimal balance between matching the dynamics at short and long times. We also discuss how this approach can be used in a hierarchical procedure of constructing optimal superstates through aggregation of microstates. The results of the general reduced-matrix theory are illustrated with applications to simple model systems and a more complex master-equation model of peptide folding derived previously from atomistic molecular dynamics simulations. We find that the reduced models faithfully capture the dynamics of the full systems, producing substantial improvements over the common local-equilibrium approximation.
We develop a systematic procedure for obtaining rate and transition matrices that optimally describe the dynamics of aggregated superstates formed by combining (clustering or lumping) microstates. These reduced dynamical models are constructed by matching the time-dependent occupancy-number correlation functions of the superstates in the full and aggregated systems. Identical results are obtained by using a projection operator formalism. The reduced dynamic models are exact for all times in their full non-Markovian formulation. In the approximate Markovian limit, we derive simple analytic expressions for the reduced rate or Markov transition matrices that lead to exact auto- and cross-relaxation times. These reduced Markovian models strike an optimal balance between matching the dynamics at short and long times. We also discuss how this approach can be used in a hierarchical procedure of constructing optimal superstates through aggregation of microstates. The results of the general reduced-matrix theory are illustrated with applications to simple model systems and a more complex master-equation model of peptide folding derived previously from atomistic molecular dynamics simulations. We find that the reduced models faithfully capture the dynamics of the full systems, producing substantial improvements over the common local-equilibrium approximation.
Suppose we
divide a high-dimensional dynamical system into two
parts, 1 and 2, and wish to describe its dynamics by a two-state kinetic
system, 12. What is the best way of choosing the
rate constants R12 and R21? To get the thermodynamics right, the ratio of the
rate constants must equal the exact equilibrium constant, R12/R21 = Peq(1)/Peq(2), where Peq(I) is the equilibrium population
of state I. The sum of the rate constants, R12 + R21, which
is the inverse relaxation time, can be chosen in a variety of ways.
Arguably the simplest is to make the number of transitions exact between
the two states at equilibrium. This is equivalent to the so-called
local-equilibrium approximation, where it is assumed that the time
evolution of each microstate within a given superstate is the same,
but with an amplitude proportional to its equilibrium population.
This approximation is an excellent one if the interconversion of the
microstates within a given superstate happens to be much faster than
that between the two aggregated states. Otherwise, it is valid only
at short times. Alternatively, one can devise an approximation valid
at long times by setting the sum of the rate constants equal to the
absolute value of the first nonzero eigenvalue of the operator that
describes the dynamics of the entire system. This approximation will
tend to perform poorly at short times.There is, however, a
compromise choice. One can force the relaxation
time for the equilibrium fluctuations of the populations of the two
states to be exact. Such fluctuations are described by the correlation
function ⟨θ(t)θ(0)⟩ of an indicator
θ(t) that is equal
to 1 when the system is in state I at time t, and zero otherwise. Since this occupancy-number correlation
function is in general multiexponential, its relaxation time is defined
as the integral over all times of an appropriately normalized form
of this correlation function. Thus, in essence, a multiexponential
correlation function is approximated here by a single-exponential
function with the exact relaxation time. The two curves deviate both
at short and at long times, yet the areas under them are the same.It is this procedure that we will generalize to the case when the
dynamical system is divided into many discrete substates (see Figure 1, top). For the sake of simplicity, we begin by
assuming that the undivided system consists of discrete states whose
dynamics is described by a master or rate equation. The generalization
to a continuous state space, divided into cells, is straightforward.
We also derive reduced Markov transition matrices for Markov-state
models with dynamics in discrete time and space. Such models appear
not only in the modeling of experimental data, but increasingly also
in molecular simulation studies[1−8] to deal with the problem of sampling rare transitions[9] and large conformation spaces by using dynamic
information collected locally over short time.[10−17] Correlation functions have been used in the validation[18] and construction[19] of such models.
Figure 1
Schematic illustrating the reduction of an n =
18 multistate kinetic model by lumping microstates (open circles;
top left) into N = 3 superstates (shaded areas; top
left and right). The n × n rate
matrix K for transitions i → j between microstates i and j is reduced to an N × N matrix whose elements R for transitions I → J between superstates I and J depend explicitly on time in the full non-Markovian
description (eq 11) and are time-independent
in the Markovian approximation (eq 12).
Schematic illustrating the reduction of an n =
18 multistate kinetic model by lumping microstates (open circles;
top left) into N = 3 superstates (shaded areas; top
left and right). The n × n rate
matrix K for transitions i → j between microstates i and j is reduced to an N × N matrix whose elements R for transitions I → J between superstates I and J depend explicitly on time in the full non-Markovian
description (eq 11) and are time-independent
in the Markovian approximation (eq 12).By allowing the dynamics of the
reduced system to be non-Markovian
(i.e., described by a time-dependent rate matrix), one can make the
entire time dependence (not only the relaxation times!) of all equilibrium
occupancy-number correlation functions exact. At first sight, this
may be somewhat surprising, but the fact that this is possible follows
from Zwanzig’s paper[20] “From
Classical Dynamics to Continuous-Time Random Walks.” He divided
configuration space into discrete cells and showed that when the Liouville
equation was converted into a generalized Langevin equation for the
cell occupancies using projection operators, the resulting noise term
vanishes when initially all substates in a cell are in local equilibrium.
Consequently, for such initial conditions, the probability of finding
the system in a given cell satisfies a generalized (non-Markovian)
master equation. In the first part of the Theory section of this paper, we show how this can be done in a completely
elementary way. In the second part, we use a projection operator approach
to combining states and show that it leads to the same results that
we obtained in a less formal way. We then apply the general formalism
to the dynamics of systems in a continuous state space, and to Markov-state
models with discrete time steps. Finally, we present several simple
illustrative examples and discuss the implication of our results on
the important problem of how to choose the states that should be combined.
Theory
Definitions
and General Framework
Consider a system
with n discrete states, whose dynamics is described
by an n × n rate matrix, whose
elements K are the
rate constants that describe the i-to-j transition. Conservation of probabilities require that K = −∑K or, in matrix notation, 1K = 0, where 1 is a column vector with n unit
elements, and superscript T denotes the transpose.
The populations p(i, t) evolve according toor in Laplace space [f̂(s) = ∫0∞dt exp(−st)f(t) for a general
function f(t)]The normalized equilibrium populations are
solutions of Kpeq = 0, ∑peq(i) = 1. For future reference, note that the s→0
limit of the Laplace transform of a function is just the area under
that function from 0 to ∞.We wish to combine (i.e.,
aggregate or lump) the states of the original system (labeled by lower-case
indices i = 1, 2, ..., n) into “superstates”
labeled by capital indices I = 1, 2, ..., N, where N < n (see
Figure 1). We will require that not only the
equilibrium populations but also the occupancy-number correlation
functions of the reduced states are exact. We will show that this
is possible if we describe the dynamics of populations P(I, t) of the superstates by the
non-Markovian rate equations of the formor in Laplace space
using the convolution
theoremwhere R̂(s) is a reduced N × N matrix
with elements R̂(s), I, J = 1,
2,..., N, that remain to be determined (Figure 1, bottom right).
Construction of Exact Kinetic
Model for Aggregated Superstates
Let Peq(I) be the
normalized equilibrium population of state I. We
wish to construct an R̂ with the property that R̂Peq = 0 forwhere the sum is over all substates i in superstate I.The occupancy-number
auto- and cross-correlation functions of the original system are denoted
by ⟨θ(t)θ(0)⟩, where the indicator
function θ(t)
is equal to one if the system is in state i at time t, and zero otherwise. We recall that a general equilibrium
correlation function ⟨f(t)g(0)⟩ [where f(t) = ∑fθ(t) and g(t) =
∑gθ(t)] is defined in terms of the propagator or Green’s function G(i, t|j, 0) as ⟨f(t)g(0)⟩ = ∑fG(i, t|j, 0)gpeq(j). The propagators are the conditional probabilities that
the system starting in state j at time t = 0 is in state i at time t and
thus describe the time evolution of the populations (e.g., p(i,t) = ∑G(i,t|j,0)p(j,0)). They are the solution of eq 1 with
initial conditions that at t = 0, G(i,0|j,0) = δ, with the Kronecker δ equal to 1 for i = j and
zero otherwise. Thus, in matrix notation, G = exp(tK). Its Laplace transform is Ĝ = (sI – K)−1, where I is the unit matrix of dimension n. From these definitions it follows that the occupancy-number
correlation functions areor in Laplace space, with Ĝ(i, s|j) the Laplace
transform of G(i, t|j, 0),Thus,
the exact number correlation functions
in the reduced system denoted by C(t) = ⟨θ(t)θ(0)⟩
can be easily found by simply summing the above results over all i ∈ I and j ∈ J. In Laplace space we obtainIf we were to calculate Ĉ directly for the
reduced system using the matrix R̂(s) introduced in eq 2 we would have (sI – R̂(s))–1Peq(J), where (sI – R̂(s))−1 is the corresponding Green’s function.
Thus, if we determine R̂(s) fromthen by construction
we have reached our goal
of making the occupancy-number correlation functions for the reduced
system exact. This is one of the key results of this paper. It has
the following “non-equilibrium” interpretation. Suppose
in the original system we start with an initial condition that all
states are unoccupied except those that belong to substate I. Let us further assume that the initial populations of
the microstates i ∈ I are
proportional to their exact equilibrium populations [i.e., there is
local equilibrium with p(i, t = 0) = peq(i)/Peq(I) for i ∈ I and p(i, t = 0) = 0 otherwise]. Then the subsequent
populations of all superstates, calculated from the reduced non-Markovian
description in eq 2, are exact for all times.
As discussed in the Introductory Remarks,
this result was obtained by Zwanzig in a more general case.[20]
Local-Equilibrium Approximation
We now examine the
limiting cases of eq 6 that are Markovian, i.e.,
the dynamics is described by an ordinary N × N rate matrix with elements R. The limit s → ∞
of eq 6 corresponds to the limit of time t → 0,Hence, by defining
a reduced, s-independent rate matrix Rle ≡ R̂ (s → ∞)
satisfyingwe obtain a reduced
description that is exact
at short times. Physically, eq 7 means that
the number of transitions between I and J per unit time is exact at equilibrium. This is just the local-equilibrium
approximation, indicated by the superscript “le”, as
can be verified by setting p(i, t) = peq(i)P(I, t)/Peq(I) in eq 1 and then summing both sides over i ∈ I for all I. The above derivation shows
that, in general, it is valid only at short times.
Optimal Markovian
Model
Now let us examine the (so-called)
Markovian limit, s → 0, when eq 2a becomes dP/dt = RP, where R = R̂(0).
One cannot simply set s = 0 in eq 6 because K has a zero eigenvalue and K–1 is thus not defined. At long times, G(i, t|j, 0) goes to Peq(i)
independent of j (i.e., the system relaxes to equilibrium
independent of the starting point, assuming that there are no disconnected
sets of microstates). In Laplace space, this means that lim(sI – K)−1 = peq1T/s + ···.
Therefore, we determine R̂(0) fromThis limit can
be carried out using
the identitywhich can be proved using
the Sherman-Morrison
formula[21] along with Kpeq = 0 = 1TK for any λ0. Using this identity and an analogous one involving R̂(0), with λ0 = 1 in units of reciprocal
time, we find that the elements of the s-independent
optimal reduced matrix are determined bywith R = R̂(0) and R = R̂(0). This can be inverted
to get R, and a computationally
convenient formula is given later in eq 12.
We note that eq 8 is valid independent of the
units of R and K because λ0 in the Sherman–Morrison formula above is arbitrary, and is
conveniently set to one in units of reciprocal time here and in similar
relations below. A reduced Markovian description using the reduced
matrix R obtained in this way guarantees that the weighted
cross-relaxation times, τ, are
exact,for all I and J, with ∑τ = 0. In other words, using the reduced
matrix R ensures that the areas under all occupancy-number
auto and cross-correlation
functions are exact. Since ∑[⟨θθ⟩
– ⟨θ⟩⟨θ⟩] = ⟨θθ⟩ – ⟨θ⟩⟨θ⟩, this also means that the cross-relaxation times t = τ/(⟨θθ⟩ – ⟨θ⟩⟨θ⟩)
are exact, defined as the areas under the normalized occupancy-number
correlation functions,We note in passing that if the equilibrium
populations Peq and weighted cross-relaxation
times τ are known, e.g., from
experiment or molecular dynamics simulation, the corresponding reduced
matrix can be constructed by inversion, R = Peq1 – D(PeqPeq + τ)−1, where D is a diagonal matrix with elements Peq(I). The somewhat unusual
structure of eq 8 can be understood as follows.
The matrix K has a spectral expansion of
the form K = ∑λab = 0peq1 + ∑λab, where a (b) are the right
(left) eigenvectors of K with eigenvalues
λ. a1 = peq and b1 = 1 are the eigenvectors for eigenvalue λ1 = 0, and
λ < 0 for j > 1. Now K–1 has a spectral expansion K–1 = (1/0)peq1 + ∑n(1/λ)abT, and hence does not exist, with the first
term being infinite. However, if we add λ0peq1 to K, the resulting matrix has a spectral expansion λ0peq1 + ∑λab, and
hence (λ0peq1 + K)−1 =
λ0–1peq1 + ∑λ–1ab.For computational reasons it will prove convenient to rewrite
the
above results in matrix notation. To this end we introduce an n × N-dimensional aggregation matrix A with elementsThus, the relation P(I, t) = ∑p(i, t)
can be written as P = Ap. In addition, we introduce diagonal matrices D and D = ADA with elements peq(i) and Peq(I) on the diagonal, respectively [i.e.,
(D) = peq(i)δ and (D) = Peq(I)δ]. Using
this notation, eq 6 becomeswhich can be solved
explicitly to giveSimilarly, eq 8 can
be explicitly solved to give the reduced matrix in the Markovian (s → 0) limit:
Projection Operator Formulation
We now show how to
construct the dynamics of the reduced system using projection operators.[22] This formalism is very general and starts by
rearranging the exact dynamical equations (e.g., eq 1) for quantities of interest (e.g., the aggregated states)
and whatever remains. Let be
a projection operator with defined so that are the probabilities of interest. The
complementary operator , defined
so that , gives probabilities that are orthogonal to those of interest.
Multiplying both sides of eq 1 by and
then by , and
setting on the right-hand sides, one findsor in Laplace
spaceThese equations are exact. If one chooses
initial conditions for which v(0) = 0, then one can solve
eq 14b for v and substitute the
result into eq 14a to get a closed equation
for the quantity of interest, u:In the present context, we must
choose in
such a way that the time dependence
of the projected populations u(t) of all microstates in a superstate I is the same, up to a proportionality factor, i.e., with c a constant
that depends on i. Moreover, let
us require that at equilibrium u = peq(i) for
all i. These two requirements are satisfied if we
setIn terms of the aggregation matrix A defined above, it follows from eq 16 that
the corresponding projection operator can be written asNow to be able to use eq 15, we must restrict ourselves to initial conditions
where v(0) = 0. Since v(0) = p(i, 0) – u(0) = p(i, 0) – peq(i)P(I, 0)/Peq(I) according to eq 16, it follows that v(0) = 0 for the initial condition that p(i, 0) = peq(i)P(I, 0)/Peq(I), i.e., p(i, 0) ∝ Cpeq(i) with a proportionality
constant C ≥
0 that depends only on the superstate I to which i belongs. This is exactly the condition of local equilibrium
within superstates I. For such initial conditions,
eq 15 is exact. However, we are not interested
in the u’s but rather in the time evolution
of the probabilities of the aggregated states P(I, t). From eq 16 it follows that u = DAD–1P. Using this, the
definition of in
eq 17, and the
fact that , we can rewrite eq 15 in the same form as eq 2b, if we definewhich at first sight
does not look anything
like the result in eq 11 that we obtained by
matching the occupancy-number correlation functions. However, they
are in fact equivalent, which can be proved by using the Woodbury
matrix inversion formula,[21]with M = sI – K, U = DAD–1, and V = AK.
Reduced
Model for Dynamics in Continuous Space
We now
show that the above results can be readily generalized when the original
system is continuous and is divided into N cells.
Let us denote integration over cell coordinates x in cell I by ∫dx. Since in discrete state space (sI – K)−1 is the Laplace transform of the Green’s
function Ĝ(i, s|j), the appropriate generalization of eq 6 iswhere Peq(J) = ∫dxpeq(x) and Ĝ(x′, s|x) is the Laplace transform of G(x′, t|x, 0).
Markov-State Model in Discrete Time
The procedure of
constructing a reduced dynamic description is also applicable to the
discrete-time dynamics of Markov-state models,where M is the n × n Markov matrix of transition
probabilities M from
microstates j to i, and p is the vector of normalized probabilities p(i) of microstate i after step k, starting from p0. Thus, M,
the kth power of M, is the discrete
analogue of the Green’s function G(t). The analogue of Ĝ(s) is the
generating functionwhere λ is a parameter, 0 ≤ λ
≤ 1. Therefore, the analogue of eq 6 iswith the following solution in matrix form:where T̂(λ) is the
generating function of operators T that define an N × N reduced, non-Markovian transition process,withIn analogy to the s →
∞ limit above, for λ → 0 one recovers the local-equilibrium
approximation valid at short times, Tle = T̂(λ = 0) ≡ T0.
The Markovian limit corresponding to s → 0
is obtained for λ → 1, and interestingly eq 8 remains essentially the samewith T ≡ T̂(λ = 1) the
reduced transition matrix in the Markovian limit.
In matrix form, the reduced Markov transition matrix becomes
Illustrative
Applications
Four-State to Two-State Reduction with Well Chosen Superstates
Consider a four-state system,To aggregate
microstates 1 and 2 into a superstate
1 + 2, and 3 and 4 into another superstate 3 + 4, we define an aggregation
matrixThe
corresponding diagonal matrices of probabilities
areandwith peq(i) = 1/4 in our example. From eq 12, we then obtain the following reduced matrix for the reduced two-state
model:Figure 2 compares the
time-dependent populations obtained with this reduced two-state rate
matrix to those obtained by integration of the full four-state model
and the local-equilibrium two-state approximation. As one would expect,
with h/k becoming smaller, transitions
between 1 + 2 and 3 + 4 become rarer, and the two-state approximation
becomes increasingly accurate. Remarkably, even for h = k, where one might expect the two-state approximation
to fail, the reduced model produces populations of the aggregated
states in excellent agreement with the full four-state model (center
panel of Figure 2). In the reduced model, small
deviations at short and long times compensate each other, with the
areas between the exact and approximate curves exactly canceling by
construction (gray shading in top panel of Figure 2). By contrast, the local-equilibrium approximation eq 7 results in significant deviations already after
very short times t ≪ 1/k,
in particular for h ≥ k.
In this example, the reduced model indeed strikes a good balance between
reproducing the dynamics at short and long times, and results in significant
improvements over the local-equilibrium approximation.
Figure 2
Populations of aggregated
states 1 + 2 for model eq 26 obtained from the
reduced rate matrix R in eq 27 (red
lines) and by exact time integration of the
full rate equations, eq 1a, through diagonalization
of the matrix K (symbols). Results are shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting from
superstate 1 at time t = 0. The reduced model R is constructed such that the shaded gray areas between the
populations exactly cancel and the exact equilibrium is recovered.
By contrast, the populations obtained by using the local-equilibrium
approximation, shown as solid black lines, match the time evolution
only near t = 0. The populations of the other superstate,
3 + 4, are not shown, since they are exactly one minus the populations
of the 1 + 2 superstate.
Populations of aggregated
states 1 + 2 for model eq 26 obtained from the
reduced rate matrix R in eq 27 (red
lines) and by exact time integration of the
full rate equations, eq 1a, through diagonalization
of the matrix K (symbols). Results are shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting from
superstate 1 at time t = 0. The reduced model R is constructed such that the shaded gray areas between the
populations exactly cancel and the exact equilibrium is recovered.
By contrast, the populations obtained by using the local-equilibrium
approximation, shown as solid black lines, match the time evolution
only near t = 0. The populations of the other superstate,
3 + 4, are not shown, since they are exactly one minus the populations
of the 1 + 2 superstate.
Four-State to Three-State Reduction with Poorly Chosen Superstates
Now we instead aggregate the central microstates 2 and 3 to form
a three-state system. We then end up with a reduced matrixThe three ordered eigenvalues 0, –
2hk/(2h+k), and
−2k of R are zero or negative,
and agree exactly (eigenvalues 1 and 3) and to second order in k/h (eigenvalue 2) with those of the original
rate matrix K. However, we notice that in this matrix
the off-diagonal elements R13 = R31 are negative, i.e., in this example, R is not strictly a rate matrix. Thus, while it is always
possible to construct a matrix R that leads to the exact
relaxation times, there is no guarantee that it will have positive
off-diagonal elements. If we nonetheless integrate the “rate
equations” for R, dP/dt = RP, the solutions are all positive
after a brief initial period, as illustrated in Figure 3. As h/k increases, the
three-state approximation, with microstates 2 + 3 aggregated, becomes
increasingly accurate. Visually, one can recognize that the time-integrated
area between the exact populations and those of the reduced model
is zero, which is the criterion used to match the relaxation times.
By contrast, the simple local-equilibrium approximation works well
only at very short times.
Figure 3
Populations of aggregated states 1 (red), 2
+ 3 (green), and 4
(blue) for model eq 26 obtained from the reduced R matrix in eq 28 (lines) and by exact
integration of eq 1a (symbols). Results are
shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting
from superstate 1 at time t = 0. Whereas the population
of state 2 + 3 is exact, exhibiting a single-exponential relaxation,
the population of state 4 initially goes negative in the reduced model.
For reference, the populations obtained from the local-equilibrium
approximation are shown as solid black lines for states 1 and 4, with
state 2 + 3 not shown since it is again integrated exactly for this
model. Note that the reduced model is exact for all states if the
initial state is the equilibrated 2 + 3 state.
Populations of aggregated states 1 (red), 2
+ 3 (green), and 4
(blue) for model eq 26 obtained from the reduced R matrix in eq 28 (lines) and by exact
integration of eq 1a (symbols). Results are
shown for k = 2 and h = 5 (top), h = 1 (center), and h = 0.2 (bottom), starting
from superstate 1 at time t = 0. Whereas the population
of state 2 + 3 is exact, exhibiting a single-exponential relaxation,
the population of state 4 initially goes negative in the reduced model.
For reference, the populations obtained from the local-equilibrium
approximation are shown as solid black lines for states 1 and 4, with
state 2 + 3 not shown since it is again integrated exactly for this
model. Note that the reduced model is exact for all states if the
initial state is the equilibrated 2 + 3 state.
Reduction of 32-State Protein Folding Model
As a realistic
example, we consider a 32-state model for the formation of a short
α-helix in water.[18] It was previously
shown that this 32-state model could be reasonably well approximated
with a two-state model (with a helically folded state F and an unfolded
state U) or a four-state model (with two folded states F1 and F2, and two unfolded states U1 and U2). Here we use the aggregation into superstates determined
in ref (18) by using
a semiquantitative procedure. Below, in the Concluding Remarks, we
show how these superstates can also be found by using a quantitative
hierarchical procedure that uses the reduced matrix.We now
reduce the 32-state rate matrix into a two-state model (F and U),
a three-state model (F1, F2, and U), and a four-state
model (F1, F2, U1, and U2) based on the definition of these superstates given in ref (18). Using eq 12, we find:andin units
of 1/ns, with the subscript indicating
the order of the states (left to right, and top to bottom). We note
that for these well-chosen superstates, all off-diagonal elements
of the reduced matrices are positive. As listed in Table 1, the eigenvalues of the matrices,
and thus the slowest relaxation times given by their negative reciprocals,
are in excellent agreement with those of the full rate matrix.
Table 1
Relaxation Rates [1/ns] from Eigenvalues
of Full 32-State Rate Matrix K and Reduced Rate Matrices R with N = 4, 3, and 2 Superstates, Respectivelya
states
–λ1
–λ2
–λ3
–λ4
32
0
0.161
0.530
0.660
4
0
0.167 (0.244)
0.556 (0.608)
0.705 (0.788)
3
0
0.167 (0.244)
0.557 (0.609)
2
0
0.174 (0.282)
Numbers in parentheses are for
the local-equilibrium approximation Rle, which
performs significantly worse in all cases.
Numbers in parentheses are for
the local-equilibrium approximation Rle, which
performs significantly worse in all cases.Starting from state U or U1, respectively, we calculated the time-dependent
populations
of the superstates and compared them to the exact populations obtained
by integration of the 32-state coupled rate equations. As shown in
Figure 4, the reduced model obtained by matching
the relaxation times gives excellent approximations to the full dynamics,
already at the two-state level. By contrast, the local-equilibrium
approximation fails at times much shorter than the global relaxation
time, which again is reflected in the inaccurate eigenvalues of the
resulting reduced rate matrix Rle (Table 1).
Figure 4
Protein folding kinetics. The full peptide-folding model
has 32
states[18] and was reduced to 2 states (F,
U; top), 3 states (F1, F2, U; middle) and 4
states (F1, F2, U1, U2; bottom). At time t = 0, the system starts from
an equilibrated state U, U, and U1, respectively (top to bottom). Exact populations
as a function of time are shown as symbols. Solid lines of matching
color show the results obtained for the reduced models with two to
four states. For reference, populations obtained with the local-equilibrium
approximation are shown as thin black lines.
Protein folding kinetics. The full peptide-folding model
has 32
states[18] and was reduced to 2 states (F,
U; top), 3 states (F1, F2, U; middle) and 4
states (F1, F2, U1, U2; bottom). At time t = 0, the system starts from
an equilibrated state U, U, and U1, respectively (top to bottom). Exact populations
as a function of time are shown as symbols. Solid lines of matching
color show the results obtained for the reduced models with two to
four states. For reference, populations obtained with the local-equilibrium
approximation are shown as thin black lines.
Concluding Remarks
This paper considered how to construct
a Markovian rate (or transition)
matrix for a given choice of aggregated states. We have not discussed
the important problem of identifying microstates that can be faithfully
aggregated. This problem of lumping microstates is central to the
analysis of kinetic data from simulation and experiment alike, from
the modeling of measured kinetics data to the construction of Markov-state
or master-equation models.[1,18,23−26] The procedures introduced here can help also in this endeavor. Specifically,
we would like to take this opportunity to propose a hierarchical approach
that should work well with large numbers of microstates. We first
order all n microstates according to the components b2(i) of the left eigenvector
of K corresponding to the largest nonzero eigenvalue
(i.e., to make the second eigenvector nondecreasing as the state index
increases, where we assume that this eigenvalue is not degenerate).
We then divide the system into two states, one including the first k microstates in this ranked list, and the other the remaining n – k microstates. For each of these
aggregations, we calculate the reduced two-state rate matrix R, and in turn the relaxation time as the negative reciprocal
of its nonzero eigenvalue. We then find the value of k that maximizes this relaxation time. At the next level, this procedure
can be repeated for the superstates obtained in the previous rounds,
setting rates out of these superstates to zero. This recursive procedure
can be truncated once the relaxation times in all possible divisions
fall below a set threshold.We applied this hierarchical aggregation
procedure to the 32-state
helix folding model of ref (18). The results are shown in Figure 5. By identifying the slowest relaxation for divisions first of the
entire system and then of the resulting superstates, we recover in
the first step the F and U states of ref (18); then F1, F2, and U; and
finally U1 and U2. So at least in cases without
significant degeneracies in the eigenvalue spectrum, the hierarchical
procedure based on maximizing the relaxation time of the reduced rate
matrix produces sensible superstates that lead to reduced models whose
characteristic times closely match those of the full system (Table 1). It will be interesting to see how well this algorithm
performs in even more complex contexts.
Figure 5
Hierarchical aggregation
procedure applied to 32-state helix folding
model of ref (18).
(A) Division into two superstates. States i were
rank-ordered according to their component in the second left-hand
eigenvector b2(i) (blue
circles; right axis). Different superstates were formed by aggregating
the first k of these ordered states into one superstate,
and the remaining ones into the other. The relaxation time −1/λ2 of the resulting reduced two-state rate matrix was calculated
by diagonalization (filled red squares; left axis). We obtain the
longest relaxation time by lumping six microstates into a folded state
F, and the remaining 26 microstates into an unfolded state U, as indicated
by the shading. Microstates are labeled in the binary notation of
ref (18) (top to bottom:
N to C terminus; 1 indicating a helical residue). (B) Division of
F into superstates F1 and F2. (C) Division of
U into superstates U1 and U2. The resulting
superstates F, F1, F2, U, U1, and
U2 are all identical with those found previously with a
more heuristic approach in ref (18).
Hierarchical aggregation
procedure applied to 32-state helix folding
model of ref (18).
(A) Division into two superstates. States i were
rank-ordered according to their component in the second left-hand
eigenvector b2(i) (blue
circles; right axis). Different superstates were formed by aggregating
the first k of these ordered states into one superstate,
and the remaining ones into the other. The relaxation time −1/λ2 of the resulting reduced two-state rate matrix was calculated
by diagonalization (filled red squares; left axis). We obtain the
longest relaxation time by lumping six microstates into a folded state
F, and the remaining 26 microstates into an unfolded state U, as indicated
by the shading. Microstates are labeled in the binary notation of
ref (18) (top to bottom:
N to C terminus; 1 indicating a helical residue). (B) Division of
F into superstates F1 and F2. (C) Division of
U into superstates U1 and U2. The resulting
superstates F, F1, F2, U, U1, and
U2 are all identical with those found previously with a
more heuristic approach in ref (18).In summary, we have developed
a systematic procedure to construct
reduced dynamic descriptions of aggregated superstates obtained by
combining (or clustering or lumping) microstates. The procedure is
generally applicable to kinetic (or master equation) models with discrete
states and continuous time evolution, Markov-state models with discrete
states and discrete time evolution, or continuous space models with
discrete or continuous time evolution. The reduced dynamic models
are exact in their non-Markovian formulation. In the approximate Markovian
limit, we provide simple analytic expressions for the reduced rate
or Markov transition matrices. Even under the Markovian approximation,
one recovers exact auto- and cross-relaxation times. The resulting
reduced models thus strike an optimal balance between recovering the
dynamics at short and long times. This approach is not only useful
to construct reduced models for already defined groupings of microstates
into superstates, but also helps in finding optimal superstates. Specifically,
we found that maximizing the relaxation time of the reduced-matrix
model provides a quantitative criterion that can be used in a hierarchical
construction of superstates through aggregation of microstates.
Authors: Vincent A Voelz; Marcus Jäger; Shuhuai Yao; Yujie Chen; Li Zhu; Steven A Waldauer; Gregory R Bowman; Mark Friedrichs; Olgica Bakajin; Lisa J Lapidus; Shimon Weiss; Vijay S Pande Journal: J Am Chem Soc Date: 2012-07-19 Impact factor: 15.419
Authors: Derya Meral; Davide Provasi; Diego Prada-Gracia; Jan Möller; Kristen Marino; Martin J Lohse; Marta Filizola Journal: Sci Rep Date: 2018-05-16 Impact factor: 4.379