Xia Xiao1, Neville Kallenbach1, Yingkai Zhang2. 1. Department of Chemistry, New York University , New York, New York 10003, United States. 2. Department of Chemistry, New York University , New York, New York 10003, United States ; NYU-ECNU Center for Computational Chemistry at NYU Shanghai , Shanghai 200062, China.
Abstract
Unlike native proteins that are amenable to structural analysis at atomic resolution, unfolded proteins occupy a manifold of dynamically interconverting structures. Defining the conformations of unfolded proteins is of significant interest and importance, for folding studies and for understanding the properties of intrinsically disordered proteins. Short chain protein fragments, i.e., oligopeptides, provide an excellent test-bed in efforts to define the conformational ensemble of unfolded chains. Oligomers of alanine in particular have been extensively studied as minimalist models of the intrinsic conformational preferences of the peptide backbone. Even short alanine peptides occupy an ensemble of substates that are distinguished by small free energy differences, so that the problem of quantifying the conformational preferences of the backbone remains a fundamental challenge in protein biophysics. Here, we demonstrate an integrated computational-experimental-Bayesian approach to quantify the conformational ensembles of the model trialanine peptide in water. In this approach, peptide conformational substates are first determined objectively by clustering molecular dynamics snapshots based on both structural and dynamic information. Next, a set of spectroscopic data for each conformational substate is computed. Finally, a Bayesian statistical analysis of both experimentally measured spectroscopic data and computational results is carried out to provide a current best estimate of the substate population ensemble together with corresponding confidence intervals. This distribution of substates can be further systematically refined with additional high-quality experimental data and more accurate computational modeling. Using an experimental data set of NMR coupling constants, we have also applied this approach to characterize the conformation ensemble of trivaline in water.
Unlike native proteins that are amenable to structural analysis at atomic resolution, unfolded proteins occupy a manifold of dynamically interconverting structures. Defining the conformations of unfolded proteins is of significant interest and importance, for folding studies and for understanding the properties of intrinsically disordered proteins. Short chain protein fragments, i.e., oligopeptides, provide an excellent test-bed in efforts to define the conformational ensemble of unfolded chains. Oligomers of alanine in particular have been extensively studied as minimalist models of the intrinsic conformational preferences of the peptide backbone. Even short alanine peptides occupy an ensemble of substates that are distinguished by small free energy differences, so that the problem of quantifying the conformational preferences of the backbone remains a fundamental challenge in protein biophysics. Here, we demonstrate an integrated computational-experimental-Bayesian approach to quantify the conformational ensembles of the model trialanine peptide in water. In this approach, peptide conformational substates are first determined objectively by clustering molecular dynamics snapshots based on both structural and dynamic information. Next, a set of spectroscopic data for each conformational substate is computed. Finally, a Bayesian statistical analysis of both experimentally measured spectroscopic data and computational results is carried out to provide a current best estimate of the substate population ensemble together with corresponding confidence intervals. This distribution of substates can be further systematically refined with additional high-quality experimental data and more accurate computational modeling. Using an experimental data set of NMR coupling constants, we have also applied this approach to characterize the conformation ensemble of trivaline in water.
An
emerging field in protein science is the study of intrinsically
disordered proteins (IDPs),[1−3] which do not fold into well-defined
3D structures in vitro but are functional in vivo. IDPs appear to be abundant in nature—it
has been predicted that about one-third of eukaryotic proteins contain
extended disordered regions, including histone tails, α-synuclein,
tau protein, p53, and BRCA1.[4] IDPs have
been implicated in cellular functioning, especially in regulation
and signaling.[5−7] Over 50 years ago, Tanford’s sedimentation
and viscosity measurements on denatured proteins led to a proposal
of the random coil model for unfolded proteins,[8] which assumes that the polypeptide backbone freely samples
all sterically allowed regions of the Ramachandran plot. In this view,
unfolded proteins and peptides represent featureless “freely
coiling” chains that occupy a multiplicity of conformations
with very large associated backbone entropy. However, recently several
lines of compelling spectroscopic evidence have converged to reveal
that the backbone conformation of short unfolded peptides, including
dipeptides and tripeptides, is structurally much more ordered than
predicted by the random coil model.[9−14] The unfolded peptide backbone clearly has conformation preferences
that are sequence and context dependent.[15−20] Thus, defining the conformations of peptides in unfolded states
has become a problem of current interest and importance. Advances
in this effort will enable construction of more accurate models of
intrinsically disordered proteins, enable elucidation of fundamental
principles of protein folding, and potentially help design novel functional
peptide modulators of biological processes.In contrast to folded
globular proteins, which are routinely characterized
at atomic resolution by X-ray crystallography and NMR spectroscopy,
a comparably detailed characterization of unfolded peptides and proteins
is much more challenging due to their multiplicity of conformational
states and dynamic nature. For a given peptide, precise measurements
can be made using a variety of spectroscopic methods, but data interpretation
often requires ad hoc assumptions,[16,21−23] which introduce significant uncertainty and/or subjectivity
into the final results. These make it difficult to utilize the complete
set of available experimental data. For example, in one key study
to determine polyproline II conformation propensities for a host–guest
series of peptides AcGGXGGNH2, only one experimental data
set (3JαN) was employed to fit each peptide
to a two-state model, assuming that the experimental data are a weighted
average of data from two representative basins, PII and
β.[24] On the other hand, limitations
in sampling and force field accuracy place direct determination of
the relative stability of different conformational substates beyond
the predictive power of molecular dynamics simulations using current
force fields.[25−27]One seminal attempt to overcome these difficulties
is a combined
molecular-dynamics/NMR (MD-NMR) approach developed by Graf et al.,[28] which aims to combine the accuracy of experimentally
measured spectroscopic data with detailed information on the set of
conformational substates provided by simulations. Taking the trialanine
peptide in aqueous solution at 300 K as a model system, they experimentally
measured a set of 15 J-coupling constants and carried out 100 ns of
explicit water molecular dynamics simulations in parallel. Snapshots
from the MD trajectory were assigned to three conformational substates
(α, β, PII) based on the Ramachandran plot
of the dihedral angles of the central residue. J-coupling values for
each substate were calculated from corresponding Karplus equations.
Finally, eight J-coupling constants for the central residue were employed
to determine substate weights by performing a global fit to a three-state
model, i.e., minimizing the difference between measured and calculated
weight-average NMR parameters. This strategy represents a significant
advance over studies that rely on either an experimental or computational
approach alone. Nevertheless, this MD-NMR approach still has limitations:
the analysis is restricted to a three-state model of an individual
residue; the substates are predefined according to the Ramachandran
plot, and only part of the experimental data set is used.In
this paper, we present an integrated computational-experimental-Bayesian
framework (outlined in Figure 1) to characterize
peptide conformational ensembles. This aims to overcome limitations
in the published MD-NMR approach[28] by introducing
two key features: (1) peptide conformational substates are assigned
by clustering molecular dynamics snapshots based on both structural
and dynamic information, rather than on subjectively defined rectangular
regions of the Ramachandran plot; (2) a Bayesian statistical reweighting
algorithm is used to provide an integrated analysis of both the experimental
and computational data, which yields a current best estimate of substate
populations with corresponding confidence intervals. This approach
allows us to construct and assess multistate models of trialanine
peptide in aqueous solution based on the full set of 15 measured J-couplings.
Our results show that the two most dominant conformational substates
of trialanine in water share the same polyproline II helix-like structure
(PII) at its central residue, while differing at the C
terminal residue. Our approach naturally allows for further systematic
refinement using supplemental data sets and more accurate computational
modeling of the relevant parameters.
Figure 1
A schematic
illustrating the Integrated Computational-Experimental-Bayesian
approach.
Methods
The central idea of the integrated computational-experimental-Bayesian
framework to characterize peptide conformational propensities is illustrated
in Figure 1. There are three steps in the computational
stage: (1a) Extensive molecular dynamics simulations are carried out
to generate an ensemble of peptide structure snapshots. (1b) MD snapshots
are clustered into peptide conformational substates with a “divide-and-merge”
approach based on both structural and dynamics information, allowing
MD population weights of conformational substates to be calculated.
(1c) For each conformational substate i, a set of
values of spectroscopic data is computed. In the experimental stage,
the key task is to obtain the corresponding experimentally measured
spectroscopic data, either from the literature or by carrying out
new experiments, or both. Finally, a Bayesian statistical algorithm
is employed to provide an integrated analysis of both computational
and experimental data, which yields a current best estimate of the
substate populations as well as the corresponding confidence intervals.A schematic
illustrating the Integrated Computational-Experimental-Bayesian
approach.In comparison with Graf’s
approach, two key components of
this new integrated framework are the clustering step and the Bayesian
statistical algorithm, which we discuss in more detail below.
Clustering
In most studies, peptide
conformation substates are predefined with roughly rectangular regions
of a Ramachandran plot. Typically, conformation assignment only considers
backbone torsion angles of one single residue in a polypeptide.[16,26,28] Here, we present a more objective
and robust method to define and assign peptide conformational substates,
i.e., a “divide-and-merge” two-stage clustering approach.
In the first stage, given a set of structural snapshots from molecular
dynamics simulations, we use Markov state models to identify residue-based
conformational macrostates based on both structural similarity and
dynamics information by employing the program MSMBUILDER2.[29] Specifically, for each residue, MD trajectories
are clustered into residue-based microstates using a hybrid k-centers
k-medoids clustering algorithm[29] with the
backbone RMSD as the structural similarity criteria. Then, kinetically
related microstates are grouped together into residue-based macrostates
using Perron Cluster Cluster Analysis (PCCA+).[30] In the second stage, these residue-based macrostates are
merged to yield substates of the whole peptide.[31] For each substate i, its MD population
weight Wmd and a set of experimental observables D can be computed.“Divide-and-Merge” two-stage
clustering of a trialanine
MD trajectory simulated with Amber99SB forced field and TIP3P water.
(a) Trialanine at pH = 2 with each residue labeled. (b) Stage 1, population
distribution with residue-based clustering based on Markov state models.
(c) Stage 2, residue-based macrostates are merged to yield a total
of 30 substates for the whole peptide in principle, but only 22 substates
existed in MD simulation. (d) Structures and populations of nine substates
with above 1% population in MD simulation.
Bayesian Statistical Weighting Algorithm
With experimental data expD collected
as well as the corresponding computed results D for each conformational substate, a conventional
approach to estimate substate population weights W, i = 1,...,n, is to minimize an objective function, such as in Graf’s
approach. This method tends
to be limited to two to three substates of a single residue and fails
to account for the uncertainty/error in either computed results or
experimental data. In addition, slightly different objective functions
can lead to distinct results, so that this minimization may not distinguish
among several different solutions. In order to overcome the above
limitations, here we employ a Bayesian statistical algorithm to provide
an integrated analysis of both computational and experimental data[32,33] and determine conformational substate weights. In Bayesian inference,
the belief in a hypothesis (H) is updated as additional evidence (E)
is acquired by employing Bayes’ rule:[34]P(H|E) = (P(E|H)·P(H))/P(E). The posterior probability of
Bayesian inference P(H|E), the updated belief in
the hypothesis after incorporating additional evidence, is a function
of two antecedents, a prior probability P(H), which
is the initial belief in the hypothesis, and a “likelihood
function” P(E|H), which is a conditional probability
for evidence to be acquired given a hypothesis. P(E) is the integrated likelihood of additional evidence, which is
the same for all possible hypotheses being considered. In our characterization
of peptide conformations, a set of substate weights W can be considered to be the hypothesis while experimental data () are treated as additional evidence, which
leads to the following formulation:where W = {1W,...,W} is
the vector of weights for n substates subject to
the constraint ∑W = 1 and W ≥ 0; D =
{expD1,...,expD} is the vector of z experimental data.
Prior Distribution
Pprior(W) represents a priori knowledge about
the weights of n conformational substates of the
peptide. For each substate i, given its initial weight Winitial and
its corresponding uncertainty σ2(Winitial), a priori knowledge about
the weight of this substate can be represented by a Gaussian distribution:Thus, the
overall joint prior distribution
can be expressed aswith the constraints that ∑W = 1 and W ≥ 0. There are
multiple ways to estimate values of Winitial and σ(Winitial) for eq 2. A straightforward approach is to employ information obtained from
the MD simulations, the MD prior, which uses MD derived population
weight Wmd as the Winitial. For σ(Winitial), we assign it an arbitrary large value of 20% when
its uncertainty is not clear. As a control, if we do not use information
from MD simulations, we calculate a simple random-coil (RC) based
prior distribution, which assumes that each substate is equally populated,
i.e., Winitial = 1/n, where n is the total number
of substates being considered.
Likelihood Function
Plikelihood(D|W) represents
the likelihood of observing the experimental data given a certain substate weight W. For each
given experimental observable expD, the associated likelihood function can
also be modeled with a Gaussian density function:where D denotes
the computed experimental
observable j for the conformational substate i, σ(expD) refers to the uncertainty in the experimental measurement
of each observable j, and σ(compD) is the error in
theoretical prediction of the observable j. The overall
joint likelihood function can be written asOnce the prior distribution
and the
likelihood function are specified, the posterior distribution, Pposterior(W|), our current best estimate of the conformational substate
weights, is calculated using eq 1 by employing
a Markov chain Monte Carlo (MCMC) algorithm.[35−37] The posterior
distribution for each substate i can be computed
byThe final Bayesian estimate of the weight and uncertainty
of substate i can be computed byNine-state results for trialanine with the Amber99SB
force field
and TIP3P water with the Bayesian algorithm and the MD prior: (a)
Simulated prior distribution based on the MD prior. (b) Simulated
posterior distribution of the final Bayesian model. (c) The differences
between computed J-coupling constant and experimental J-coupling constant
for both MD simulation and our approach. (d) Two dominant substates
in the final Bayesian model.
Computational Details
The initial simulation system was prepared by immersing peptide
Ala3 in a rectangular water box with a minimum solute–wall
distance 15 Å, neutralized by adding one Cl– counterion. Since the experiment was conducted at pH = 2, the N-terminus
of the peptide would be protonated, as shown in Figure 2a. The AMBER 12[38] package with
the Amber99SB force field[27,39−41] was used to perform classic MD simulations, and water molecules
were described by the TIP3P[42] water model.
Following multistep minimizations and MD equilibrations, a 200 ns
NPT MD simulation was carried out. During the MD simulation, periodic
boundary conditions were employed with a 10 Å cutoff for nonbonded
interactions. Long-range electrostatic interactions were treated with
the particle mesh Ewald (PME)[43,44] method. All bonds involving
hydrogen atoms were constrained with the SHAKE[45] algorithm, and a time step of 2 fs was set. System temperature
was controlled at 300 K with the Berendsen thermostat,[45] and the pressure was maintained at 1 atm. Snapshots
were saved every 0.2 ps.
Figure 2
“Divide-and-Merge” two-stage
clustering of a trialanine
MD trajectory simulated with Amber99SB forced field and TIP3P water.
(a) Trialanine at pH = 2 with each residue labeled. (b) Stage 1, population
distribution with residue-based clustering based on Markov state models.
(c) Stage 2, residue-based macrostates are merged to yield a total
of 30 substates for the whole peptide in principle, but only 22 substates
existed in MD simulation. (d) Structures and populations of nine substates
with above 1% population in MD simulation.
With 1 million snapshots from the MD
simulations, a “divide-and-merge”
two-stage clustering approach is employed to define and assign peptide
conformational substates. In the first stage, for each residue, MD
snapshots are clustered into residue-based microstates by employing
the program MSMBUILDER2.[29] As illustrated
in Figure 2b, the first residue of the trialanine
peptide is clustered into two residue-based macrostates, the second
into five macrostates, and the third into three macrostates. In the
second stage, these residue-based macrostates are merged to yield
a maximum of 2 × 5 × 3 = 30 substates in principle for the
whole peptide (see Figure 2c), of which only
22 substates are populated sufficiently. We consider those substates
that have >1% population in the MD simulations for further analysis,
which includes nine conformational substates (see Figure 2d). The MD population for each substate i is then calculated based on the clustering results. The
J-coupling constants for each snapshot are calculated from parametrized
Karplus equations[46−49] (see Table S1), and average J-couplings
constants D are then
computed for each substate i.With the experimental data collected
for 15 J-coupling constants,[28] as listed
in Table S2, and the corresponding computed
values D, we carry out
a Bayesian statistical analysis, implemented with Python, to obtain
the posterior distribution Pposterior(W|) in eq 1. The random walk Metropolis-Hastings algorithm[35−37] is used to
sample the posterior distribution Pposterior(W|) in eq 1, and each posterior distribution has been sampled in 1 million
steps. The random walk steps are obtained from a uniform distribution,
and the step size of the random walk is adjusted to achieve a desired
acceptance probability of 30%–70%.
Results
and Discussion
With the protocol described above, we have
characterized a 9 substate
ensemble for trialanine in aqueous solution with both MD prior (initial
weights calculated from MD simulations) and RC prior (initial weights
from a random-coil model, which assumes equal population among all
substates). From Table 1, we can see that although
the initial weights of the two prior distributions are very far apart,
the Bayesian estimates of substate weights are consistent, and their
final confidence intervals (σ) as well as the error in reproducing
the experimental data (χ2) are significantly smaller
than employing initial substate weights. Figure 3a and b illustrate a prior distribution from MD simulations as well
as the posterior distribution of the final Bayesian model, and Figure 3c illustrates the significant reduction of error
in reproducing the experimental data. These results clearly demonstrate
the applicability and robustness of our integrated Bayesian approach.
Table 1
Nine-State Results for Trialanine
Using the Amber99SB Force Field and TIP3P Water with Both MD Prior
and Random Coil Priora
α
β
PII
1
2
3
4
5
6
7
8
9
Amber 99SB & TIP3P
A-α-II
A-α-I
A-α-III
A-β-II
A-β-I
A-β-III
A-PII-II
A-PII-I
A-PII-III
χ2
MD Prior
Winitial(σ)
2.1(20)
2.7(20)
4.8(20)
5.4(20)
10.7(20)
16(20)
10.3(20)
19.8(20)
28.1(20)
10.52
Wbayesian(σ)
3.6(3.1)
2.4(2)
4(3.3)
1.5(1.4)
1.2(1.1)
1.8(1.7)
12.3(6.5)
4.3(3.2)
69.0(6.7)
3.17
RC Prior
Winitial(σ)
11.1(20)
11.1(20)
11.1(20)
11.1(20)
11.1(20)
11.1(20)
11.1(20)
11.1(20)
11.1(20)
14.63
Wbayesian(σ)
3.8(3.2)
2.4(2.1)
4.6(3.6)
1.5(1.4)
1.2(1.1)
1.9(1.8)
13.7(7.1)
4.1(3.1)
66.9(7.1)
3.23
Winitial refers to a priori knowledge about weights of n conformation
substates of the peptide. Wbeysian refers
to the current best estimate of substate
weights and their confidence interval. χ2 = z–1∑(expD – compD)2/(σ2(expD)) + σ2(compD)).
Figure 3
Nine-state results for trialanine with the Amber99SB
force field
and TIP3P water with the Bayesian algorithm and the MD prior: (a)
Simulated prior distribution based on the MD prior. (b) Simulated
posterior distribution of the final Bayesian model. (c) The differences
between computed J-coupling constant and experimental J-coupling constant
for both MD simulation and our approach. (d) Two dominant substates
in the final Bayesian model.
Winitial refers to a priori knowledge about weights of n conformation
substates of the peptide. Wbeysian refers
to the current best estimate of substate
weights and their confidence interval. χ2 = z–1∑(expD – compD)2/(σ2(expD)) + σ2(compD)).In our nine substate ensemble, the A-PII-III substate
is the most dominant conformation, with a population of ∼67%;
the A-PII-II substate is the second most dominant conformation,
with a population of ∼13%, as shown in Figure 3d. Both substates have the center amino acid in the PII conformation but differ in the terminal dihedral angles.
It should be noted that this level of characterization cannot be achieved
by previous methods, which only focus on a single residue.In
order to assess the role of water models on our results, we
have carried out molecular dynamics simulations using Amber99SB for
the peptide and TIP4P-Ew[50] for water molecules,
which previously have been shown to yield results in closer agreement
with experimentally measured J-coupling data than the Amber99SB/TIP3P
combination.[27] All other components in
our computational and analysis protocol are the same as the above.
From Table 2, we see that MD simulations with
the Amber99SB/TIP4P-Ew force field yield 12 conformational substates
with population levels above 1% after two-stage clustering. In comparison
with results in Table 1, the three additional
conformation substates have the central residue in the αL conformation.
We have characterized the corresponding 12 substate ensemble for trialanine
with both MD prior and RC prior, as shown in Table 2. Not only are the results for the different priors very consistent,
the first and second major substates with populations of ∼65%
and ∼12% are the same as in the nine-state model, which has
populations of ∼66% and ∼13%, respectively. This further
demonstrates the robustness of the integrated Bayesian approach.
Table 2
12-State Results for Trialanine with
Amber99SB Force Field and TIP4PEW Water with MD Prior and Random Coil
Prior
αL
α
β
PII
10
11
12
1
2
3
4
5
6
7
8
9
Amber 99SB & TIP4PEW
A-αL-II
A-αL-I
A-αL-III
A-α-II
A-α-I
A-α-III
A-β-II
A-β-I
A-β-III
A-PII-II
A-PII-I
A-PII-III
χ2
MD Prior
Winitial(σ)
2.2(20)
1.9(20)
2.9(20)
3.1(20)
2.1(20)
5.9(20)
5.4(20)
8.1(20)
14.9(20)
9.7(20)
17.3(20)
26.5(20)
9.78
Wbayesian(σ)
2.3(2)
1.6(1.4)
2.5(2.1)
3.4(2.9)
1.7(1.6)
3.8(3.2)
1.4(1.3)
1.1(1.1)
1.8(1.7)
10.6(6.3)
3.7(2.8)
66.0(6.6)
3.43
RC Prior
Winitial(σ)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
8.3(20)
16.59
Wbayesian(σ)
2.2(1.9)
1.6(1.4)
2.8(2.3)
3.5(3)
1.7(1.6)
4.2(3.5)
1.4(1.4)
1.1(1.0)
1.7(1.6)
13.1(6.7)
3.6(2.8)
63.0(7.0)
3.49
To further examine its applicability
and reliability, we have also
carried out clustering and Bayesian analysis focusing on the central
amino acid of Ala3. The clustering results in five substates,
as shown in Figure 2b. Only eight out of 15
experimental J-couplings (see Table S1 for
those J-couplings labeled red) are related to dihedral angles of the
center residue and were used to characterize this five-state ensemble.
As shown in Tables 3 and 4, we can see that all results are very consistent, with the PII conformation most dominant with a population of 86% ±
5%, 82% ± 6%, 84% ± 5%, and 81% ± 6%, respectively,
for different priors and force fields. Meanwhile, all our results
(Tables 1–4)
consistently indicate that if focusing on the central residue, the
α basin would be the second most populated (less than 10%) while
the β conformation substate would be the least populated. It
should be noted that Graf’s three-state model[28] for the central residue of Ala3 results in close
to 0 population for the α conformation, which seems puzzling
given the helix propensity of Ala.
Table 3
Five-State Results
for Trialanine
with Amber99SB Force Field and TIP3P Water with Both MD Prior and
Random Coil Prior
Amber99SB
& TIP3P
Y
αL
α
β
PII
χ2
MD Prior
Winitial(σ)
0.024(20)
1.3(20)
9.8(20)
31.7(20)
57.2(20)
9.46
Wbayesian(σ)
3.2(2.6)
3.5(2.7)
5.5(4.1)
2.0(1.8)
85.8(4.9)
2.10
RC Prior
Winitial(σ)
20.0(20)
20.0(20)
20.0(20)
20.0(20)
20.0(20)
20.78
Wbayesian(σ)
4.0(3.0)
3.9(2.9)
7.6(5.1)
2.1(1.9)
82.4(5.6)
2.32
Table 4
Five-State Results
for Trialanine
with Amber99SB Force Field and TIP4PEW Water with MD Prior and Random
Coil Prior
Amber99SB & TIP4PEW
Y
αL
α
β
PII
χ2
MD Prior
Winitial(σ)
1.4(20)
6.9(20)
11.0(20)
28.0(20)
52.7(20)
8.99
Wbayesian(σ)
3.3(2.6)
3.7(2.8)
6.5(4.6)
2.0(1.9)
84.5(5.2)
1.96
RC Prior
Winitial(σ)
20.0(20)
20.0(20)
20.0(20)
20.0(20)
20.0(20)
20.35
Wbayesian(σ)
4.0(3.0)
4.0(2.9)
8.8(5.5)
2.1(1.9)
81.1(5.9)
2.17
Finally, we applied this integrated computational-experimental-Bayesian
approach to characterize the conformational ensemble in trivaline
in aqueous solution, as illustrated in Figure 4a. We carried out 200 ns molecular dynamics simulations using the
Amber99SB/TIP3P force field, and snapshots were clustered into eight
conformation substates as shown in Figure 4b and c. Using Graf’s experimental data set of NMR coupling
constants,[28] we determined an eight substate
conformational ensemble for trivaline (Table 5) with a five-substate conformation ensemble for the central residue
of trivaline (Table 6) using both MD and RC
priors. The results are again very consistent despite employing different
priors or different numbers of conformational substates. The most
dominant conformation substate for trivaline has the center residue
in the PII conformation with a population ∼49% ±
7%, much lower than that for trialanine.
Figure 4
“Divide-and-merge”
two-stage clustering of a trivaline
MD trajectory simulated with the Amber99SB forced field and TIP3P
water. (a) Trivaline at pH = 2 with each residue labeled. (b) Stage
1, population distribution with residue-based clustering based on
Markov state models. (c) Stage 2, residue-based macrostates are merged
to yield a total 10 substates for the whole peptide in principle,
but only eight substates exist in the MD simulation.
Table 5
Eight-State Results for Trivaline
with Amber99SB Force Field and TIP3P Water with MD Prior and Random
Coil Prior
αL 1
αL 2
α
β
PII
Amber 99SB &
TIP3P
1
2
3
4
5
6
7
8
χ2
MD Prior
Winitial(σ)
0.1(20)
0.1(20)
0.2(20)
16.2(20)
5.5(20)
25.6(20)
8.2(20)
44.1(20)
3.13
Wbayesian(σ)
6.0(4.1)
6.1(4.1)
2.9(2.4)
19.2(7.6)
2.7(2.2)
8.8(5.5)
3.4(2.7)
51.1(7.1)
1.99
RC Prior
Winitial(σ)
12.5(20)
12.5(20)
12.5(20)
12.5(20)
12.5(20)
12.5(20)
12.5(20)
12.5(20)
7.13
Wbayesian(σ)
6.4(4.3)
6.5(4.3)
2.9(2.5)
20.8(8)
2.8(2.4)
8.9(5.6)
4(3.1)
47.7(7.2)
2.05
Table 6
Five-State Results for Trivaline with
Amber99SB Force Field and TIP3P Water with MD Prior and Random Coil
Prior
Amber99SB & TIP3P
αL 1
αL2
α
β
PII
χ2
MD Prior
Winitial(σ)
0.1(20)
0.1(20)
16.4(20)
31.0(20)
52.3(20)
4.39
Wbayesian(σ)
5.8(4.2)
7(4.5)
22.8(8.0)
10.5(5.9)
53.9(6.9)
2.46
RC Prior
Winitial(σ)
20.0(20)
20.0(20)
20.0(20)
20.0(20)
20.0(20)
9.46
Wbayesian(σ)
6.2(4.4)
7.6(4.7)
25.7(8.2)
10.5(6)
50.0(6.9)
2.51
“Divide-and-merge”
two-stage clustering of a trivaline
MD trajectory simulated with the Amber99SB forced field and TIP3P
water. (a) Trivaline at pH = 2 with each residue labeled. (b) Stage
1, population distribution with residue-based clustering based on
Markov state models. (c) Stage 2, residue-based macrostates are merged
to yield a total 10 substates for the whole peptide in principle,
but only eight substates exist in the MD simulation.
Summary
Conformational analysis of
unfolded peptides is notoriously challenging,
due to the intrinsically dynamic nature of the ensemble of accessible
states that are distinguished by small free energy differences. Data
from a variety of different spectroscopies including UVCD, VCD, Raman,
and ROA have been used to demonstrate that there are in fact strong
conformational preferences in unfolded states, modeled here by the
trialanine and trivaline peptides in water. As pointed out in a detailed
review by Adzhubei et al.,[51] the PII conformation plays a major role in unfolded peptide structure.
The main problem has been to quantify this or any other substate preference.
In this work, we have demonstrated an integrated computational-experimental-Bayesian
approach to characterize conformational ensembles. In comparison with
previous methods, this integrated approach offers several novel attractive
features: (i) It characterizes the whole chain rather than a single
residue. (ii) It provides an objective and robust method to define
and assign peptide conformational substates. (iii) It naturally includes
uncertainty estimations, taking errors in both experimental data and
computational results into account. (iv) Bayesian estimates of peptide
conformational substates and their confidence intervals can be further
systematically refined with additional high-quality experimental data
and more accurate computational modeling, including more reliable
force fields, more extensive sampling, and more accurate methods to
compute experimental observables. Work along this line is currently
in progress.Here, we have applied this integrated approach
to define the conformational
ensembles of trialanine and trivaline in aqueous solution. Our results
concur with other studies that disprove the random-coil model (a detailed
review by Adzhubei et al.[51]) and indicate
that PII conformation is dominant in both tripeptides,
to different degrees. One conclusion of the new approach is that the
picture of a simple two-state distribution between β and PII conformations is oversimplified. Our current analysis points
to significantly lower populations of β structure than predicted
by earlier studies.[18,28] The integrated strategy reported
opens a way to quantitatively define the populations of conformation
ensembles in unfolded peptides using a systematic and consistent procedure.
Inclusion of different sequences and experimental data sets is currently
being investigated.
Authors: Andrew Hagarman; Thomas J Measey; Daniel Mathieu; Harald Schwalbe; Reinhard Schweitzer-Stenner Journal: J Am Chem Soc Date: 2010-01-20 Impact factor: 15.419
Authors: Shuxiang Li; Casey T Andrews; Tamara Frembgen-Kesner; Mark S Miller; Stephen L Siemonsma; Timothy D Collingsworth; Isaac T Rockafellow; Nguyet Anh Ngo; Brady A Campbell; Reid F Brown; Chengxuan Guo; Michael Schrodt; Yu-Tsan Liu; Adrian H Elcock Journal: J Chem Theory Comput Date: 2015-03-10 Impact factor: 6.006