J C Gaines1,2, A Virrueta2,3, D A Buch4, S J Fleishman5, C S O'Hern1,2,3,6,7, L Regan1,2,8,9. 1. Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA. 2. Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, CT 06520, USA. 3. Department of Mechanical Engineering and Materials Science, Yale University, New Haven, CT 06520, USA. 4. C. Eugene Bennett Department of Chemistry, 217 Clark Hall, West Virginia University, Morgantown, WV 26506, USA. 5. Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 76100, Israel. 6. Department of Physics, Yale University, New Haven, CT 06520, USA. 7. Department of Applied Physics, Yale University, New Haven, CT 06520, USA. 8. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. 9. Department of Chemistry, Yale University, New Haven, CT 06520, USA.
Abstract
Protein core repacking is a standard test of protein modeling software. A recent study of six different modeling software packages showed that they are more successful at predicting side chain conformations of core compared to surface residues. All the modeling software tested have multicomponent energy functions, typically including contributions from solvation, electrostatics, hydrogen bonding and Lennard-Jones interactions in addition to statistical terms based on observed protein structures. We investigated to what extent a simplified energy function that includes only stereochemical constraints and repulsive hard-sphere interactions can correctly repack protein cores. For single residue and collective repacking, the hard-sphere model accurately recapitulates the observed side chain conformations for Ile, Leu, Phe, Thr, Trp, Tyr and Val. This result shows that there are no alternative, sterically allowed side chain conformations of core residues. Analysis of the same set of protein cores using the Rosetta software suite revealed that the hard-sphere model and Rosetta perform equally well on Ile, Leu, Phe, Thr and Val; the hard-sphere model performs better on Trp and Tyr and Rosetta performs better on Ser. We conclude that the high prediction accuracy in protein cores obtained by protein modeling software and our simplified hard-sphere approach reflects the high density of protein cores and dominance of steric repulsion.
Protein core repacking is a standard test of protein modeling software. A recent study of six different modeling software packages showed that they are more successful at predicting side chain conformations of core compared to surface residues. All the modeling software tested have multicomponent energy functions, typically including contributions from solvation, electrostatics, hydrogen bonding and Lennard-Jones interactions in addition to statistical terms based on observed protein structures. We investigated to what extent a simplified energy function that includes only stereochemical constraints and repulsive hard-sphere interactions can correctly repack protein cores. For single residue and collective repacking, the hard-sphere model accurately recapitulates the observed side chain conformations for Ile, Leu, Phe, Thr, Trp, Tyr and Val. This result shows that there are no alternative, sterically allowed side chain conformations of core residues. Analysis of the same set of protein cores using the Rosetta software suite revealed that the hard-sphere model and Rosetta perform equally well on Ile, Leu, Phe, Thr and Val; the hard-sphere model performs better on Trp and Tyr and Rosetta performs better on Ser. We conclude that the high prediction accuracy in protein cores obtained by protein modeling software and our simplified hard-sphere approach reflects the high density of protein cores and dominance of steric repulsion.
A grand challenge in biology is to design new protein–protein interactions for many
potential applications including point of care diagnostics (Rusling), sensors for proteinaceous
biological warfare agents (Sapsford) and more effective vaccines (Correia). In order to design new
proteins we must learn the rules for designing protein cores, which endow proteins and
protein complexes with stability. Computational protein design provides a unique approach
with which to gain fundamental insights into protein structure. It is important to benchmark
the predictions made by computational design software against known protein crystal
structures. A frequently used test for computational design software is side chain
conformation recovery, where the side chains are removed from a protein crystal structure
and the software attempts to recover the observed side chain conformations of all residues
(Peterson).
There are two categories of protein core repacking: one starts with all possible sequences
and seeks to recover the wild type sequence (Dobson
;Dantas
) and the other starts with the wild type sequence
and seeks to recover the observed combination of side chain dihedral angles. Here, we focus
on the second type, where the side chains of core residues are removed simultaneously and
all side chain dihedral angle combinations of the starting sequence are sampled. The optimal
combination is predicted and compared to the observed structure (see Fig.1). Protein core repacking is a particularly
meaningful test of computational design software developed to design stable variants of
proteins (Goldenzweig) and design new protein–protein interactions (Fleishman).
Fig. 1
Illustration of single and combined
rotations for protein core repacking studies using PDB: 1C7K. (A) We show
a cluster of three interacting core residues (Thr, Leu, Val) shaded in green using
stick representation with the rest of the protein shaded in gray. (B) For
combined rotations, all three core residues, with atoms represented as spheres (C:
green, N: blue, O: oxygen), are rotated simultaneously and the repulsive steric
interactions are calculated between atoms in the three moving residues as well as
between atoms in the residues with fixed side chains. (C–E)
For single rotations, only one core residue ((C) Thr, (D) Leu or (E) Val) in the
cluster is rotated at a time, while the others remain fixed. Steric interactions are
calculated between atoms in the moving residue and atoms of all other residues in the
protein. In all cases, each atom in the protein is represented as a sphere, but
stationary atoms are shown here as sticks to highlight the residues that are not
rotated.
Illustration of single and combined
rotations for protein core repacking studies using PDB: 1C7K. (A) We show
a cluster of three interacting core residues (Thr, Leu, Val) shaded in green using
stick representation with the rest of the protein shaded in gray. (B) For
combined rotations, all three core residues, with atoms represented as spheres (C:
green, N: blue, O: oxygen), are rotated simultaneously and the repulsive steric
interactions are calculated between atoms in the three moving residues as well as
between atoms in the residues with fixed side chains. (C–E)
For single rotations, only one core residue ((C) Thr, (D) Leu or (E) Val) in the
cluster is rotated at a time, while the others remain fixed. Steric interactions are
calculated between atoms in the moving residue and atoms of all other residues in the
protein. In all cases, each atom in the protein is represented as a sphere, but
stationary atoms are shown here as sticks to highlight the residues that are not
rotated.In recent work, Peterson performed side chain recovery for ~200 proteins using six different protein
modeling software suites (SCWRL (Krivov), OSCAR (Liang
), RASP (Miao
), Rosetta (Kuhlman and Baker, 2000), Sccomp (Eyal
) and FoldX (Guerois)). The key component of computational
protein design software is the energy function, which can include many terms:
stereochemistry (potentials that enforce equilibrium bond lengths and angles derived from
small molecule crystal structure data); statistical potentials derived from
backbone-dependent side chain rotamer libraries (Dunbrack and Cohen, 1997,Shapovalov and
Dunbrack, 2011); repulsive and attractive van der Waals atomic interactions;
hydrogen bonding; electrostatics; solvation; disulfide bond energy (RASP-specific), and an
ad hoc pairwise residue potential (Rosetta-specific). The energy
functions differ in the specific form and relative weights assigned to each of these
terms.Overall, protein modeling software performs well for protein side chain recovery. In
particular, Petersonet al. found that all six software packages obtain
higher accuracy for their predictions for the side chain dihedral angle conformations for
core residues compared to surface residues. In addition, the software packages achieve
higher accuracy when predicting χ1 alone (90–95% within 40°) compared to
predictions of side chain dihedral angle combinations, e.g. χ1 and χ2
(82–87% within 40° degrees for each). Because the rotamer recovery prediction accuracy for
all the protein design software tested is higher for core residues, here we investigate to
what extent an energy function that only includes stereochemistry and repulsive hard-sphere
atomic interactions can repack protein cores.We take a systematic approach to protein core repacking studies. We first study single
residue rotations and then collective residue rotations, both using the hard-sphere model.
This comparison allows us to determine if multiple sterically allowed side chain
conformations are possible in the core. We then perform collective fixed-sequence core
repacking calculations using Rosetta, a well-established protein design software package,
and compare the results to those of the hard-sphere model. This comparison allows us to
identify the dominant forces that determine side chain conformations in protein cores.In the results section, we first describe studies of single residue rotations, where we
sample all side chain dihedral angle combinations of a single core residue, keeping the side
chain conformations of all other residues fixed to their crystal structure values. We
evaluate the energy of each side chain dihedral angle combination and compare the lowest
energy side chain dihedral angle combination for each core residue (Leu, Ile, Met, Phe, Ser,
Thr, Trp, Tyr, Val) to the observed values. We find that the hard-sphere model achieves a
prediction accuracy of greater than 90% (within 30°) for all residues except Met (84%) and
Ser (38%). We compare the results of single residue rotations to the results of collective
residue rotations, which provides insight into the number of possible ways to pack
interacting core residues.For collective residue rotations, we simultaneously rotate the side chains of all residues
in each interacting cluster. We perform these calculations for the same clusters in all
proteins using both the hard-sphere model and Rosetta. We observe the same high prediction
accuracy for collective residue rotations as we did for single residue rotations for the
hard-sphere model: greater than 90% accuracy (within 30°) for all core residues except for
Met (77%) and Ser (36%) (see Figs5 and6). For combined rotations, Rosetta and the
hard-sphere model give the same high prediction accuracy (≥90% within 30°) for Ile, Leu,
Thr, Phe and Val (Fig.7). The hard-sphere model
performs slightly better on aromatic residues than Rosetta, whereas Rosetta achieves much
higher accuracy for Ser. We discuss potential explanations for these differences in the
Results section. The cases for which the hard-sphere model does not achieve high prediction
accuracy allow us to identify when additional interactions are necessary to predict side
chain conformations.
Fig. 5
Combined rotations in the
context of the protein core: the fraction (F(Δχ)) of each residue
type for which the hard-sphere model prediction of the side chain conformation is Δχ
< 10° (yellow, left bar), 20° (red, center bar) or 30° (blue, right bar) from the
crystal structure for core residues in the Dunbrack 1.0 Å
database.
Fig.
6
Comparison of the accuracy of single and combined rotations for
core residues in the Dunbrack 1.0 Å database. Each bar shows the fraction of residues
for which the hard-sphere model prediction of the side chain conformation is Δχ <
30° for single (blue, left bar) or combined (red, right bar)
rotations.
Fig.
7
Comparison of the accuracy of combined rotations for core
residues in the Dunbrack 1.0 Å database using the hard-sphere model (red, right bar)
and Rosetta (yellow, right bar). Each bar shows the fractionF(Δχ) of
residues for which the model prediction was Δχ < 30°.
Materials and methods
Data sets of protein crystal structures and core residues
We use the Dunbrack 1.0 Å database (Wang and
Dunbrack, 2003,2005) of
high-resolution protein crystal structures as the basis for our protein core repacking
studies. The Dunbrack 1.0 Å database contains 221 proteins with resolution ≤1.0 Å, side
chain B-factors per residue ≤30 Å2, R-factor ≤0.2 and sequence identity
<50%. As a way to model the system at nonzero temperature and improve the statistics,
variations in bond lengths and angles are implemented by replacing each side chain with
different instances of the side chain taken from the Dunbrack 1.7 Å database, each with an
independent set of side chain bond lengths and angles (Zhou). The Dunbrack 1.7 Å database
contains ~800 proteins with resolution ≤1.7 Å (Dunbrack and Cohen, 1997). Additional studies were performed on a second
database, the ‘HiQ54’ database (Leaver-Fay
), which contains 54 non-redundant, single-chain
monomeric proteins with resolution and MolProbity score <1.4 Å.Our analysis focuses on the side chains of residues in protein cores. We have identified
all core residues in the Dunbrack 1.0 Å database using a method described previously
(Caballero;
Gaines). In
brief, noncore atoms are identified that are on the surface of the protein or near an
interior void with a radius ≥1.4 Å. In our strict definition, a core residues is defined
as any residue containing exclusively core atoms (including hydrogen atoms). The numbers
of each amino acid that occur as core residues in the Dunbrack 1.0 Å database are given in
Table1.
Table I.
The number of each amino acid
designated as core in the Dunbrack 1.0 Å database
Amino acid
No. in
Dunbrack 1.0 Å
database
Ala
529
Asn
50
Asp
78
Arg
6
Cys
142
Gln
17
Glu
31
Gly
453
His
24
Ile
453
Leu
355
Lys
3
Met
90
Phe
141
Pro
63
Ser
193
Thr
136
Trp
28
Tyr
69
Val
438
Total
849
The number of each amino acid
designated as core in the Dunbrack 1.0 Å database
Hard-sphere model
As described in previous work (Zhou;Gaines), the ‘hard-sphere’ model treats each atom
i as a sphere that interacts pairwise with all other non-bonded atoms
j via the purely repulsive Lennard–Jones potential:
where is the center-to-center
separation between atomsi andj,
is the Heaviside
step function, ϵ is the energy scale of the repulsive interactions,
=
/2
and
is the radius of atomi. The values for the atomic radii
(Csp3, Caromatic: 1.5 Å; CO : 1.3 Å ; O: 1.4 Å; N: 1.3 Å;
HC: 1.10 Å; HO,N: 1.00 Å and S: 1.75 Å) were obtained in prior
work (Zhou) by
minimizing the difference between the side chain dihedral angle distributions predicted by
the hard-sphere dipeptide mimetic model and those observed in protein crystal structures
for a subset of amino acid types. Hydrogen atoms were added using the REDUCE software
program (Word),
which sets the bond lengths for C-H, N-H and S-H to 1.1, 1.0 and 1.3 Å, respectively, and
the bond angles to 109.5° and 120° for angles involving Csp3 and
Csp2 atoms, respectively. Additional dihedral angle degrees of freedom
involving hydrogen atoms are chosen to minimize steric clashes (Word).Predictions of the side chain conformations of single amino acids are obtained by
rotating each of the side chain dihedral angles
(with a fixed backbone conformation, (Liu and Chen,
2016)) and finding the lowest energy conformations of the residue, where the
energy includes both intra- and inter-residue steric repulsive interactions (Fig.1C–E). If the lowest energy conformation of the
residue is degenerate (i.e. multiple dihedral angle configurations result in the same
minimum energy), all lowest energy configurations are recorded. We then calculate the
Boltzmann weight of the lowest energy side chain conformation of the residue,
,
where the temperatureT/ε = 10−2 approximates
hard-sphere-like interactions. To sample bond length and angle fluctuations, each residue
is replaced with random bond length and angle combinations taken from the Dunbrack 1.7 Å
database and the new lowest energy conformation is found. We select 50 bond length and
angle variants, and for each find the lowest energy dihedral angle conformation and
corresponding
values. We averageP over the variants to obtain
. We
then compare the particular dihedral angle combination associated with
the highest value ofP to the side chain of
the crystal structure. To assess the accuracy of the
hard-sphere model in predicting the side chain dihedral angles of residues in protein
cores, we calculatedIf multiple side chain configurations were reported in the Protein Databank for a given
protein, Δχ was calculated for all reported conformations with an occupancy ≥40% and the
smallest value of Δχ was selected. We calculate the fractionF(Δχ) of
residues with Δχ less than 10°, 20° and 30°. A discussion of the calculations of the error
bars forF(Δχ) is included in theSupplemental Material.In addition to single residue rotations, we performed core repacking using combined
rotations of interacting core residues in each protein with the wild type amino acid
sequence. For the combined rotation method, all residues in an interacting cluster are
rotated simultaneously (with fixed backbone conformations), and the global minimum energy
conformation is identified (Fig.1B). A cluster
of interacting residues is defined such that side chain atoms of each residue in the
cluster only interact with other residues in the cluster, but do not interact with the
side chains of other core residues in the protein (Fig.2). Specifically, if an atomic overlap is possible between two
residues without an interaction with the protein backbone also occurring, those two
residues are considered to be interacting. Examples of interaction networks between core
residues in interacting clusters are given in Fig.3C. Ala, Gly and Pro were excluded from this analysis since these amino acids do
not possess side chain dihedral angle degrees of freedom. In addition, we did not include
Cys residues because they can form disulfide bonds. The Dunbrack 1.0 Å database includes
352 distinct clusters (with greater than 1 residue). A few clusters contained 10 or more
residues, but these were not included in the analyses. We also removed clusters containing
the charged residues Arg, Asp, Glu and Lys and the polar residues Asn, Gln and His, which
are rare in protein cores (<10% of core residues). This resulted in a total of 250
clusters and 852 residues from the Dunbrack 1.0 Å database with sizes given in Fig.3. The frequency of each amino acid in these
clusters is given in Table2. The HiQ54
database contains 50 core clusters with 2–15 residues per cluster (see Fig.3B).
Fig. 2
Schematic in two dimensions of a protein that
contains three core clusters. Each amino acid is represented by disk-shaped atoms
that are connected by lines. The protein backbone is indicated by a thick black
line, and the thinner lines form the side chains. Each residue contains two backbone
atoms and between one and seven side chain atoms. ‘Surface’ residues are shaded
gray. Any residue that is completely surrounded by other atoms is designated as a
core residue. Each core cluster contains residues that interact with each other but
do not interact with the side chains of residues in another cluster. For example,
the cluster in blue has atoms that touch the backbone of the cluster in orange, but
these atoms do not interact with the side chains of residues in the orange cluster
without clashing with the backbone first. The three core clusters shown here contain
five (blue), five (orange) and two (green) residues.
Fig. 3
The distribution of
cluster sizes in the (A) Dunbrack 1.0 Å and (B) HiQ54
databases. Each cluster is defined as a set of residues in a protein core that
interact with each other, but not with any other side chains of other core residues.
(C) Examples of interaction networks based on two clusters of core
residues from protein PDB:1T3Y. The clusters contain eight (top) and five (bottom)
residues, respectively. Each line in the network indicates interactions between two
residues. For example, in the top cluster Ile 125 interacts with Ile 79 and Leu 120,
but does not interact with Trp 81 or Val 17 (in another
cluster).
Table II.
The number of each uncharged amino acid found in
interacting clusters (with size greater than 1 residue) in the Dunbrack 1.0 Å
database
Amino acid
No. in clusters in Dunbrack 1.0 Å
database
Ile
163
Leu
179
Met
50
Phe
70
Ser
68
Thr
48
Trp
13
Tyr
29
Val
229
Total
849
Schematic in two dimensions of a protein that
contains three core clusters. Each amino acid is represented by disk-shaped atoms
that are connected by lines. The protein backbone is indicated by a thick black
line, and the thinner lines form the side chains. Each residue contains two backbone
atoms and between one and seven side chain atoms. ‘Surface’ residues are shaded
gray. Any residue that is completely surrounded by other atoms is designated as a
core residue. Each core cluster contains residues that interact with each other but
do not interact with the side chains of residues in another cluster. For example,
the cluster in blue has atoms that touch the backbone of the cluster in orange, but
these atoms do not interact with the side chains of residues in the orange cluster
without clashing with the backbone first. The three core clusters shown here contain
five (blue), five (orange) and two (green) residues.The distribution of
cluster sizes in the (A) Dunbrack 1.0 Å and (B) HiQ54
databases. Each cluster is defined as a set of residues in a protein core that
interact with each other, but not with any other side chains of other core residues.
(C) Examples of interaction networks based on two clusters of core
residues from protein PDB:1T3Y. The clusters contain eight (top) and five (bottom)
residues, respectively. Each line in the network indicates interactions between two
residues. For example, in the top cluster Ile 125 interacts with Ile 79 and Leu 120,
but does not interact with Trp 81 or Val 17 (in another
cluster).The number of each uncharged amino acid found in
interacting clusters (with size greater than 1 residue) in the Dunbrack 1.0 Å
databasePredictions from combined rotations for the side chain dihedral angle combinations of
core residues in a given cluster are obtained by rotating each of the side chain dihedral
angles
of all residues in that cluster and identifying the lowest energy side chain dihedral
angle combination, where the total energy includes the repulsive Lennard–Jones
interactions between atoms on a single residue as well as atoms on different residues both
in the given cluster and other residues in the protein. We represented the side chain
dihedral angle combinations as a tree, where each level represents an amino acid and the
nodes at each level represent the allowed side chain dihedral angle conformations for the
corresponding residue. We then implement a depth-first search to find the global energy
minimum and the corresponding side chain dihedral angle conformation. Bond lengths and
angles were varied by sampling 30 bond length and angle variants from the Dunbrack 1.7 Å
database. The Boltzmann weightP for each variant was found
and averaged over the variants to obtain,
and Δχ was calculated as described above for single residue rotations.
Rosetta predictions
The prediction accuracy for collective core repacking using the hard-sphere model was
compared to that from Rosetta (Leaver-Fay) for the same core clusters. We first generated relaxed
structures for each protein studied, using Rosetta's fast relax protocol with backbone
constraints that maintain the positions of the backbone heavy atoms near their crystal
structure locations (Tyka;Liu and Chen, 2016). Fifty
relaxed structures were produced and the five lowest energy structures were chosen for
core repacking. Rotamer sampling on all side chain dihedral angles using the wild type
amino acid sequence was set to the maximum value (i.e. the original rotamer value ± 0.25
standard deviations). For each of the five relaxed structures, we performed combined
repacking of the residues in each core cluster and selected the output conformation with
the lowest Rosetta energy. Δχ was calculated for each residue as described above,
resulting in five Δχ values for each residue, which were used to obtain the average
fractionF(Δχ) of residues with Δχ less than 10°, 20° and 30°. A sample
Rosetta script and a description of the calculations of the error bars for
F(Δχ) is given in theSupplemental Material.
Results
In previous studies, we have shown that the hard-sphere dipeptide model can recapitulate
the observed side chain dihedral angle distributions of nonpolar, aromatic and polar amino
acids (Cys, Ile, Leu, Phe, Ser, Thr, Trp, Tyr and Val) (Zhou). In more recent work (Caballero), we showed
that the hard-sphere model including both intra- and inter-residue interactions could
predict the side chain dihedral angle conformations of single residues in protein cores. The
prediction accuracy (within 20° of the observed structure) was greater than 90% for Ile,
Leu, Phe, Thr, Trp, Tyr and Val. This prior work focused on rotations of the side chains of
individual residues in protein cores. Here, we expand this work to examine the predictions
obtained by the hard-sphere model from simultaneous rotations of multiple residues in
protein cores (maintaining the wild type amino acid sequence), as well as to a larger
database of protein crystal structures. To enable a detailed comparison with a
well-established protein design software package, we compare the predictions of the
hard-sphere model to those from Rosetta on the same sets of core residues.In Fig.4, we investigate the accuracy of the
hard-sphere model in predicting the side chain dihedral angles of individual residues in
protein cores. For each amino acid (Ile, Leu, Met, Phe, Ser, Thr, Trp, Tyr and Val), we
calculate the percentage of residues for which the predicted side chain dihedral angle
conformation is within 10°, 20° and 30° of the crystal structure value. Consistent with our
prior results, the hard-sphere model accurately predicts the side chain dihedral angle
combinations of single residues in the context of the protein for Ile, Leu, Phe, Thr, Trp,
Tyr and Val (≥90% within 30°). This result emphasizes that the purely repulsive hard-sphere
model can accurately predict the side chain dihedral angle combinations for nonpolar and
uncharged amino acids. The quantitative values of our results differ slightly from those
found inCaballero because in the current study we use the much larger Dunbrack 1.0 Å database
of protein crystal structures.
Fig.
4
Single residue rotations in the context of the protein core: the
fraction (F(Δχ)) of each residue type for which the hard-sphere model
prediction of the side chain conformation is Δχ < 10° (yellow, left bar), 20° (red,
center bar) or 30° (blue, right bar) from the crystal structure for core residues in
the Dunbrack 1.0 Å database.
Single residue rotations in the context of the protein core: the
fraction (F(Δχ)) of each residue type for which the hard-sphere model
prediction of the side chain conformation is Δχ < 10° (yellow, left bar), 20° (red,
center bar) or 30° (blue, right bar) from the crystal structure for core residues in
the Dunbrack 1.0 Å database.We find that the hard-sphere model is unable to predict with high accuracy, the observed
side chain conformations for two residues that we studied: Ser and Met. Our results for Met
are consistent with those found inVirruetta
. In this prior work, we found that local steric
interactions were insufficient to predict the shape of theP(χ3)
distribution for Met. It was necessary to add attractive atomic interactions to the
hard-sphere model to reproduce the observedP(χ3). Here, using
only repulsive interactions, we predict ~80% of Met residues within 30°. Our results for Ser
(only 38% within 30°) are also consistent with our prior work inCaballero. We speculate that because
the side chain of Ser is small, hydrogen-bonding interactions must be included to correctly
place its side chain. In contrast, we suggest that the more bulky Thr and Tyr side chains
cause steric interactions to determine the positioning of their side chains, even though
they are able to form hydrogen bonds (Zhou).We obtain similar results when we perform combined rotations of core residues using the
hard-sphere model (Figs5 and6). Single and combined rotations have the same
prediction accuracy, which shows that there are very few arrangements of the residues in a
protein core that are sterically allowed and that the side chain conformations of most core
residues are dominated by packing constraints. Slightly lower prediction accuracy is found
for a few residues using combined rotations, because finding the conformation corresponding
to the global energy minimum may improve the accuracy for one residue, while lowering the
accuracy for another residue in the same cluster. We also performed single and collective
repacking on the HiQ54 data set and found similar accuracies for both single and combined
rotations for both data sets (These results are shown in theSupplementary Material).Combined rotations in the
context of the protein core: the fraction (F(Δχ)) of each residue
type for which the hard-sphere model prediction of the side chain conformation is Δχ
< 10° (yellow, left bar), 20° (red, center bar) or 30° (blue, right bar) from the
crystal structure for core residues in the Dunbrack 1.0 Å
database.Comparison of the accuracy of single and combined rotations for
core residues in the Dunbrack 1.0 Å database. Each bar shows the fraction of residues
for which the hard-sphere model prediction of the side chain conformation is Δχ <
30° for single (blue, left bar) or combined (red, right bar)
rotations.We now compare the results of core repacking (with combined rotations) using the
hard-sphere model to those found using Rosetta (Fig.7). For the residues Ile, Leu, Phe, Thr and Val, the hard-sphere model achieves a
similar prediction accuracy to that obtained by Rosetta. The largest differences occur for
Ser: Rosetta gives 85% (within 30°), while the hard-sphere model gives 36% (within 30°). We
previously speculated that because the side chain of Ser is small, hydrogen-bonding
interactions are more important for properly positioning its side chain than the side chain
of Thr. Rosetta includes hydrogen-bonding interactions, which is likely the reason for its
higher prediction accuracy.Comparison of the accuracy of combined rotations for core
residues in the Dunbrack 1.0 Å database using the hard-sphere model (red, right bar)
and Rosetta (yellow, right bar). Each bar shows the fractionF(Δχ) of
residues for which the model prediction was Δχ < 30°.Rosetta obtains prediction accuracies of 85% and 78% (within 30°) for Trp and Tyr,
respectively, while the hard-sphere model obtains 95% and 94% (within 30°) for Trp and Tyr,
respectively (Fig.7). To further investigate
this difference, we calculated Δχ for χ1 and χ2 separately for both
residues (Fig.8). For Trp, the hard-sphere model
performs slightly better than Rosetta at predicting χ1 and χ2. For
Tyr, Rosetta and the hard-sphere model perform equally well for χ1, but the
hard-sphere model performs better for χ2.
Fig. 8
Comparison of the accuracy of combined rotations for
core Met, Trp and Tyr residues in the Dunbrack 1.0 Å database using the hard-sphere
model (red, left bar) and Rosetta (yellow, right bar). Each bar shows the fraction
F(Δχ) of residues for which the model prediction was Δχ < 30°
for each side chain dihedral angle separately.
Comparison of the accuracy of combined rotations for
core Met, Trp and Tyr residues in the Dunbrack 1.0 Å database using the hard-sphere
model (red, left bar) and Rosetta (yellow, right bar). Each bar shows the fraction
F(Δχ) of residues for which the model prediction was Δχ < 30°
for each side chain dihedral angle separately.For Met, both the hard-sphere model and Rosetta obtain prediction accuracies below 80% for
Δχ < 30°. Both the hard-sphere model and Rosetta accurately predict χ1 and
χ2 (above 90% within 30°), but have much lower prediction accuracies for
χ3 (below 80% within 30°) (see Fig.8). In previous work, we showed that χ1 and χ2 of Met are
well predicted using the hard-sphere model, whereas χ3 is not (Virrueta). This
result holds true for both the dipeptide model as well as in the context of the protein
core. In this previous study, we found that the addition of attractive atomic interactions
improves the prediction of χ3 for Met. The current results for single and
collective core repacking showing that the hard-sphere model yields low χ3
prediction accuracy for Met are consistent with the previous results. For Rosetta, the
energy function includes statistical potentials that are based on backbone-dependent side
chain dihedral angle rotamer libraries. Such potentials do not fully account for the local
environment (i.e. side chain and backbone atoms of other residues). Instead, other terms in
the Rosetta energy function, for example attractive and repulsive Lennard–Jones atomic
interactions, are used to position the side chain in the local environment. We speculate
that the low prediction accuracy for χ3 of Met using Rosetta indicates that the
Lennard–Jones energy terms that account for local environment are not weighted appropriately
to identify the correct rotamer for an individual Met. Because Met represents only 6% of
core cluster residues, we do not pursue the modeling of Met further in this work.
Discussion
In this article, we showed several key results. First, single and collective core repacking
using the hard-sphere model give the same prediction accuracies for the side chain
conformations of seven of the most common core residues. This result implies that there are
no alternative sterically allowed conformations of core residues other than those in the
crystal structure. If alternative sterically allowed conformations existed, we would have
found them using the collective repacking method and thus the prediction accuracy would have
dramatically decreased relative to the value for single residue rotations. It does not.
Thus, collective repacking reveals that the structures of protein cores are uniquely
specified by steric interactions.Second, the hard-sphere model obtains prediction accuracies that are as high or higher than
Rosetta for Ile, Leu, Phe, Thr, Val, Trp and Tyr. Thus, hard-sphere interactions are
dominant in determining side chain conformations for these residues. The hard-sphere model
and Rosetta both give <80% prediction accuracy for Met, which is caused by poor
prediction of the side chain dihedral angle χ3. Rosetta performs better on Ser,
presumably because Rosetta includes hydrogen-bonding interactions, which specify the
particular side chain conformation for each local environment. Interestingly, Thr and Tyr
can both hydrogen bond, but can be accurately predicted using the hard-sphere model alone,
presumably because they both have bulkier side chains than Ser. Third, we have shown that an
energy function that only includes stereochemistry and repulsive hard-sphere atomic
interactions can repack protein cores with high accuracy, which has important implications
both for our understanding of protein structure and for application-specific protein
design.Why do the hard-sphere model and six computational protein design software packages studied
in Petersonet al. obtain similar high prediction accuracies for many core
residues? One reason is that protein cores are densely packed and thus steric repulsive
interactions are dominant (Chothia, 1975;Richards, 1977;Liang and Dill, 2001;Seeliger and de Groot, 2007;Gaines
). In addition, the weights of the repulsive atomic
interactions and statistical potentials derived from backbone-dependent side chain dihedral
angle rotamer libraries are large in comparison to other terms in the energy functions of
the six software packages.Click here for additional data file.
Authors: Bruno E Correia; John T Bates; Rebecca J Loomis; Gretchen Baneyx; Chris Carrico; Joseph G Jardine; Peter Rupert; Colin Correnti; Oleksandr Kalyuzhniy; Vinayak Vittal; Mary J Connell; Eric Stevens; Alexandria Schroeter; Man Chen; Skye Macpherson; Andreia M Serra; Yumiko Adachi; Margaret A Holmes; Yuxing Li; Rachel E Klevit; Barney S Graham; Richard T Wyatt; David Baker; Roland K Strong; James E Crowe; Philip R Johnson; William R Schief Journal: Nature Date: 2014-02-05 Impact factor: 49.962
Authors: Dagmara I Kisiela; Pearl Magala; Gianluca Interlandi; Laura A Carlucci; Angelo Ramos; Veronika Tchesnokova; Benjamin Basanta; Vladimir Yarov-Yarovoy; Hovhannes Avagyan; Anahit Hovhannisyan; Wendy E Thomas; Ronald E Stenkamp; Rachel E Klevit; Evgeni V Sokurenko Journal: PLoS Pathog Date: 2021-04-07 Impact factor: 7.464