Literature DB >> 34524800

Force Field Effects in Simulations of Flexible Peptides with Varying Polyproline II Propensity.

Stéphanie Jephthah¹, Francesco Pesce², Kresten Lindorff-Larsen², Marie Skepö¹.

Abstract

Five peptides previously suggested to possess polyproline II (PPII) structure have here been investigated by using atomistic molecular dynamics simulations to compare how well four different force fields known for simulating intrinsically disordered proteins relatively well (Amber ff99SB-disp, Amber ff99SB-ILDN, CHARM36IDPSFF, and CHARMM36m) can capture this secondary structure element. The results revealed that all force fields sample PPII structures but to different extents and with different propensities toward other secondary structure elements, in particular, the β-sheet and "random coils". A cluster analysis of the simulations of histatin 5 also revealed that the conformational ensembles of the force fields are quite different. We compared the simulations to circular dichroism and nuclear magnetic resonance spectroscopy experiments and conclude that further experiments and methods for interpreting them are needed to assess the accuracy of force fields in determining PPII structure.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 34524800 PMCID： PMC8515809 DOI： 10.1021/acs.jctc.1c00408

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Intrinsically disordered proteins and regions (IDPs and IDRs)—also recognized as natively unfolded proteins and peptides—are characterized by the lack of a well-defined tertiary structure in aqueous solution. Their structural properties are known to vary significantly, which makes it difficult to study them by standard methods. For example, because of their flexible behavior, IDPs and IDRs cannot be crystallized,[1] and until relatively recently, many molecular simulations of flexible peptides and proteins had a strong bias for sampling α-helical and β-sheet structures as well as too compact and stable structures.[2−4] Despite IDPs often being thought of as “unordered”, studies originating from the 1970s discovered that a few natively unfolded peptides possessed some degree of local order in their backbones, identified as the left-handed polyproline II (PPII) helix.[5] Although the PPII helix was discovered a long time ago, it is still substantially less known than, for example, the α-helix and the β-sheet.[6] One of the more recognized occurrences of the PPII helix might be as being part of the triple-helix structure of collagen, where it helps stabilize the collagen structure,[7] and indeed specific efforts have been devoted to optimizing computational models for collagen.[8,9] Another well-known and important property of the PPII helix has been observed in the binding to SH3 domains,[10] where it facilitates and mediates protein–protein interactions.[11,12] As a secondary structure element, the PPII helix is decidedly different from the α-helix and the β-sheet and perhaps less well-known, although frequently occurring in many proteins. The PPII helix has backbone dihedral angles of approximately (ϕ, ψ) = (−75°, +145°) with a helical pitch of 9.3 Å/turn and 3.0 residues/turn, which causes it to become quite extended.[6,13,14] The peptide bond in a PPII helix is always found as the trans isomer (ω = +180°) since the cis isomer (ω = 0°) yields a different secondary structure known as the polyproline I (PPI) helix ((ϕ, ψ) = (−75°, +160°) with a helical pitch of 5.6 Å/turn and 3.3 residues/turn).[15] One popular experimental technique for determining the secondary structure of proteins is circular dichroism (CD) spectroscopy. In the CD spectrum, the PPII helix is often associated with a strong band with negative ellipticity around 198 nm and a weak positive band around 218 nm.[5,6,16] There are several software packages available for analyzing CD data and providing estimates of the relative secondary structure content, at least in terms of α-helices and β-sheets.[17−19] Unfortunately, these algorithms may fall short when it comes to analyzing spectra of more disordered proteins that contain several less common secondary structure elements, including the PPII helix, and where the structural elements are not fixed in time. In such cases, the remaining secondary structure elements are lumped together and categorized as “others” or “random coils”. A similar problem is also encountered when the secondary structure content of protein structures is determined from simulations. Although the widely used DSSP (Dictionary of Secondary Structure of Proteins) program is able to identify and quantify a wider collection of secondary structure elements, it does not include the PPII helix. Fortunately, there is other software available that utilizes modified DSSP assignment to also include the PPII structure.[20,21] The development of force fields for simulating IDPs is constantly evolving to help alleviate the problem of overly collapsed structures in simulations and to make the simulations as computationally efficient and accurate as possible.[2−4] So far in force field development, the focus has mainly been on optimizing two aspects of IDP simulations.[22−25] The first aspect is the secondary structure propensities, which are often modified by adjusting the protein backbone dihedral parameters.[26−32] The second aspect concerns the balance of the protein–solvent interactions,[33,34] which is crucial to not sampling too compact IDP conformations and to accurately capturing the more extended conformations. This is generally controlled by increasing and fine-tuning the interaction between the protein and the water in the simulations. To evaluate new force fields for simulations of flexible peptides and proteins, different properties can be considered. Nuclear magnetic resonance (NMR) observables, such as scalar couplings and chemical shifts, are used for assessing force field accuracy by comparing simulated and experimental values and are in particular sensitive to local structural properties. Comparisons of scattering curves and the radius of gyration are used to evaluate the global compactness of the simulated proteins. In addition to such direct comparison to experimental observables, secondary structure propensities are often also assessed, and although more comprehensive analyses sometimes are used, they are most often restricted to the α-helix, the β-sheet, and the “random coil”. Conformational clustering is also sometimes used as a tool in force field analyses. Here we present a study where five different short peptides (7–24 residues long) with varying PPII propensities, as well as five variants of one of the peptides, have been simulated with four different force fields that are known to work relatively well for simulating IDPs. Our analyses were mainly focused on differences regarding the secondary structure content across peptides and force fields. Our findings revealed that, although all the chosen force fields give rise to conformational ensembles with some level of PPII structure, they do so to different extents and with different propensities toward other secondary structure elements. Additionally, all force fields capture a trend showing that the PPII content increases with the number of Pro residues in peptide chains consisting of only Ala and Pro residues.

Methods

Molecular Dynamics Simulations

Five different peptides known to possess PPII structure, as well as five different variants of one of the peptides, were simulated using atomistic molecular dynamics (MD) simulations. Names and amino acid sequences of the selected peptides are shown in Table .

Table 1

Names, Number of Amino Acid Residues (#aa), and Amino Acid Sequences of the Peptides Used in This Studya

Positively charged amino acids are shown in blue, and negatively charged amino acids are shown in red.

Positively charged amino acids are shown in blue, and negatively charged amino acids are shown in red. The simulations were performed with the GROMACS package (ver. 4.6.7),[35−37] with four different force fields that have also previously been used for simulating IDPs: (A) the AMBER ff99SB-disp force field with its own TIP4P-D-type water model,[38] (B) the AMBER ff99SB-ILDN force field[28] with the TIP4P-D water model,[34] (C) the CHARMM36IDPSFF[39,40] force field with the CHARMM-modified TIP3P water model,[41] and (D) the CHARMM36m force field,[31] also with the CHARMM-modified TIP3P water model. When discussed in the text, the force fields are referred to by their abbreviations, and when described in the figures and tables, they are referred to by their one-letter code (see Table ). A rhombic dodecahedron was used as a simulation box, with periodic boundary conditions in all directions. A minimum distance of 1 nm was set between the solute and the box edges. The initial, linear protein structures were built with the use of PyMOL.[42]

Table 2

Force Field Notations

force field	abbreviation	one-letter code
AMBER ff99SB-disp	A99SB-disp	A
AMBER ff99SB-ILDN	A99SB-ILDN	B
CHARMM36IDPSFF	C36IDPSFF	C
CHARMM36m	C36m	D

The Verlet leapfrog algorithm,[43] with a time step of 2 fs, was used to integrate the equations of motion. Nonbonded interactions were computed with a Verlet list cutoff scheme, short-ranged interactions were calculated by using a pair list with a cutoff of 1 nm, and long-ranged electrostatics were evaluated by using particle mesh Ewald summation[44] with cubic interpolation and a grid spacing of 0.16 nm. Long-ranged dispersion interactions were applied to the energies and pressures of the simulated systems. All bond lengths were constrained by using the LINCS algorithm.[45] A velocity-rescaling thermostat[46] with a relaxation time of 0.1 ps was used to keep a temperature of 293 or 300 K (see Table for details), and a Parrinello–Rahman barostat[47] kept the pressure at 1 bar throughout the simulations. A relaxation time of 2 ps was used, and the isothermal compressibility was set to that of water, i.e., 4.5 × 10–5 bar–1.

Table 3

Force Fields (FF), Total Production Run Times (t), and Temperatures (T) of All the Simulations

peptide(s)	FF	t (μs)	T (K)
A7, P-113, P₁₃, V1, V2, V3, V4, Pep3	A	5	300
A7, P₁₃, V1, V2, V2.2, V3, V4, Pep3	B	5	300
A7, P-113, P₁₃, V1, V2, V3, V4, Pep3	C	5	300
A7, P-113, P₁₃, V1, V2, V3, V4, Pep3	D	5	300
P-113	B	12	300
Hst5	A	5	293
Hst5	B	7	293
Hst5	C	5	293
Hst5	D	5	293

Energy minimization was done by using the steepest descent algorithm. Equilibration of the temperature and pressure was done in two steps and with position restraints on the proteins: (1) 500 ps in the NVT ensemble and (2) 1000 ps in the NPT ensemble. Five replicates with different starting seeds were used for each simulation. The final production runs were performed in the NPT ensemble for a total of 5 μs (5 × 1 μs) for the majority of the peptides. Hst5 with A99SB-ILDN was run for a total of 7 μs (1 × 3 μs + 2 × 2 μs), and P-113 with A99SB-ILDN was run for a total of 12 μs (1 × 8 μs + 2 × 2 μs). The differences among all the simulations are summarized in Table . Simulation data of Hst5 with A99SB-ILDN and Hst5 with C36m have previously been published in the paper by Jephthah et al.[48]

Simulation Analyses

The GROMACS tool g_analyze was used to obtain averages, autocorrelation functions, and error estimates (block averaging)[49] of the radius of gyration and the end-to-end distance of the simulated peptides. Results from these analyses were used to assess convergence and are presented in the Supporting Information (Figures S1–S15). The GROMACS tool g_cluster was used with the GROMOS algorithm[50] to obtain conformational clusters, and to obtain frames for representative structures. All protein structures were visualized and rendered with PyMOL.[42]

Principal Component Analysis

Principal component analysis (PCA) is a dimensionality reduction method that makes it possible to represent a fraction of the information contained in a large set of variables (or features) in a smaller set. This is achieved by calculating the eigenvectors (or principal components) of the variables’ covariance matrix. A PCA was performed, for each peptide, on an aggregated trajectory made by concatenating the trajectories resulting from the four different force fields. This, as suggested in ref (51), ensures a robust comparison of the force fields by projecting the resulting trajectories onto common principal components. PCA calculations were performed with pyEMMA[52] using as features the cosine and sine of each backbone dihedral. The analyses were based on the first two principal components.

Secondary Structure Analysis

The secondary structure was analyzed by using three methods: (i) the DSSP algorithm,[53] (ii) the DSSP-PPII algorithm,[20] and (iii) estimations from the Ramachandran plot. The last method is described in the following paragraph. The GROMACS tool g_chi was used to check the ω angles in the simulations (see Figure S16). The local structural preferences were also estimated from the dihedral angles of the peptide backbones, which were obtained by using the GROMACS tool g_rama. Only the α-helical (both the right-handed and the left-handed), the β-sheet, and the PPII helical regions of the Ramachandran map were considered for this analysis. Similar to what has previously been done in many other studies,[9,20,26,27,30,31,54−58] a residue was considered to be in the right-handed α-helix (αR) region of the Ramachandran map when −90° ≤ ϕ ≤ −30° and −90° ≤ ψ ≤ 0°, as illustrated in the Ramachandran map in Figure . Correspondingly, 30° ≤ ϕ ≤ 90° and 0° ≤ ψ ≤ 90° were used for the left-handed α-helix (αL) region, −180° ≤ ϕ < −104° and 180° ≤ ψ ≤ 104° were used for the β-sheet region, and −104° ≤ ϕ ≤ −46° and 116° ≤ ψ ≤ 174° were used for the PPII helix region. Residues not belonging to any of the aforementioned regions were unclassified but categorized as “bend/coil/turn” for simplification of the plots. We note that this classification is not based on secondary structure elements, but simply examines which regions of the Ramachandran map the different residues populate.

Figure 1

Example of a Ramachandran map illustrating the four different secondary structure regions analyzed in this study. The β-sheet region is defined by −180° ≤ ϕ < −104° and 180° ≤ ψ ≤ 104°, the PPII helix region is defined by −104° ≤ ϕ ≤ −46° and 116° ≤ ψ ≤ 174°, the αR-helix region is defined by −90° ≤ ϕ ≤ −30° and −90° ≤ ψ ≤ 0°, and the αL-helix region is defined by 30° ≤ ϕ ≤ 90° and 0° ≤ ψ ≤ 90°. Anything outside of these regions is classified as “random coil”. The plot is normalized for a maximum intensity of 1. The displayed angles of the example were obtained from simulations of Hst5 with A99SB-ILDN.

CD Prediction

To predict CD spectra from structural ensembles, we employed SESCA.[59] The SESCA algorithm has two steps: 1. The first step is per residue secondary structure assignment. We use DISICL[60] as the secondary structure prediction algorithm as it explicitly takes into account PPII conformations. 2. Spectral contributions from each secondary structure element in a conformation are combined to produce the CD spectra. In SESCA, the set of spectral contributions assigned to subsets of secondary structures are stored in the “basis sets”. Different basis sets for a given secondary structure assignment are available, which differ in the resolution of the spectral contributions definition. Optionally, the spectral contributions of the side chains may be added. We tested several of the available basis sets, but mainly used DS6-1SC1 (DS6-1 with side chain contribution), as this gave rise to the predicted spectra that resembled the most in shape the experimental spectra. Finally, the CD spectra from each conformation of the ensemble are linearly averaged.

Calculation of J-Couplings

We used Karplus-like equations[61] to calculate the backbone NMR scalar (J) couplings. This equation has the functional form In eq , θ is the torsional angle that determines the J-coupling constant, while A, B, and C are fitting parameters and δ is a shift used in some calculations. For 3JHNH, 3JH, 3JHNC′, and 3JHNC, we used the parametrization from Lindorff-Larsen et al.[62] The same parametrizations employed in the work of Graf et al.[63] were used for 1JNC,[64]2JNC,[65] and 3JHNC.[66] To compare the calculated coupling constants with the experimental values we usedwhere ⟨J⟩calc is the time average of the ith J-coupling constant from the frames of a simulation, J is the respective experimentally observed J-coupling constant, and σ is the error associated with the parametrization of the Karplus relationship.[65−67] Experimental errors and the errors on simulated averages are smaller than those from the Karplus parametrizations and were thus not included.

Results and Discussion

Effects of Force Field in Simulations of Five Peptides

An initial comparison of the effects of the force fields was done by analyzing the resulting average radius of gyration of the five peptides (Table ). All force fields resulted in similar average values for the radius of gyration for each of the individual peptides, although C36IDPSFF on average resulted in slightly more compact conformations compared to the other force fields. This was, however, not the case for P13, for which C36m sampled a slightly smaller average instead. Overall, it seemed like both A99SB-disp and A99SB-ILDN sampled similar averages for all peptides.

Table 4

Radius of Gyration, Rg (nm), for the Five Peptides with the Four Different Force Fields

FF	A7a	P-113	P₁₃a	Pep3	Hst5
A	0.62	0.97 ± 0.01	1.14	0.91 ± 0.03	1.29 ± 0.08
B	0.63	0.91 ± 0.03	1.14	0.94 ± 0.01	1.29 ± 0.05
C	0.60	0.86 ± 0.01	1.13	0.87 ± 0.01	1.18 ± 0.02
D	0.61	0.92 ± 0.02	1.11	0.97 ± 0.01	1.35 ± 0.03

The values of A7 and P13 are reported without error margins because their errors are smaller than 0.005 nm.

The values of A7 and P13 are reported without error margins because their errors are smaller than 0.005 nm. Because the conformational ensembles of IDPs are highly heterogeneous, it is not trivial to find a set of variables that can describe the high variability of an ensemble in a low-dimensional representation. For each peptide, we used PCA (on aggregated trajectories over all force fields as discussed under Methods) to represent and visualize the simulations in a space of reduced dimensionality. After projecting the ensembles from the different force fields onto a common subspace, we examine the free energy surfaces projected as a function of the first two principal components, and in general we find relatively similar surfaces. However, the relative probabilities of the conformational states may differ, with C36IDPSFF giving rise to less “rough” surfaces, while the others show regions poorly explored at the simulated temperature (Figure ).

Figure 2

Free energy surfaces as a function of the first (x-axis) and second (y-axis) principal components. As a result of performing the PCA on the aggregated trajectories, the PC coordinates for all the force fields on a row are the same. Nonetheless, it is worth highlighting that, in the case of P13 with C36m, we observe a shift of the minimum on the second PC axis. Since P13 is thought to mostly populate PPII conformations, we decided to characterize and compare the free energy minima resulting from A99SB-disp and C36m. Subtle differences were observed, both in the average radius of gyration (1.14 nm for A99SB-disp and 1.11 nm for CHARMM36m) and in the per residue average backbone dihedrals that, for both force fields, reside in the PPII ranges defined in DSSP-PPII[20] (Figure c). At the level of local structure, we find that A99SB-disp populates more PPII conformations than C36m (Figure d). Additionally, the PPII helix formed in simulations with A99SB-disp appears more bent with respect to an imaginary helix axis, while the PPII helix formed with C36m appears to be straighter (Figure a,b).

Figure 3

Structural analysis of free energy minima in A99SB-disp (blue) and C36m (red). (a) Side view and (b) C-terminal view of two representative P13 structures from A99SB-disp and C36m. (c) Average per residues ϕ and ψ dihedrals, compared to the PPII helix range defined in DSSP-PPII. (d) Per residue probability of PPII conformations as assigned by DSSP-PPII. Ramachandran plots depicting all dihedral angles of each simulated peptide are presented in Figure . A few differences were observed when the force fields were compared. First, the Amber force fields (especially A99SB-ILDN) clearly show a more distinct β-sheet region and the populations in both the α-helix region and the β-sheet region seem to be confined to smaller and more concentrated regions in the Amber simulations compared to the CHARMM simulations, where they seem to be spread out over larger areas. It is also worth noting that all Amber simulations have similar appearances/distributions over the Ramachandran space. The same is observed for the CHARMM simulations, and their appearance/distribution is slightly different from that of the Amber simulations. This strongly suggests that the different force field families (as might be expected) sample different structures. Interestingly, the PPII region seems to be somewhat shifted in the case of P13 with C36m, which is not seen for the other peptide simulations with the same force field. From studying these aggregated distributions across the Ramachandran plots alone, it is nearly impossible to obtain any detailed information on secondary structure propensities. Thus, the region populations have been quantified and are presented in Figure , where they are also compared to secondary structure estimates from the DSSP and the DSSP-PPII methods.

Figure 4

Ramachandran plots aggregated over the full sequence of the five main peptides from simulations using four different force fields. The plots are normalized for a maximum intensity of 1.

Figure 5

Stacked histograms of average secondary structure of simulated peptides obtained from three different methods: M1 = DSSP, M2 = DSSP-PPII, and M3 = Ramachandran populations of the regions defined in Figure . We note that while M1 and M2 refer to assessments of secondary structure, M3 only reports on the sampling of regions of the Ramachandran map of each individual residue independent of the conformations of its neighbors.

Ramachandran plots aggregated over the full sequence of the five main peptides from simulations using four different force fields. The plots are normalized for a maximum intensity of 1. Stacked histograms of average secondary structure of simulated peptides obtained from three different methods: M1 = DSSP, M2 = DSSP-PPII, and M3 = Ramachandran populations of the regions defined in Figure . We note that while M1 and M2 refer to assessments of secondary structure, M3 only reports on the sampling of regions of the Ramachandran map of each individual residue independent of the conformations of its neighbors. Comparisons of the average secondary structure content of each simulated peptide using three different methods are depicted in Figure . The DSSP analysis suggested that all peptides were fully disordered and dominated by bends, coils, and turns. Further investigation using DSSP-PPII did, however, reveal that approximately 10–35% of the secondary structure content was, in fact, PPII structure for most of the peptides. The P13 peptide was found to possess even more PPII structure (∼50–80%), which is reasonable since it is expected to mainly possess PPII structure in aqueous solution.[68,69] By comparing the secondary structure contents obtained from the different force fields using the DSSP-PPII method, A99SB-disp was found to sample more PPII structure than the other force fields for all peptides (except P13) and A99SB-disp and C36IDPSFF sampled slightly more α/310/π-helical and β-sheet/bridge content. A7 and P13 possessed no other secondary structure elements according to this analysis. The amount of unstructured content (coil/bend/turn) was highest for C36m in the case of P13. No other obvious secondary structure propensities and trends were discerned. Average populations of different regions of the Ramachandran map (corresponding to typical dihedral angles in different secondary structure elements) were also estimated from all of the simulations. Although this method was able to identify angles corresponding to PPII, it also showed large populations in the α-helical and β-sheet regions, which were not observed in the DSSP and the DSSP-PPII analyses. This is not surprising since the Ramachandran map includes all angles regardless of position, whereas secondary structures need several consecutive amino acids with the same classification for them to register as a secondary structure. The Ramachandran analysis indicated that C36IDPSFF provided less sampling in the structured regions for all peptides except P13, whereas A99SB-ILDN sampled larger populations in the β-sheet region than any other force field. Similar to what was observed in the DSSP-PPII analysis, A99SB-disp and C36IDPSFF were found to have larger sampling in the α-helical regions than the other two force fields. Each simulated peptide was investigated with DSSP-PPII to identify the average secondary structure content per amino acid residue (Figure ). All force fields gave similar secondary structure profiles for A7 and P-113, although a small difference was observed between the Amber and the CHARMM force fields. In P-113 the largest PPII content was centered around Arg-3 and Arg-9, and the lowest PPII content was found around Gly-6. The secondary structure profiles of P13 were similar for all force fields except for C36IDPSFF, where the PPII content was lower. For the Pep3 simulations, the largest PPII content was centered around Ala-3 and Pro-7 for all force fields. A small increase in PPII content was also observed around Asn-11 in Pep3 for all force fields except C36IDPSFF. The PPII content followed a sharper decrease toward the C-terminus in the simulations with A99SB-ILDN and C36IDPSFF, and the lowest PPII content was centered around Gly-5. The α/310/π-helical content in Pep3 was found mainly around Val-10 except for in the C36IDPSFF simulation, where a larger portion was centered around Lys-4. For Hst5, the largest PPII content was found around Lys-5 and Lys-13 for all force fields, in addition to a smaller peak around Lys-17. A99SB-ILDN and C36m had their α/310/π-helical content around Tyr-10, whereas it was located closer to the termini in the C36IDPSFF simulations and around His-19 in the A99SB-disp simulations. The simulation of Hst5 with A99SB-disp also gave rise to a low amount of α/310/π-helical content throughout most of the peptide. The β-sheet/bridge content in Hst5 was found around Lys-13 and Arg-22 in the Amber simulations but was more randomly distributed in the CHARMM simulations.

Figure 6

Stacked secondary structure histograms per amino acid residue of simulated peptides as obtained from DSSP-PPII algorithm.

Stacked secondary structure histograms per amino acid residue of simulated peptides as obtained from DSSP-PPII algorithm. Differences between the force fields were further analyzed for Hst5. Cluster analysis was performed for each individual force field, as well as for a concatenated trajectory in which all four force fields were included. The analysis was done using a root-mean-square-deviation (RMSD) cutoff of 0.5 nm. This value was chosen by examining the total population of the eight first clusters and varying the cluster radius (in steps of 0.05 nm) until their total population was closest to 50%. Additionally, using the same cutoff for all force fields made it easier to compare them. We do, however, note that because clustering methods compare “central structures”, they are not optimal to use for analyzing flexible peptides. Therefore, the number of clusters and their sizes that are presented here are not representations of the “truth”—they are simply used to compare the conformational sampling of the force fields. The representative structures of the top eight conformational clusters of Hst5 with the four force fields, as well as the force field mix, are presented in Figure . Visual inspection of the representative structures immediately reveals that the first cluster conformers are different for the different force fields. Comparing the combined percentage sizes of the top eight conformation clusters gives some indication of the relative variability of the conformations sampled by the four different force fields. A higher value means that there are fewer conformations sampled in the remaining clusters, which suggests a lower degree of conformational variability. By this reasoning, of the four selected force fields, A99SB-disp provides the smallest amount of conformational variability with the remaining three force fields being comparable.

Figure 7

Representative structures of top eight cluster conformations of Hst5 as simulated with the four different force fields (A–D), as well as from a mixture of the four force fields (Mix). The total percentages of the top eight clusters are given above the structures, and the relative size of each individual cluster is given below each structure. To get a more quantitative comparison among the force fields, their trajectories were concatenated, followed by a new cluster analysis where each structure could be traced back to its individual force field. The relative cluster populations of the individual force fields in the top eight clusters are illustrated in Figure . Although all force fields are represented in each cluster, they are not evenly distributed. For example, the first cluster is dominated by C36IDPSFF, whereas the second cluster mainly contains conformations from A99SB-disp and C36m. The fifth cluster is the most evenly distributed cluster across the force fields, and the sixth cluster is heavily dominated by A99SB-ILDN. From this analysis it is safe to say that, although the average properties of different force fields may be similar, the force fields’ individual conformational ensembles are rather different, which naturally leads to different secondary structure contents.

Figure 8

Weighted cluster population from the individual force fields in the top eight clusters of Hst5.

CD Prediction Using SESCA

Since the four force fields give rise to different conformational ensembles, one may reasonably ask which of the force fields is more representative of the real conformational ensemble in solution. To answer this question, we may attempt to compare the simulations to experimental data. This ideally requires a forward model to predict the experimental observables from an ensemble of structures. We here used data from CD spectroscopy, as CD is highly sensitive of secondary structure composition, and we used SESCA[59] as a forward model. Experimental data for A7, P-113, and Hst5 were obtained from Graf et al.,[63] Han et al.,[70] and Jephthah et al.,[48] respectively. Unfortunately, and as also noted for other IDPs in the papers by Fagerberg et al.[71] and Gopal et al.,[72] it was not possible to obtain a meaningful agreement between the experimental CD spectra and those predicted by SESCA (Figure ). This can be due to the fact that the main negative signature peak of a PPII helix may appear in experiments between 190 and 210 nm[73−75] because of non-secondary-structure contributions, while the spectral contribution associated with a PPII conformation in SESCA has a fixed position. Also, given the relative scarcity of PPII structure in folded proteins, it may be difficult to deconvolute its contribution when developing prediction methods for CD spectra. Also, a qualitative analysis based on the intensity of the main negative peak does not provide a decisive suggestion of what force field may be the most reliable. This is also complicated by some intensity scaling that may be needed to take into account uncertainty in the estimate of the concentration of the sample used for the experimental CD data. At this stage it is not clear if the source of the problem may be the force fields’ inaccuracy, finite sampling, or the inaccuracy of the CD calculation for these kinds of systems.

Figure 9

Comparison of experimental CD spectra with the ones predicted with SESCA from the conformational ensembles produced employing different force fields. The y-axes show the ellipticity, θ (deg cm2/dmol).

Comparing A7 Simulations to Experimental NMR Scalar Couplings

A perhaps more accurate way of comparing simulations and experiments of flexible proteins and peptides is by investigating NMR scalar couplings. Since experimental J-coupling constants are available in the literature for A7[63] and J-couplings are more commonly used in comparison with computational models, we decided to calculate J-couplings from our A7 simulations (see Methods). Simulations with A99SB-disp show the best agreement with the experimental data, followed by C36IDPSFF, while C36m and A99SB-ILDN give rise to a less good fit with the experimental data (Table ). This result may in part reflect that the target data for optimizing A99SB-disp included the experimental J-couplings for Ala5.[38] With 39 scalar couplings used to calculate the χ2 values, it appears that the deviations observed in C36m and A99SB-ILDN are greater than what would be expected by chance.

Table 5

Result of Comparison between the Experimental J-Coupling Constants of A7 and Those Calculated from the Conformational Ensembles Resulting from the Different Force Fields

FF	χ²
A	19.8
B	98.1
C	39.6
D	53.1

We examined in more detail the 3JHNH and 2JNC couplings, as 3JHNH can help discriminate the ϕ dihedral regions of β and αr/PPII elements, and 2JNC can help discriminate the ψ dihedral regions of β/PPII and αr.[63] We observe that the ψ angle distributions are relatively similar, with a strong peak corresponding to the β/PPII region, and the agreement with 2JNC is evenly good (Figure c,d). More differences can be observed for 3JHNH, where especially A99SB-ILDN and C36m show a lower population in the αr/PPII (4–7 Hz[76]) and a higher population in an unclassified region, resulting in a lower agreement with the experimental 3JHNH (Figure a,b).

Figure 10

Overview of two J-coupling constants predicted from A7 simulations. (a) 3JHNH and (c) 2JNC as a function of the underlying angles are represented as solid black lines (left y-axes), the experimental couplings (average over residues) are represented as dashed gray lines (left y-axes), and the dihedral angle distributions from the different force fields are represented in different colors (right y-axes). (b) and (d) show the experimental couplings (with errors used for the calculation of the χ2 associated with the parametrizations of the Karplus relationship) as solid black lines, and the predicted couplings from the different force fields are represented with different colors.

Effect of Proline Residue Content

A few variants of P13, V1–V4 (see Table ), were investigated to see how the Pro content affected the PPII propensities, as estimated by how much the residues sampled the PPII regions of the Ramachandran maps. Figure shows the PPII content as a function of the number of Pro residues in P13 and the peptide variants. All force fields yielded significant correlation (p < 0.05, see Table ) between the PPII content and the number of Pro residues for P13 and the chosen variants, where an increased number of Pro residues provided a larger PPII content. The slopes of these trends in the linear regression, however, differ depending on what force field was used, with C36m having the smallest increase and A99SB-ILDN having the largest increase. Furthermore, A99SB-ILDN was the force field that provided the strongest correlation (see Table ).

Figure 11

PPII helix content as obtained from DSSP-PPII analysis as a function of total number of Pro residues. Values from simulations of P13, V1, V2, V3, and V4, were used for this investigation.

Table 6

Linear Regression Statistics of Figure a

	m	c	r²	p
A	2.076	50.691	0.948	0.005
B	3.561	30.322	0.983	0.001
C	3.181	39.218	0.973	0.002
D	1.738	27.875	0.870	0.021

Slope (m), intercept (c), coefficient of determination (r2), and probability value (p).

Slope (m), intercept (c), coefficient of determination (r2), and probability value (p). PPII helix content as obtained from DSSP-PPII analysis as a function of total number of Pro residues. Values from simulations of P13, V1, V2, V3, and V4, were used for this investigation. V2.2 (see Table ) was simulated using the A99SB-ILDN force field to investigate if the PPII content is affected by the relative position of the Pro residues in the amino acid sequence. The average PPII contents of V2 and V2.2 from the DSSP-PPII analysis were found to be essentially the same: 56 and 55%, respectively. Although the PPII content at first did not seem to be affected by the patterning of the Pro residues, histograms of the secondary structure per amino acid residue of V2 and V2.2 (Figure ) revealed that the PPII content is significantly more localized to the Pro residues in V2.2, whereas it was more evenly distributed in V2. However, further investigation is needed to fully characterize this trend. It is, for example, necessary to study the effect of patterning of Pro in the other P13 variants as well. It would also be of interest to see how the trend is affected by the peptide length.

Figure 12

Stacked secondary structure histograms per amino acid residue of (a) V2 and (b) V2.2 as obtained from the DSSP-PPII algorithm.

Conclusions

In this study we have evaluated the differences among four different force fields in simulations of five short peptides with varying PPII propensities. All force fields gave similar ensemble averages of the radius of gyration, although the averages by the C36IDPSFF force field were generally slightly smaller compared to the other force fields. All force fields appeared to sample comparable regions of conformational spaces (as probed by the first two principal components) for each individual peptide, although with slightly different probabilities. Similarly, all force fields sampled the PPII structure, but to different extents. Additionally, some force fields were more prone to sampling other secondary structure elements. For example, A99SB-disp and C36IDPSFF sampled more α/310/π-helical and β-sheet/bridge content, C36m sampled less structured content than the other force fields, and A99SB-disp often had the highest PPII content. Direct comparison by conformational clustering revealed that the force fields have a bias toward different conformational clusters. CD prediction using SESCA was performed to examine which force field provided a more accurate conformational ensemble. Unfortunately, the method was not able to match the predicted and experimental spectra. We also calculated scalar couplings and compared them to experimental results for A7 and found two force fields (A99SB-disp and C36IDPSFF) that gave agreement roughly within expected error. We note that the calculations of both CD and scalar couplings contain contributions from all types of local structures and do not solely report on the accuracy of the PPII content. Finally, we investigated the effect of Pro residue content on the PPII content of short peptides containing only Ala and Pro, and we observed a correlation between the number of Pro residues in the amino acid sequence and the PPII content. We conclude by highlighting that we need better methods to calculate experimental observables that are sensitive to secondary structure preferences for flexible peptides. Such methods are often parametrized using folded protein structures and, thus, may be difficult to apply or inaccurate for disordered peptides and proteins.[77] In particular, we stress the need for better methods to link populations of PPII-like structures in simulations to a broader range of experimental observables and note that NMR chemical shifts can also be used for this purpose.[78,79] This is especially needed for simulations of proteins in which PPII might have a significant role, such as for example Hst5, SH3-binding peptides, and collagen.

61 in total

1. Angular dependence of 1J(Ni,Calphai) and 2J(Ni,Calpha(i-1)) coupling constants measured in J-modulated HSQCs.

Authors: Julia Wirmer; Harald Schwalbe
Journal: J Biomol NMR Date: 2002-05 Impact factor: 2.835

2. Protein simulations with an optimized water model: cooperative helix formation and temperature-induced unfolded state collapse.

Authors: Robert B Best; Jeetain Mittal
Journal: J Phys Chem B Date: 2010-11-01 Impact factor: 2.991

Review 3. The SH3 domain--a family of versatile peptide- and protein-recognition module.

Authors: Tomonori Kaneko; Lei Li; Shawn S-C Li
Journal: Front Biosci Date: 2008-05-01

4. Residue-specific force field based on the protein coil library. RSFF1: modification of OPLS-AA/L.

Authors: Fan Jiang; Chen-Yang Zhou; Yun-Dong Wu
Journal: J Phys Chem B Date: 2014-05-28 Impact factor: 2.991

5. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors: W Kabsch; C Sander
Journal: Biopolymers Date: 1983-12 Impact factor: 2.505

6. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB.

Authors: James A Maier; Carmenza Martinez; Koushik Kasavajhala; Lauren Wickstrom; Kevin E Hauser; Carlos Simmerling
Journal: J Chem Theory Comput Date: 2015-07-23 Impact factor: 6.006

7. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles.

Authors: Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell
Journal: J Chem Theory Comput Date: 2012-07-18 Impact factor: 6.006

Review 8. Collagen structure and stability.

Authors: Matthew D Shoulders; Ronald T Raines
Journal: Annu Rev Biochem Date: 2009 Impact factor: 23.643

9. Factors Affecting the Stabilization of Polyproline II Helices in a Hydrophobic Environment.

Authors: Nicola Zanna; Lorenzo Milli; Benedetta Del Secco; Claudia Tomasini
Journal: Org Lett Date: 2016-03-23 Impact factor: 6.005

10. Antifungal Activity and Action Mechanism of Histatin 5-Halocidin Hybrid Peptides against Candida ssp.

Authors: Juhye Han; Md Anirban Jyoti; Ho-Yeon Song; Woong Sik Jang
Journal: PLoS One Date: 2016-02-26 Impact factor: 3.240

2 in total

1. Changes in the Local Conformational States Caused by Simple Na⁺ and K⁺ Ions in Polyelectrolyte Simulations: Comparison of Seven Force Fields with and without NBFIX and ECC Corrections.

Authors: Natalia Lukasheva; Dmitry Tolmachev; Hector Martinez-Seara; Mikko Karttunen
Journal: Polymers (Basel) Date: 2022-01-08 Impact factor: 4.329

2. Self-Diffusive Properties of the Intrinsically Disordered Protein Histatin 5 and the Impact of Crowding Thereon: A Combined Neutron Spectroscopy and Molecular Dynamics Simulation Study.

Authors: Eric Fagerberg; Samuel Lenton; Tommy Nylander; Tilo Seydel; Marie Skepö
Journal: J Phys Chem B Date: 2022-01-19 Impact factor: 2.991

2 in total