| Literature DB >> 28821744 |
Daniele Raimondi1,2,3,4, Gabriele Orlando1,2,3,4, Rita Pancsa5, Taushif Khan1,3,4, Wim F Vranken6,7,8.
Abstract
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28821744 PMCID: PMC5562875 DOI: 10.1038/s41598-017-08366-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Average of leave-one-out stratified cross-validation (27 sets) performance of EFoldMine based on the 30 proteins available in Start2Fold (see Supplementary section 1 for information on the performance indicators).
| Parameter | Performance % |
|---|---|
| Sensitivity (Sen) | 73.1 |
| Specificity (Spe) | 75.2 |
| Accuracy (Acc) | 73.4 |
| Balanced Accuracy (Bac) | 74.1 |
| Precision (Pre) | 36.1 |
| Matthews Correlation Coefficient (MCC) | 35.4 |
| ROC Area Under the Curve (AUC), cutoff 0.169 | 80.8 |
| Best PPV, 10%, 5% | 45.6, 48.8 |
Figure 1Myoglobin and leghemoglobin in relation to early folding scores. Myoglobin (PDB: 1MYF) (a) cartoon representation with helices colored from of N to C terminal and (b) full per-residue EFoldMine prediction, with helix regions (A–H) indicated with colors as in a. Leghemoglobin (PDB:1BIN) (c) cartoon representation with helices colored from of N to C terminal and (b) full per-residue EFoldMine prediction, with helix regions (A–H) indicated with colors as in (c).
Figure 2Protein G and L in relation to early folding scores. (a) Protein G (PDB:2GB1) with secondary structure elements (E1–E4, H1) indicated, and (b) the corresponding distributions of the protein G early folding scores as box plots. (c) Protein L (PDB:2PTL) with secondary structure elements (E1-E4, H1) indicated, and (d) the corresponding distributions of the protein L early folding scores as box plots.
Figure 3The distribution of the EFoldMine predictions separated by residues experimentally identified by HDX Mass Spectrometry (MS). (a) Apo-Maltose binding protein and (b) the alpha subunit of tryptophan synthase. The separation by MS data for early (brown) and intermediate (purple) folding residues is shown, with the experimentally identified residues in dark shade, the remaining residues in light shade, and with the amino acid bias-corrected distributions included (no bias). The number of data points is indicated above each distribution, the p-value of the significance of the difference between two distributions below the compared distributions. The protein structures show the early folding (brown) and intermediate folding (purple) regions.
Figure 4The domain map of p27 Kip1 with the EFoldMine prediction. The blue shaded areas indicate known interaction sites, the red and green shaded areas within the cyclin A/CDK2 interacting region indicate helix and sheet forming segments, respectively.
Figure 5Comparison of early folding scores to structure-related data. The early folding prediction scores (black graph, top) indicate which residues in the sequences will form structure first through local interaction between amino acids (green circle, top). These predictions are compared to (a) the relative solvent accessibility and the contact-S2 calculated from 2939 non-redundant PDB structures, with significant differences between the distribution of their values for the early folding residues (green) and other residues (brown), and (b) residues with evolutionary co-variation signals (light blue), which have higher early folding prediction scores than other residues (dark blue).