| Literature DB >> 25559583 |
Fei Huang, Christopher J Oldfield, Bin Xue, Wei-Lun Hsu, Jingwei Meng, Xiaowen Liu, Li Shen, Pedro Romero, Vladimir N Uversky, A Dunker.
Abstract
BACKGROUND: The earliest whole protein order/disorder predictor (Uversky et al., Proteins, 41: 415-427 (2000)), herein called the charge-hydropathy (C-H) plot, was originally developed using the Kyte-Doolittle (1982) hydropathy scale (Kyte & Doolittle., J. Mol. Biol, 157: 105-132(1982)). Here the goal is to determine whether the performance of the C-H plot in separating structured and disordered proteins can be improved by using an alternative hydropathy scale.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25559583 PMCID: PMC4304195 DOI: 10.1186/1471-2105-15-S17-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The Order versus Disorder Prediction Performances of 19 Hydropathy Scales.
| Scales | Sens | Spec | Bal. Acc | AUC |
|---|---|---|---|---|
| Guy | 0.70 ± 0.16 | 0.97 ± 0.02 | 0.84 ± 0.09 | 0.90 ± 0.06 |
| Miyazawa | 0.70 ± 0.15 | 0.96 ± 0.02 | 0.83 ± 0.09 | 0.90 ± 0.11 |
| Manavalan | 0.70 ± 0.15 | 0.96 ± 0.03 | 0.83 ± 0.09 | 0.90 ± 0.07 |
| Sweet | 0.69 ± 0.14 | 0.97 ± 0.02 | 0.83 ± 0.09 | 0.91 ± 0.07 |
| Fauchere | 0.68 ± 0.13 | 0.97 ± 0.02 | 0.83 ± 0.08 | 0.88 ± 0.07 |
| Rose | 0.67 ± 0.17 | 0.97 ± 0.02 | 0.82 ± 0.09 | 0.91 ± 0.06 |
| Black | 0.64 ± 0.09 | 0.97 ± 0.02 | 0.81 ± 0.06 | 0.88 ± 0.06 |
| Woods | 0.61 ± 0.15 | 0.97 ± 0.03 | 0.79 ± 0.09 | 0.88 ± 0.06 |
| Breese | 0.64 ± 0.12 | 0.95 ± 0.04 | 0.80 ± 0.08 | 0.87 ± 0.08 |
| Leo | 0.61 ± 0.12 | 0.96 ± 0.03 | 0.79 ± 0.08 | 0.86 ± 0.08 |
| Kyte-Doolittle | 0.61 ± 0.16 | 0.96 ± 0.03 | 0.79 ± 0.09 | 0.87 ± 0.10 |
| Roseman | 0.56 ± 0.16 | 0.96 ± 0.02 | 0.76 ± 0.09 | 0.86 ± 0.08 |
| Chothia | 0.55 ± 0.13 | 0.96 ± 0.03 | 0.76 ± 0.08 | 0.88 ± 0.05 |
| Argos | 0.54 ± 0.10 | 0.97 ± 0.03 | 0.76 ± 0.06 | 0.85 ± 0.06 |
| Janin | 0.52 ± 0.16 | 0.96 ± 0.02 | 0.74 ± 0.09 | 0.86 ± 0.06 |
| Tanford | 0.49 ± 0.14 | 0.96 ± 0.03 | 0.73 ± 0.08 | 0.86 ± 0.09 |
| Eisenberg | 0.48 ± 0.19 | 0.96 ± 0.03 | 0.72 ± 0.11 | 0.85 ± 0.05 |
| Welling | 0.40 ± 0.14 | 0.97 ± 0.03 | 0.69 ± 0.09 | 0.79 ± 0.07 |
| Wolfenden | 0.36 ± 0.11 | 0.97 ± 0.02 | 0.67 ± 0.07 | 0.79 ± 0.06 |
For equations and explanations, see the Methods section at the end of this manuscript:
Sens: Sensitivity
Spec: Specificity
Bal. Acc: Balanced accuracy (average of sensitivity and specificity)
AUC: Area under the curve
The Order versus Disorder Prediction Performances of 19 Hydropathy Scales Measured by Other Metrics.
| Scales | F | MCC | PPV | NPV |
|---|---|---|---|---|
| Guy | 0.75 ± 0.12 | 0.71 ± 0.13 | 0.82 ± 0.10 | 0.94 ± 0.03 |
| Miyazawa | 0.74 ± 0.11 | 0.70 ± 0.12 | 0.80 ± 0.10 | 0.94 ± 0.03 |
| Manavalan | 0.74 ± 0.11 | 0.70 ± 0.12 | 0.80 ± 0.10 | 0.94 ± 0.03 |
| Sweet | 0.74 ± 0.08 | 0.71 ± 0.08 | 0.83 ± 0.11 | 0.94 ± 0.03 |
| Fauchere | 0.74 ± 0.08 | 0.70 ± 0.09 | 0.83 ± 0.12 | 0.94 ± 0.02 |
| Rose | 0.73 ± 0.12 | 0.70 ± 0.13 | 0.82 ± 0.09 | 0.94 ± 0.03 |
| Black | 0.71 ± 0.05 | 0.67 ± 0.06 | 0.81 ± 0.12 | 0.93 ± 0.02 |
| Woods | 0.68 ± 0.12 | 0.64 ± 0.13 | 0.78 ± 0.12 | 0.93 ± 0.03 |
| Breese | 0.68 ± 0.11 | 0.63 ± 0.13 | 0.75 ± 0.15 | 0.93 ± 0.02 |
| Leo | 0.68 ± 0.10 | 0.64 ± 0.12 | 0.79 ± 0.15 | 0.93 ± 0.02 |
| Kyte-Doolittle | 0.67 ± 0.13 | 0.63 ± 0.14 | 0.78 ± 0.14 | 0.93 ± 0.03 |
| Roseman | 0.64 ± 0.15 | 0.59 ± 0.17 | 0.75 ± 0.15 | 0.92 ± 0.03 |
| Chothia | 0.63 ± 0.11 | 0.59 ± 0.13 | 0.77 ± 0.15 | 0.92 ± 0.03 |
| Argos | 0.63 ± 0.09 | 0.59 ± 0.10 | 0.78 ± 0.13 | 0.92 ± 0.02 |
| Janin | 0.59 ± 0.14 | 0.55 ± 0.12 | 0.74 ± 0.11 | 0.91 ± 0.03 |
| Tanford | 0.57 ± 0.14 | 0.53 ± 0.14 | 0.72 ± 0.15 | 0.91 ± 0.02 |
| Eisenberg | 0.56 ± 0.16 | 0.53 ± 0.18 | 0.74 ± 0.20 | 0.91 ± 0.03 |
| Welling | 0.50 ± 0.15 | 0.48 ± 0.13 | 0.78 ± 0.19 | 0.89 ± 0.02 |
| Wolfenden | 0.46 ± 0.11 | 0.43 ± 0.13 | 0.69 ± 0.15 | 0.89 ± 0.02 |
For equations and explanations, see the Methods section at the end of this manuscript:
F: the F1 score
MCC: Matthew Correlation Coefficient
PPV: Positive Predictive Values
NPV: Negative Predictive Values
A comparison of 3 hydropathy scales.
| IDP-Hydropathy scale | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Residue | W | Y | I | F | C | L | V | M | N | T |
| 10.66 | 6.64 | 6.19 | 5.79 | 5.62 | 5.17 | 4.64 | 2.49 | 2.06 | 1.22 | |
| 0.91 | 0.07 | 0.02 | -0.48 | -1.23 | -1.84 | 2.18 | -2.20 | -2.43 | -3.89 | |
| -0.51 | -0.21 | -1.13 | -2.12 | -1.42 | -1.18 | -1.27 | -1.59 | 0.48 | 0.07 | |
| 0.10 | 1.91 | 0.33 | 0.78 | 0.83 | 0.52 | -0.50 | 0.95 | 1.40 | 0.73 | |
| -0.90 | -1.30 | 4.50 | 2.80 | 2.50 | 3.80 | 4.20 | 1.90 | -3.5 | -0.70 | |
| 1.80 | -4.50 | -0.40 | -3.50 | -3.50 | -0.80 | -3.20 | -3.50 | -3.90 | -1.60 | |
Figure 1Charge-Hydropathy plots. In (A) the IDP-Hydropathy scale was used, in (B) the Guy (1985) Hydropathy scale was used, and in (C) the Kyte-Doolittle (1981) hydropathy scale was used. Red circles indicate disordered proteins, blue circles indicate structured proteins. For these plots, each scale was normalized to be in the interval of 0 to 1. The Guy's scale is multiplied by -1 prior to normalization to conform to the energy rule set by Kyte-Doolittle scale. In (A) the function describing the boundary is:
Figure 2Correlation coefficients between IDP-Hydropathy and AAindex clusters. H: Hydrophobicity cluster B: β propensity cluster P: Physicochemical properties cluster C: Composition cluster O: Other properties cluster A: α and turn propensities
Mean, median, standard deviation, max, and min of |r| and AAindex in each cluster.
| Cluster | Mean | Median | Std | Max | Min |
|---|---|---|---|---|---|
| H | 0.75 | 0.75 | 0.07 | 0.86 | 0.60 |
| B | 0.72 | 0.72 | 0.08 | 0.82 | 0.58 |
| P | 0.49 | 0.49 | 0.16 | 0.76 | 0.23 |
| C | 0.43 | 0.36 | 0.13 | 0.61 | 0.31 |
| O | 0.23 | 0.21 | 0.06 | 0.32 | 0.18 |
| A | 0.08 | 0.05 | 0.08 | 0.28 | 0.01 |
H: Hydropathy cluster
B: β propensity cluster
P: Physicochemical properties cluster
C: Composition cluster
O: Other properties cluster
A: α and turn propensities cluster
Figure 3Comparing IDP-Hydropathy scale against Guy's scale (A) and Kyte-Doolittle's scale (B). Each letter is the one letter code for an amino acid. Note that in Guy's scale (A), the measurement for free energy transfer adopted the opposite theme as compared to the Kyte-Doolittle scale. In Guy's scale, a positive value indicates hydrophilic, while in Kyte-Doolittle scale and IDP-Hydropathy, a positive value indicates hydrophobic. The r value is the correlation coefficient of the 2 scales compared.
IDP-Hydropathy scale performance compared to 4 disorder propensity scales, DisProt, TopIDP, FoldUnfold, and B-value.
| Method | Sens | Spec | Bal. acc | AUC | F | MCC | PPV | NPV |
|---|---|---|---|---|---|---|---|---|
| 0.81 ± 0.11 | 0.98 ± 0.02 | 0.90 ± 0.07 | 0.94 ± 0.05 | 0.84 ± 0.08 | 0.82 ± 0.09 | 0.89 ± 0.09 | 0.96 ± 0.02 | |
| 0.77 ± 0.12 | 0.97 ± 0.04 | 0.87 ± 0.08 | 0.94 ± 0.06 | 0.80 ± 0.08 | 0.77 ± 0.10 | 0.85 ± 0.14 | 0.96 ± 0.02 | |
| 0.76 ± 0.11 | 0.97 ± 0.02 | 0.87 ± 0.07 | 0.93 ± 0.04 | 0.79 ± 0.06 | 0.76 ± 0.06 | 0.84 ± 0.07 | 0.96 ± 0.02 | |
| 0.72 ± 0.12 | 0.97 ± 0.02 | 0.85 ± 0.07 | 0.91 ± 0.07 | 0.77 ± 0.10 | 0.73 ± 0.11 | 0.82 ± 0.11 | 0.95 ± 0.02 | |
| 0.70 ± 0.16 | 0.97 ± 0.02 | 0.84 ± 0.09 | 0.90 ± 0.06 | 0.75 ± 0.12 | 0.71 ± 0.13 | 0.82 ± 0.10 | 0.94 ± 0.03 | |
| 0.67 ± 0.14 | 0.98 ± 0.02 | 0.83 ± 0.08 | 0.91 ± 0.07 | 0.74 ± 0.11 | 0.71 ± 0.12 | 0.85 ± 0.10 | 0.94 ± 0.02 | |
| 0.61 ± 0.16 | 0.96 ± 0.03 | 0.79 ± 0.09 | 0.87 ± 0.10 | 0.67 ± 0.13 | 0.63 ± 0.14 | 0.78 ± 0.14 | 0.93 ± 0.03 |
The accuracy metrics for Guy and Kyte-Doolittle hydropathy scales are also presented as references.
For equations and explanations, see the Methods section at the end of this manuscript:
Sens: Sensitivity
Spec: Specificity
Bal. Acc: Balanced accuracy (average of sensitivity and specificity)
AUC: Area under the curve
F: the F1 score
MCC: Matthew Correlation Coefficient
PPV: Positive Predictive Values
NPV: Negative Predictive Values
VLXT and VSL2 per residue prediction over our entirely disordered/structured dataset.
| Disordered | 58% | ~ | 78% | ~ | |
| Structured | ~ | 78% | ~ | 74% | |
The entries are fraction of residues that are predicted disordered/structured over the whole disordered/structured dataset. For simplicity, only the diagonal entries for each predictor are shown.