| Literature DB >> 20823312 |
Marcin J Mizianty1, Wojciech Stach, Ke Chen, Kanaka Durga Kedarisetti, Fatemeh Miri Disfani, Lukasz Kurgan.
Abstract
MOTIVATION: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20823312 PMCID: PMC2935446 DOI: 10.1093/bioinformatics/btq373
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
List of input information sources used by disorder predictors based on machine learning classifiers (sorted by the year of publication)
| Prediction method | Input information | Meta predictor | Reference | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AA type | AA propensity | AA position | PSSM profile | SS prediction | Solvent accessi- bility prediction | Terminus indicator | |||
| GS-metaDisorder | X | X | X | X | X | Unpublished data | |||
| MD | X | X | X | X | X | Schlessinger | |||
| metaPrDOS | X | Ishida and Kinoshita ( | |||||||
| DISpro | X | X | X | Hecker | |||||
| OnD-CRF | X | X | Wang and Sauer ( | ||||||
| POODLE-S | X | X | Shimizu | ||||||
| POODLE-L | X | X | Hirose | ||||||
| POODLE-W | X | X | Shimizu | ||||||
| DisPSSMP2 | X | X | Su | ||||||
| PrDOS | X | X | Ishida and Kinoshita ( | ||||||
| MULTICOM-CMFR | X | X | X | X | Cheng | ||||
| DISOPRED2 | X | X | Ward | ||||||
Fig. 1.Architecture of the MFDp method.
Comparison of predictive quality measured on the MxD and CASP8 datasets when considering all disordered regions
| Dataset | Predictor | MCC | ACC | SENS | SPEC | AUC | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Value | SD | Signif- | Value | SD | Signif- | Value | SD | Value | SD | Value | SD | Value | SD | Signif- | ||
| icance | icance | icance | ||||||||||||||
| MxD | MFDp | ±0.02 | ±0.01 | ±0.01 | 0.76 | ±0.01 | 0.75 | ±0.01 | ±0.01 | |||||||
| MD | 0.43 | ±0.02 | + | 0.48 | ±0.01 | + | 0.74 | ±0.01 | 0.68 | ±0.01 | 0.80 | ±0.01 | ±0.01 | = | ||
| 0.40 | ±0.02 | + | 0.44 | ±0.02 | + | 0.72 | ±0.01 | 0.66 | ±0.01 | 0.78 | ±0.01 | 0.78 | ±0.01 | + | ||
| 0.39 | ±0.02 | + | 0.42 | ±0.01 | + | 0.71 | ±0.01 | 0.60 | ±0.01 | 0.82 | ±0.01 | 0.78 | ±0.01 | + | ||
| 0.37 | ±0.01 | + | 0.38 | ±0.01 | + | 0.69 | ±0.01 | 0.53 | ±0.01 | ±0.01 | 0.78 | ±0.01 | + | |||
| NORSnet | 0.34 | ±0.02 | + | 0.37 | ±0.02 | + | 0.68 | ±0.01 | 0.55 | ±0.02 | 0.81 | ±0.01 | 0.74 | ±0.01 | + | |
| 0.33 | ±0.01 | + | 0.40 | ±0.01 | + | 0.70 | ±0.01 | 0.78 | ±0.01 | 0.62 | ±0.01 | 0.77 | ±0.01 | + | ||
| Ucon | 0.31 | ±0.01 | + | 0.34 | ±0.01 | + | 0.67 | ±0.01 | 0.57 | ±0.01 | 0.77 | ±0.01 | 0.74 | ±0.01 | + | |
| 0.19 | ±0.01 | + | 0.22 | ±0.01 | + | 0.61 | ±0.00 | ±0.01 | 0.38 | ±0.01 | 0.69 | ±0.01 | + | |||
| CASP8 | MFDp | ±0.06 | 0.63 | ±0.06 | 0.82 | ±0.03 | 0.68 | ±0.06 | 0.95 | ±0.00 | 0.89 | ±0.02 | ||||
| 379 | 0.59 | ±0.06 | + | 0.65 | ±0.05 | – | 0.82 | ±0.03 | 0.71 | ±0.05 | 0.94 | ±0.00 | 0.91 | ±0.02 | – | |
| 0.59 | ±0.06 | + | 0.61 | ±0.06 | + | 0.80 | ±0.03 | 0.65 | ±0.06 | 0.95 | ±0.00 | 0.88 | ±0.02 | + | ||
| 297 | 0.57 | ±0.05 | + | 0.66 | ±0.05 | – | ±0.02 | 0.74 | ±0.05 | 0.92 | ±0.00 | 0.90 | ±0.02 | – | ||
| 97 | 0.56 | ±0.05 | + | 0.65 | ±0.05 | – | 0.82 | ±0.02 | 0.73 | ±0.05 | 0.92 | ±0.00 | ±0.02 | – | ||
| 153 | 0.55 | ±0.05 | + | ±0.05 | – | ±0.02 | 0.76 | ±0.05 | 0.90 | ±0.00 | ±0.02 | – | ||||
| 0.54 | ±0.06 | + | 0.52 | ±0.06 | + | 0.76 | ±0.03 | 0.56 | ±0.06 | 0.96 | ±0.00 | 0.85 | ±0.03 | + | ||
| 0.53 | ±0.09 | + | 0.45 | ±0.09 | + | 0.73 | ±0.05 | 0.48 | ±0.09 | ±0.00 | 0.82 | ±0.03 | + | |||
| 69 | 0.51 | ±0.05 | + | 0.66 | ±0.04 | – | ±0.02 | 0.80 | ±0.04 | 0.86 | ±0.00 | 0.90 | ±0.02 | – | ||
| NORSnet | 0.48 | ±0.12 | + | 0.37 | ±0.11 | + | 0.69 | ±0.06 | 0.39 | ±0.11 | ±0.00 | 0.79 | ±0.04 | + | ||
| 0.42 | ±0.05 | + | 0.59 | ±0.04 | + | 0.80 | ±0.02 | 0.78 | ±0.04 | 0.82 | ±0.01 | 0.86 | ±0.02 | + | ||
| MD | 0.42 | ±0.06 | + | 0.56 | ±0.06 | + | 0.78 | ±0.03 | 0.71 | ±0.06 | 0.85 | ±0.01 | 0.85 | ±0.03 | + | |
| Ucon | 0.29 | ±0.06 | + | 0.34 | ±0.07 | + | 0.67 | ±0.03 | 0.47 | ±0.07 | 0.87 | ±0.00 | 0.74 | ±0.04 | + | |
| 0.19 | ±0.03 | + | 0.31 | ±0.03 | + | 0.65 | ±0.01 | ±0.03 | 0.45 | ±0.01 | 0.78 | ±0.03 | + | |||
We report the averages and corresponding SDs for bootstrapping with 1000 repetitions of 80% of chains. Underlined methods are used as inputs for MFDp. The methods are sorted by the MCC values and the highest values are shown in bold. Results of tests of significance of the differences between MFDp and the other methods are given in the ‘significance’ columns. The tests compare average values from 1000 bootstrapping repetitions. The + and – mean that MFDp is statistically significantly better/worse with P < 0.01, and = means that results are not significantly different.
Comparison of predictive quality measured on the MxD and CASP8 datasets when considering disordered regions that are ≥30-residues long
| Dataset | Predictor | MCC | ACC | SENS | SPEC | AUC | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Value | SD | Signif- | Value | SD | Signif- | Value | SD | Value | SD | Value | SD | Value | SD | Signif- | ||
| icance | icance | icance | ||||||||||||||
| MxD | MFDp | ±0.02 | ±0.02 | ±0.01 | 0.77 | ±0.01 | 0.75 | ±0.01 | ±0.01 | |||||||
| MD | ±0.02 | + | 0.50 | ±0.02 | + | 0.75 | ±0.01 | 0.70 | ±0.01 | 0.80 | ±0.01 | ±0.01 | = | |||
| 0.41 | ±0.02 | + | 0.45 | ±0.01 | + | 0.72 | ±0.01 | 0.63 | ±0.01 | 0.82 | ±0.01 | 0.79 | ±0.01 | + | ||
| 0.40 | ±0.02 | + | 0.46 | ±0.02 | + | 0.73 | ±0.01 | 0.67 | ±0.02 | 0.78 | ±0.01 | 0.78 | ±0.01 | + | ||
| NORSnet | 0.37 | ±0.02 | + | 0.41 | ±0.02 | + | 0.70 | ±0.01 | 0.59 | ±0.02 | 0.81 | ±0.01 | 0.76 | ±0.01 | + | |
| 0.37 | ±0.01 | + | 0.38 | ±0.01 | + | 0.69 | ±0.01 | 0.53 | ±0.01 | ±0.01 | 0.78 | ±0.01 | + | |||
| 0.33 | ±0.01 | + | 0.40 | ±0.01 | + | 0.70 | ±0.01 | 0.79 | ±0.01 | 0.62 | ±0.01 | 0.77 | ±0.01 | + | ||
| Ucon | 0.32 | ±0.01 | + | 0.37 | ±0.01 | + | 0.68 | ±0.01 | 0.60 | ±0.01 | 0.77 | ±0.01 | 0.75 | ±0.01 | + | |
| PROFbval | 0.19 | ±0.01 | + | 0.22 | ±0.01 | + | 0.61 | ±0.00 | ±0.01 | 0.38 | ±0.01 | 0.70 | ±0.01 | + | ||
| CASP8 | MFDp | ±0.10 | ±0.09 | 0.86 | ±0.04 | 0.78 | ±0.09 | 0.95 | ±0.00 | 0.90 | ±0.04 | |||||
| 0.59 | ±0.16 | = | 0.57 | ±0.16 | + | 0.79 | ±0.08 | 0.59 | ±0.17 | ±0.00 | 0.87 | ±0.05 | + | |||
| 0.59 | ±0.15 | = | 0.60 | ±0.15 | + | 0.80 | ±0.07 | 0.62 | ±0.15 | ±0.00 | 0.85 | ±0.06 | + | |||
| 0.58 | ±0.11 | + | 0.68 | ±0.10 | + | 0.84 | ±0.05 | 0.73 | ±0.10 | 0.95 | ±0.00 | 0.90 | ±0.04 | = | ||
| 379 | 0.56 | ±0.10 | + | 0.71 | ±0.08 | + | 0.86 | ±0.04 | 0.77 | ±0.08 | 0.94 | ±0.00 | ±0.03 | – | ||
| 297 | 0.54 | ±0.09 | + | ±0.08 | + | ±0.04 | 0.81 | ±0.08 | 0.92 | ±0.00 | 0.91 | ±0.04 | – | |||
| 0.53 | ±0.11 | + | 0.59 | ±0.10 | + | 0.79 | ±0.05 | 0.63 | ±0.10 | 0.96 | ±0.00 | 0.86 | ±0.05 | + | ||
| 97 | 0.52 | ±0.09 | + | 0.71 | ±0.08 | + | 0.85 | ±0.04 | 0.79 | ±0.08 | 0.92 | ±0.00 | ±0.03 | – | ||
| 153 | 0.50 | ±0.09 | + | 0.72 | ±0.08 | + | 0.86 | ±0.04 | 0.81 | ±0.08 | 0.90 | ±0.00 | 0.92 | ±0.03 | – | |
| 69 | 0.44 | ±0.08 | + | 0.69 | ±0.07 | + | 0.85 | ±0.03 | 0.83 | ±0.07 | 0.86 | ±0.00 | 0.91 | ±0.03 | – | |
| 0.37 | ±0.07 | + | 0.63 | ±0.07 | + | 0.81 | ±0.04 | 0.81 | ±0.07 | 0.82 | ±0.01 | 0.88 | ±0.04 | + | ||
| MD | 0.36 | ±0.09 | + | 0.58 | ±0.11 | + | 0.79 | ±0.05 | 0.73 | ±0.11 | 0.85 | ±0.01 | 0.86 | ±0.05 | + | |
| Ucon | 0.28 | ±0.09 | + | 0.41 | ±0.12 | + | 0.71 | ±0.06 | 0.54 | ±0.12 | 0.87 | ±0.00 | 0.77 | ±0.06 | + | |
| PROFbval | 0.16 | ±0.04 | + | 0.31 | ±0.05 | + | 0.66 | ±0.03 | ±0.05 | 0.45 | ±0.01 | 0.81 | ±0.06 | + | ||
We report the averages and corresponding SDs for bootstrapping with 1000 repetitions of 80% of chains. Underlined methods are used as inputs for MFDp. The methods are sorted by the MCC values and the highest values are shown in bold. Results of tests of significance of the differences between MFDp and the other methods are given in the ‘significance’ columns. The tests compare average values from 1000 bootstrapping repetitions. The + and − mean that MFDp is statistically significantly better/worse with P < 0.01, and = means that results are not significantly different.
Fig. 2.ROCs for the predictions on the (A) MxD and (B) CASP8 datasets.
Comparison of predictions of proteins with long (≥ 30 residues) disordered segments on the MxD datasets
| Predictor | MCC | ACC | SENS | SPEC | AUC |
|---|---|---|---|---|---|
| MFDp | 0.82 | 0.71 | |||
| Ucon | 0.52 | 0.73 | 0.53 | 0.85 | |
| 0.52 | 0.76 | 0.71 | 0.81 | 0.82 | |
| 0.52 | 0.75 | 0.63 | 0.87 | 0.83 | |
| 0.51 | 0.74 | 0.60 | 0.89 | 0.83 | |
| MD | 0.49 | 0.74 | 0.67 | 0.81 | 0.80 |
| NORSnet | 0.48 | 0.71 | 0.51 | 0.93 | 0.80 |
| 0.47 | 0.73 | 0.62 | 0.82 | ||
| PROFbval | 0.39 | 0.69 | 0.73 | 0.66 | 0.76 |
Underlined methods are used as inputs for MFDp. The methods are sorted by the MCC values and the highest values are shown in bold.
Fig. 3.ROCs for the predictions of proteins with long-disordered segments on the MxD dataset.
Fig. 4.Comparison of predictions from MFDp, DISOPRED2 (DP2), IUPREDL (IUPL), IUPREDS (IUPS) and DISOclust (DISOc), MD; and two CASP8 predictors with the highest MCC, McGuffin (379) and GeneSilicoMetaServer (297) for CASP8 targets T0480 (on the left) and T0404 (on the right). The ‘–’ and ‘D’ denote the ordered and disordered residues, respectively. The actual disorder annotations are shown in the first line.