| Literature DB >> 22624656 |
Lukasz P Kozlowski1, Janusz M Bujnicki.
Abstract
BACKGROUND: Intrinsically unstructured proteins (IUPs) lack a well-defined three-dimensional structure. Some of them may assume a locally stable structure under specific conditions, e.g. upon interaction with another molecule, while others function in a permanently unstructured state. The discovery of IUPs challenged the traditional protein structure paradigm, which stated that a specific well-defined structure defines the function of the protein. As of December 2011, approximately 60 methods for computational prediction of protein disorder from sequence have been made publicly available. They are based on different approaches, such as utilizing evolutionary information, energy functions, and various statistical and machine learning methods.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22624656 PMCID: PMC3465245 DOI: 10.1186/1471-2105-13-111
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Description of disorder predictors analyzed in this work
| DisEMBL | ANN trained to predict classic loops (DSSP), flexible loops with high B-factors, missing coordinates in X-ray structures, regions of low-complexity and prone to aggregation. | local installation | [ |
| DISOPRED2 | SVM trained to predict residues with missing coordinates. | local installation | [ |
| DISpro | Recursive neural networks (RNNs) trained to predict missing coordinates. | local installation | [ |
| GlobPlot | A simple method based on several hydrophobicity scales to predict regions of missing coordinates and loops with high B-factors. | local installation | [ |
| iPDA | Incorporates information about sequence conservation, predicted secondary structure, sequence complexity and hydrophobic clusters. | web service | [ |
| IUPred | Estimates pairwise interaction energies using a statistical potential. Two versions for predicting long and short disorder. | web service | [ |
| Pdisorder | Combination of neural network, linear discriminant function and acute smoothing procedure is used for recognition of disordered and ordered regions in proteins. | web service | [ |
| Poodle-s | SVM trained for short disorder detection (uses PSSMs generated by PSI-BLAST). | web service | [ |
| Poodle-l | Predicts long disorder using an SVM. | web service | [ |
| PrDOS | Predicts missing coordinates in 3D structure using SVM and PSSMs from PSI-BLAST. | web service | [ |
| Spritz | Predicts long and short disorder (missing coordinates) using two separate SVMs. Utilizes secondary structure. | web service | [ |
| RONN | Predicts missing coordinates using an ANN. | local installation | [ |
Description of fold recognition methods used by MetaDisorder
| PSI-BLAST | Position-Specific Iterated BLAST uses position-specific scoring matrices derived during the search of the nr database | local installation | [ |
| FFAS | Profile-profile alignment and fold-recognition algorithm for fold and function assignment | local installation | [ |
| mGenThreader | The method combines profile-profile alignments with secondary-structure specific gap-penalties, classic pair- and solvation potentials using a linear combination optimized with a regression SVM model | local installation | [ |
| HHsearch | Generalizes the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs | local installation | [ |
| PCONS | A neural-network-based consensus predictor | local installation | [ |
| PHYRE | An algorithm that uses profile-profile and secondary structure matching algorithm | web service | [ |
Summary of the datasets employed in this study
| Number of proteins | 566 | 1147 | 122 |
| Number of residues in disordered regions | 54,570 (23.45%) | 18,146 (6.28%) | 3,068 (11.11%) |
| Number of residues in ordered regions | 178,094 (76.55%) | 270,862 (93.72%) | 24,546 (88.89%) |
| Total number of residues | 232,664 | 289,008 | 27,614 |
Thresholds used in fold recognition programs for classification of potentially good, medium and poor alignments
| Method | Good | Medium | Poor |
| PSI-BLAST* | < 2e-06 | < 0.023 | > 0.023 |
| FFAS | <−34.5 | < −8.5 | > − 8.5 |
| MGenThreader | > 0.65 | > 0.546 | < 0.546 |
| HHsearch* | >95 | >80 | <80 |
| PCONS | > 2.17 | > 1.03 | < 1.03 |
| PHYRE | < 0.085 | < 0.27 | > 0.27 |
* - the same score was used regardless of the database.
Figure 1MetaDisorder web-server interface.a) user-friendly web interface – main plot part can be easily zoomed in and out, results reported by all primary methods can be downloaded in the CASP format. b) simple text output format suitable for machine processing.
Performance of disorder prediction on the combined pdbRemark465, CASP7 and Disprot dataset
| Method | Sw | MCC | AUC |
| FloatCons | 0.475 | ||
| BinCons | 0.599 | 0.843 ± 0.003 | |
| iPDA | 0.555 | 0.419 | 0.829 ± 0.004 |
| DISPROT(vls2) | 0.539 | 0.399 | 0.830 ± 0.001 |
| DISOPRED | 0.481 | 0.436 | 0.778 ± 0.003 |
| POODLE-S | 0.474 | 0.423 | 0.828 ± 0.004 |
| PrDOS | 0.469 | 0.442 | 0.810 ± 0.006 |
| POODLE-L | 0.464 | 0.397 | 0.794 ± 0.004 |
| RONN | 0.450 | 0.350 | 0.762 ± 0.006 |
| IUPred (short) | 0.445 | 0.412 | 0.788 ± 0.002 |
| DisPSSMP | 0.442 | 0.377 | 0.776 ± 0.004 |
| IUPred (long) | 0.432 | 0.392 | 0.787 ± 0.004 |
| Spritz (long) | 0.418 | 0.377 | - |
| Pdisorder | 0.383 | 0.350 | - |
| Dispro | 0.355 | 0.411 | - |
| Spritz (short) | 0.334 | 0.306 | - |
| DisEMBL | 0.289 | 0.232 | - |
| GlobPlot | 0.187 | 0.172 | - |
The highest value for each score is shown in bold.
Figure 2Receiver operating characteristics (ROC) plots and their area under curve (AUC) for disorder prediction methods used to construct the FloatCons meta-predictor for a combined dataset comprising Disprot, CASP7 targets and PDBremark465. FPR values are presented on a logarithmic scale.
The results of our meta-predictors and top-scoring primary methods in CASP8 and CASP9
| Method | Sw | AUC | Sensitivity | Specificity |
| FloatCons | 0.908 ± 0.017 | 0.904 ± 0.004 | ||
| BinCons | 0.661 | 0.897 ± 0.021 | 0.741 ± 0.050 | 0.920 ± 0.003 |
| DisoClust | 0.644 | 0.908 ± 0.018 | 0.727 ±0.047 | 0.917 ± 0.004 |
| MULTICOM | 0.660 ± 0.039 | 0.896 ± 0.019 | 0.796 ± 0.039 | 0.864 ± 0.004 |
| Mahmood-Torda | 0.619 ± 0.061 | 0.641 ± 0.061 | ||
| POODLE-L | 0.588 ± 0.066 | 0.895 | 0.646 ± 0.066 | 0.942 ± 0.004 |
| Method | Sw | AUC | Sensitivity | Specificity |
| FloatCons | 0.427 ± 0.009 | 0.795 ± 0.011 | 0.574 ± 0.020 | 0.854 ± 0.009 |
| GSmetaDisorder3D | 0.391 ± 0.007 | 0.784 ± 0.012 | 0.411 ± 0.016 | |
| GSmetaDisorderMD | 0.476 ± 0.006 | 0.818 ± 0.008 | 0.821 ± 0.010 | |
| GSmetaDisorderMD2 | 0.841 ± 0.014 | 0.860 ± 0.012 | ||
| PrDOS2 | 0.509 ± 0.002 | 0.609 ± 0.008 | 0.857 ± 0.003 | |
| MULTICOM-REFINE | 0.500 ± 0.003 | 0.821 ± 0.008 | 0.651 ± 0.003 | 0.851 ± 0.004 |
The highest value for each score is shown in bold.
The results of evaluation of GSmetaDisorder3d, GSmetaDisorderMD and GSmetaDisorderMD2 on CASP8 targets
| Method | MCC | Sw | AUC |
| floatCons | 0.606 | 0.904 | |
| GSmetaDisorder3d | 0.589 | 0.519 ± 0.024 | 0.833 |
| GSmetaDisorderMD | 0.558 ± 0.034 | 0.927 | |
| GSmetaDisorderMD2 | 0.607 ± 0.042 | ||