| Literature DB >> 26198229 |
Xin Deng1, Jordan Gumm2, Suman Karki3, Jesse Eickholt4, Jianlin Cheng5,6.
Abstract
Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale.Entities:
Keywords: applications of disorder prediction; deep networks; machine learning; protein disorder prediction
Mesh:
Substances:
Year: 2015 PMID: 26198229 PMCID: PMC4519904 DOI: 10.3390/ijms160715384
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Performance of WiDNdisorder (wide deep network disorder predictor) on the DO1111_TEST dataset.
| Predictor | Balanced Accuracy | F-Measure | Sw | AUC | ||||
|---|---|---|---|---|---|---|---|---|
| Value | ±SE | Value | ±SE | Value | ±SE | Value | ±SE | |
| WiDNdisorder | 79.6 | 0.8 | 67.8 | 3.9 | 59.2 | 1.5 | 86.8 | 0.26 |
| ESpritz (v1.3) | 73.8 | 0.4 | 60.8 | 1.8 | 47.5 | 0.79 | 81.9 | 0.30 |
| DISOPRED (v3) | 76.8 | 0.4 | 68.7 | 4.7 | 53.7 | 0.81 | 91.5 | 0.22 |
| DNdisorder | 77.6 | 0.9 | 67.3 | 0.7 | 55.2 | 1.7 | 85.4 | 0.27 |
Performance of WiDNdisorder on the CASP10 dataset.
| Predictor | Balanced Accuracy | F-Measure | Sw | AUC | ||||
|---|---|---|---|---|---|---|---|---|
| Value | ±SE | Value | ±SE | Value | ±SE | Value | ±SE | |
| WiDNdisorder | 71.7 | 0.8 | 33.8 | 1.2 | 43.3 | 1.5 | 80.9 | 0.63 |
| ESpritz (v1.3) | 72.0 | 0.8 | 38.7 | 2.1 | 43.8 | 1.6 | 81.3 | 0.64 |
| PrDOS-CNF | 69.4 | 0.8 | 51.2 | 1.4 | 38.7 | 1.7 | 88.3 | 0.53 |
| DISOPRED (v3) | 69.0 | 0.9 | 52.0 | 1.8 | 38.0 | 2.0 | 87.2 | 0.55 |
| DNdisorder | 73.1 | 1.0 | 34.4 | 0.9 | 46.2 | 1.9 | 82.3 | 0.62 |
Figure 1Performance of disorder prediction methods on the DO1111_TEST dataset.
Figure 2Performance of disordered prediction methods on the CASP10 dataset.
Comparison of Tier 1 and Tier 2 predictions from WiDNdisorder on the DO1111_TEST and CASP10 datasets.
| Dataset | Tier-1 | Tier-2 | ||
|---|---|---|---|---|
| AUC | ±SE | AUC | ±SE | |
| DO1111_TEST | 84.1 | 0.28 | 86.8 | 0.26 |
| CASP10 | 78.8 | 0.65 | 80.9 | 0.63 |
Recall of disordered predictions by disorder region length on the DO1111_TEST dataset.
| Predictor | Length of Disordered Region | |||
|---|---|---|---|---|
| 1–5 | 6–15 | 16–25 | >25 | |
| WiDNdisorder | 76.2 | 57.9 | 67.2 | 74.3 |
| ESpritz (v1.3) | 74.7 | 63.3 | 67.4 | 55.9 |
| DISOPRED (v3) | 39.2 | 43.3 | 52.2 | 58.5 |
| DNdisorder | 75.4 | 74.1 | 76.1 | 57.9 |
Recall of disordered predictions by disorder region length on the CASP10 dataset.
| Predictor | Length of Disordered Region | |||
|---|---|---|---|---|
| 1–5 | 6–15 | 16–25 | >25 | |
| WiDNdisorder | 57.5 | 60.0 | 57.8 | 50.8 |
| ESpritz (v1.3) | 46.2 | 52.7 | 65.6 | 52.9 |
| PrDOS-CNF | 28.7 | 39.8 | 46.9 | 46.1 |
| DISOPRED (v3) | 25.1 | 33.6 | 49.8 | 47.5 |
| DNdisorder | 48.8 | 64.6 | 66.0 | 53.4 |
AUC for disorder predictions by input window size on the D01111_TEST dataset.
| Deep Network Configuration | Input Window Size (in Residues) | |||
|---|---|---|---|---|
| 31 | 51 | 71 | 91 | |
| dropout only | 78.6 | 81.6 | 82.6 | 83.3 |
| maxout nodes only | 78.0 | 79.2 | 80.8 | 80.9 |
| maxout nodes with dropout | 81.7 | 83.3 | 83.7 | 84.1 |
AUC for disorder predictions by input window size on the CASP10 dataset.
| Deep Network Configuration | Input Window Size (in Residues) | |||
|---|---|---|---|---|
| 31 | 51 | 71 | 91 | |
| dropout only | 71.0 | 70.0 | 73.0 | 74.9 |
| maxout nodes only | 71.3 | 73.0 | 74.7 | 75.8 |
| maxout nodes with dropout | 76.7 | 77.7 | 78.4 | 78.8 |
Figure 3Graphical representation of a neural network.
Figure 4General deep network architecture used for Tier 1 order/disorder prediction.
Figure 5Tier 2 deep network architecture used to make final order/disorder prediction.
Figure 6Distribution of the length of disordered regions in the DO1111_TEST dataset. Each bin represents a range of five residues, and the last bin represents the number of disordered regions that have a length greater than 100 residues.
Figure 7Distribution of the length of disordered regions in the CASP10 dataset. Each bin represents a range of five residues, and the last bin represents the number of disordered regions that have a length greater than 100 residues.
Figure 8Distribution of the length of disordered regions in the DO1111_TRAIN. Each bin represents a range of five residues, and the last bin represents the number of disordered regions that have a length greater than 100 residues.
Figure 9Wide windows used for sequence encoding.