| Literature DB >> 35356546 |
Bi Zhao1, Lukasz Kurgan1.
Abstract
Intrinsic disorder prediction is an active area that has developed over 100 predictors. We identify and investigate a recent trend towards the development of deep neural network (DNN)-based methods. The first DNN-based method was released in 2013 and since 2019 deep learners account for majority of the new disorder predictors. We find that the 13 currently available DNN-based predictors are diverse in their topologies, sizes of their networks and the inputs that they utilize. We empirically show that the deep learners are statistically more accurate than other types of disorder predictors using the blind test dataset from the recent community assessment of intrinsic disorder predictions (CAID). We also identify several well-rounded DNN-based predictors that are accurate, fast and/or conveniently available. The popularity, favorable predictive performance and architectural flexibility suggest that deep networks are likely to fuel the development of future disordered predictors. Novel hybrid designs of deep networks could be used to adequately accommodate for diversity of types and flavors of intrinsic disorder. We also discuss scarcity of the DNN-based methods for the prediction of disordered binding regions and the need to develop more accurate methods for this prediction.Entities:
Keywords: BRNN, Bidirectional recurrent neural networks; CAID, Critical Assessment of Intrinsic Protein Disorder; CASP, Critical Assessment of Structure Prediction; CNN, Convolutional neural networks; DNN, Deep neural network; Deep learning; Deep neural networks; Disordered binding regions; Disordered regions; FFNN, Feed forward neural networks; IDP, Intrinsically disordered protein; IDR, Intrinsically disordered region; Intrinsic disorder; Prediction
Year: 2022 PMID: 35356546 PMCID: PMC8927795 DOI: 10.1016/j.csbj.2022.03.003
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Summary of intrinsic disorder predictors that were developed since 2013 when the first deep learning-based method was released. The predictors are sorted in the chronological order of their year of publications. “*” denotes predictors that are used in Fig. 3.
| Predictor name | Year published | Reference | Applies DNN | Availability | URL |
|---|---|---|---|---|---|
| MFDp2 | 2013 | No | WS | ||
| DNdisorder | 2013 | Yes | N/A | N/A | |
| preDNdisorder | 2013 | No | N/A | N/A | |
| Ulg-GIGA | 2013 | No | N/A | N/A | |
| DisMeta | 2014 | No | WS | ||
| disCoP | 2014 | No | WS | ||
| DynaMine | 2014 | No | SP + WS | ||
| PON-Diso | 2014 | No | WS | ||
| DISOPRED3* | 2015 | No | SP + WS | ||
| s2D-1 | 2015 | No | No | N/A | |
| s2D-2* | 2015 | No | No | N/A | |
| DisoMCS | 2015 | No | N/A | N/A | |
| DeepCNF-D | 2015 | Yes | SP | ||
| AUCpreD* | 2016 | Yes | N/A | N/A | |
| AUCpreD-np* | 2016 | Yes | N/A | N/A | |
| DisPredict (DisPredict2)* | 2016 | No | SP | ||
| MobiDB-lite* | 2017 | No | WS | ||
| SPOT-Disorder* | 2017 | Yes | SP + WS | ||
| IUpred2A-long* | 2018 | No | SP + WS | ||
| IUpred2A-short* | 2018 | No | SP + WS | ||
| pyHCA* | 2018 | No | No | SP | |
| SPOT-Disorder-Single* | 2018 | Yes | SP + WS | ||
| Predictor by Zhao and Xue | 2018 | No | No | N/A | |
| IDP-CRF | 2018 | No | No | N/A | |
| rawMSA* | 2019 | Yes | SP | ||
| SPOT-Disorder2* | 2019 | Yes | SP + WS | ||
| Spark-IDPP | 2019 | No | No | N/A | |
| IDP-FSP | 2019 | No | No | N/A | |
| DisoMine* | 2020 | No | Yes | WS | |
| ODiNPred | 2020 | No | WS | ||
| IDP-Seq2Seq* | 2020 | Yes | WS | ||
| flDPnn* | 2021 | Yes | SP + WS | ||
| flDPlr* | 2021 | No | No | N/A | |
| IUPred3 | 2021 | No | SP + WS | ||
| RFPR-IDP* | 2021 | Yes | WS | ||
| Metapredict* | 2021 | Yes | SP + WS |
“No” means that a given predictor was not published in a peer-reviewed journal but was included based on participation in the CASP and/or CAID assessment.
Availability: released as “SP” (standalone program), “WS” (web server). “No” not released as either SP (standalone program) or WS (web server), and “N/A” (not available) SP and/or WS were released at the time of publication (i.e. URL was provided in the original article) but they were not available as of February 2022 when the access was tested.
Fig. 1Development of disorder predictors since 2013 when the first deep learning-based predictor was released. The left/right y-axis gives the number/fraction of predictors in a given time period. The predictors are color-coded where green represents deep neural network-based methods and blue represents other types of predictors.
Fig. 3Comparison of predictive performance between disorder predictors that utilize deep neural networks (in red) and the other disorder predictors (in blue). The predictive performance is quantified with AUC, AUPR, F1 and MCC. Results of individual predictors are denoted by dots. Distributions of these values are summarized with the box plots. *** means that the predictive performance of the deep learners is significantly higher than the performance of the other methods (p-value < 0.05).
Summary of intrinsic disorder predictors that use deep neural network models. The predictors are sorted in the chronological order of their year of publications. X marks inputs that are used by a given predictor. “*” denotes predictors that are used in Fig. 3.
| Predictor name | Year published | Inputs | Network architecture | AUC | Runtime | ||||
|---|---|---|---|---|---|---|---|---|---|
| Sequence | Evolutionary features | Predicted structural feature | Physicochemical properties | Type | Size | ||||
| DNdisorder | 2013 | X | X | RBM | Moderately deep | N/A | N/A | ||
| DeepCNF-D | 2015 | X | X | X | CNN | Moderately deep | N/A | N/A | |
| AUCpreD* | 2016 | X | X | X | X | CNN | Moderately deep | 0.757 | 7.0 |
| AUCpreD-np* | 2016 | X | X | X | CNN | Moderately deep | 0.751 | <0.5 | |
| SPOT-Disorder* | 2017 | X | X | X | BRNN | Moderately deep | 0.744 | 5.0 | |
| SPOT-Disorder-Single* | 2018 | X | X | X | BRNN + CNN | Deep | 0.757 | 0.8–1.0 | |
| rawMSA* | 2019 | X | X | BRNN + CNN | Very deep | 0.780 | >10.0 | ||
| SPOT-Disorder2* | 2019 | X | X | X | BRNN + CNN | Very deep | 0.760 | >10.0 | |
| DisoMine* | 2020 | X | BRNN | Moderately deep | 0.765 | <0.5 | |||
| IDP-Seq2Seq* | 2020 | X | X | X | BRNN | Very deep | 0.754 | 12.0 | |
| flDPnn* | 2021 | X | X | X | FFNN | Moderately deep | 0.814 | 0.5–1.0 | |
| RFPR-IDP* | 2021 | X | X | BRNN + CNN | Moderately deep | 0.722 | <0.5 | ||
| Metapredict* | 2021 | X | BRNN | Moderately deep | 0.746 | <0.5 | |||
The input sequence was encoded and directly used as predictive input.
Evolutional features computed from the input sequence including position-specific scoring matrix (PSSM), entropy-based conservation, and multiple sequence alignment.
Structural features predicted from the input sequence, such as putative secondary structure, solvent accessibility, and half-sphere exposures.
Physicochemical properties of the amino acids in the input sequence including polarizability, hydrophobicity, and isoelectric point.
Type of the deep learning neural network used: “RBM” (Restricted Boltzmann Machine); “CNN” (Convolutional Neural Network); “BRNN” (Bidirectional Recurrent Neural Network); and “FFNN” (Feed Forward Neural Network).
The number of hidden layers: moderately deep with 2 to 3 layers; deep with 4 to 5 layers; and very deep with over 5 layers.
The average runtime in minutes to predict one amino acid sequence. N/A denotes that the results could not be collected since a working implementation of the corresponding predictor is not available.
Fig. 2Heatmap that compares 11 available deep learners based on three key characteristics: predictive performance quantified with AUC, speed measured with runtime, and mode of availability. The predictors are sorted in the chronological order of their year of publications. The color-coded scores represent quality where 2 (dark blue) is best, 1 (blue) is intermediate, and 0 (light blue) is worst. The AUC values are categorized into three groups using statistical test that measures robustness of differences between predictors over different protein sets; details are described in the text. Methods with AUCs that are not statistically different (p-value ≥ 0.05) from the best (worst) performing flDPnn (RFPR-IDP) are labeled with 2 (0), while the remaining predictors are labeled with 1. The runtime is divided into three ranges: < 1 min (score of 2); between 1 and 10 min (score of 1); and ≥ 10 min (score of 0). The availability score counts the number of modes where 2 means that both SP (standalone program) or WS (web server) are available and 1 that either SP or WS are available.