| Literature DB >> 36250006 |
Ranran Chen1,2, Xinlu Li1,2, Yaqing Yang1,2, Xixi Song1,2, Cheng Wang1,2, Dongdong Qiao3.
Abstract
Intrinsically disordered proteins (IDPs) participate in many biological processes by interacting with other proteins, including the regulation of transcription, translation, and the cell cycle. With the increasing amount of disorder sequence data available, it is thus crucial to identify the IDP binding sites for functional annotation of these proteins. Over the decades, many computational approaches have been developed to predict protein-protein binding sites of IDP (IDP-PPIS) based on protein sequence information. Moreover, there are new IDP-PPIS predictors developed every year with the rapid development of artificial intelligence. It is thus necessary to provide an up-to-date overview of these methods in this field. In this paper, we collected 30 representative predictors published recently and summarized the databases, features and algorithms. We described the procedure how the features were generated based on public data and used for the prediction of IDP-PPIS, along with the methods to generate the feature representations. All the predictors were divided into three categories: scoring functions, machine learning-based prediction, and consensus approaches. For each category, we described the details of algorithms and their performances. Hopefully, our manuscript will not only provide a full picture of the status quo of IDP binding prediction, but also a guide for selecting different methods. More importantly, it will shed light on the inspirations for future development trends and principles.Entities:
Keywords: ML; intrinsically disordered protein (IDP); machine learning; protein functions; protein interaction sites prediction; protein sequence
Year: 2022 PMID: 36250006 PMCID: PMC9567019 DOI: 10.3389/fmolb.2022.985022
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1The work flow of each type of methods Figure 1 illustrates main three categories described in this article: (A) scoring function-based methods (B) machine learning-based methods and (C) consensus-based methods. The key steps for each type of methods are depicted in the diagram. Scoring function-based methods in (A), we use ANCHOR work flow to represent how scoring function works. Machine learning-based methods perform the prediction using various types of machine learning models like SVM and Neural Network based on features extracted from different perspectives. Consensus-based methods can predict IDP binding site by weighting different prediction models and combining them optimally. The final results show in (A) was processed by (Mészáros et al., 2018). The final results show in (B) was processed by (Malhis et al., 2016), The final results show in (C) was processed by (Barik et al., 2020).
Common features of intrinsically disordered protein-protein interaction sites predictors.
| Categories | Features | Tools to calculate | References of the tools |
|---|---|---|---|
| Amino acid composition | Amino acid composition |
|
|
| Predicted structural features | Secondary structure | PSIPRED & GOR-I & SPIDER2 & SPOT-1D & Porte |
|
| Accessible surface area | SPIDER2 & SPOT-1D & EDTSurf |
| |
| Backbone angle | SPIDER2 & SPOT-1D |
| |
| Hemispheric exposure | SPIDER2 & SPOT-1D |
| |
| Contact numbers | SPIDER2 & SPOT-1D |
| |
| B-factor | ProDy & PROFbval |
| |
| Structural motifs | BRNN |
| |
| Disorder scores | Disorder scores | IUPred & Espritz & VSL2 & DISOPRED2 & DISOclust & MFDp |
|
| Physicochemical properties | Physicochemical properties | AAindex database |
|
| Evolutionary information | Position-Specific Scoring Matrix | PSI-BLAST |
|
| Bigram feature | |||
| Hidden Markov Model | HHblits |
| |
| Shannon entropy | |||
| Other features | The length and location of IDR | ||
| Sequence complexity | SEG algorithm |
|
Summary of intrinsically disordered protein-protein interaction sites predictors.
| Categories | Years | Predictors | References | Algorithms | Databases | Features | Performance | URL |
|---|---|---|---|---|---|---|---|---|
| Scoring function based | 2010 | retro-MoRFs |
| Sequence alignment | RNase E and p53 and SRC-3 and SwissProt and PDB | Disorder scores and Sequence similarity | Not Available | Not Available |
| 2009 | ANCHOR |
| Energy estimation | Disprot and PDB | Pairwise interaction energy | Accuracy 0.67 |
| |
| 2018 | ANCHOR2 |
| Energy estimation | DisProt and PDB and UniProt and DIBS | Pairwise interaction energy | AUC 0.901 |
| |
| Machine-learning based | 2012 | MoRFpred |
| SVM | PDB and UniProtKB and Published literature | B-factors and ASA and Disorder scores and Physicochemical properties and PSSM | AUC 0.697 |
|
| 2013 | MFSPSSMpred |
| SVM | PDB and UniProt and Published literature | AAC and PSSM | AUC 0.758 | Not Available | |
| 2014 | DISOPRED3 |
| SVM | DisProt and PDB and UniProt | AAC and PSSM and The length and location of IDR | MCC 0.126 |
| |
| 2015 | MoRFCHiBi |
| SVM | PDB and UniProtKB and Published literature | AAC and Physicochemical properties | AUC 0.770 |
| |
| 2016 | MoRFCHiBiLight |
| Bayes rule | PDB and UniProtKB and Published literature | Disorder scores and Physicochemical properties | AUC 0.868 |
| |
| 2016 | MoRFCHiBiWeb |
| Bayes rule | PDB and UniProtKB and Published literature | Disorder scores and Physicochemical properties and PSSM | AUC 0.894 |
| |
| 2016 | fMoRFpred |
| SVM | PDB and UniProtKB and Published literature | AAC and SS and Disorder scores and Physicochemical properties | AUC 0.59–0.67 |
| |
| 2016 | Predict-MoRFs |
| SVM | PDB and UniProt | HMM | AUC 0.702 |
| |
| 2016 | PSSMpred |
| SVM | Disprot and PDB and UniProtKB and ELM | PSSM | AUC 0.758 |
| |
| 2017 | Yu et al |
| SVM | PDB and UniProtKB/Swiss-Prot | AAC and SS and ASA and Physicochemical properties and KNN score | AUC 0.9679 | Not Available | |
| 2018 | Fang et al |
| SVM | PDB and UniProt | PSSM | AUC 0.713 | Not Available | |
| 2018 | MoRFPred-plus |
| SVM | DisProt and PDB and UniProtKB and Published literature | Physicochemical properties and HMM | AUC 0.821 |
| |
| 2007 | alpha-MoRFpred |
| Feed-forward neural networks | PDB and SwissProt | SS and Disorder scores and Physiochemical properties and Shannon’s entropy | Sensitivity 0.87 | Not Available | |
| Specificity 0.87 | ||||||||
| Accuracy 0.87 | ||||||||
| 2012 | SLiMPred |
| BRNN | Disprot and PDB and UniProtKB and ELM | SS and Structural motifs and ASA and Disorder scores | AUC 0.69 |
| |
| 2013 | PepBindPred |
| BRNN | ELM and SCOP | SS and Disorder scores and Vina score | AUC 0.75 |
| |
| 2013 | SPINE-D |
| Neural-network | DisProt | SS and ASA | MCC 0.15 |
| |
| 2016 | SPOT-Disorder |
| LSTM | Disprot and PDB and UniProt | SS and Backbone angles and HSE and CN and ASA and Physicochemical properties and PSSM and Shannon entropy | MCC 0.309 |
| |
| 2019 | SPOT-Disorder2 |
| LSTM | DisProt and PDB and UniProt and MobiDB | SS and Backbone angles and HSE and CN and ASA and PSSM and HMM | MCC 0.155 |
| |
| 2021 | DeepDISOBind |
| Multi-task deep neural network | DisProt | SS and RAAPs | AUC 0.771 |
| |
| 2021 | flDPnn |
| RF and Feedforward neural network | DisProt | SS and Disorder scores and PSSM | AUC 0.79 |
| |
| 2015 | DisoRDPbind |
| Logistic regression | DisProt | AAC and SS and Disorder scores and Physiochemical properties and Sequence complexity | AUC 0.62–0.72 |
| |
| 2019 | IDRBind |
| Gradient boosted trees and CRF | PDB and IDEAL and peptiDB and Docking Benchmark 5 and Published literature | AAC and B-factors and ASA and Physicochemical properties and PSSM | MCC 0.31 |
| |
| Consensus | 2018 | OPAL |
| Integrating predictors | PDB and UniProtKB and Published literature | SS and Backbone angles and HSE and ASA and Physiochemical properties | AUC 0.795–0.870 |
|
| 2018 | OPAL+ |
| Integrating predictors | PDB and UniProtKB and Published literature | SS and Backbone angles and HSE and ASA and Physicochemical properties and HMM and Bigram feature vectors | AUC 0.820–0.876 |
| |
| 2019 | Sharma et al |
| Integrating predictors | PDB and UniProtKB and Published literature | SS and Backbone angles and HSE and CN and ASA and Physicochemical properties | AUC 0.797–0.881 |
| |
| 2020 | HybridPBRpred |
| Integrating predictors | DisProt and PDB | AAC and SS and ASA and Disorder scores and Physiochemical properties and HHM and RAAP | AUC 0.795 |
| |
| 2020 | DEPICTER |
| Integrating predictors | DisProt and PDB | AAC and SS and Disorder scores and Physiochemical properties and Sequence complexity and Pairwise interaction energy | AUC 0.87 |
|