| Literature DB >> 27879651 |
Ji-Wei Chang1,2, Yan-Qing Zhou3,4, Muhammad Tahir Ul Qamar5,6, Ling-Ling Chen7,8, Yu-Duan Ding9,10.
Abstract
Most cellular functions involve proteins' features based on their physical interactions with other partner proteins. Sketching a map of protein-protein interactions (PPIs) is therefore an important inception step towards understanding the basics of cell functions. Several experimental techniques operating in vivo or in vitro have made significant contributions to screening a large number of protein interaction partners, especially high-throughput experimental methods. However, computational approaches for PPI predication supported by rapid accumulation of data generated from experimental techniques, 3D structure definitions, and genome sequencing have boosted the map sketching of PPIs. In this review, we shed light on in silico PPI prediction methods that integrate evidence from multiple sources, including evolutionary relationship, function annotation, sequence/structure features, network topology and text mining. These methods are developed for integration of multi-dimensional evidence, for designing the strategies to predict novel interactions, and for making the results consistent with the increase of prediction coverage and accuracy.Entities:
Keywords: PPIs; interaction prediction; physical interactions; support vector machine
Mesh:
Substances:
Year: 2016 PMID: 27879651 PMCID: PMC5133940 DOI: 10.3390/ijms17111946
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Different methods for detecting protein−protein interactions.
Interaction databases for construction of gold standard (until September 2016).
| Name | Description | Points/Edges | C | E | L | O | Last Update | Ref. I | Ref. II |
|---|---|---|---|---|---|---|---|---|---|
| BioGrid 3.4 | An interaction repository with data compiled through comprehensive curation efforts. | 65,099/836,212 | N | P | P | 61 | September 2016 | [ | |
| IntAct 4.2.5 | Provides a strong, freely available, open source database system and analysis tools for molecular interaction data. | 93,856/653,104 | N | P | P | 8 | September 2016 | [ | |
| PDB | A database containing experimentally determined three-dimensional structures of proteins. | 126,079/NA | N | P | P | NA | September 2016 | [ | |
| STRING | A database including protein interactions containing both physical and functional associations. | 9.6 million/184 million | P | P | P | 2031 | September 2016 | [ | |
| APID | Based on known experimentally validated PPIs and integrated interactomes with a methodological approach to report quality levels and coverage over the proteomes. | 90,379/678,441 | N | P | P | 25 | June 2016 | [ | |
| DIP | A database combining experimental PPI information from a variety of sources. | 28,764/81,627 | N | P | P | 826 | Febrary 2014 | [ | |
| HitPredict 4 | A resource of experimentally determined PPI with reliability scores. | 70,808/398,696 | N | P | N | 105 | September 2015 | [ | |
| MINT | Focuses on experimentally verified protein−protein interactions mined from the scientific literature by expert curators. | 25,530/125,464 | N | P | P | 611 | September 2013 | [ | |
| TAIR-nbrowse | Provide Arabidopsis PPI data curated from the literature by TAIR curators. | 2452/8626 | N | P | P | 1 | September 2011 | [ | |
| HPRD Release 9 | A centralized platform to integrate interaction networks of human protein. | 30,047/41,327 | N | P | N | 1 | April 2010 | [ | |
| PINA2.0 | An integrated platform for protein interaction network construction, filtering, analysis, visualization and management. | 12,969/365,930 | N | P | N | 7 | May 2014 | [ | |
| Negatome 2.0 * | A collection containing experimentally supported non-interacting protein pairs and domain pairs which are unlikely engaged in direct physical interactions. | 3376/6532 | N | P | P | NA | 2014 | [ |
Points/Edges, number of interactors (proteins)/number of interactions; C, computationally supported; E, experimentally supported; L, Literature curated; O, number of organisms; Ref., reference resources; N, negative PPIs contained; P, positive PPIs contained; *, negative datasets of PPIs; NA, Not Available.
Figure 2Total non-redundant interactions of major species in BioGRID (Version 3.4.140, September 2016) and IntAct (September 2016).
Annotated protein pairs with diverse evidence.
| Categories | Feature | Abbreviation | Ref. |
|---|---|---|---|
| EVO | Gene Fusion Event | FE | [ |
| Gene Cluster | GCL | [ | |
| Gene Neighborhood | GN | [ | |
| Pylogenetic Profile | PP | [ | |
| FF | GO Cellular Component | COM | [ |
| Coessentiality | ESS | [ | |
| Gene/Protein Coexpression | Exp | [ | |
| GO Molecular Function | FUN | [ | |
| Colocalization | Loc | [ | |
| Ortholog/ Sequence Similar | ORT | [ | |
| GO Biological Process | PRO | [ | |
| Coregulation/Transcriptional Regulation | Reg | [ | |
| TOP | Graphical Invariants | GI | [ |
| Probabilistic Graphical Model | PGM | [ | |
| Small-World Clustering Coefficients | SCC | [ | |
| SEQ | Conjoint Triad | COT | [ |
| N-grams | NGR | [ | |
| ORF Codon Usage | ORF | [ | |
| Position-Specific Scoring Matrix | PSSM | [ | |
| 2D Structure | 2DS | [ | |
| STR | 3D Structure | 3DS | [ |
| Average of the Cumulative Hydropathy Indices | ACH | [ | |
| Domain–Domain Interaction | DDI | [ | |
| DSSP Structure in PDB | DSSP | [ | |
| Electrostatics | ELE | [ | |
| Protein Fold | Fold | [ | |
| Generalized Born | GB | [ | |
| High Quality AA Indices | HQI | [ | |
| Predicted Accessibility | pA | [ | |
| Physico-Chemical Properties | PHC | [ | |
| PSIPRED Structure | PSIP | [ | |
| Posttranslational Modifications | PTM | [ | |
| Relative Solvent Accessibility | RSA | [ | |
| Surface Area | SA | [ | |
| Van Der Waals Forces | VDW | [ | |
| TM | Literature-Curated | LC | [ |
Categories: Evolutionary relationship (EVO), Functional features (FF), Network topological (TOP), Sequence-based signatures (SEQ), Structure-based signatures (STR), Text mining (TM). The definitions of abbreviations are based on references and customizations.
Figure 3Schematic diagram for Conjoint Triad Method [55].
Figure 4Two methods to predict domain−domain interactions (DDIs) from PPIs. Proteins A and B are a pair of proteins in a PPI network. Protein A contains domains a and b, whereas protein B contains domains c, d and e. PPI is interpreted as the result of interactions among multiple domain pairs. (A) A method that considers a domain pair as basic unit of protein interactions; (B) Another method that proposes a domain combination pair as a basic unit for the prediction model [83].
Some studies or online tools of PPI prediction by evidence-combining methods (until September 2016).
| Class | Description | Classifiers | Evidence | Organisms | Ref. | (URL) (Last Update) (Points/Edges) |
|---|---|---|---|---|---|---|
| DDI | iPfam: catalogs of protein family interactions, including domain and ligand interactions, calculated from known structures | NA | PHC, 3DS | NA | [ | ( |
| DDI | 3did: database of three-dimensional interacting domains is a collection of DDIs in proteins for which high-resolution known 3D structures | NA | PHC, 3DS | NA | [ | ( |
| DDI | DOMINE is a database of known and predicted DDIs | POI | Exp, PP, FE, FUN, PRO, COM, DDI, 3DS | NA | [ | ( |
| DDI | Combine protein interaction datasets from multiple species to construct DDIs | NB, EC | FUN, PRO, GF, DDI, etc. | 4 ( | [ | NA |
| PPI | Predicting PPIs in | EC | ORT, COM, PRO | 1 ( | [ | NA |
| PPI | CitrusNet: sweet orange PPI network | KNN | DDI, ORT, COT | 1 ( | [ | ( |
| PPI | A predicted interactome for | EC | Exp, ORT, Loc | 1 | [ | NA |
| PPI | PRIN: a predicted rice interactome network | EC | FUN, PRO, COM, Exp, ORT | 1 ( | [ | ( |
| PPI | TSEMA: predicts the interaction between two families of proteins based on Monte Carlo approach | MC | PP | NA | [ | ( |
| PPI | Predicting PPI using graph invariants and a neural network | NN | PGM, GI, PHC | NA | [ | NA |
| PPI | IID: integrated interactions database providing tissue-specific PPIs for model organisms | EC | Exp, ORT, etc. | 6 ( | [ | ( |
| PPI | FpClass: interactions and properties of human proteins | association analysis | DDI, FUN, PRO, COM, PTM, Exp, ORT, PSIP | 1 ( | [ | ( |
| PPI | PAIR: the predicted Arabidopsis interactome resource | SVM | PP, PRO, FUN, COM, Exp, DDI | 1 ( | [ | ( |
| PPI | SPPS: sequence-based protein partners search | SVM | COT | 5 ( | [ | ( |
| PPI | PIPs: human PPI prediction database | NB | Exp, ORT, DDI, Loc, PTM | 5 ( | [ | ( |
| PPI | Six classifiers and different biological data were used to predict interactions | RF, kRF, NB, DT, LR, SVM | Exp, FUN, PRO, COM, ESS, Reg, FE, GN, PP, ORT, DDI, etc. | NA | [ | NA |
| PPI | SSWRF: an ensemble of SVM and SWRF method | SVM, SWRF | PSSM, ACH, RSA | NA | [ | NA |
| PPI | Sequence-based approach is developed by combining MCD and SVM methods | MCD, SVM | COT, SeqS | 1 ( | [ | NA |
| PPI | PrePPI: predicts PPI using both structural and nonstructural information | LR | ORT, FUN, PRO, COM, ESS, Exp, PP, etc. | 2 ( | [ | ( |
| PPI | MLPPI: multi-level machine learning prediction of PPI in yeast | SVM | 2DS&PHC (PSIP, DSSP, HQI), SEQ, etc. | 1( | [ | ( |
| PPI | Probabilistic model of the human PPI network | NB | PRO, Exp, ORT, DDI | 1 ( | [ | NA |
| PPI | Characterization and prediction of PPI in the yeast | LR | DDI, Fold, FE, PP, GN, Loc, PRO, Exp | 1 ( | [ | NA |
| PPI | InPrePPI method: an integrated method for prediction of PPI | AC | GCL, PP, FE, GN | 1 ( | [ | ( |
| PPI | Global genome-scale PPI network in | NB | ORT, FE, GN, PP, FUN, PRO, COM, Exp, DDI | 1 ( | [ | NA |
| PPIS | LORIS method: sequence-based L1-logreg classifier proposed to identify PPIS | L1-logreg | PSSM, ACH, RSA | NA | [ | NA |
| PPIS | Struct2Net, iWRAP & Coev2Net | PGM, LR, etc. | ORT | 3 ( | [ | ( |
| PPIS | PRISM2: protein interactions by structural matching | EC | RSA, ORT | NA | [ | ( |
| PPIS | MIEC-SVM: structure-based method for predicting protein recognition specificity | SVM | VDW, ELE, GB, SA, PTM, etc. | NA | [ | ( |
| PPIS | PSIVER method | NB, KDE | PSSM, pA | NA | [ | NA |
Points/Edges, number of interactors (proteins or domains)/number of interactions; Ref., reference resources; NA, Not Available; URL, Uniform Resource Locator (some sites are currently under maintenance); DDI, Domain–Domain Interaction; PPI, Protein–Protein Interaction; PPIS, Protein–Protein Interaction Site. Classifiers: AC, Integrated Value of the Accuracy and Coverage; ANN, Artificial Neural Network; DT, Decision Tree; EC, Evidence Counting; KDE, Kernel Density Estimation; KNN, K-nearest Neighbor; kRF, RF similarity-based k-Nearest-Neighbor; L1-logreg, L1-regularized Logistic Regression; LR, Logistic Regression; MC, Monte Carlo; MCD, Multi-scale Continuous and Discontinuous Sequence Representation Approach; NB, Naive Bayes; PGM, Probabilistic Graphical Model; POI, Prediction Overlap Index; RF, Random Forest; SVM, Support Vector Machine; SWRF, Sample-weighted Random Forest. Organisms: At, Arabidopsis thaliana; Ce, Caenorhabditiselegans; Cs, Citrus sinensis; Dm, Drosophila melanogaster; Ec, Escherichia coli; Hs, Homo sapiens; Mm, Musmusculus; Os, Oryza sativa; Rn, Rattusnorvegicus; Sc, Saccharomyces cerevisiae.