| Literature DB >> 35832624 |
Xiaotian Hu1, Cong Feng1, Tianyi Ling1,2,3, Ming Chen1,4.
Abstract
Protein-protein interactions (PPIs) play key roles in a broad range of biological processes. The disorder of PPIs often causes various physical and mental diseases, which makes PPIs become the focus of the research on disease mechanism and clinical treatment. Since a large number of PPIs have been identified by in vivo and in vitro experimental techniques, the increasing scale of PPI data with the inherent complexity of interacting mechanisms has encouraged a growing use of computational methods to predict PPIs. Until recently, deep learning plays an increasingly important role in the machine learning field due to its remarkable non-linear transformation ability. In this article, we aim to present readers with a comprehensive introduction of deep learning in PPI prediction, including the diverse learning architectures, benchmarks and extended applications.Entities:
Keywords: Biological prediction; Deep learning; Feature embedding; Protein–protein interaction
Year: 2022 PMID: 35832624 PMCID: PMC9249595 DOI: 10.1016/j.csbj.2022.06.025
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Timeline for computational PPI prediction methods.
Typical protein–protein interaction databases for deep learning models.a
| Database | Proteins | Interactions | Organisms | URL | Confidence scores | Type of information | Used by |
|---|---|---|---|---|---|---|---|
| DIP | 28,850 | 81,923 | Unavailable | Interactions | DeepPPI, DPPI, PIPR, DeepFE-PPI, Liu’s work, DeepTrio, FSFDW, TAGPPI, S-VGAE | ||
| HPRD | 30,047 | 41,327 | Unavailable | Interactions, disease associations, domain annotations | DeepFE-PPI, DeepPPI, S-VGAE | ||
| HIPPIE | 17,000 | 273,900 | Available | Interactions, disease associations | DPPI, Liu’s work, | ||
| BioGRID | 82,082 | 1,244,672 | Unavailable | Interactions, Go associations | DeepTrio, D-SCRIPT | ||
| STRING | 67,592,464 | 296,567,750 | Available | Interactions | PIPR, D-SCRIPT, MTT, TAGPPI | ||
| IntAct | 118,759 | 1,184,144 | Available | Interactions | MTT | ||
| HPIDB | 16,332 | 69,787 | Hosts and pathogens | Unavailable | Interactions, host and pathogen associations | DeepViral, TransPPI | |
| MINT | 27,069 | 132,249 | Available | Interactions | |||
| RCSB PDB | 128,685 | NA | Unavailable | Complexes, structures, disease associations | CAMP, TransPPI |
NA, not available from the original paper.
Fig. 2Overall deep learning framework for PPI prediction.
Recently proposed deep learning methods for PPI prediction.
| Method | Year | Main learning structure | Sources of input feature | Encoding method | Combining method |
|---|---|---|---|---|---|
| DeepPPI | 2017 | Multilayer Perceptron | Protein sequences | Seven sequence-based features (like amino acid composition) | Concatenation |
| DPPI | 2018 | Convolutional Neural Networks | Protein sequences | Protein position specific scoring matrices (PSSM) derived by PSI-BLAST | Element-wise multiplication |
| DeepFE-PPI | 2019 | Multilayer Perceptron | Protein sequences | Pre-trained model embedding (Word2vec | Concatenation |
| PIPR | 2019 | Bidirectional Gated Recurrent Unit and Convolutional Neural Networks | Protein sequences | Pre-trained model embedding (Skip-Gram | Element-wise multiplication |
| S-VGAE | 2020 | Graph Convolutional Neural Networks | Protein sequences and topology information of PPI networks | Conjoint triad (CT) method | Concatenation |
| Liu’s work | 2020 | Graph Convolutional Neural Networks | Protein sequences and topology information of PPI networks | One-hot encoding | Concatenation |
| DeepViral | 2021 | Word2Vec model and Convolutional Neural Networks | Protein sequences, phenotypes associated with human genes and pathogens, and the Gene Ontology annotations of human proteins | DL2Vec embedding model | Dot product |
| FSNN-LGBM | 2021 | Multilayer Perceptron | Protein sequences | pseudo amino acid composition (PseAAC) and conjoint triad (CT) methods | Element-wise multiplication |
| TransPPI | 2021 | Convolutional Neural Networks | Protein sequences | Protein position specific scoring matrices (PSSM) derived by PSI-BLAST | Concatenation |
| DeepTrio | 2021 | Convolutional Neural Networks | Protein sequences | Trainable symbol lexicon embedding | Element-wise addition |
| FSFDW | 2021 | Skip-Gram (Deepwalk) | Protein sequences and topology information of PPI networks | Sequence-based features selected by Louvain method and Term variance | Element-wise multiplication |
| NXTfusion | 2021 | Multilayer Perceptron | Protein-Protein, Protein-Domain, Protein-Tissue and Protein-Disease relations | One-hot encoding | Bilinear transformation |
| MTT | 2021 | Multilayer Perceptron | Protein sequences | Pre-trained model embedding (UniReo | Element-wise multiplication |
| CAMP | 2021 | Convolutional Neural Networks and Self-attention | Protein sequences, secondary structures, polarity, and hydropathy properties | Protein position specific scoring matrices (PSSM) calculated by PSI-BLAST and trainable symbol lexicon embedding | Concatenation |
| D-SCRIPT | 2021 | Broadcast subtraction and multiplication, and Convolutional Neural Networks | Protein sequences | Pre-trained model embedding (Bepler and Berger’ work | Broadcast subtraction and broadcast multiplication |
| TAGPPI | 2022 | Convolutional Neural Networks and Graph attention networks | Protein sequences and structures | Pre-trained model embedding (SeqVec | Concatenation |
The reported performance and efficiency of PPI deep learning methods.a
| Method | Acc. (%) | Prec. (%) | Sen. (%) | Spec. (%) | F1 (%) | MCC (%) | AUC | AUPRC | Training time | Training environment | Benchmark |
|---|---|---|---|---|---|---|---|---|---|---|---|
| DeepPPI | 94.43 | 96.65 | 92.06 | NA | NA | 88.97 | NA | NA | 369 s | Intel Xeon E2520 CPU with 16G memory | |
| DPPI | 94.55 | 96.68 | 92.24 | NA | NA | NA | NA | NA | NA | 32 AMD 6272 CPUs | |
| DeepFE-PPI | 94.78 | 96.45 | 92.99 | NA | NA | 89.62 | NA | NA | 1008 s | Intel Core i5-7400 with 16G memory | |
| PIPR | 97.09 | 97.00 | 97.17 | 97.00 | 97.09 | 94.17 | NA | NA | 150 s | NVIDIA GeForce GTX 1080 Ti GPU | |
| S-VGAE | 99.15 | 98.90 | 99.41 | 98.89 | 99.15 | NA | NA | NA | NA | NVIDIA GeForce GTX 1080 GPU with 7 GB memory | |
| Liu’s work | 95.33 | 97.02 | 93.55 | NA | NA | NA | NA | NA | NA | NA | |
| DeepViral | NA | NA | NA | NA | NA | NA | 0.800 | NA | NA | Nvidia Tesla V100 GPU | Host and pathogen PPIs from HPIDB |
| FSNN-LGBM | 98.70 | 99.11 | 98.28 | 99.12 | NA | 97.41 | 0.997 | NA | NA | NA | |
| DeepTrio | 97.55 | 98.95 | 96.12 | 98.98 | 97.52 | 95.15 | NA | NA | NA | NVIDIA Tesla P100 GPU with 16 GB memory | |
| FSFDW | NA | NA | NA | NA | NA | NA | 0.794 | NA | NA | NA | |
| NXTfusion | NA | NA | NA | NA | NA | NA | 0.988 | 0.778 | NA | NA | |
| MTT | NA | 93.53 | 94.05 | NA | 93.79 | NA | 0.980 | 0.980 | NA | NVIDIA GTX 1080-Ti GPU with 11 GB memory | VirusMINT database |
| CAMP | NA | NA | NA | NA | NA | NA | 0.872 | 0.641 | 2 h | 48 CPU cores and one NVIDIA GeForce GTX 1080Ti GPU | Protein-peptides interactions from the RCSB PDB and DrugBank |
| D-SCRIPT | NA | 72.8 | 27.8 | NA | NA | NA | 0.833 | 0.516 | 3 days | A single 32 GB GPU | |
| TAGPPI | 97.81 | 98.10 | 98.26 | 98.10 | 97.80 | 95.63 | 0.977 | NA | NA | NVIDIA TITAN RTX with 24 GB memory |
NA, not available from the original paper.