| Literature DB >> 35495666 |
Xiaodi Yang1, Shiping Yang2, Panyu Ren1, Stefan Wuchty3,4,5, Ziding Zhang1.
Abstract
Identifying human-virus protein-protein interactions (PPIs) is an essential step for understanding viral infection mechanisms and antiviral response of the human host. Recent advances in high-throughput experimental techniques enable the significant accumulation of human-virus PPI data, which have further fueled the development of machine learning-based human-virus PPI prediction methods. Emerging as a very promising method to predict human-virus PPIs, deep learning shows the powerful ability to integrate large-scale datasets, learn complex sequence-structure relationships of proteins and convert the learned patterns into final prediction models with high accuracy. Focusing on the recent progresses of deep learning-powered human-virus PPI predictions, we review technical details of these newly developed methods, including dataset preparation, deep learning architectures, feature engineering, and performance assessment. Moreover, we discuss the current challenges and potential solutions and provide future perspectives of human-virus PPI prediction in the coming post-AlphaFold2 era.Entities:
Keywords: deep learning; human-virus protein-protein interactions; machine learning; prediction; transfer learning
Year: 2022 PMID: 35495666 PMCID: PMC9051481 DOI: 10.3389/fmicb.2022.842976
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 6.064
FIGURE 1(A) Workflow of human-virus PPI prediction covering dataset construction, feature engineering, model construction, and performance assessment. ROC indicates receiver operating characteristic curve and PR indicates precision-recall curve. (B) Transfer learning for the human-virus PPI prediction task. H, V, and P1 represents human protein, viral protein, and the single protein, respectively.
Existing deep learning prediction methods of human-virus PPIs.
| Method | Virus species | Input information | Embedding approach | Model architecture | Number of positive/negative samples | Negative sampling | URL |
| TransPPI ( | Multiple viruses | Protein sequences | PSSM | CNN + MLP + transfer learning | 31,381/313,810 | Dissimilarity-based negative sampling |
|
| DeepViral ( | 14 viral families | Protein sequences, functions, and disease phenotypes | one-hot and node2vec | CNN + MLP | 24,678/246,780 | Random sampling |
|
| LSTM-PHV ( | All viruses | Protein sequences | word2vec | LSTM + MLP | 22,383/223,830 | Dissimilarity-based negative sampling |
|
| DeepVHPPI ( | Multiple viruses | Protein sequences | one-hot | CNN + MLP + transfer learning | 22,653/226,530 | Dissimilarity-based negative sampling |
|
| MTT ( | Multiple viruses | Protein sequences | mLSTM | MLP + transfer learning | Multiple settings | Multiple settings |
|