| Literature DB >> 36124304 |
Sumit Madan1,2, Victoria Demina3, Marcus Stapf3, Oliver Ernst3, Holger Fröhlich1,4.
Abstract
Prediction and understanding of virus-host protein-protein interactions (PPIs) have relevance for the development of novel therapeutic interventions. In addition, virus-like particles open novel opportunities to deliver therapeutics to targeted cell types and tissues. Given our incomplete knowledge of PPIs on the one hand and the cost and time associated with experimental procedures on the other, we here propose a deep learning approach to predict virus-host PPIs. Our method (Siamese Tailored deep sequence Embedding of Proteins [STEP]) is based on recent deep protein sequence embedding techniques, which we integrate into a Siamese neural network. After showing the state-of-the-art performance of STEP on external datasets, we apply it to two use cases, severe acute respiratory syndrome coronavirus 2 and John Cunningham polyomavirus, to predict virus-host PPIs. Altogether our work highlights the potential of deep sequence embedding techniques originating from the field of NLP as well as explainable artificial intelligence methods for the analysis of biological sequences.Entities:
Keywords: John Cunningham polyomavirus major capsid protein VP1; SARS-CoV-2 spike glycoprotein; Siamese neural network; deep protein sequence embeddings; protein-protein interactions; virus-host interactions
Year: 2022 PMID: 36124304 PMCID: PMC9481957 DOI: 10.1016/j.patter.2022.100551
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 2Architecture of our STEP model that uses the Siamese neural network while using the ProtBERT embeddings
Overview of the results of comparative evaluation of STEP on LSTM-PHV, yeast, and human PPI datasets
| AUC | AUPR | F1 | MCC | |
|---|---|---|---|---|
| Comparative analysis on host-virus PPI dataset from Tsukiyama et al. | ||||
| Tsukiyama et al. | 97.58% (±0.13%) | 93.86% (±0.35%) | 91,00% (±0.53%) | 90.30% (±0.53%) |
| STEP (ours) | 98.72% (±0.16%)∗ | 95.71% (±0.51%)∗ | 91.53% (± 0.65%)∗ | 90.82% (±0.72%)∗ |
| Comparative analysis on single independent host-virus PPI test dataset from Tsukiyama et al. | ||||
| Yang et al. | 96.30% | 81.00% | 72.40% | 69.70% |
| Tsukiyama et al. | 97.30% | 93.80% | 91.10%∗ | 90.40%∗ |
| STEP (ours) | 98.50%∗ | 94.50%∗ | 89.69% | 88.76% |
| Comparative analysis on Yeast PPI dataset from Guo et al. | ||||
| Guo et al. | NA | NA | 87.34% (±1.33) | 75.09% (±2.51%) |
| Chen et al. | NA | NA | 97.09% (±0.23%) | 94.17% (±0.48%) |
| STEP (ours) | 99.61% (±0.10%) | 99.58% (±0.17%) | 97.37% (±0.27%)∗ | 94.77% (±0.54%)∗ |
| Comparative analysis on Human PPI dataset from Sun et al. | ||||
| Sun et al. | NA | NA | 97.15% | NA |
| STEP (ours) | 99.74% (±0.03%) | 99.66% (±0.04%) | 98.84% (±0.09%)∗ | 97.67% (±0.18%) |
NA, not available in original publication.
For LSTM-PHV and Yeast PPI datasets, we applied a 5-fold CV similar to the authors of the given studies. For the Human PPI dataset of Sun et al., we applied a 10-fold CV for training the STEP models. The highest values are marked with asterisks. More details of each experiment can be found in Tables S1–S3.
Figure 1Receiver operator characteristic (ROC) curve (left) and AUPR (right) obtained by applying the STEP-brain model on unseen test data
Top 10 predicted interactions of the JCV major capsid protein VP1 and human receptors ranked by the probability obtained by our model
| Rank | Receptor protein ID | Receptor protein name | Score (in %) | Associated GO molecular function |
|---|---|---|---|---|
| 1 | UPF0606 protein KIAA1549 | 99.31 | – | |
| 2 | SLIT and NTRK-like protein 5 | 99.09 | protein binding | |
| 3 | polycystic kidney disease protein 1-like 3 | 98.68 | calcium channel activity, sour taste receptor activity | |
| 4 | voltage-dependent L-type calcium channel subunit alpha-1F | 98.63 | high voltage-gated calcium channel activity, metal ion binding | |
| 5 | versican core protein | 98.51 | calcium ion binding, hyaluronic acid binding, glycosaminoglycan binding, extracellular matrix structural constituent conferring compression resistance | |
| 6 | receptor-type tyrosine-protein phosphatase zeta | 98.33 | protein tyrosine phosphatase activity, integrin binding, protein binding, phosphatase activity, hydrolase activity, phosphoprotein phosphatase activity, transmembrane receptor protein tyrosine phosphatase activity | |
| 7 | neuroligin-1 | 98.33 | neurexin family protein binding, signaling receptor activity, identical protein binding, cell adhesion molecule binding, scaffold protein binding, PDZ domain binding, amyloid-beta binding | |
| 8 | interphotoreceptor matrix proteoglycan 2 | 98.23 | heparin binding, hyaluronic acid binding, extracellular matrix structural constituent | |
| 9 | melanocortin receptor 3 | 98.19 | peptide hormone binding, G protein-coupled receptor activity, melanocyte-stimulating hormone receptor activity, neuropeptide binding, melanocortin receptor activity | |
| 10 | receptor-type tyrosine-protein phosphatase gamma | 98.14 | protein tyrosine phosphatase activity, identical protein binding, phosphatase activity, transmembrane receptor protein tyrosine phosphatase activity, hydrolase activity, phosphoprotein phosphatase activity |
Results of the outer loop folds retrieved during the nested CV of STEP-virus-host model by using the test set with a ratio of 1:1 positive to pseudo-negative instances
| Outer fold | AUC | AUPR |
|---|---|---|
| 1 | 88.17% | 89.93% |
| 2 | 86.83% | 88.62% |
| 3 | 77.03% | 77.73% |
| 4 | 82.52% | 81.67% |
| 5 | 82.56% | 82.15% |
| Mean | 83.42% (± 3.91%) | 84.02% (±4.58%) |