| Literature DB >> 31956393 |
Abstract
Pathogen-host protein interactions are fundamental for pathogens to manipulate host signaling pathways and subvert host immune defense. For most pathogens, very few or no experimental studies have been conducted to investigate their signaling cross-talks with host. In this study, we propose a computational framework to validate the biological assumption that human protein-protein interaction (PPI) networks alone are sufficient to infer pathogen-host PPIs via pathogen functional mimicry. Pathogen functional mimicry assumes that a pathogen functionally mimics and substitutes host counterpart proteins in order for the pathogen to get involved in or hijack the host cellular processes. Through pathogen functional mimicry defined via gene ontology (GO) semantic similarity, we first use the known human PPIs as templates to infer pathogen-host PPIs, and the PPIs are further used as training data to build an l2-regularized logistic regression model for novel pathogen-host PPI prediction. Independent tests on the experimental data from human immunodeficiency virus and Francisella tularensis validate the effectiveness of the proposed pathogen functional mimicry technique. Performance comparisons also show that the proposed technique y excels the existing pathogen sequence mimicry approaches and transfer learning methods. The proposed framework provides a new avenue to study the experimentally less-studied pathogens in the worst scenarios that very few or no experimental pathogen-host PPIs are available. As two case studies, we apply the proposed framework to Salmonella typhimurium and Human respiratory syncytial virus to reconstruct the pathogen-host PPI networks and further investigate the interference of these two pathogens with human immune signaling and transcription regulatory system.Entities:
Keywords: GO semantic similarity; Human protein-protein interaction networks; Pathogen functional mimicry; Pathogen-host protein interaction networks; Signaling pathways; Transfer learning
Year: 2019 PMID: 31956393 PMCID: PMC6956678 DOI: 10.1016/j.csbj.2019.12.008
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1ROC curves for 5-fold cross validation performance on human immunodeficiency virus and Francisella tularensis.
Performance estimation of 5-fold cross validation and independent test on HIV and F. tularensis.
| HIV | Size | Combined-instance | Homolog-instance | Target-instance | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| PR | SE | MCC | PR | SE | MCC | PR | SE | MCC | ||
| Positive | 50,060 | 0.935 | 0.9452 | 0.8856 | 0.9353 | 0.9452 | 0.8854 | 0.9352 | 0.9454 | 0.8812 |
| Negative | 50,060 | 0.9441 | 0.9336 | 0.8854 | 0.9436 | 0.9333 | 0.8851 | 0.9387 | 0.9273 | 0.8797 |
| [Acc; MCC] | [93.95%; 0.8854] | [93.93%; 0.8852] | [93.68%; 0.8808] | |||||||
| [ROC-AUC] | [0.9819] | [0.9818] | [0.9805] | |||||||
| F1 Score | 0.9401 | 0.9402 | 0.9403 | |||||||
| size | Combined-instance | Homolog-instance | Target-instance | |||||||
| PR | SE | MCC | PR | SE | MCC | PR | SE | MCC | ||
| Positive | 41,796 | 0.7083 | 0.9683 | 0.684 | 0.708 | 0.972 | 0.687 | 0.7323 | 0.9683 | 0.6972 |
| Negative | 41,796 | 0.9499 | 0.6012 | 0.6493 | 0.9558 | 0.6013 | 0.6531 | 0.9459 | 0.6103 | 0.6587 |
| [Acc; MCC] | [78.47%; 0.6343] | [78.61%; 0.6363] | [79.79%; 0.6540] | |||||||
| [ROC-AUC] | [0.9510] | [0.9541] | [0.9504] | |||||||
| F1 Score | 0.8181 | 0.8193 | 0.834 | |||||||
| Independent test | HIV | |||||||||
| (Recognition rate) | Positive | Negative | Positive | Negative | ||||||
| 73.09% | 91.67% | 66.47% | 60.78% | |||||||
Fig. 2ROC curves for 5-fold cross validation performance on human respiratory syncytial virus and Salmonella typhimurium.
Performance estimation of 5-fold cross validation and independent test on HRSV and S. typhimurium.
| HRSV | Size | Combined-instance | Homolog-instance | Target-instance | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| PR | SE | MCC | PR | SE | MCC | PR | SE | MCC | ||
| Positive | 1,310 | 0.9495 | 0.9908 | 0.9409 | 0.9495 | 0.9908 | 0.9408 | 0.9516 | 0.9908 | 0.9403 |
| Negative | 1,310 | 0.9904 | 0.9473 | 0.9407 | 0.9904 | 0.9472 | 0.9407 | 0.9895 | 0.9447 | 0.9399 |
| [Acc; MCC] | [96.91%; 0.9400] | [96.90%; 0.9399] | [96.88%; 0.9395] | |||||||
| [ROC-AUC] | [0.9952] | [0.9952] | [0.9948] | |||||||
| F1 Score | 0.9697 | 0.9697 | 0.9708 | |||||||
| Size | Combined-instance | Homolog-instance | Target-instance | |||||||
| PR | SE | MCC | PR | SE | MCC | PR | SE | MCC | ||
| Positive | 50,000 | 0.8428 | 0.943 | 0.796 | 0.8433 | 0.9431 | 0.796 | 0.8578 | 0.9434 | 0.8032 |
| Negative | 50,000 | 0.9353 | 0.824 | 0.7914 | 0.935 | 0.8236 | 0.7912 | 0.9301 | 0.828 | 0.796 |
| [Acc; MCC] | [88.36%; 0.7891] | [88.36%; 0.7891] | [88.84%; 0.7972] | |||||||
| [ROC-AUC] | [0.9545] | [0.9545] | [0.9543] | |||||||
| F1 Score | 0.8901 | 0.8904 | 0.8986 | |||||||
| Independent test (Recognition rate) | HRSV | |||||||||
| Positive | Negative | Positive | Negative | |||||||
| 79.31% | 96.55% | 75.81% | ||||||||
Fig. 3Fifteen top biological processes interfered with by human respiratory syncytial virus (A) and S. typhimurium (B).
Fig. 4Human proteins predicted to be targeted by the HRSV protein SH|P69360 (A) and the S. typhimurium protein dnaQ|P0A1G9.
Fig. 5Pathway enrichment analyses. A–B show the number of HRSV proteins that target each specific human immune signaling pathway and the number of human immune signaling pathways that each HRSV protein targets. C–D show the number of S. typhimurium proteins that target each specific human immune signaling pathway and the number of human immune signaling pathways that each S. typhimurium protein targets.
Fig. 6Pathogen interference with host cellular transcriptional activities. A–B show the number of HRSV proteins that target each specific human gene/protein associated with gene transcription and the number of human transcriptional genes/proteins that each HRSV protein targets. C–D show the number of S. typhimurium proteins that target each specific human gene/protein associated with gene transcription and the number of human transcriptional genes/proteins that each S. typhimurium protein targets. For clarity, only 25 top transcription associated human proteins are illustrated in (A) and (C).
Fig. 7Performance comparison with the existing methods.