| Literature DB >> 30367586 |
Xiang Zhou1, Byungkyu Park1, Daesik Choi1, Kyungsook Han2.
Abstract
BACKGROUND: Viral infection involves a large number of protein-protein interactions (PPIs) between virus and its host. These interactions range from the initial binding of viral coat proteins to host membrane receptor to the hijacking the host transcription machinery by viral proteins. Therefore, identifying PPIs between virus and its host helps understand the mechanism of viral infections and design antiviral drugs. Many computational methods have been developed to predict PPIs, but most of them are intended for PPIs within a species rather than PPIs across different species such as PPIs between virus and host.Entities:
Keywords: Interspecies protein-protein interaction; Prediction model; Virus and host
Mesh:
Substances:
Year: 2018 PMID: 30367586 PMCID: PMC6101077 DOI: 10.1186/s12864-018-4924-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The number of known host–virus PPIs and viruses interacting with a host
| Host | Major hosts | #Host-virus | #Interacting |
|---|---|---|---|
| classification | (taxonomy ID) | PPIs | virus taxanomy IDs |
| Human | Homo sapiens (9606) | 11,491 | 246 |
| Mus musculus (10090) | 191 | 89 | |
| Bos taurus (9913) | 125 | 32 | |
| Rattus norvegicus (10116) | 86 | 19 | |
| Non-human | Sus scrofa (9823) | 57 | 10 |
| animal | Gallus gallus (9031) | 15 | 9 |
| Equus caballus (9796) | 7 | 6 | |
| Drosophila melanogaster (7227) | 4 | 3 | |
| Canis lupus familiaris (9615) | 3 | 1 | |
| Plant | Arabidopsis thaliana (3702) | 17 | 11 |
| Escherichia coli K-12 (83333) | 78 | 9 | |
| Bacteria | Streptococcus pneumonia (170187) | 49 | 2 |
| Pseudomonas aeruginosa (208963) | 13 | 4 | |
| Escherichia coli (562) | 3 | 1 | |
| Others | 15 hosts | 18 | 15 |
| Total | 29 | 12,157 | 332 ∗ |
332*: the total number of non-redundant viruses in terms of taxonomy IDs
Fig. 1a Training dataset 1 (TR1): 10,955 PPIs between human and any virus except H1N1. Test dataset 1 (TS1): 381 PPIs between human and H1N1 virus. Training dataset 2 (TR2): 11,341 PPIs between human and any virus except Ebola virus. Test dataset 2 (TS2): 150 PPIs between human and Ebola virus. b Training dataset 3 (TR3): 11,617 PPIs between any host and any virus except H1N1. Test dataset 1 (TS1): 381 PPIs between human and H1N1 virus. Training dataset 4 (TR4): 12,007 PPIs between any host and any virus except Ebola virus. Test dataset 2 (TS2): 150 PPIs between human and Ebola virus
Fig. 2Training dataset 5 (TR5): 11,491 PPIs between human and any virus. Test dataset 5.1 (TS5.1): 488 PPIs between non-human animal and any virus. Test dataset 5.2 (TS5.2): 17 PPIs between plant and any virus. Test dataset 5.3 (TS5.3): 143 PPIs between bacteria and any virus. Test dataset 5.4 (TS5.4): 666 PPIs between non-human host and any virus (combined set of test datasets 5.1, 5.2, 5.3 and 18 PPIs with 15 other hosts)
The average sequence similarity between proteins in training datasets and those in test datasets
| Average | ||
|---|---|---|
| Proteins in training datasets | Target proteins in test datasets | sequence |
| similarity | ||
| 766 virus proteins in TR1,TR3 | 11 H1N1 virus proteins in TS1 | 9.6% |
| 774 virus proteins in TR2,TR4 | 3 Ebola virus proteins in TS2 | 10.9% |
| 3,924 human proteins in TR5 | 368 non-human animal proteins in TS5.1 | 10.7% |
| 3,924 human proteins in TR5 | 13 plant proteins in TS5.2 | 10.6% |
| 3,924 human proteins in TR5 | 106 bacteria proteins in TS5.3 | 10.4% |
Fig. 3An example of a feature vector for a pair of host and virus proteins. RFAT: relative frequency of amino acid triplets. FDAT: frequency difference of amino acid triplets between virus and host proteins. AC: amino acid composition. A pair of host and virus proteins is represented by a feature vector with 1175 elements
Results of 10–fold cross validation of SVM model on 12,157 PPIs between any host-virus PPIs with different ratios of positive to negative instances
| P:N | Dataset | SN(%) | SP(%) | ACC(%) | PPV(%) | NPV(%) | MCC | AUC |
|---|---|---|---|---|---|---|---|---|
| 1 | 84.93 | 86.03 | 85.48 | 85.87 | 85.09 | 0.709 | 0.926 | |
| 1:1 | 2 | 84.92 | 86.06 | 85.49 | 85.89 | 85.09 | 0.701 | 0.926 |
| 3 | 85.36 | 85.92 | 85.64 | 85.84 | 85.44 | 0.712 | 0.925 | |
| mean ± SD | 85.07 ± 0.3 | 86.00 ± 0.1 | 85.54 ± 0.1 | 85.87 ± 0.0 | 85.21 ± 0.2 | 0.71 ± 0.0 | 0.93 ± 0.0 | |
| 1 | 78.91 | 91.17 | 87.08 | 81.72 | 89.64 | 0.707 | 0.923 | |
| 1:2 | 2 | 78.29 | 91.03 | 86.78 | 81.36 | 89.34 | 0.700 | 0.921 |
| 3 | 78.22 | 91.18 | 86.86 | 81.59 | 89.33 | 0.701 | 0.920 | |
| mean ± SD | 78.47 ± 0.4 | 91.13 ± 0.1 | 86.91 ± 0.2 | 81.56 ± 0.2 | 89.44 ± 0.2 | 0.70 ± 0.0 | 0.92 ± 0.0 | |
| 1 | 74.55 | 93.32 | 88.63 | 78.82 | 91.66 | 0.691 | 0.920 | |
| 1:3 | 2 | 74.61 | 93.56 | 88.82 | 79.43 | 91.70 | 0.696 | 0.919 |
| 3 | 74.62 | 93.41 | 88.72 | 79.07 | 91.69 | 0.693 | 0.920 | |
| mean ± SD | 74.59 ± 0.0 | 93.43 ± 0.1 | 88.72 ± 0.1 | 79.11 ± 0.3 | 91.68 ± 0.0 | 0.69 ± 0.0 | 0.92 ± 0.0 |
SN: sensitivity, SP: specificity, ACC: accuracy, PPV: positive predictive value, NPV: negative predictive value, MCC: Matthews correlation coefficient, AUC: the area under the ROC
Results of 10-fold cross validation with datasets of virus-host PPIs using different combinations of features
| Features | SN(%) | SP(%) | ACC(%) | PPV(%) | NPV(%) | MCC | AUC |
|---|---|---|---|---|---|---|---|
| RFAT | 82.85 | 84.04 | 83.45 | 83.84 | 83.05 | 0.668 | 0.903 |
| FDAT | 68.34 | 57.84 | 63.11 | 61.86 | 64.65 | 0.264 | 0.689 |
| AC | 59.85 | 68.11 | 63.98 | 65.24 | 62.92 | 0.281 | 0.698 |
| Composition | 71.79 | 55.79 | 63.79 | 61.89 | 66.42 | 0.279 | 0.685 |
| Transition | 74.05 | 55.72 | 64.88 | 62.58 | 68.23 | 0.302 | 0.713 |
| Distribution | 71.79 | 31.55 | 51.67 | 51.19 | 52.80 | 0.036 | 0.515 |
| RFAT+FDAT+AC | 84.73 | 85.62 | 85.18 | 85.49 | 84.86 | 0.703 | 0.920 |
| Composition+Transition +Distribution | 76.51 | 61.72 | 69.12 | 66.65 | 72.43 | 0.386 | 0.787 |
| All 6 features | 85.36 | 85.92 | 85.64 | 85.84 | 85.44 | 0.712 | 0.925 |
SN: sensitivity, SP: specificity, ACC: accuracy, PPV: positive predictive value, NPV: negative predictive value, MCC: Matthews correlation coefficient, AUC: area under the ROC
Results of testing the prediction model on PPIs of new viruses
| Dataset | SN(%) | SP(%) | ACC(%) | PPV(%) | NPV(%) | MCC | AUC |
|---|---|---|---|---|---|---|---|
| TR1–TS1 | 89.76 | 66.14 | 77.95 | 72.61 | 86.60 | 0.575 | 0.886 |
| TR2–TS2 | 90.67 | 65.33 | 78.00 | 72.34 | 87.50 | 0.579 | 0.867 |
| TR3–TS1 | 88.98 | 65.88 | 77.43 | 72.28 | 85.67 | 0.564 | 0.884 |
| TR4–TS2 | 94.67 | 68.67 | 81.67 | 75.13 | 92.79 | 0.656 | 0.890 |
TR1: training dataset of PPIs between human and any virus except H1N1. TS1: test dataset of PPIs between human and H1N1 virus. TR2: training dataset of PPIs between human and any virus except Ebola virus. TS2: test dataset of PPIs between human and Ebola virus. TR3: training dataset of PPIs between any host and any virus except H1N1. TR4: training dataset of PPIs between any host and any virus except Ebola virus. SN: sensitivity, SP: specificity, ACC: accuracy, PPV: positive predictive value, NPV: negative predictive value, MCC: Matthews correlation coefficient, AUC: area under the ROC
Results of testing the prediction models trained with human-virus PPIs (TR5) on PPIs of new hosts
| Dataset | SN(%) | SP(%) | ACC(%) | PPV(%) | NPV(%) | MCC | AUC |
|---|---|---|---|---|---|---|---|
| TR5–TS5.1 | 66.39 | 65.98 | 66.19 | 66.12 | 66.26 | 0.324 | 0.733 |
| TR5–TS5.2 | 76.47 | 58.82 | 67.65 | 65.00 | 71.43 | 0.359 | 0.761 |
| TR5–TS5.3 | 59.44 | 74.83 | 67.13 | 70.25 | 64.85 | 0.347 | 0.736 |
| TR5–TS5.4 | 64.87 | 67.87 | 66.37 | 66.87 | 65.89 | 0.327 | 0.731 |
TS5.1: test dataset of PPIs between non-human animal and any virus. TS5.2: test dataset of PPIs between plant and any virus. TS5.3: test dataset of PPIs between bacteria and any virus. TS5.4: test dataset of PPIs between any non-human host (non-human animal, plant, bacteria and 15 other hosts) and any virus. SN: sensitivity, SP: specificity, ACC: accuracy, PPV: positive predictive value, NPV: negative predictive value, MCC: Matthews correlation coefficient, AUC: the area under the ROC
Results of testing our SVM and DeNovo’s SVM [6] on DeNovo’s dataset of 425 positive and 425 negative PPIs
| SN(%) | SP(%) | ACC(%) | PPV(%) | NPV(%) | MCC | AUC | |
|---|---|---|---|---|---|---|---|
| Our SVM | 80.00 | 88.94 | 84.47 | 87.86 | 81.64 | 0.692 | 0.897 |
| DeNovo’s SVM | 80.71 | 83.06 | 81.90 | – | – | – | – |
SN: sensitivity, SP: specificity, ACC: accuracy, PPV: positive predictive value, NPV: negative predictive value, MCC: Matthews correlation coefficient, AUC: the area under the ROC, “–”: not available
Results of 5-fold cross validation of our SVM and Barman’s SVM [14] with Barman’s dataset of 1035 positive and 1035 negative PPIs
| SN(%) | SP(%) | ACC(%) | PPV(%) | NPV(%) | MCC | AUC | F1(%) | |
|---|---|---|---|---|---|---|---|---|
| Our SVM | 76.14 | 83.77 | 79.95 | 82.46 | 77.80 | 0.601 | 0.858 | 79.17 |
| Barman’s SVM | 67.00 | 74.00 | 71.00 | 72.00 | – | 0.440 | 0.730 | 69.41 |
SN: sensitivity, SP: specificity, ACC: accuracy, PPV: positive predictive value, NPV: negative predictive value, MCC: Matthews correlation coefficient, AUC: the area under the ROC, F1 = 2x(SNxPPV)/(SN+PPV), “–”: not available