| Literature DB >> 32626745 |
Lei Yang1,2, Yukun Han1, Huixue Zhang1, Wenlong Li3, Yu Dai3.
Abstract
Protein-protein interactions (PPIs) are important for almost all cellular processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. The experimental methods for identifying PPIs are always time-consuming and expensive. Therefore, it is important to develop computational approaches for predicting PPIs. In this paper, an improved model is proposed to use a machine learning method in the study of protein-protein interactions. With the consideration of the factors affecting the prediction of the PPIs, a method of feature extraction and fusion is proposed to improve the variety of the features to be considered in the prediction. Besides, with the consideration of the effect affected by the different input order of the two proteins, we propose a "Y-type" Bi-RNN model and train the network by using a method which both needs backward and forward training. In order to insure the training time caused on the extra training either a backward one or a forward one, this paper proposes a weight-sharing policy to minimize the parameters in the training. The experimental results show that the proposed method can achieve an accuracy of 99.57%, recall of 99.36%, sensitivity of 99.76%, precision of 99.74%, MCC of 99.14%, and AUC of 99.56% under the benchmark dataset.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32626745 PMCID: PMC7312734 DOI: 10.1155/2020/5072520
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Benchmark dataset.
| Dataset | Positive samples | Negative samples | Total |
|---|---|---|---|
| Benchmark set | 29071 | 31496 | 60567 |
| Training set | 26128 | 28439 | 54567 |
| Hold-out test set | 2943 | 3057 | 6000 |
Species dataset.
| Species dataset | Positive samples | Negative samples | Total |
|---|---|---|---|
|
| |||
| Original set | 37027 | 37027 | 74054 |
| Training set | 29622 | 29622 | 59244 |
| Testing set | 7405 | 7405 | 14810 |
|
| |||
| Original set | 5943 | 5943 | 11886 |
| Training set | 4754 | 4754 | 9508 |
| Testing set | 1189 | 1189 | 2378 |
|
| |||
| Original set | 6954 | 6954 | 13908 |
| Training set | 5023 | 5023 | 10046 |
| Testing set | 1931 | 1931 | 3862 |
|
| |||
| Original set | 4030 | 4030 | 8060 |
| Training set | 3224 | 3224 | 6448 |
| Testing set | 806 | 806 | 1612 |
|
| |||
| Original set | 1458 | 1458 | 2916 |
| Training set | 1116 | 1116 | 2332 |
| Testing set | 342 | 342 | 684 |
|
| |||
| Original set | 22683 | — | 22683 |
Physicochemical properties of 20 amino acids.
| Code |
|
|
|
|
| SASA | NCI |
|
|---|---|---|---|---|---|---|---|---|
| A | 0.62 | -0.5 | 27.5 | 8.1 | 0.046 | 1.181 | 0.007187 | 12.772 |
| C | 0.29 | -1 | 44.6 | 5.5 | 0.128 | 1.461 | -0.03661 | 10.4312 |
| D | -0.9 | 3 | 40 | 13 | 0.105 | 1.587 | -0.02382 | 8.4134 |
| E | -0.74 | 3 | 62 | 12.3 | 0.151 | 1.862 | 0.006802 | 9.1455 |
| F | 1.19 | -2.5 | 115.5 | 5.2 | 0.29 | 2.228 | 0.037552 | 11.6877 |
| G | 0.48 | 0 | 0 | 9 | 0 | 0.881 | 0.179052 | 12.742 |
| H | -0.4 | -0.5 | 79 | 10.4 | 0.23 | 2.025 | -0.01069 | 12.669 |
| I | 1.38 | -1.8 | 93.5 | 5.2 | 0.186 | 1.81 | 0.021631 | 12.5099 |
| K | -1.5 | 3 | 100 | 11.3 | 0.219 | 2.258 | 0.017708 | 15.9477 |
| L | 1.06 | -1.8 | 93.5 | 4.9 | 0.186 | 1.931 | 0.051672 | 12.4699 |
| M | 0.64 | -1.3 | 94.1 | 5.7 | 0.221 | 2.034 | 0.002683 | 11.6655 |
| N | -0.78 | 2 | 58.7 | 11.6 | 0.134 | 1.655 | 0.005392 | 11.3355 |
| P | 0.12 | 0 | 41.9 | 8 | 0.131 | 1.468 | 0.239531 | 11.9434 |
| Q | -0.85 | 0.2 | 80.7 | 10.5 | 0.18 | 1.932 | 0.049211 | 11.8677 |
| R | -2.53 | 3 | 105 | 10.5 | 0.291 | 2.56 | 0.043587 | 15.839 |
| S | -0.18 | 0.3 | 29.3 | 9.2 | 0.062 | 1.298 | 0.004627 | 11.8877 |
| T | -0.05 | -0.4 | 51.3 | 8.6 | 0.108 | 1.525 | 0.003352 | 12.0855 |
| V | 1.08 | -1.5 | 71.5 | 5.9 | 0.14 | 1.645 | 0.057004 | 12.1677 |
| W | 0.81 | -3.4 | 145.5 | 5.4 | 0.409 | 2.663 | 0.037977 | 12.662 |
| Y | 0.26 | -2.3 | 117.3 | 6.2 | 0.298 | 2.368 | 0.023599 | 11.8677 |
Figure 1“Y-type” Bi-RNN model diagram of local weight sharing.
Figure 2Schematic diagram of LSTM neurons.
Figure 3Schematic diagram of forward and backward model training process.
Performances of deep neural network with local weight sharing.
| Test set | Accuracy (%) | Recall (%) | Sensitivity (%) | Precision (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| 1 | 99.88 | 99.87 | 99.88 | 99.87 | 99.75 | 99.88 |
| 2 | 99.88 | 99.75 | 100.00 | 100.00 | 99.75 | 99.87 |
| 3 | 99.57 | 99.21 | 99.88 | 99.87 | 99.14 | 99.55 |
| 4 | 99.88 | 99.13 | 99.88 | 99.87 | 99.02 | 99.50 |
| 5 | 99.82 | 99.62 | 100.00 | 100.00 | 99.63 | 99.81 |
| Hold-out | 99.57 | 99.36 | 99.76 | 99.74 | 99.14 | 99.56 |
Performances with different proportions of training and testing sets.
| Train/test | Accuracy (%) | Recall (%) | Sensitivity (%) | Precision (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| 0.3/0.7 | 99.71 | 99.07 | 99.84 | 99.83 | 98.94 | 99.46 |
| 0.25/0.75 | 99.95 | 100.00 | 99.90 | 99.90 | 99.90 | 99.95 |
| 0.2/0.8 | 99.88 | 99.87 | 99.88 | 99.87 | 99.75 | 99.88 |
| 0.1/0.9 | 99.75 | 99.49 | 100.00 | 100.00 | 99.51 | 99.74 |
Performance comparisons on datasets for other species.
| Species | Test set | Accuracy (%) | Recall (%) | Sensitivity (%) | Precision (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|---|
|
| 1 | 95.29 | 91.22 | 99.35 | 99.29 | 90.88 | 95.29 |
| 2 | 95.40 | 91.65 | 99.14 | 99.07 | 91.05 | 95.39 | |
| 3 | 95.40 | 91.94 | 98.85 | 98.76 | 91.01 | 95.39 | |
| 4 | 95.04 | 90.71 | 99.35 | 99.29 | 90.41 | 95.03 | |
| 5 | 95.22 | 91.07 | 99.35 | 99.29 | 90.75 | 95.21 | |
| Hold-out | 95.04 | 90.71 | 99.35 | 99.29 | 90.41 | 95.03 | |
|
| |||||||
|
| 1 | 98.33 | 96.81 | 99.76 | 99.74 | 96.68 | 98.28 |
| 2 | 98.20 | 96.56 | 99.76 | 99.74 | 96.44 | 98.16 | |
| 3 | 98.14 | 96.43 | 99.76 | 99.74 | 96.32 | 98.09 | |
| 4 | 98.33 | 97.19 | 99.40 | 99.35 | 96.67 | 98.30 | |
| 5 | 98.45 | 97.32 | 99.52 | 99.48 | 96.92 | 98.42 | |
| Hold-out | 98.14 | 96.81 | 99.40 | 99.35 | 96.30 | 98.10 | |
|
| |||||||
|
| 1 | 99.94 | 99.83 | 100.00 | 100.00 | 99.87 | 99.92 |
| 2 | 99.99 | 99.97 | 100.00 | 100.00 | 99.98 | 99.99 | |
| 3 | 99.88 | 99.83 | 99.91 | 99.83 | 99.74 | 99.87 | |
| 4 | 99.99 | 99.97 | 100.00 | 100.00 | 99.98 | 99.99 | |
| 5 | 99.96 | 99.92 | 99.98 | 99.97 | 99.91 | 99.95 | |
| Hold-out | 99.86 | 99.61 | 100.00 | 100.00 | 99.70 | 99.80 | |
|
| |||||||
|
| 1 | 99.93 | 99.88 | 99.97 | 99.97 | 99.85 | 99.93 |
| 2 | 99.92 | 99.86 | 99.97 | 99.97 | 99.84 | 99.92 | |
| 3 | 99.87 | 99.84 | 99.91 | 99.90 | 99.74 | 99.87 | |
| 4 | 99.89 | 99.84 | 99.93 | 99.93 | 99.77 | 99.88 | |
| 5 | 99.86 | 99.82 | 99.91 | 99.90 | 99.73 | 99.86 | |
| Hold-out | 99.94 | 99.88 | 100.00 | 100.00 | 99.88 | 99.94 | |
Figure 4Experimental results of different feature extraction and fusion methods in a support vector machine.
Comparison with other methods.
| Method | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| Our work | 99.57 | 99.76 | 99.74 | 99.14 |
| Work in Ref. [ | 98.78 | 98.23 | 98.61 | 97.57 |
| Work in Ref. [ | 97.38 | 94.76 | 100.00 | 94.89 |
| Work in Ref. [ | 94.43 | 96.65 | 94.38 | 88.97 |
| Work in Ref. [ | 93.92 | 91.10 | 96.45 | 88.56 |
| Work in Ref. [ | 92.65 | 92.63 | 92.67 | 86.40 |
| Work in Ref. [ | 92.05 | 88.82 | 95.87 | 86.09 |
| Work in Ref. [ | 89.33 | 88.87 | 89.93 | N/A |
| Work in Ref. [ | 83.35 | 92.95 | 83.32 | 63.77 |
Comparison time performance with other deep learning methods.
| Method | Our work | 5-layer fully connected neural network | DeepPPI [ | Deep neural network without local weight sharing |
|---|---|---|---|---|
| Time (seconds) | 312 | 309 | 382 | 563 |
Training other models by using the features of forward and backward reconstruction.
| Method | Test set | Accuracy (%) | Recall (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|---|---|
| Xu et al.'s work [ | 1 | 84.77 | 67.50 | 94.06 | 85.95 | 65.78 |
| 2 | 84.88 | 67.53 | 94.04 | 85.68 | 65.78 | |
| 3 | 84.88 | 67.68 | 94.53 | 85.33 | 65.47 | |
| 4 | 84.85 | 68.01 | 93.65 | 84.85 | 65.56 | |
| 5 | 84.57 | 66.41 | 94.07 | 85.42 | 64.93 | |
| Average | 84.79 | 67.43 | 94.07 | 85.45 | 65.50 | |
| Average before improvement | 83.35 | 65.45% | 92.95 | 83.32 | 63.77 |
Performances on the cross-species validations.
| Training set | Test set | Accuracy (%) |
|---|---|---|
|
|
| 98.39 |
|
| 95.75 | |
|
| 91.11 | |
|
| 86.15 | |
|
| 50.20 | |
|
| ||
|
|
| 96.23 |
|
| 55.00 | |
|
| 52.97 | |
|
| 50.19 | |
|
| 49.72 | |
|
| ||
|
|
| 97.23 |
|
| 55.32 | |
|
| 51.19 | |
|
| 49.72 | |
|
| 65.97 | |
|
| ||
|
|
| 95.23 |
|
| 51.66 | |
|
| 50.24 | |
|
| 44.12 | |
|
| 43.81 | |