| Literature DB >> 35761175 |
Xue Li1, Peifu Han1, Gan Wang1, Wenqi Chen1, Shuang Wang1, Tao Song2.
Abstract
BACKGROUND: Protein-protein interactions (PPIs) dominate intracellular molecules to perform a series of tasks such as transcriptional regulation, information transduction, and drug signalling. The traditional wet experiment method to obtain PPIs information is costly and time-consuming. RESULT: In this paper, SDNN-PPI, a PPI prediction method based on self-attention and deep learning is proposed. The method adopts amino acid composition (AAC), conjoint triad (CT), and auto covariance (AC) to extract global and local features of protein sequences, and leverages self-attention to enhance DNN feature extraction to more effectively accomplish the prediction of PPIs. In order to verify the generalization ability of SDNN-PPI, a 5-fold cross-validation on the intraspecific interactions dataset of Saccharomyces cerevisiae (core subset) and human is used to measure our model in which the accuracy reaches 95.48% and 98.94% respectively. The accuracy of 93.15% and 88.33% are obtained in the interspecific interactions dataset of human-Bacillus Anthracis and Human-Yersinia pestis, respectively. In the independent data set Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, all prediction accuracy is 100%, which is higher than the previous PPIs prediction methods. To further evaluate the advantages and disadvantages of the model, the one-core and crossover network are conducted to predict PPIs, and the data show that the model correctly predicts the interaction pairs in the network.Entities:
Keywords: Deep learning; Deep neural network; Protein-protein interactions; Self-attention
Mesh:
Year: 2022 PMID: 35761175 PMCID: PMC9235110 DOI: 10.1186/s12864-022-08687-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Compositions of the four benchmark data sets
| Data sets | Interaction pairs | Noninteraction pairs | Protein pairs |
|---|---|---|---|
| S.cerevisiae(core subset) | 5594 | 5594 | 11188 |
| Human | 3899 | 4262 | 8161 |
| Human-B.Anthracis | 3094 | 9500 | 12594 |
| Human-Y.pestis | 4097 | 12500 | 16597 |
| Saccharomyces cerevisiae | 17257 | 48594 | 65551 |
Classification of amino acids based on amino acid side chains and dipole volume
| Cluster | Amino acid |
|---|---|
| 1 | A, G, V |
| 2 | I, L, F, P |
| 3 | Y, M, T, S |
| 4 | H, N, Q, W |
| 5 | R, K |
| 6 | D, E |
| 7 | C |
Fig. 1Neural network procedure
Fig. 2Networks with dropout
Fig. 3Model of Self-attention
Fig. 4SDNN-PPI for prediction of protein-protein interaction
Performance of different coding methods
| Encoding methods | Length | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|---|
| ACC+CT+LD+AC | 1203 | 92.31 ± 0.66 | 94.37 ± 0.25 | 90.26 ± 1.28 | 94.13 ± 0.25 | 84.70 ± 1.27 | 97.00 |
| ACC+CT+LD | 993 | 92.00 ± 0.66 | 93.31 ± 1.04 | 90.69 ± 0.57 | 93.14 ± 1.01 | 84.03 ± 1.33 | 97.03 |
| ACC+CT+AC | 573 | ||||||
| ACC+LD+AC | 860 | 91.41 ± 0.52 | 93.06 ± 0.87 | 89.76 ± 0.64 | 92.83 ± 0.86 | 82.87 ± 1.06 | 96.58 |
| CT+LD+AC | 1183 | 89.50 ± 0.68 | 90.97 ± 0.58 | 88.02 ± 1.47 | 90.70 ± 0.50 | 79.04 ± 1.33 | 95.62 |
| ACC+CT | 363 | 89.79 ± 0.65 | 90.79 ± 0.81 | 88.79 ± 0.87 | 90.61 ± 0.76 | 79.61 ± 1.29 | 95.78 |
| ACC+LD | 650 | 88.93 ± 0.43 | 89.67 ± 1.37 | 88.20 ± 1.53 | 89.54 ± 1.14 | 77.91 ± 0.88 | 95.05 |
| ACC+AC | 230 | 85.74 ± 1.80 | 87.20 ± 1.74 | 84.29 ± 2.66 | 86.82 ± 1.74 | 71.54 ± 3.60 | 92.59 |
| CT+LD | 973 | 90.47 ± 0.52 | 92.76 ± 0.57 | 88.18 ± 0.93 | 92.42 ± 0.55 | 81.03 ± 1.02 | 95.97 |
| CT+AC | 553 | 89.06 ± 0.55 | 90.70 ± 0.87 | 87.43 ± 1.34 | 90.40 ± 0.73 | 78.19 ± 1.07 | 94.74 |
| LD+AC | 840 | 91.44 ± 0.44 | 92.72 ± 0.72 | 90.17 ± 0.56 | 92.54 ± 0.69 | 82.92 ± 0.89 | 96.60 |
Comparison among different layer architectures for SDNN-PPI on S.cerevisiae(core subset)
| Architectures | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| DNN-PPI a | 94.9 | 96.35 | 95.83 | 96.25 | 89.84 | 98.54 |
| DNN-PPI b | 91.78 | 93.56 | 89.99 | 93.33 | 83.62 | 97.05 |
| SDNN-PPI a | 95.16 | 96.96 | 93.37 | 96.86 | 90.4 | 98.53 |
| SDNN-PPI b | 95.21 | 96.98 | 93.44 | 96.87 | 90.48 | 98.56 |
| SDNN-PPI |
Prediction results of S.cerevisiae (core subset) under five-fold cross-validation
| testing set | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| 1 | 95.44 | 97.59 | 93.48 | 97.48 | 90.97 | 98.42 |
| 2 | 95.17 | 96.69 | 94.1 | 96.59 | 90.39 | 98.92 |
| 3 | 95.13 | 97.32 | 93.3 | 97.20 | 90.35 | 98.28 |
| 4 | 95.49 | 96.37 | 94.36 | 96.27 | 91.98 | 98.74 |
| 5 | 96.16 | 98.21 | 93.74 | 98.14 | 92.39 | 98.80 |
| average | 95.48 ± 0.37 | 97.23 ± 0.66 | 93.80 ± 0.39 | 97.13 ± 0.66 | 91.02 ± 0.74 | 98.63 |
Prediction results of Human data set under five-fold cross-validation
| Testing set | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| 1 | 98.78 | 99.49 | 98.08 | 98.97 | 97.06 | 99.67 |
| 2 | 99.29 | 98.85 | 99.49 | 99.23 | 98.46 | 99.63 |
| 3 | 98.85 | 99.10 | 98.46 | 98.97 | 97.56 | 99.51 |
| 4 | 98.78 | 99.36 | 98.72 | 99.1 | 97.95 | 99.72 |
| 5 | 98.97 | 98.84 | 99.10 | 98.83 | 96.8 | 99.46 |
| average | 98.94 ± 0.19 | 99.10 ± 0.24 | 98.77 ± 0.49 | 99.02 ± 0.13 | 97.57 ± 0.60 | 99.60 |
Prediction results of Human-B.Anthracis data set under five-fold cross-validation
| Testing set | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| 1 | 91.44 | 85.14 | 97.74 | 86.8 | 83.54 | 97.93 |
| 2 | 93.78 | 93.05 | 94.51 | 93.15 | 87.57 | 98.65 |
| 3 | 92.49 | 87.72 | 97.25 | 88.79 | 85.36 | 98.03 |
| 4 | 94.26 | 90.78 | 97.74 | 91.39 | 88.73 | 98.22 |
| 5 | 93.78 | 91.76 | 95.79 | 92.07 | 87.62 | 98.32 |
| average | 93.15 ± 1.03 | 89.69 ± 2.88 | 96.61 ± 1.27 | 90.44 ± 2.32 | 86.57 ± 1.87 | 98.23 |
Prediction results of Human-Y.pestis data set under five-fold cross-validation
| Testing set | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| 1 | 85.91 | 75.98 | 95.85 | 79.94 | 73.28 | 95.73 |
| 2 | 88.16 | 83.9 | 92.43 | 85.15 | 76.61 | 95.39 |
| 3 | 88.16 | 80.83 | 95.49 | 83.3 | 77.16 | 95.73 |
| 4 | 91.27 | 89.26 | 93.29 | 89.68 | 82.62 | 96.31 |
| 5 | 88.16 | 83.76 | 92.55 | 85.07 | 76.61 | 95.55 |
| average | 88.33 ± 1.71 | 82.74 ± 4.34 | 93.92 ± 6.06 | 84.63 ± 1.85 | 77.26 ± 3.79 | 95.74 |
Comparison results of different PPIs prediction methods on S.cerevisiae (core subset)
| Methods | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| DeepPPI[ | 94.43 ± 0.30 | N/A | 92.06 ± 0.36 | 96.65 ± 0.59 | 88.97 ± 0.62 | N/A |
| DeepFE-PPI[ | 94.78 ± 0.61 | N/A | 92.99 ± 0.66 | 96.45 ± 0.87 | 89.62 ± 1.23 | N/A |
| LightGBM-PPI[ | 95.07 | 97.94 | 92.21 | 97.82 | 90.30 | N/A |
| Bio2Vec[ | 93.30 | N/A | 92.70 | 93.55 | 87.49 | 97.20 |
| StackPPI[ | 94.64 | 96.46 | 92.81 | 96.33 | 89.34 | N/A |
| GTB-PPI[ | 95.15 ± 0.25 | N/A | 92.21 ± 0.36 | 97.97 ± 0.60 | 90.45 ± 0.53 | N/A |
| AE-LGBM[ | 95.40 ± 0.20 | 92.10 ± 0.30 | N/A | 91.00 ± 0.40 | N/A | |
| GcForest-PPI[ | 95.44 ± 0.18 | N/A | 92.72 ± 0.44 | 91.02 ± 0.35 | N/A | |
| SDNN-PPI | 97.23 ± 0.66 | 97.13 ± 0.66 |
Comparison results of different PPIs prediction methods on Human
| Methods | ACC(%) | Spec(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| RPEC[ | 96.59 | N/A | 96.72 | 96.18 | 93.18 | N/A |
| Bio2Vec[ | 97.31 | N/A | 96.28 | 98.48 | 94.76 | |
| GWOSVM[ | 94.56 | N/A | 95.55 | 93.08 | 89.51 | N/A |
| DeepFE-PPI[ | 98.71 ± 0.30 | N/A | 98.54 ± 0.55 | 98.77 ± 0.53 | 97.43 ± 0.61 | N/A |
| AE-LGBM[ | 98.70 ± 0.10 | 98.10 ± 0.20 | N/A | 97.30 ± 0.30 | N/A | |
| AE-AC[ | 97.19 | 98.06 | 96.34 | N/A | N/A | N/A |
| SDNN-PPI | 99.10 ± 0.24 | 99.60 |
Comparison results of different PPIs prediction methods on Human-B.Anthracis
| Methods | ACC(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| LBE-BN[ | 78.70 | 73.00 | 42.00 | 43.40 | 83.70 |
| LBE-NB[ | 82.50 | 53.80 | 47.80 | 39.70 | 82.10 |
| LBE-RF[ | 85.40 | 24.0 | 67.00 | 34.00 | 86.80 |
| ACC-BN[ | 77.40 | 51.70 | 37.30 | 30.30 | 79.00 |
| LBE-j48[ | 80.06 | 31.20 | 39.60 | 23.90 | 54.10 |
| LD-DNN[ | 91.70 | 89.50 | 83.50 | 96.37 | |
| SDNN-PPI | 90.44 |
Comparison results of different PPIs prediction methods on Human-Y.pestis
| Methods | ACC(%) | Sens(%) | Prec(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| LBE-BN[ | 76.10 | 73.50 | 38.60 | 40.10 | 81.30 |
| LBE-NB[ | 80.90 | 45.50 | 43.2 | 32.80 | 78.60 |
| LBE-RF[ | 84.6 | 16.00 | 66.30 | 27.30 | 83.50 |
| ACC-BN[ | 80.00 | 52.40 | 42.10 | 34.90 | 75.60 |
| LBE-j48[ | 80.10 | 27.90 | 37.10 | 20.80 | 51.70 |
| LD-DNN[ | 87.30 | 84.20 | 74.90 | 94.99 | |
| SDNN-PPI | 84.63 |
Prediction results of four data sets in kappa coefficient
| Data sets | S.cerevisiae(core subset) | Human | Human-B.Anthracis | Human-Y.pestis |
|---|---|---|---|---|
| kappa | 0.91 | 0.98 | 0.85 | 0.76 |
Comparison of ACC of different PPIs prediction methods on independent test sets
| Methods/Species | C.elegans | E.coli | H.sapiens | M.musculus |
|---|---|---|---|---|
| test pairs | 4013 | 6984 | 1412 | 313 |
| DeepPPI[ | 94.84 | 92.19 | 93.77 | 91.37 |
| DeepFE-PPI[ | ||||
| LightGBM-PPI[ | 90.16 | 92.16 | 94.83 | 94.57 |
| StackPPI[ | 97.11 | 98.71 | 97.66 | 98.40 |
| GcForest-PPI[ | 96.01 | 96.3 | 98.58 | 99.04 |
| GTB-PPI[ | 92.42 | 94.06 | 97.38 | 98.08 |
| AE-LGBM[ | 90.10 | 92.10 | 94.80 | 94.50 |
| SDNN-PPI |
Fig. 5The predicted results of PPIs networks of a one-core network for CD9
Fig. 6The predicted results of a crossover network for the Wnt-related pathway
Performance of different methods on PPI network
| LightGBM-PPI[ | StackPPI[ | GTB-PPI[ | AE-LGBM[ | GcForest-PPI[ | SDNN-PPI | |
|---|---|---|---|---|---|---|
| CD9 | 15/16 | N/A | 15/16 | 16/16 | 16/16 | |
| Wnt | 89/96 | 93/96 | 92/96 | 95/96 | 94/96 |