| Literature DB >> 31173605 |
Xue Wang1,2,3, Yuejin Wu1,2, Rujing Wang2,3, Yuanyuan Wei1, Yuanmiao Gui1,2.
Abstract
Protein-protein interactions (PPIs) play an important role in the life activities of organisms. With the availability of large amounts of protein sequence data, PPIs prediction methods have attracted increasing attention. A variety of protein sequence coding methods have emerged, but the training of these methods is particularly time consuming. To solve this issue, we have proposed a novel matrix sequence coding method. Based on deep neural network (DNN) and a novel matrix protein sequence descriptor, we constructed a protein interaction prediction model for predicting PPIs. When performed on human PPIs data, the method achieved an accuracy of 94.34%, a recall of 98.28%, an area under the curve (AUC) of 97.79% and a loss of 23.25%. A non-redundant dataset was used to evaluate this prediction model, and the prediction accuracy is 88.29%. These results indicate that the matrix of sequence (MOS) descriptor can enhance the predictive power of PPIs and reduce training time, which can be a useful complement for future proteomics research. The experimental code and experimental results can be found at https://github.com/smalltalkman/hppi-tensorflow.Entities:
Mesh:
Year: 2019 PMID: 31173605 PMCID: PMC6555512 DOI: 10.1371/journal.pone.0217312
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Amino acid classification based on their dipole and side chain volumes.
| Number | Amino Acids |
|---|---|
| 1 | A, G, V |
| 2 | I, L, F, P |
| 3 | Y, M, T, S |
| 4 | H, N, Q, W |
| 5 | R, K |
| 6 | D, E |
| 7 | C |
Fig 1Neural network training procedure.
Adjusting the learning rate of our model.
| Learning rate | Accuracy (%) | AUC (%) | Recall (%) | Loss (%) | Train time(s/100 steps) |
|---|---|---|---|---|---|
| 0.01 | 0.7926±0.0256 | 0.8807±0.023 | 0.8923±0.0239 | 0.4373±0.0377 | 0.1296±0.001 |
| 0.001 | 0.7553±0.0377 | 0.8495±0.0367 | 0.8603±0.0393 | 0.4900±0.0489 | 0.1263±0.0012 |
| 0.0001 | 0.6784±0.0273 | 0.7584±0.0307 | 0.7616±0.0337 | 0.5871±0.0271 | 0.1307±0.0011 |
| 0.00001 | 0.6363±0.0076 | 0.6921±0.0096 | 0.687±0.01 | 0.6443±0.0193 | 0.1303±0.001 |
Adjusting of the network width of our model.
| Width | Accuracy (%) | AUC (%) | Recall (%) | Loss (%) | Train time(s/100 steps) |
|---|---|---|---|---|---|
| 128 | 0.7191±0.02 | 0.8139±0.0169 | 0.8232±0.0182 | 0.542±0.0236 | 0.1594±0.0034 |
| 256 | 0.7476±0.0199 | 0.8455±0.0161 | 0.8618±0.0182 | 0.4966±0.0276 | 0.1875±0.0014 |
| 512 | 0.8277±0.0352 | 0.9107±0.0296 | 0.9211±0.0288 | 0.3826±0.0527 | 0.2497±0.0026 |
| 1024 | 0.8234±0.0317 | 0.9081±0.0283 | 0.9192±0.0278 | 0.3882±0.0484 | 0.3483±0.0034 |
Adjusting the network width of our model.
| Depth | Accuracy (%) | AUC (%) | Recall (%) | Loss (%) | Train time(s/100 steps) |
|---|---|---|---|---|---|
| 512×512 | 0.8262±0.0376 | 0.913±0.0316 | 0.9241±0.0303 | 0.3904±0.0513 | 0.8478±0.0081 |
| 512×512×512 | 0.9159±0.0563 | 0.9621±0.0379 | 0.9689±0.035 | 0.2598±0.075 | 1.4374±0.0159 |
| 512×512×512×512 | 0.7988±0.0407 | 0.891±0.0335 | 0.9068±0.0333 | 0.428±0.0544 | 2.0333±0.0087 |
| 512×512×512×512×512 | 0.7104±0.0898 | 0.7976±0.1201 | 0.8447±0.0483 | 0.5276±0.0785 | 2.6208±0.0126 |
Results based on DNN with CT, AC, LD, and MOS on the benchmark dataset.
| Method | Accuracy | Recall | AUC | Loss |
|---|---|---|---|---|
| DNN-CT | 0.9711±0.0038 | 0.9891±0.0009 | 0.9835±0.0018 | 0.2747±0.0686 |
| DNN-AC | 0.9684±0.0013 | 0.9867±0.0013 | 0.9802±0.0022 | 0.6591±0.3178 |
| DNN-LD | 0.953±0.0087 | 0.9828±0.003 | 0.9757±0.0043 | 0.3623±0.0924 |
| DNN-MOS | 0.9434±0.0078 | 0.9828±0.0023 | 0.9779±0.0028 | 0.2325±0.0154 |
Results based on DNN with CT, AC, LD, and MOS on the benchmark dataset.
| Method | Train time (s) | Test time (s) | The dimensions of vector space | Data set |
|---|---|---|---|---|
| DNN-CT | 0.2852±0.0039 | 1.39E-05 | 686 | HPRD (36591) + Swiss-Port (36324) |
| DNN-AC | 0.2186±0.0014 | 1.32E-05 | 420 | HPRD (36591) + Swiss-Port (36324) |
| DNN-LD | 0.4045±0.0141 | 1.48E-05 | 1260 | HPRD (36591) + Swiss-Port (36324) |
| DNN-MOS | 0.1261±0.0039 | 1.28E-05 | 58 | HPRD (36591) + Swiss-Port (36324) |
Results of DNN with different feature extraction method on a non-redundant dataset.
| Methods | Accuracy | Recall | AUC |
|---|---|---|---|
| 88.29% | 93.63% | 92.23% | |
| 89.88% | 93.79% | 91.78% | |
| 93.35% | 96.24% | 94.99% | |
| 85.84% | N/A | N/A |
Comparison of the performances of MOS based on different classifiers using the human dataset.
| Methods | Accuracy | Recall | AUC |
|---|---|---|---|
| DT-MOS | 0.9436 | 0.9365 | 0.9436 |
| KN-MOS | 0.8301 | 0.6973 | 0.8298 |
| RF-MOS | 0.9729 | 0.9611 | 0.9729 |