| Literature DB >> 33154416 |
Abstract
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33154416 PMCID: PMC7645622 DOI: 10.1038/s41598-020-75467-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Illustration of binary and several other attributes’ volumetric representation for enzyme 1A00.
Figure 2Residual block with two layers.
Figure 4The diagram depicting working of proposed method.
Figure 3Memory block cell of LSTM network.
The average repeated 3-fold cross-validation results on different features of proteins using multi-layer perceptron.
| Binary | 0.9027 | 0.9511 | 0.7862 | 0.9187 | 0.9342 | 0.7658 | 0.9676 | 0.9836 |
| Charge | 0.9538 | 0.9761 | 0.9875 | |||||
| Isoelectric | 0.9349 | 0.9565 | 0.8827 | 0.9527 | 0.9539 | 0.8462 | 0.9776 | 0.9881 |
| Hydropathy | 0.9360 | 0.8672 | 0.9473 | 0.9552 | 0.8471 | |||
| Avg of all Features | 0.9048 | 0.9122 | 0.8869 | 0.9528 | 0.9308 | 0.7843 | 0.9627 | 0.9810 |
The best results are marked in bold.
The repeated 3-fold cross-validation results on Human PPI dataset using LSTM-based classifier that integrates structural features and autocovariance.
| Repeat 1 | 1 | 0.9600 | 0.9679 | 0.9409 | 0.9753 | 0.9716 | 0.9040 | 0.9803 | 0.9866 |
| 2 | 0.9738 | 0.9830 | 0.9514 | 0.9799 | 0.9815 | 0.9365 | 0.9904 | 0.9949 | |
| 3 | 0.9771 | 0.9829 | 0.9630 | 0.9847 | 0.9838 | 0.9447 | 0.9916 | 0.9949 | |
| Repeat 2 | 1 | 0.9522 | 0.9582 | 0.9377 | 0.9738 | 0.9659 | 0.8863 | 0.9773 | 0.9852 |
| 2 | 0.9658 | 0.9694 | 0.9570 | 0.9820 | 0.9756 | 0.9183 | 0.9861 | 0.9907 | |
| 3 | 0.9781 | 0.9815 | 0.9699 | 0.9874 | 0.9845 | 0.9474 | 0.9925 | 0.9958 | |
| Repeat 3 | 1 | 0.9536 | 0.9770 | 0.8971 | 0.9582 | 0.9675 | 0.8870 | 0.9778 | 0.9857 |
| 2 | 0.9765 | 0.9890 | 0.9462 | 0.9779 | 0.9834 | 0.9429 | 0.9919 | 0.9956 | |
| 3 | 0.9799 | 0.9824 | 0.9739 | 0.9891 | 0.9857 | 0.9517 | 0.9946 | 0.9970 | |
| Mean | |||||||||
| Std. deviation | 0.0103 | 0.0092 | 0.0216 | 0.0087 | 0.0073 | 0.0246 | 0.0064 | 0.0045 | |
The average results are marked in bold.
The repeated 3-fold cross-validation results on Human PPI dataset using LSTM-based classifier that integrates structural features and conjoint triad.
| Repeat 1 | 1 | 0.9629 | 0.9645 | 0.9590 | 0.9827 | 0.9735 | 0.9121 | 0.9847 | 0.9892 |
| 2 | 0.9727 | 0.9762 | 0.9643 | 0.9851 | 0.9806 | 0.9346 | 0.9934 | 0.9965 | |
| 3 | 0.9774 | 0.9808 | 0.9691 | 0.9871 | 0.9840 | 0.9457 | 0.9952 | 0.9977 | |
| Repeat 2 | 1 | 0.9598 | 0.9512 | 0.9498 | 0.9786 | 0.9647 | 0.8845 | 0.9801 | 0.9873 |
| 2 | 0.9760 | 0.9840 | 0.9566 | 0.9821 | 0.9830 | 0.9420 | 0.9906 | 0.9941 | |
| 3 | 0.9807 | 0.9827 | 0.9759 | 0.9900 | 0.9863 | 0.9537 | 0.9960 | 0.9980 | |
| Repeat 3 | 1 | 0.9648 | 0.9675 | 0.9582 | 0.9824 | 0.9749 | 0.9163 | 0.9852 | 0.9911 |
| 2 | 0.9712 | 0.9735 | 0.9654 | 0.9855 | 0.9795 | 0.9312 | 0.9915 | 0.9949 | |
| 3 | 0.9825 | 0.9874 | 0.9707 | 0.9878 | 0.9876 | 0.9577 | 0.9968 | 0.9982 | |
| Mean | |||||||||
| Std. deviation | 0.0075 | 0.0108 | 0.0076 | 0.0033 | 0.0068 | 0.0218 | 0.0055 | 0.0038 | |
The average results are marked in bold.
The repeated 3-fold cross-validation results on Human PPI dataset using LSTM-based classifier that integrates structural features with autocovariance and conjoint triad.
| Repeat 1 | 1 | 0.9606 | 0.9584 | 0.9658 | 0.9854 | 0.9717 | 0.9076 | 0.9847 | 0.9904 |
| 2 | 0.9760 | 0.9810 | 0.9639 | 0.9850 | 0.9830 | 0.9422 | 0.9940 | 0.9968 | |
| 3 | 0.9809 | 0.9865 | 0.9675 | 0.9865 | 0.9865 | 0.9540 | 0.9949 | 0.9972 | |
| Repeat 2 | 1 | 0.9648 | 0.9814 | 0.9249 | 0.9693 | 0.9753 | 0.9145 | 0.9834 | 0.9905 |
| 2 | 0.9728 | 0.9787 | 0.9586 | 0.9828 | 0.9807 | 0.9346 | 0.9901 | 0.9944 | |
| 3 | 0.9815 | 0.9870 | 0.9683 | 0.9869 | 0.9869 | 0.9554 | 0.9957 | 0.9978 | |
| Repeat 3 | 1 | 0.9590 | 0.9695 | 0.9337 | 0.9725 | 0.9710 | 0.9014 | 0.9803 | 0.9863 |
| 2 | 0.9748 | 0.9742 | 0.9763 | 0.9900 | 0.9820 | 0.9402 | 0.9917 | 0.9950 | |
| 3 | 0.9831 | 0.9885 | 0.9699 | 0.9875 | 0.9880 | 0.9591 | 0.9967 | 0.9984 | |
| Mean | |||||||||
| Std. deviation | 0.0086 | 0.0091 | 0.0164 | 0.0067 | 0.0061 | 0.0203 | 0.0056 | 0.0039 | |
The average results are marked in bold.
The prediction performances on test set of Human PPI dataset for different multimodal feature combinations.
| Structural+AC | 0.9692 | 0.9354 | 0.9741 | 0.9785 | 0.9246 | 0.9831 | ||
| Structural+CT | 0.9706 | 0.9804 | 0.9463 | 0.9783 | 0.9794 | 0.9282 | 0.9831 | 0.9886 |
| Structural+AC+CT | 0.9807 | 0.9887 |
The best results are marked in bold.
The prediction performances on test set of Saccharomyces cerevisiae PPI dataset for different multimodal feature combinations.
| Structural+AC | 0.9206 | 0.8922 | 0.9177 | 0.8424 | 0.9764 | 0.9776 | ||
| Structural+CT | 0.9266 | 0.9232 | 0.9226 | 0.9263 | 0.8532 | 0.9777 | 0.9796 | |
| Structural+AC+CT | 0.9276 | 0.9462 | 0.9443 |
The best results are marked in bold.
The results of Human PPIs dataset for different feature combinations on test set.
| Unimodal | AC | 0.9348 | 0.9534 | 0.8907 | 0.9537 | 0.9536 | 0.8440 | 0.9679 | 0.9800 |
| CT | 0.9484 | 0.9578 | 0.9252 | 0.9693 | 0.9635 | 0.8756 | 0.9804 | ||
| Structural | 0.9167 | 0.9512 | 0.8314 | 0.9330 | 0.9420 | 0.7945 | 0.9619 | 0.9793 | |
| Bimodal | Structural+AC | 0.9692 | 0.9354 | 0.9741 | 0.9785 | 0.9246 | 0.9831 | 0.9897 | |
| Structural+CT | 0.9706 | 0.9804 | 0.9463 | 0.9783 | 0.9794 | 0.9282 | 0.9831 | 0.9886 | |
| Structural+AC+CT | 0.9807 | 0.9887 |
The best results are marked in bold.
The results of Saccharomyces cerevisiae PPIs dataset for different feature combinations on test set.
| Unimodal | AC | 0.8855 | 0.9067 | 0.8646 | 0.8683 | 0.8871 | 0.7718 | 0.9447 | 0.9386 |
| CT | 0.9094 | 0.9204 | 0.8987 | 0.8994 | 0.9097 | 0.8191 | 0.9633 | 0.9605 | |
| Structural | 0.8559 | 0.8761 | 0.8361 | 0.8403 | 0.8578 | 0.7126 | 0.9281 | 0.9270 | |
| Bimodal | Structural+AC | 0.9206 | 0.8922 | 0.9177 | 0.8424 | 0.9764 | 0.9776 | ||
| Structural+CT | 0.9266 | 0.9232 | 0.9226 | 0.9263 | 0.8532 | 0.9777 | 0.9796 | ||
| Structural+AC+CT | 0.9276 | 0.9462 | 0.9443 |
The best results are marked in bold.
Figure 5Illustration of the performance of models trained on different feature combinations. (a) Human PPIs dataset, (b) Saccharomyces cerevisiae PPIs dataset.
Performance comparison between proposed approach and existing methods on the test set of Human PPIs dataset.
| SAE_AC[ | 0.9682 | NA | NA | NA | NA | NA | NA | NA |
| SAE_CT[ | 0.9447 | NA | NA | NA | NA | NA | NA | NA |
| Proposed approach |
The best results are marked in bold.
Note: NA means not available.
Performance comparison between proposed approach and existing methods on Saccharomyces cerevisiae PPIs dataset.
| DeepPPI-Sep[ | 0.9250 | 0.9056 | 0.9449 | 0.9438 | NA | 0.8508 | 0.9743 | NA |
| DeepPPI-Con[ | 0.9001 | 0.8847 | 0.9160 | 0.9150 | NA | 0.8008 | 0.9576 | NA |
| EnsDNN-Con[ | 0.9068 | 0.9014 | 0.9119 | 0.9119 | 0.9062 | 0.8143 | 0.9645 | NA |
| EnsDNN-Sep[ | 0.9119 | 0.9223 | 0.9017 | 0.9041 | 0.9129 | 0.8244 | 0.9659 | NA |
| EnsDNN[ | 0.9529 | 0.9512 | 0.9548 | 0.9545 | 0.9529 | 0.9059 | 0.9700 | NA |
| Proposed approach |
The best results are marked in bold.
Note: NA means not available.
p-values obtained for different bimodal feature set combinations of two datasets.
| Structural+AC | 1.10954e-22 | 0.0345 |
| Structural+CT | 2.31427e-23 | 2.19183e-06 |
| Structural+AC+CT | 2.72237e-24 | 2.66818e-15 |