| Literature DB >> 35710342 |
Xiao-Rui Su1,2,3, Lun Hu4,5,6, Zhu-Hong You7, Peng-Wei Hu1,2,3, Bo-Wei Zhao1,2,3.
Abstract
BACKGROUND: Protein-protein interaction (PPI) plays an important role in regulating cells and signals. Despite the ongoing efforts of the bioassay group, continued incomplete data limits our ability to understand the molecular roots of human disease. Therefore, it is urgent to develop a computational method to predict PPIs from the perspective of molecular system.Entities:
Keywords: Heterogeneous molecular network; LINE; Network representation learning; Protein sequence; Protein–protein interaction
Mesh:
Substances:
Year: 2022 PMID: 35710342 PMCID: PMC9205098 DOI: 10.1186/s12859-022-04766-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1The overview of MTV-PPI
The statistics of associations in the heterogeneous molecular network
| Type of associations | Sources | Number |
|---|---|---|
| miRNA-LncRNA | lncRNASNP2 [ | 8374 |
| miRNA-Disease | HMDD [ | 16,427 |
| miRNA-Protein | miRTarBase [ | 4944 |
| LncRNA-Disease | LncRNADisease [ | 1264 |
| Protein–Protein | STRING [ | 19,237 |
| Protein-Disease | DisGeNET [ | 25,087 |
| Drug-Protein | DrugBank [ | 11,107 |
| Drug-Disease | CTD [ | 18,416 |
| LncRNA-Protein | LncRNA2Target [ | 690 |
| Total | – | 105,546 |
The statistics of nodes in the heterogeneous molecular network
| Type of nodes | Number |
|---|---|
| Protein | 1649 |
| LncRNA | 769 |
| miRNA | 1023 |
| Disease | 2062 |
| Drug | 1025 |
Fig. 2An illustration of the process of extracting inter-view feature
Fig. 3An illustration of first-order proximity and second-order proximity in LINE
Predictive performance under each fold on heterogeneous molecular network
| Fold | Acc. | Sen. | Pre. | AUC | AUPR |
|---|---|---|---|---|---|
| 0 | 0.8703 | 0.8264 | 0.9060 | 0.9341 | 0.9346 |
| 1 | 0.8732 | 0.8332 | 0.9056 | 0.9370 | 0.9378 |
| 2 | 0.8602 | 0.8181 | 0.8933 | 0.9234 | 0.9268 |
| 3 | 0.8617 | 0.8342 | 0.8828 | 0.9298 | 0.9270 |
| 4 | 0.8620 | 0.9124 | 0.9019 | 0.9262 | 0.9277 |
| Overall |
Best results are bolded
Results of various methods
| Methods | Acc. | Sen. | Pre. | AUC | AUPR |
|---|---|---|---|---|---|
| LR | 0.7717 ± 0.0066 | 0.7551 ± 0.0090 | 0.7329 ± 0.0092 | 0.8482 ± 0.0060 | 0.8411 ± 0.0058 |
| DPPI | 0.8007 ± 0.0087 | 0.7623 ± 0.0099 | 0.7677 ± 0.0090 | 0.8726 ± 0.0076 | 0.8903 ± 0.0078 |
| WSRC | 0.8225 ± 0.0105 | 0.7623 ± 0.0097 | 0.7987 ± 0.0123 | 0.9022 ± 0.0089 | 0.8975 ± 0.0086 |
| LPPI | 0.8062 ± 0.0116 | 0.7232 ± 0.0103 | 0.8424 ± 0.0173 | 0.8022 ± 0.0154 | |
| PIPR | 0.7536 ± 0.0090 | 0.7678 ± 0.0100 | 0.7456 ± 0.0098 | 0.8331 ± 0.0094 | 0.8246 ± 0.0096 |
| MTV-PPI | 0.8249 ± 0.0085 |
Best results are bolded
Fig. 4ROC and PR curves obtained by MTV-PPI and all baseline algorithms
Fig. 5Predictive performances with two different aggregators
Fig. 6Results with different network embedding algorithms
Predictive performance with different feature type
| Feature Type | Acc. | Sen. | Pre. | AUC | AUPR |
|---|---|---|---|---|---|
| Inter-view feature | 0.7491 ± 0.0090 | 0.6945 ± 0.0109 | 0.7797 ± 0.0103 | 0.8206 ± 0.0080 | 0.8185 ± 0.0181 |
| Intra-view feature | 0.8570 ± 0.0045 | 0.8130 ± 0.0105 | 0.8916 ± 0.0099 | 0.9240 ± 0.0046 | 0.9238 ± 0.0093 |
| Aggregated feature |
Best results are bolded
Fig. 7ROC and PR curves obtained by various features
Predictive performance with various classifiers
| Classifier | Acc. | Sen. | Pre. | AUC | AUPR |
|---|---|---|---|---|---|
| SVM | 0.7103 ± 0.0078 | 0.7577 ± 0.0113 | 0.6921 ± 0.0074 | 0.7747 ± 0.0077 | 0.7686 ± 0.0074 |
| LR | 0.7056 ± 0.0072 | 0.7452 ± 0.0119 | 0.6905 ± 0.0067 | 0.7733 ± 0.0078 | 0.7667 ± 0.0076 |
| NB | 0.6772 ± 0.0084 | 0.7392 ± 0.0098 | 0.6578 ± 0.0090 | 0.7563 ± 0.0071 | 0.7827 ± 0.0075 |
| AdaBoost | 0.6946 ± 0.0088 | 0.7306 ± 0.0115 | 0.6816 ± 0.0090 | 0.7669 ± 0.0094 | 0.7713 ± 0.0086 |
| XGBoost | 0.8600 ± 0.0081 | 0.8419 ± 0.0109 | 0.9240 ± 0.0048 | ||
| RF | 0.8249 ± 0.0085 | 0.9301 ± 0.0050 |
Best results are bolded
Fig. 8ROC and PR curves obtained by various classifiers
The detail information of each subnetwork
| Name | # Nodes | # Interactions |
|---|---|---|
| Protein–protein (PP) | 1649 | 19,237 |
| miRNA–protein–protein (MiPP) | 2672 | 24,181 |
| lncRNA–protein–protein (LncPP) | 2418 | 19,927 |
| Disease–protein–protein (DiPP) | 3711 | 44,324 |
| Drug–protein–protein (DrPP) | 2674 | 30,344 |
Experimental results obtained on each sub-network
| Sub-network | Acc. | Sen. | Pre. | AUC | AUPR |
|---|---|---|---|---|---|
| PP | 0.8348 ± 0.0065 | 0.7869 ± 0.0094 | 0.8704 ± 0.0075 | 0.9047 ± 0.0038 | 0.9144 ± 0.0041 |
| MiPP | 0.8420 ± 0.0022 | 0.7936 ± 0.0055 | 0.8786 ± 0.0049 | 0.9095 ± 0.0020 | 0.9198 ± 0.0022 |
| LncPP | 0.8350 ± 0.0044 | 0.7865 ± 0.0086 | 0.8711 ± 0.0081 | 0.9042 ± 0.0028 | 0.9143 ± 0.0028 |
| DiPP | 0.8352 ± 0.0048 | 0.7796 ± 0.0064 | 0.8772 ± 0.0068 | 0.9053 ± 0.0031 | 0.9139 ± 0.0032 |
| DrPP | 0.8537 ± 0.0034 | 0.8057 ± 0.0052 | 0.8913 ± 0.0026 | 0.9213 ± 0.0039 | 0.9291 ± 0.0040 |
| All |
Best results are bolded