| Literature DB >> 35879648 |
Xiaoxu Wang1, Yijia Zhang2, Peixuan Zhou1, Xiaoxia Liu3.
Abstract
BACKGROUND: Protein complexes are essential for biologists to understand cell organization and function effectively. In recent years, predicting complexes from protein-protein interaction (PPI) networks through computational methods is one of the current research hotspots. Many methods for protein complex prediction have been proposed. However, how to use the information of known protein complexes is still a fundamental problem that needs to be solved urgently in predicting protein complexes.Entities:
Keywords: Network representation learning; Protein complex prediction; Protein–protein interaction networks; Supervised learning
Mesh:
Year: 2022 PMID: 35879648 PMCID: PMC9317086 DOI: 10.1186/s12859-022-04850-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Overall flow chart of our method
Fig. 2Overall flow chart of SVCC model
Model parameter settings
| Parameters | Value |
|---|---|
| C | 3 |
| Kernel | Poly |
| Degree | 4 |
| Gamma | scale |
| Coef0 | 0 |
| Probability | True |
| Tol | 0.001 |
| Cache_size | 200 |
Basic information of two protein interaction networks
| Dataset | Numbers of nodes | Number of edges | Avg numbers of neighbors |
|---|---|---|---|
| DIP | 3490 | 11,189 | 6.412 |
| HPRD | 7307 | 29,213 | 7.996 |
Comparison of experimental results on DIP and HPRD data sets
| Dataset | Method | Number | Precision | Recall | F-score |
|---|---|---|---|---|---|
| DIP | MCODE | 72 | 0.5138 | 0.1010 | 0.1689 |
| COACH | 747 | 0.4310 | 0.4685 | 0.4490 | |
| CMC | 709 | 0.3004 | 0.4631 | 0.3644 | |
| ClusterONE | 363 | 0.5041 | 0.3661 | 0.4241 | |
| GANE | 326 | 0.6012 | 0.4303 | 0.5016 | |
| EWCA | 1028 | 0.5642 | 0.4767 | 0.5168 | |
| SLPC | 766 | 0.6057 | 0.4631 | 0.5249 | |
| SVCC only | 946 | 0.5898 | 0.4699 | 0.5231 | |
| Our | 514 | 0.7684 | 0.4330 | ||
| HPRD | COACH | 1914 | 0.3322 | 0.5805 | 0.4226 |
| CMC | 2399 | 0.3384 | 0.7741 | 0.4710 | |
| ClusterONE | 875 | 0.3942 | 0.3348 | 0.3621 | |
| GANE | 755 | 0.3668 | 0.3124 | 0.3374 | |
| EWCA | 1915 | 0.4976 | 0.5832 | 0.5370 | |
| SLPC | 2431 | 0.4452 | 0.6901 | 0.5412 | |
| MCODE | 137 | 0.5182 | 0.1050 | 0.1746 | |
| SVCC only | 2649 | 0.3812 | 0.8243 | 0.5213 | |
| Our | 1252 | 0.5559 | 0.7186 |
The highest F-score is in bold
Parameter settings of six supervised learning models
| ID | Model | Parameters |
|---|---|---|
| 1 | RF | n_estimators = 1000 |
| 2 | LR | C = 1.0 |
| 3 | KNN | n_neighbors = 5 |
| 4 | XGBoost | booster = gbtree, learning_rate = 0.3, max_depth = 6, min_child_weight = 1 |
| 5 | AdaBoost | base_estimator = DecisionTreeClassifier, algorithm = SAMME, n_estimators = 350, learning_rate = 0.4 |
| 6 | GBDT | learning_rate = 0.1, n_estimators = 100, max_depth = 2, min_samples_split = 1.0, min_samples_leaf = 2 |
Fig. 3Experimental comparison results of supervised models on the DIP network
Fig. 4Experimental comparison results of superrvised models on the HPRD network
Parameter settings of five network representation learning methods
| ID | Method | Parameters |
|---|---|---|
| 1 | Node2VEC | walk-length = 80, number-walks = 10, p = 8.0, q = 1.0, dimensions = 64 |
| 2 | DeepWalk | walk-length = 80, number-walks = 10, dimensions = 64 |
| 3 | HOPE | dimensions = 64 |
| 4 | LINE | epoch = 5, order = 3, clf-ratio = 0.5, dimensions = 64 |
| 5 | SDNE | alpha = 1e-6, beta = 5, nu1 = 1e-5, nu2 = 1e-4, batch_size = 200, epoch = 5, learning_rate = 0.01, dimensions = 64 |
Fig. 5The impact of different network representation learning methods on the experimental performance of the DIP network
Fig. 6The impact of different network representation learning methods on the experimental performance of the HPRD network
Comparison of experimental results using different classification labels on the DIP and HPRD datasets
| Dataset | SVC train set | RF train set | Precision | Recall | F-score |
|---|---|---|---|---|---|
| DIP | Two categories | Two categories | 0.7684 | 0.4330 | 0.5539 |
| Three categories | 0.8726 | 0.4112 | |||
| Three categories | Two categories | 0.5562 | 0.4467 | 0.4955 | |
| Three categories | 0.5579 | 0.4494 | 0.4978 | ||
| HPRD | Two categories | Two categories | 0.5559 | 0.7186 | |
| Three categories | 0.7891 | 0.4966 | 0.6096 | ||
| Three categories | Two categories | 0.4485 | 0.4299 | 0.4390 | |
| Three categories | 0.6472 | 0.2919 | 0.4023 |
The highest F-score is in bold
Ten predicted complexes with low that match the true complexes on the DIP network
| ID | Complex | Match | |||
|---|---|---|---|---|---|
| GO_Process | GO_Function | GO_Component | |||
| 1 | 1.0 | 5.73543e-12 | 7.89663e-08 | 1.33543e-16 | |
| 2 | 0.8 | 6.51289e-12 | 4.55447e-14 | 6.52337e-10 | |
| 3 | 0.86 | 6.7513e-11 | 4.80533e-15 | 1.4919e-19 | |
| 4 | 0.83 | 3.56116e-16 | 1.33543e-16 | 3.81553e-17 | |
| 5 | 0.85 | 2.62478e-19 | 6.57407e-17 | 5.95926e-20 | |
| 6 | 1.0 | 9.13462e-10 | 2.69145e-09 | 6.52473e-11 | |
| 7 | 1.0 | 3.00595e-12 | 6.52473e-11 | 9.10894e-15 | |
| 8 | 0.8 | 1.29404e-13 | 1.59353e-12 | 3.56116e-16 | |
| 9 | 0.83 | 2.79506e-14 | 5.83637e-06 | 2.0486e-12 | |
| 10 | 1.0 | 5.70914e-10 | 5.58191e-07 | 6.52473e-11 | |
Ten predicted complexes with low that match the true complexes on the HPRD network
| ID | Complex | Match | |||
|---|---|---|---|---|---|
| GO_Process | GO_Function | GO_Component | |||
| 1 | 0.83 | 1.16739e-14 | 3.054e-07 | 2.53167e-12 | |
| 2 | 1.0 | 1.52359e-07 | 6.80376e-09 | 5.39083e-07 | |
| 3 | 0.86 | 4.48623e-13 | 1.51728e-07 | 4.56648e-17 | |
| 4 | 0.83 | 6.12908e-06 | 7.33099e-11 | 5.9544e-20 | |
| 5 | 0.61 | 1.74854e-20 | 1.60322e-06 | 4.16646e-19 | |
| 6 | 1.0 | 2.55384e-08 | 3.37375e-07 | 1.2821e-07 | |
| 7 | 1.0 | 2.46197e-10 | 2.09624e-09 | 1.83324e-07 | |
| 8 | 0.8 | 7.87032e-11 | 6.84817e-06 | 7.78789e-13 | |
| 9 | 0.83 | 7.34722e-08 | 5.99134e-09 | 1.42065e-08 | |
| 10 | 1.0 | 2.7759e-05 | 2.59935e-05 | 4.61418e-09 | |
Eight predicted complexes with low that don’t match the true complexes on the DIP and HPRD networks
| Dataset | ID | Complex | Match | |||
|---|---|---|---|---|---|---|
| GO_Process | GO_Function | GO_Component | ||||
| dDIP | 1 | 0.21 | 4.76944e-10 | 3.56521e-05 | 2.2827e-09 | |
| 2 | YOR304W | 0.25 | 4.24108e-08 | 2.88883e-08 | 7.01017e-07 | |
| 3 | 0.33 | 7.42936e-10 | 5.70436e-09 | 2.91387e-11 | ||
| 4 | YHL006C YDR078C YIL132C YLR376C | 0.0 | 3.00595e-12 | 6.54104e-05 | 4.55447e-14 | |
| HPRD | 1 | SHC1 | 0.12 | 1.00517e-08 | 1.14531e-06 | 3.6605e-07 |
| 2 | 0.2 | 3.43893e-12 | 1.97181e-14 | 1.63409e-13 | ||
| 3 | 0.1 | 2.01548e-06 | 3.66353e-06 | 1.16493e-05 | ||
| 4 | CDH18 CDH9 CDH6 | 0.0 | 9.51598e-09 | 4.37102e-06 | 3.82459e-09 | |