| Literature DB >> 27678214 |
Bihai Zhao1, Sai Hu2, Xueyong Li1, Fan Zhang1, Qinglong Tian1, Wenyin Ni3.
Abstract
BACKGROUND: Accurate annotation of protein functions is still a big challenge for understanding life in the post-genomic era. Many computational methods based on protein-protein interaction (PPI) networks have been proposed to predict the function of proteins. However, the precision of these predictions still needs to be improved, due to the incompletion and noise in PPI networks. Integrating network topology and biological information could improve the accuracy of protein function prediction and may also lead to the discovery of multiple interaction types between proteins. Current algorithms generate a single network, which is archived using a weighted sum of all types of protein interactions.Entities:
Year: 2016 PMID: 27678214 PMCID: PMC5039885 DOI: 10.1186/s40246-016-0087-x
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Fig. 1a is the original protein-protein interaction network experimentally validated. b is the constructed co- annotation network based on the GO profile. c is the constructed co-expression network based on time course gene expression data. d is reconstructed network based on the PPI network, co- annotation network and co-expression network by current methods
Fig. 2Example of multilayer protein networks
Fig. 3Visualization of constructed multilayer protein networks
Fig. 4The tensor representation of a multilayer protein network
Statistical analysis of the influence of three layers
| Layers | Annotated proteins | Precision | Recall |
|
|---|---|---|---|---|
| PIL | 1274 | 0.3791 | 0.1094 | 0.1697 |
| SDL | 1215 | 0.3595 | 0.1538 | 0.2154 |
| SCL | 1103 | 0.3404 | 0.1829 | 0.238 |
Fig. 5a is the constructed multilayer protein network. b is the tensor representation of MPN after pre-processing. c is the predicted functions list for the un-known protein A generated by the FP-MPN method
Statistical analysis of overlaps of functions
| OS | Proportion (all proteins) | Proportion (proteins with more than one function) |
|---|---|---|
| (0, 0.2] | 2.81 % | 5.64 % |
| (0.2, 0.4] | 13.90 % | 27.95 % |
| (0.4, 0.6] | 27.05 % | 54.41 % |
| (0.6, 0.8] | 2.02 % | 4.06 % |
| (0.8, 1] | 54.22 % | 7.93 % |
The influence of access sequence
| Categories | Schemes | Precision | Recall |
| CR |
|---|---|---|---|---|---|
| BP | SCL → SDL → PIL | 0.444 | 0.427 | 0.435 | 0.426 |
| SCL → PIL → SDL | 0.462 | 0.401 | 0.429 | 0.374 | |
| SDL → PIL → SCL | 0.452 | 0.404 | 0.426 | 0.396 | |
| SDL → SCL → PIL | 0.442 | 0.424 | 0.433 | 0.422 | |
| PIL → SDL → SCL | 0.453 | 0.404 | 0.427 | 0.397 | |
| PIL → SCL → SDL | 0.459 | 0.398 | 0.426 | 0.372 | |
| MF | SCL → SDL → PIL | 0.569 | 0.544 | 0.556 | 0.508 |
| SCL → PIL → SDL | 0.566 | 0.535 | 0.55 | 0.495 | |
| SDL → PIL → SCL | 0.585 | 0.54 | 0.561 | 0.505 | |
| SDL → SCL → PIL | 0.568 | 0.543 | 0.555 | 0.507 | |
| PIL → SDL → SCL | 0.584 | 0.539 | 0.561 | 0.504 | |
| PIL → SCL → SDL | 0.573 | 0.541 | 0.557 | 0.5 | |
| CC | SCL → SDL → PIL | 0.463 | 0.439 | 0.451 | 0.415 |
| SCL → PIL → SDL | 0.468 | 0.43 | 0.448 | 0.4 | |
| SDL → PIL → SCL | 0.473 | 0.424 | 0.447 | 0.402 | |
| SDL → SCL → PIL | 0.461 | 0.439 | 0.45 | 0.413 | |
| PIL → SDL → SCL | 0.473 | 0.424 | 0.448 | 0.403 | |
| PIL → SCL → SDL | 0.467 | 0.429 | 0.447 | 0.4 |
Overall comparisons of various methods
| Categories | Methods | MP | Precision | Recall |
| CR |
|---|---|---|---|---|---|---|
| BP | FP-MPN | 1595 | 0.444 | 0.427 | 0.435 | 0.426 |
| Zhang | 810 | 0.225 | 0.220 | 0.222 | 0.216 | |
| DCS | 1148 | 0.312 | 0.314 | 0.313 | 0.327 | |
| DSCP | 1298 | 0.357 | 0.359 | 0.358 | 0.363 | |
| PON | 572 | 0.150 | 0.140 | 0.145 | 0.161 | |
| MF | FP-MPN | 995 | 0.569 | 0.544 | 0.556 | 0.508 |
| Zhang | 608 | 0.332 | 0.332 | 0.332 | 0.316 | |
| DCS | 839 | 0.461 | 0.462 | 0.461 | 0.441 | |
| DSCP | 927 | 0.518 | 0.515 | 0.516 | 0.489 | |
| PON | 413 | 0.223 | 0.216 | 0.22 | 0.228 | |
| CC | FP-MPN | 1265 | 0.463 | 0.439 | 0.451 | 0.415 |
| Zhang | 561 | 0.197 | 0.196 | 0.197 | 0.198 | |
| DCS | 876 | 0.306 | 0.309 | 0.307 | 0.315 | |
| DSCP | 1014 | 0.364 | 0.363 | 0.364 | 0.356 | |
| PON | 440 | 0.148 | 0.138 | 0.143 | 0.158 |
Fig. 6The precision-recall curves of FP-MPN compared to other four existing algorithms
Fig. 7FP/TP curves of various methods
Statistical analysis of FP/TP of various methods
| Categories | Methods | Maximum | Minimum | Average | Middle |
|---|---|---|---|---|---|
| BP | FP-MPN | 9.44 | 0.72 | 6.48 | 7.18 |
| Zhang | 40.29 | 1.59 | 20.96 | 21.04 | |
| DCS | 33.94 | 2.12 | 18.64 | 18.94 | |
| DSCP | 32.14 | 1.75 | 17.49 | 17.75 | |
| PON | 9.39 | 3.07 | 6.98 | 7.41 | |
| MF | FP-MPN | 6.19 | 0.53 | 5.23 | 5.99 |
| Zhang | 45.5 | 0.9 | 22.81 | 22.71 | |
| DCS | 39.41 | 1.18 | 21.28 | 21.88 | |
| DSCP | 38.54 | 0.94 | 20.4 | 20.73 | |
| PON | 4.57 | 1.85 | 4.2 | 4.57 | |
| CC | FP-MPN | 7.39 | 0.72 | 5.88 | 6.59 |
| Zhang | 53.51 | 2.12 | 27.29 | 27.09 | |
| DCS | 38.15 | 2.36 | 21.49 | 22.25 | |
| DSCP | 37.02 | 1.81 | 20.45 | 21.04 | |
| PON | 6.88 | 3.07 | 6.02 | 6.57 |
Fig. 8The precision-recall curves of various methods using tenfold cross-validation
Selected functions predicted by various methods
| Categories | Proteins | FP-MPN | Zhang | DCS | DSCP | PON |
|---|---|---|---|---|---|---|
| BP | YGL100W |
| GO:0000723 | GO:0043161 | GO:0043161 | GO:0000001 |
| YNL262W |
|
|
|
|
| |
| YLR321C |
|
|
|
|
| |
| YBR278W |
|
|
| |||
| MF | YBR114W |
|
|
|
| GO:0000386 |
| YJR052W |
| GO:0008134 | GO:0043130 | |||
| YJR140C |
| GO:0046933 |
| |||
| YBL021C |
| GO:0003713 | GO:0003713 | GO:0003713 | GO:0003713 | |
| CC | YNL161W |
|
|
|
|
|
| YBR198C |
| GO:0070210 | GO:0070210 | GO:0070210 |
| |
| YDR167W |
| GO:0005666 |
| |||
| YNL273W |
| GO:0005751 | GO:0005751 |
Fig. 9Comparison of the running time of various methods