| Literature DB >> 24868539 |
Zhixia Teng1, Maozu Guo2, Qiguo Dai2, Chunyu Wang2, Jin Li3, Xiaoyan Liu2.
Abstract
In this paper, we propose a novel method, SeekFun, to predict protein function based on weighted mapping of domains and GO terms. Firstly, a weighted mapping of domains and GO terms is constructed according to GO annotations and domain composition of the proteins. The association strength between domain and GO term is weighted by symmetrical conditional probability. Secondly, the mapping is extended along the true paths of the terms based on GO hierarchy. Finally, the terms associated with resident domains are transferred to host protein and real annotations of the host protein are determined by association strengths. Our careful comparisons demonstrate that SeekFun outperforms the concerned methods on most occasions. SeekFun provides a flexible and effective way for protein function prediction. It benefits from the well-constructed mapping of domains and GO terms, as well as the reasonable strategy for inferring annotations of protein from those of its domains.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24868539 PMCID: PMC4017789 DOI: 10.1155/2014/641469
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
The details of experimental datasets.
| Uniref50 | SwissProt | TrEMBL | |
|---|---|---|---|
| Number of annotated proteins | 20693 | 17176 | 19526 |
| Number of proteins with domains | 11673 | 15810 | 13588 |
| Number of involved domains | 4998 | 4430 | 3642 |
| Number of involved GOs | 4812 | 7572 | 3992 |
Figure 1Compare distributions of relevance on similar datasets. R dSCP, R SCP, and logR PV represent the relevance computed by conditional probability, symmetrical conditional probability, and P value, respectively. S is constructed by taking nine of ten equal-size partitions of SwissProt at a time, i = 1,2 ⋯ 10. Likewise, U and T denote the constructed subsets of Uniref50 and TrEMBL separately, j, k = 1,2 ⋯ 10. The curves display the distributions of relevance on similar subsets of the experimental datasets.
Figure 2Compare distributions of relevance on significantly different datasets. R dSCP, R SCP, and logR PV represent the relevance computed by conditional probability, symmetrical conditional probability, and P value, respectively. SwissProt, Uniref50, and TrEMBL are the significantly different datasets.The curves display the distributions of relevance on the experimental datasets.
Compare the impact of R SCP on protein function prediction.
| Uniref50 | SwissProt | TrEMBL | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MF | BP | CC | MF | BP | CC | MF | BP | CC | ||
| Predpfam2go | Precision | 0.5568 | 0.6094 | 0.5978 | 0.4861 | 0.532 | 0.5557 | 0.3856 | 0.3482 | 0.3954 |
| Recall | 0.441 | 0.2888 | 0.1747 |
| 0.4496 | 0.2255 | 0.6176 | 0.6027 | 0.2255 | |
|
| 0.4922 | 0.3918 | 0.2704 | 0.5721 | 0.4873 | 0.3208 | 0.4748 | 0.4414 | 0.2872 | |
|
| ||||||||||
| Predweighted | Precision | 0.2979 | 0.2502 | 0.1944 | 0.3514 | 0.2609 | 0.2611 | 0.3472 | 0.2179 | 0.2033 |
| Recall |
|
|
| 0.5946 |
|
|
|
|
| |
|
| 0.4312 | 0.3681 | 0.3183 | 0.4417 | 0.3727 | 0.3887 | 0.4827 | 0.3325 | 0.3259 | |
|
| ||||||||||
| Predcombine | Precision |
|
|
|
|
|
|
|
|
|
| Recall | 0.6971 | 0.5823 | 0.7655 | 0.56 | 0.4093 | 0.5984 | 0.7662 | 0.6371 | 0.7309 | |
|
|
|
|
|
|
|
|
|
|
| |
The best results are in bold.
Compare the impact of RSC on protein function prediction.
| Uniref50 | SwissProt | TrEMBL | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MF | BP | CC | MF | BP | CC | MF | BP | CC | ||
| RPE | Precision | 0.2709 | 0.1582 | 0.184 | 0.328 | 0.2334 | 0.2866 | 0.2803 | 0.1664 | 0.2131 |
| Recall |
|
|
|
|
|
|
|
|
| |
|
| 0.4076 | 0.2575 | 0.3044 | 0.4424 | 0.3195 | 0.4184 | 0.4224 | 0.2709 | 0.3443 | |
|
| ||||||||||
| RSC | Precision |
|
|
|
|
|
|
|
|
|
| Recall | 0.7876 | 0.6856 | 0.8163 | 0.5953 | 0.4294 | 0.6083 | 0.8229 | 0.6985 | 0.7716 | |
|
|
|
|
|
|
|
|
|
|
| |
The best results are in bold.
Compare the performances of the concerned methods.
| Uniref50 | SwissProt | TrEMBL | Average | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| MF | BP | CC | MF | BP | CC | MF | BP | CC | |||
| NB | Precision | 0.7778 | 0.7339 | 0.7421 | 0.8362 | 0.8121 |
|
|
|
| 0.8201 |
| Recall | 0.0428 | 0.0319 | 0.0244 | 0.5012 | 0.4212 | 0.3718 | 0.5086 | 0.3721 | 0.4819 | 0.3062 | |
|
| 0.0812 | 0.0612 | 0.0473 | 0.6267 | 0.5547 | 0.5156 | 0.6493 | 0.5172 | 0.6259 | 0.4088 | |
|
| |||||||||||
| DRDO | Precision | 0.7716 | 0.7151 | 0.7109 | 0.8232 | 0.8004 | 0.8312 | 0.8644 | 0.8073 | 0.8623 | 0.7985 |
| Recall | 0.1777 | 0.1385 | 0.1115 | 0.5868 | 0.5023 | 0.4437 | 0.5517 | 0.429 | 0.5422 | 0.387 | |
|
| 0.2888 | 0.2321 | 0.1928 | 0.6852 | 0.6173 | 0.5786 | 0.6735 | 0.5603 | 0.6657 | 0.4994 | |
|
| |||||||||||
| DRDO-NB | Precision | 0.8375 | 0.6906 | 0.7439 | 0.7379 | 0.7186 | 0.6766 | 0.8426 | 0.8471 | 0.7512 | 0.7607 |
| Recall | 0.2094 | 0.232 | 0.2695 | 0.2394 | 0.2272 | 0.2633 | 0.157 | 0.1502 | 0.1452 | 0.2104 | |
|
| 0.335 | 0.3474 | 0.3956 | 0.3615 | 0.3452 | 0.379 | 0.2647 | 0.2551 | 0.2434 | 0.3252 | |
|
| |||||||||||
| dcGO | Precision | 0.4342 | 0.3751 | 0.3014 | 0.558 | 0.5253 | 0.4375 | 0.3801 | 0.3473 | 0.3494 | 0.412 |
| Recall | 0.6127 | 0.503 | 0.6127 |
|
| 0.5904 | 0.6692 | 0.5137 | 0.6509 | 0.5764 | |
|
| 0.5083 | 0.4297 | 0.4041 | 0.5805 | 0.4731 | 0.5026 | 0.4848 | 0.4144 | 0.4547 | 0.4725 | |
|
| |||||||||||
| SeekFun | Precision |
|
|
|
|
| 0.7751 | 0.8064 | 0.8071 | 0.8163 |
|
| Recall |
|
|
| 0.5953 | 0.4294 |
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The best results are in bold.