| Literature DB >> 22719847 |
Suyu Mei1.
Abstract
Recent years have witnessed much progress in computational modelling for protein subcellular localization. However, the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance, and the gene ontology (GO) based models may take the risk of performance overestimation for novel proteins. Furthermore, many human proteins have multiple subcellular locations, which renders the computational modelling more complicated. Up to the present, there are far few researches specialized for predicting the subcellular localization of human proteins that may reside in multiple cellular compartments. In this paper, we propose a multi-label multi-kernel transfer learning model for human protein subcellular localization (MLMK-TLM). MLMK-TLM proposes a multi-label confusion matrix, formally formulates three multi-labelling performance measures and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which to further extends our published work GO-TLM (gene ontology based transfer learning model for protein subcellular localization) and MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for multiplex human protein subcellular localization. With the advantages of proper homolog knowledge transfer, comprehensive survey of model performance for novel protein and multi-labelling capability, MLMK-TLM will gain more practical applicability. The experiments on human protein benchmark dataset show that MLMK-TLM significantly outperforms the baseline model and demonstrates good multi-labelling ability for novel human proteins. Some findings (predictions) are validated by the latest Swiss-Prot database. The software can be freely downloaded at http://soft.synu.edu.cn/upload/msy.rar.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22719847 PMCID: PMC3374840 DOI: 10.1371/journal.pone.0037716
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration of divergent homolog in terms of subcellular localization.
Optimal performance on 3681 human locative protein dataset.
|
|
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
| ||
|
| 77 | 0.9063 | 0.7532 | 0.8229 | 0.8772 | 0.6494 | 0.7504 | 0.8235 | 0.7273 | 0.7694 |
|
| 817 | 0.7845 | 0.8556 | 0.7704 | 0.8061 | 0.8042 | 0.7552 | 0.7380 | 0.8446 | 0.7322 |
|
| 79 | 0.9123 | 0.6582 | 0.7710 | 0.7910 | 0.6709 | 0.7231 | 0.8333 | 0.6329 | 0.7212 |
|
| 24 | 0.9167 |
| 0.6467 | 0.8000 |
| 0.6306 | 0.9167 |
| 0.6467 |
|
| 229 | 0.9302 | 0.8734 | 0.8951 | 0.9151 | 0.8472 | 0.8730 | 0.8818 | 0.8472 | 0.8557 |
|
| 385 | 0.9525 | 0.8857 | 0.9096 | 0.9284 | 0.8753 | 0.8906 | 0.9413 | 0.8753 | 0.8977 |
|
| 161 | 0.9214 | 0.8012 | 0.8534 | 0.9161 | 0.8137 | 0.8576 | 0.8705 | 0.7516 | 0.8010 |
|
| 77 | 1.0000 | 0.7143 | 0.8426 | 0.9828 | 0.7403 | 0.8504 | 0.9825 | 0.7273 | 0.8426 |
|
| 24 | 0.9500 | 0.7917 | 0.8665 | 0.8947 | 0.7083 | 0.7949 | 0.8947 | 0.7083 | 0.7949 |
|
| 364 | 0.9620 | 0.9038 | 0.9255 | 0.9477 | 0.8956 | 0.9131 | 0.9339 | 0.8544 | 0.8825 |
|
| 1021 | 0.8479 | 0.9334 | 0.8486 | 0.8156 | 0.9314 | 0.8238 | 0.8287 | 0.8815 | 0.8028 |
|
| 47 | 0.9750 | 0.8298 | 0.8983 | 0.9318 | 0.8723 | 0.9004 | 0.9762 | 0.8723 | 0.9219 |
|
| 354 | 0.8746 | 0.8672 | 0.8576 | 0.8169 | 0.8446 | 0.8130 | 0.8680 | 0.8362 | 0.8370 |
|
| 22 | 1.0000 |
| 0.7375 | 1.0000 |
| 0.7375 | 1.0000 |
| 0.7060 |
|
| 87.04%/0.8606 | 85.22%/0.8411 | 83.97%/0.8277 | |||||||
Figure 2Performance on 3681 human protein dataset with varying homologs.
Figure 3Kernel weight estimation on 3681 human locative protein dataset.
Multi-labelling evaluation for optimistic case.
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
| |||
|
| 480 | 3.33% |
|
| 0 | 0 | 42.92% |
|
|
|
|
|
| 43 | 2.33% |
|
|
| 0 | 13.95% |
|
|
| 0 |
|
| 3 | 0 |
|
|
|
| 33.33% | 0 | 0 |
| 0 |
Multi-labelling evaluation for pessimistic case.
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
| |||
|
| 480 | 3.75% |
|
| 0 | 0 | 34.58% |
|
|
| 2.50% |
|
| 43 | 0 |
|
|
| 0 | 9.30% |
|
|
| 0 |
|
| 3 | 0 |
|
| 66.67% |
| 0 | 0 |
| 0 |
|
Multi-labelling evaluation—perfect label match.
|
| |
|
| Q9UHB9;Q9UBQ5;Q9BT78;Q6SJ96;Q86W56;Q14145;Q9GZM5;Q9ULJ6;Q96Q15;Q15025;P06734;Q9NTJ3;O95391;Q92973;Q9UNS2;Q53HL2;Q9BZZ5;O43592;Q13352;Q9UHD9;Q9UHB6;Q9UQ80;P61221;Q9C0E2;P61011;P35520;O43324;Q9UI26;Q14738;P04792;Q9NYL5;Q7L7V1;Q86VP3;Q15392;Q7Z699;Q07954;Q13421;P11532;Q15004;Q12794;Q9P1T7;Q96QU8;Q9GZU1;Q9UKA2;O95070;P00488;O95153;Q12830;P16989;Q9Y6A2;Q92934;O60869;Q8N2I9;Q8N488;P38936;P16442;Q14493;Q9Y282;Q9HAP2;Q92551;Q9Y3C4;Q9NS86;Q96DU7;Q14978;P07954;Q9NXE4;P37840;Q9NRA8;P15907;Q08211;O76054;Q02880;O95997;Q8TEM1;P24071;Q969M3;Q9Y251;O95406;Q8NFM7;Q8WXI7;P14060;Q92820;P30519;Q96S99;Q7Z417;O95163;P23490;Q9BUP3;Q96Q89;Q96FX2;Q9Y6K9;O43157;O75533;Q49AN0;O75312;P56693;O60921;O95456;Q9Y6Q9;Q9BZG8;Q8N668;Q86X55;P35052;Q86UB2;Q99436;P29728;Q96PL5;Q9NY26;O14492;O15360;O75419;Q15020;P31512;O95273;Q8TEQ6;Q96PM5;Q9UNY4;O14980;Q9BV57;Q92681;Q6AZY7;O95479;Q9NZ42;O43174;Q93084;P20309;Q71F23;Q08J23;Q99519;Q86UW9;Q49MI3;P52630;P62826;Q00597;P61457;O00628;P60900;Q9H8E8;Q96C86;Q9BZJ0;Q9H8T0;Q9UKT4;Q12934;Q06787;Q13485;Q8IVL5;Q6RW13;Q8IWL8;Q13363;Q9Y314;P55060;Q9GZY1;O95453;Q07955;Q8WXG1;Q9C000;Q6IMN6;Q13765;O94829;Q9BRS8;O75365;P13987;O15354;Q9NSB8;Q14512;Q8TC92;Q96BI3;Q9NPH3;Q14703;P78329;Q04656;P58340;Q15276;O75884;Q7L5N1;Q969Q6;Q96KS0;Q86UA6;Q9NQW1;Q13098;Q7Z4G1;Q9BWU0;Q9BWS9;P21580;Q9UN88;Q13868;Q9NPD3;O14929;O15304;P61960;Q15650;Q13231;Q9NXR7;Q86YF9;P08684;Q9NWZ8;Q9BPZ7;P50748;Q03933;Q9NYF8;Q9Y397;Q96FF9;Q14534;P50542;Q00978;Q9H9T3;Q11203;O75881;Q9Y6K0;Q9P212;P43681;O43493;Q9UM00; |
|
| Q9UHB9;Q9UBQ5;Q9BT78;Q6SJ96;Q86W56;Q9GZM5;Q9ULJ6;O43663;Q96Q15;Q15025;Q9NTJ3;O95391;Q92973;Q9UNS2;Q53HL2;Q8IZY2;Q9BZZ5;O43592;Q13352;Q9UHB6;Q9UQ80;Q9C0E2;P22059;P61011;P35520;O43324;Q9UI26;Q14738;P04792;Q9NYL5;Q7L7V1;Q86VP3;Q15392;Q9UHK6;Q7Z699;P08962;Q15004;Q12794;Q9P1T7;Q96QU8;Q9GZU1;Q9UKA2;O95070;P00488;O95153;Q12830;P16989;Q13316;Q9Y6A2;Q92934;O60869;Q8N2I9;Q8N488;P38936;P16442;Q14493;Q9Y282;Q9HAP2;Q92551;Q9Y3C4;Q9NS86;Q96DU7;Q14978;P07954;Q9NXE4;Q9Y5Z9;P37840;Q9NRA8;P15907;Q08211;O76054;Q02880;O95997;Q8TEM1;Q969M3;Q9Y251;O95406;Q8WXI7;P14060;Q92820;P30519;Q96S99;Q7Z417;O95163;Q9BUP3;Q96Q89;Q96FX2;Q9Y6K9;O75533;Q49AN0;O75312;P56693;O60921;O95456;Q9Y6Q9;Q9BZG8;Q8N668;Q86X55;P35052;Q86UB2;Q99436;Q96PL5;O15360;O75419;Q15020;O95273;Q8TEQ6;Q96PM5;Q9UNY4;O14980;Q9BV57;Q92681;O95479;Q9NZ42;O43174;Q8ND25;P20309;Q71F23;Q08J23;Q99519;Q86UW9;P52630;P62826;Q00597;P61457;O00628;Q9H8E8;Q96C86;Q9BZJ0;Q9UKT4;Q12934;Q06787;Q13485;Q6RW13;Q8IWL8;Q13363;Q9Y314;P55060;Q9GZY1;O95453;Q07955;Q9C000;Q13765;O94829;Q9BRS8;O15354;Q9NSB8;Q14512;Q8TC92;Q96BI3;Q9NPH3;Q14703;P78329;P58340;Q15276;O75884;Q7L5N1;Q969Q6;Q96KS0;Q86UA6;Q92624;Q13098;Q7Z4G1;Q9BWU0;Q9BWS9;P21580;Q13868;Q9NPD3;O14929;P61960;Q15650;Q13231;Q9NXR7;Q86YF9;P08684;Q03135;Q9NWZ8;Q9BPZ7;P50748;Q03933;Q9NYF8;Q9Y397;Q96FF9;Q14534;P50542;Q00978;Q11203;O75881;Q9Y6K0;Q9P212; |
|
| Q9UBQ5;Q9BT78;Q6SJ96;Q86W56;Q9ULJ6;O43663;Q96Q15;Q15025;P06734;Q9NTJ3;O95391;Q92973;Q9UNS2;Q53HL2;Q9BZZ5;O43592;Q13352;Q9UHB6;Q9UQ80;Q9C0E2;P61011;P35520;O43324;Q9UI26;Q14738;P04792;Q9NYL5;Q7L7V1;Q15392;Q9UHK6;Q7Z699;P11532;Q15004;Q12794;Q9P1T7;Q96QU8;Q9UKA2;A5X5Y0;P00488;O95153;Q12830;P16989;Q9Y6A2;Q92934;O60869;Q8N488;P38936;P16442;Q14493;Q9HAP2;Q92551;Q9Y3C4;Q9NS86;Q96DU7;Q14978;P07954;Q9NXE4;Q9Y5Z9;P37840;Q9NRA8;P15907;Q08211;O76054;Q02880;O95997;Q9UKL3;P24071;Q969M3;Q9Y251;O95406;Q92820;P30519;Q96S99;Q7Z417;O95163;Q9BUP3;Q96Q89;Q96FX2;O75533;Q49AN0;P56693;O60921;O95456;Q9Y6Q9;Q9BZG8;Q86X55;P35052;P58335;Q86UB2;Q99436;Q96PL5;O15360;O75419;Q15020;O95273;Q8TEQ6;Q96PM5;Q9UNY4;O14980;Q9BV57;Q92681;Q9NZ42;O43174;P20309;Q71F23;Q08J23;Q86UW9;P52630;P62826;Q00597;P61457;Q9H8E8;Q96C86;Q9BZJ0;Q08378;Q9UKT4;Q06787;Q13485;Q8IWL8;Q13363;Q9Y314;P55060;O95453;Q07955;Q9C000;Q6IMN6;Q13765;O94829;Q9BRS8;Q9UKT7;P49721;Q14512;Q8TC92;Q96BI3;Q9NPH3;Q14703;P8329;Q04656;P58340;O75884;Q7L5N1;Q969Q6;Q96KS0;Q86UA6;Q92624;Q13439;Q13098;Q7Z4G1;Q9BWU0;Q9BWS9;P21580;Q13868;Q9NPD3;O14929;P61960;Q13231;Q9NXR7;P08684;Q9NWZ8;P50748;Q03933;Q9NYF8;Q9Y397;Q96FF9;Q14534;Q00978;Q53QV2;Q9H9T3;Q11203;O75881; |
Multi-labelling evaluation—non-perfect label match.
|
|
| ||
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
Illustrations:
[1] True Subcellular Locations : denotes the labels from Hum-mPLoc 2.0 dataset (GOA database version 70.0 released March 10 2008);
[2] http://www.uniprot.org/uniprot/ UniProt release 2011_11 Nov 16, 2011), where [0.136] denotes the probability that the protein is assigned to the label Cytoskeleton;
[3] Nucleus[0.115]: Non-target Label Hit (wrong prediction), NOT validated by the latest Swiss-Prot database ( http://www.uniprot.org/uniprot/ UniProt release 2011_11 Nov 16, 2011), where [0.115] denotes the probability that the protein is assigned to the label Nucleus;
Multi-labelling evaluation for moderate case.
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
| |||
|
| 480 | 4.37% |
|
| 0 | 0 | 38.75% |
|
|
| 0.63% |
|
| 43 | 0 |
|
|
| 0 | 9.30% |
|
|
| 0 |
|
| 3 | 0 |
|
|
|
| 0 |
|
| 0 | 0 |