| Literature DB >> 26075251 |
Jian-Sheng Wu1, Hai-Feng Hu2, Shan-Cheng Yan1, Li-Hua Tang1.
Abstract
Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML) learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer from weak-label problem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs) researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26075251 PMCID: PMC4436452 DOI: 10.1155/2015/619438
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Characteristics of the data sets.
| Organism | Examples | Classes | Instances per bag | Labels per example |
|---|---|---|---|---|
|
| 379 | 320 | 3.20 ± 1.21 | 3.14 ± 3.33 |
|
| 373 | 344 | 3.14 ± 1.19 | 3.55 ± 5.00 |
Performance of the MIMLwel methods with different weak-label ratios on two datasets.
| Datasets | W.L.R. | HL↓ | maF1↑ | miF1↑ |
|---|---|---|---|---|
|
| 20% | 0.010 ± 0.002 | 0.003 ± 0.004 | 0.032 ± 0.035 |
| 40% | 0.010 ± 0.002 | 0.009 ± 0.005 | 0.116 ± 0.038 | |
| 60% | 0.010 ± 0.002 | 0.016 ± 0.006 | 0.201 ± 0.034 | |
| 80% | 0.011 ± 0.001 |
|
| |
|
| ||||
|
| 20% | 0.013 ± 0.002 | 0.009 ± 0.008 | 0.145 ± 0.111 |
| 40% | 0.010 ± 0.002 | 0.005 ± 0.003 | 0.092 ± 0.039 | |
| 60% | 0.011 ± 0.003 | 0.010 ± 0.006 | 0.167 ± 0.072 | |
| 80% | 0.011 ± 0.003 |
|
| |
Figure 1The performance of MIMLwel on all two datasets with 80% weak-label ratios (W.L.R.) under different values of scaling factorμ when the fraction parameterα is fixed to 0.1 and different values of the fraction parameterα when the scaling factorμ is fixed to 1.0. The performance of MIMLwel reaches the perk in most cases by setting the scaling factorμ to 1.0 and the fraction parameter α to 0.1.
Comparison results (mean ± std.) of MIMLwel models with four state-of-the-art MIML methods with different weak-label ratios on the Geobacter sulfurreducens dataset.
| W.L.R. | Methods | HL↓ | maF1↑ | miF1↑ |
|---|---|---|---|---|
| 20% | MIMLwel |
| 0.003 ± 0.004 |
|
| MIMLNN |
| 0.000 ± 0.000 | 0.000 ± 0.000 ● | |
| MIMLRBF |
| 0.002 ± 0.003 | 0.002 ± 0.003 ● | |
| MIMLSVM | 0.012 ± 0.002 |
| 0.005 ± 0.003 ● | |
| EnMIMLNN {metric} |
| 0.002 ± 0.002 | 0.001 ± 0.002 ● | |
|
| ||||
| 40% | MIMLwel |
|
|
|
| MIMLNN |
| 0.000 ± 0.000 | 0.000 ± 0.000 ● | |
| MIMLRBF |
| 0.004 ± 0.004 | 0.003 ± 0.003 ● | |
| MIMLSVM | 0.012 ± 0.001 | 0.006 ± 0.003 | 0.006 ± 0.003 ● | |
| EnMIMLNN {metric} |
| 0.003 ± 0.004 | 0.003 ± 0.003 ● | |
|
| ||||
| 60% | MIMLwel | 0.010 ± 0.002 | 0. |
|
| MIMLNN | 0.010 ± 0.001 | 0.001 ± 0.001 | 0.001 ± 0.001 ● | |
| MIMLRBF |
| 0.009 ± 0.007 | 0.008 ± 0.007 ● | |
| MIMLSVM | 0.011 ± 0.001 | 0.008 ± 0.003 | 0.008 ± 0.003 ● | |
| EnMIMLNN {metric} | 0.010 ± 0.001 | 0.009 ± 0.004 | 0.008 ± 0.004 ● | |
|
| ||||
| 80% | MIMLwel | 0.011 ± 0.001 |
|
|
| MIMLNN | 0.010 ± 0.001 | 0.002 ± 0.001 ● | 0.002 ± 0.001 ● | |
| MIMLRBF |
| 0.009 ± 0.004 ● | 0.008 ± 0.004 ● | |
| MIMLSVM | 0.011 ± 0.001 | 0.008 ± 0.002 ● | 0.008 ± 0.002 ● | |
| EnMIMLNN {metric} |
| 0.013 ± 0.004 | 0.012 ± 0.004 ● | |
Comparison results (mean ± std.) of MIMLwel models with four state-of-the-art MIML methods with different weak-label ratios on the Shewanella loihica PV-4 dataset.
| W.L.R. | Methods | HL↓ | maF1↑ | miF1↑ |
|---|---|---|---|---|
| 20% | MIMLwel | 0.013 ± 0.002 |
|
|
| MIMLNN |
| 0.000 ± 0.000 | 0.000 ± 0.000 ● | |
| MIMLRBF | 0.011 ± 0.003 | 0.001 ± 0.001 | 0.001 ± 0.001 ● | |
| MIMLSVM | 0.012 ± 0.002 | 0.005 ± 0.002 | 0.004 ± 0.002 ● | |
| EnMIMLNN {metric} |
| 0.001 ± 0.001 | 0.001 ± 0.001 ● | |
|
| ||||
| 40% | MIMLwel |
|
|
|
| MIMLNN |
| 0.000 ± 0.000 | 0.000 ± 0.000 ● | |
| MIMLRBF |
| 0.001 ± 0.002 | 0.001 ± 0.002 ● | |
| MIMLSVM | 0.012 ± 0.002 | 0.004 ± 0.002 | 0.004 ± 0.002 ● | |
| EnMIMLNN {metric} |
| 0.001 ± 0.003 | 0.001 ± 0.003 ● | |
|
| ||||
| 60% | MIMLwel | 0.011 ± 0.003 |
|
|
| MIMLNN |
| 0.001 ± 0.001 | 0.001 ± 0.001 ● | |
| MIMLRBF |
| 0.004 ± 0.004 | 0.003 ± 0.003 ● | |
| MIMLSVM | 0.012 ± 0.003 | 0.005 ± 0.001 | 0.005 ± 0.002 ● | |
| EnMIMLNN {metric} |
| 0.005 ± 0.003 | 0.004 ± 0.003 ● | |
|
| ||||
| 80% | MIMLwel | 0.011 ± 0.003 |
|
|
| MIMLNN | 0.010 ± 0.003 | 0.002 ± 0.001 | 0.001 ± 0.001 ● | |
| MIMLRBF |
| 0.008 ± 0.005 | 0.007 ± 0.005 ● | |
| MIMLSVM | 0.012 ± 0.003 | 0.005 ± 0.002 | 0.005 ± 0.001 ● | |
| EnMIMLNN {metric} | 0.010 ± 0.003 | 0.006 ± 0.004 | 0.005 ± 0.003 ● | |
Comparison results on two examples.
| Organism/UniProt ID | Molecular function in UniProt | Methods | GO molecular function list | ||
|---|---|---|---|---|---|
|
| (1) 4 iron, 4 sulfur cluster binding | Ground truth | GO:0008270 | GO:0046872 | GO:0000287 |
| GO:0051539 | GO:0030145 | GO:0005506 | |||
| GO:0004160 | |||||
| MIMLwel | GO:0005524 |
|
| ||
|
|
|
| |||
| MIMLNN | Null | ||||
| MIMLRBF |
|
| |||
| MIMLSVM | GO:0050567 | ||||
| EnMIMLNN {metric} |
|
| |||
|
| |||||
|
| (1) ATP binding | Ground truth | GO:0003924 | GO:0005524 | GO:0004386 |
| GO:0008270 | GO:0016887 | GO:0046961 | |||
| GO:0005215 | GO:0017111 | GO:0004004 | |||
| GO:0008094 | GO:0008565 | ||||
| MIMLwel |
|
|
| ||
|
|
|
| |||
| GO:0043565 | |||||
| MIMLNN | Null | ||||
| MIMLRBF |
|
|
| ||
|
|
|
| |||
| MIMLSVM |
| ||||
| EnMIMLNN {metric} |
|
|
| ||
(a) Original
| F1 | F2 | F3 | F4 | F5 | |
|---|---|---|---|---|---|
| P1 | 0 | 1 | 0 | 1 | 0 |
| P2 | 0 | 0 | 1 | 0 | 1 |
| P3 | 1 | 1 | 0 | 0 | 1 |
| P4 | 0 | 1 | 1 | 0 | 0 |
| P5 | 1 | 0 | 0 | 1 | 0 |
| P6 | 0 | 1 | 0 | 0 | 0 |
(b) Task 1
| F1 | F2 | F3 | F4 | F5 | |
|---|---|---|---|---|---|
| P1 | 0 | ? | 0 | 1 | 0 |
| P2 | 0 | 0 | ? | ? | 1 |
| P3 | 1 | ? | 0 | ? | 1 |
| P4 | ? | 1 | 1 | 0 | 0 |
| P5 | 1 | 0 | ? | ? | 0 |
| P6 | 0 | 1 | ? | 0 | 0 |
(c) Task 2
| F1 | F2 | F3 | F4 | F5 | |
|---|---|---|---|---|---|
| P1 | 0 | ? | 0 | 1 | 0 |
| P2 | 0 | 0 | ? | ? | 1 |
| P3 | 1 | ? | 0 | ? | 1 |
| P4 | ? | 1 | 1 | 0 | 0 |
| P5 | ? | ? | ? | ? | ? |
| P6 | ? | ? | ? | ? | ? |