| Literature DB >> 24587817 |
Anton Kolesov1, Dmitry Kamyshenkov2, Maria Litovchenko3, Elena Smekalova4, Alexey Golovizin2, Alex Zhavoronkov3.
Abstract
Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset (or even for all of them) some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely labeled dataset, it performs ineffectively. To overcome the problem, we added an extra step, training set modification, before training a classifier. In this paper, we try two algorithms for training set modification: weighted k-nearest neighbor (WkNN) and soft supervised learning (SoftSL). Both of these approaches are based on similarity measurements between data vectors. We performed the experiments on AgingPortfolio (text dataset) and then rechecked on the Yeast (nontext genetic data). We tried SVM and RF classifiers for the original datasets and then for the modified ones. For each dataset, our experiments demonstrated that both classification algorithms performed considerably better when preceded by the training set modification step.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24587817 PMCID: PMC3920912 DOI: 10.1155/2014/781807
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Decision rules before and after missing label restoration.
Microaveraged precision, recall, and F 1-measure (F 1), obtained on AgingPortfilio dataset with different classification methods.
| Method | Precision | Recall |
|
|---|---|---|---|
| SVM with fixed parameters | 0.8649 | 0.1983 | 0.2977 |
| SVM with parameter tuning | 0.7727 | 0.3302 | 0.4159 |
| SVM, del+WkNN | 0.5538 | 0.4452 | 0.4439 |
| SVM, add+WkNN | 0.4664 | 0.5684 | 0.4707 |
| SVM, del+SoftSL | 0.2132 | 0.6914 | 0.3259 |
| SVM, add+SoftSL | 0.3850 | 0.5639 | 0.4576 |
Average number of categories per document and documents per category in AgingPortfolio training set before and after modification.
| Modification method | Categories per doc. | Docs. in category |
|---|---|---|
| No modification | 4.4 | 45.09 |
| Add+WkNN | 15.15 | 155.14 |
| Add+SoftSL | 16.6 | 168.87 |
Figure 2CROC curves for different categories of AgingPortfoliio (SVM classification with WkNN analysis).
Figure 3CROC curves for different categories of AgingPortfoliio (SVM classification with SoftSL analysis).
Figure 4Comparison of WkNN and SoftSL analysis with SVM classification: CROC curves for different categories of AgingPortfolio.
Microaveraged results for category “experimental techniques: in vivo methods” (AgingPortfilio dataset).
| Method | Precision | Recall |
|
|---|---|---|---|
| SVM only | 0.6 | 0.12 | 0.2 |
| With del+WkNN | 0.7879 | 0.26 | 0.391 |
| With add+WkNN | 0.66 | 0.33 | 0.44 |
| With del+SoftSL | 0.2140 | 0.64 | 0.3208 |
| With add+SoftSL | 0.4653 | 0.47 | 0.4677 |
Microaveraged results for category “cancer and related diseases: malignant neoplasms including in situ” (AgingPortfilio dataset).
| Method | Precision | Recall |
|
|---|---|---|---|
| SVM only | 0.4444 | 0.4 | 0.4211 |
| with del+WkNN | 0.1277 | 0.9 | 0.2236 |
| with add+WkNN | 0.24 | 0.9 | 0.3789 |
| with del+SoftSL | 0.0549 | 1.0 | 0.1040 |
| with add+SoftSL | 0.1032 | 0.975 | 0.1866 |
Microaveraged results for category “aging mechanisms by anatomy: cell level” (AgingPortfilio dataset).
| Method | Precision | Recall |
|
|---|---|---|---|
| SVM only | 0.5167 | 0.3827 | 0.4397 |
| With del+WkNN | 0.5946 | 0.2716 | 0.3729 |
| With add+WkNN | 0.4123 | 0.5802 | 0.4821 |
| With del+SoftSL | 0.5342 | 0.4815 | 0.5065 |
| With add+SoftSL | 0.4182 | 0.5679 | 0.4817 |
Microaveraged results for category “aging mechanisms by anatomy: cell level: cellular substructures” (AgingPortfilio dataset).
| Method | Precision | Recall |
|
|---|---|---|---|
| SVM only | 0.52 | 0.2167 | 0.3059 |
| With del+WkNN | 1.0 | 0.0167 | 0.0328 |
| With add+WkNN | 0.4493 | 0.5167 | 0.4806 |
| With del+SoftSL | 0.75 | 0.15 | 0.25 |
| With add+SoftSL | 0.3667 | 0.55 | 0.44 |
Microaveraged results for AgingPortfolio dataset obtained with Random Forest Classification with different training set modifications.
| Method | Precision | Recall |
|
|---|---|---|---|
| RF only | 0.4738 | 0.2033 | 0.2467 |
| With del+WkNN | 0.4852 | 0.2507 | 0.2870 |
| With add+WkNN | 0.3058 | 0.4255 | 0.3194 |
Microaveraged results for SVM, trained on “incompletely labeled” Yeast dataset (with different fraction of deleted labels p) with add+WkNN label restoration. Optimal WkNN parameters k and T are acquired via grid search.
|
| Optimal | Optimal | Precision | Recall |
|
|---|---|---|---|---|---|
| 0 | — | — | — | — | — |
| 0.1 | 10 | 0.3 | 0.6582 | 0.6847 | 0.6712 |
| 0.2 | 10 | 0.25 | 0.6525 | 0.6811 | 0.6665 |
| 0.3 | 10 | 0.15 | 0.6357 | 0.7137 | 0.6725 |
| 0.4 | 10 | 0.1 | 0.6604 | 0.669 | 0.6648 |
| 0.5 | 10 | 0.05 | 0.6225 | 0.7259 | 0.6702 |
| 0.6 | 10 | 0.05 | 0.6248 | 0.7261 | 0.6716 |
Microaveraged results for SVM, trained on “incompletely labeled” Yeast dataset (with different fraction of deleted labels p).
|
| Precision | Recall |
|
|---|---|---|---|
| 0 | 0.7176 | 0.5707 | 0.6358 |
| 0.1 | 0.7337 | 0.5233 | 0.6109 |
| 0.2 | 0.7354 | 0.4056 | 0.5229 |
| 0.3 | 0.6260 | 0.2442 | 0.3513 |
| 0.4 | 0.3544 | 0.1191 | 0.1783 |
| 0.5 | 0 | 0 | 0 |
| 0.6 | 0 | 0 | 0 |
Microaveraged results for RF, trained on “incompletely labeled” Yeast dataset (with different fraction of deleted labels p).
|
| Precision | Recall |
|
|---|---|---|---|
| 0 | 0.6340 | 0.5087 | 0.5315 |
| 0.1 | 0.6081 | 0.4613 | 0.4959 |
| 0.2 | 0.5693 | 0.3648 | 0.4133 |
| 0.3 | 0.5788 | 0.3240 | 0.3873 |
| 0.4 | 0.5068 | 0.2471 | 0.3094 |
| 0.5 | 0.5194 | 0.2431 | 0.3104 |
| 0.6 | 0.4354 | 0.1621 | 0.2224 |
Microaveraged results for RF, trained on “incompletely labeled” Yeast dataset (with different fraction of deleted labels p) with add+WkNN label restoration. Optimal WkNN parameters k and T are acquired via grid search.
|
| Optimal | Optimal | Precision | Recall |
|
|---|---|---|---|---|---|
| 0 | — | — | — | — | — |
| 0.1 | 10 | 0.3 | 0.5940 | 0.7120 | 0.6224 |
| 0.2 | 10 | 0.25 | 0.5626 | 0.7462 | 0.6146 |
| 0.3 | 10 | 0.15 | 0.5282 | 0.7992 | 0.6095 |
| 0.4 | 10 | 0.10 | 0.5223 | 0.7648 | 0.5940 |
| 0.5 | 10 | 0.05 | 0.4672 | 0.8388 | 0.5726 |
| 0.6 | 15 | 0.05 | 0.4547 | 0.8695 | 0.5707 |