| Literature DB >> 36172313 |
Xiaoli Jiang1, Jing Zhou1, Xinyue Qiao1, Chang Peng1, Shiwen Su1.
Abstract
In this paper, a novel distance-based multilabel classification algorithm is proposed. The proposed algorithm combines k-nearest neighbors (kNN) with neighborhood classifier (NC) to impose double constraints on the quantity and distance of the neighbors. In short, the radius constraint is introduced in the kNN model to improve the classification accuracy, and the quantity constraint k is added in the NC model to speed up computing. From the neighbors with the double constraints, the probabilities for each label are estimated by the Bayesian rule, and the classification judgment is made according to the probabilities. Experimental results show that the proposed algorithm has slight advantages over similar algorithms in calculation speed and classification accuracy.Entities:
Year: 2022 PMID: 36172313 PMCID: PMC9512597 DOI: 10.1155/2022/9891971
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1(a) 3NN and (b) bounded 3NN for nonuniformly distributed data.
Pseudo-code of ML-kNN.
| Input: |
|
|---|---|
| Output: |
|
| Step 1 | For |
| Step 2 | Identify |
| Step 3 | End for |
| Step 4 | For |
| Step 5 | Learn the probabilities |
| Step 8 | End for |
| Step 9 | Identify |
| Step 5 | For |
| Step 6 | Calculate the statistics |
| Step 8 | End for |
| Step 7 | Return |
Pseudo-code of NC.
| Input: |
|
|---|---|
| Output: |
|
| Step 1 | For |
| Step 2 | Identify |
| Step 3 | End for |
| Step 4 | For |
| Step 5 | Learn the probabilities |
| Step 8 | End for |
| Step 9 | Identify |
| Step 5 | For |
| Step 6 | Calculate the statistics |
| Step 8 | End for |
| Step 7 | Return |
Pseudo-code of BML-kNN.
| Input: |
|
|---|---|
| Output: |
|
| Step 1 | For |
| Step 2 | Compute |
| Step 3 |
|
| Step 4 | For |
| Step 5 | Compute |
| Step 6 | If |
| Step 7 |
|
| Step 8 | Else |
| Step 9 |
|
| Step 10 | End if |
| Step 11 | End for |
| Step 12 | For |
| Step 13 | | |
| Step 14 | Compute |
| Step 15 | End for |
| Step 16 | End for |
| Step 17 | For |
| Step 18 | Compute |
| Step 19 | Compute |
| Step 20 | End for |
Multilabel data sets.
| Dataset | Type | Instances | Features | Label | Domain |
|---|---|---|---|---|---|
| Enron | Nominal | 1702 | 1001 | 53 | Text |
| Medical | Nominal | 978 | 1449 | 45 | Text |
| Core15k | Nominal | 5000 | 499 | 374 | Images |
| Genbase | Nominal | 662 | 1185 | 27 | Biology |
| Yeast | Numerical | 2417 | 103 | 14 | Biology |
| Emotion | Numerical | 593 | 72 | 6 | Music |
| CAL500 | Numerical | 500 | 62 | 174 | Music |
| Scene | Numerical | 2407 | 294 | 6 | Images |
| Mediamill | Numerical | 43907 | 120 | 101 | Video |
| Nus-wide | Numerical | 269648 | 129 | 81 | Images |
Figure 2Variation of evaluation metrics with radius of neighborhood on dataset Scene.
Figure 3Variation of evaluation metrics with radius of neighborhood on dataset Yeast.
Figure 4Variation of evaluation metrics with parameter k on dataset Yeast.
Figure 5Variation of evaluation metrics with parameter k and radius on dataset Yeast.
Average running time.
| Dataset | ML-kNN | NC | BOOSTEXTER | RANK-SVM | BML-kNN |
|---|---|---|---|---|---|
| Yeast | 1.73 | 3.65 | 1.78 | 1.88 | 1.12 |
| Emotion | 0.49 | 0.26 | 0.33 | 0.58 | 0.11 |
| Genbase | 0.16 | 0.33 | 0.17 | 0.23 | 0.19 |
| Scene | 1.12 | 1.35 | 1.18 | 1.16 | 0.61 |
| Enron | 0.84 | 2.52 | 0.86 | 0.89 | 0.97 |
| Medical | 0.34 | 0.82 | 0.62 | 0.78 | 0.39 |
| Core15k | 12.1 | 46.4 | 18.2 | 19.2 | 15.8 |
| CAL500 | 1.53 | 1.87 | 1.49 | 1.64 | 1.43 |
| Mediamill | 82.1 | 90.7 | 87.0 | 81.1 | 76.6 |
| Nus-wide | 565 | 596 | 529 | 581 | 547 |
Hamming loss.
| Dataset | ML-kNN | NC | BOOSTEXTER | RANK-SVM | BML-kNN |
|---|---|---|---|---|---|
| Yeast | 0.22 | 0.26 | 0.28 | 0.27 | 0.21 |
| Emotion | 0.27 | 0.29 | 0.26 | 0.24 | 0.23 |
| Genbase | 0.04 | 0.07 | 0.05 | 0.06 | 0.03 |
| Scene | 0.09 | 0.09 | 0.11 | 0.07 | 0.10 |
| Enron | 0.06 | 0.07 | 0.08 | 0.07 | 0.05 |
| Medical | 0.01 | 0.02 | 0.03 | 0.03 | 0.02 |
| Core15k | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 |
| CAL500 | 0.11 | 0.13 | 0.12 | 0.12 | 0.09 |
| Mediamill | 0.03 | 0.04 | 0.05 | 0.04 | 0.02 |
| Nus-wide | 0.02 | 0.02 | 0.03 | 0.02 | 0.01 |
One-error.
| Dataset | ML-kNN | NC | BOOSTEXTER | RANK-SVM | BML-kNN |
|---|---|---|---|---|---|
| Yeast | 0.25 | 0.29 | 0.26 | 0.28 | 0.22 |
| Emotion | 0.38 | 0.39 | 0.36 | 0.37 | 0.35 |
| Genbase | 0.02 | 0.02 | 0.03 | 0.01 | 0.02 |
| Scene | 0.24 | 0.31 | 0.27 | 0.28 | 0.22 |
| Enron | 0.31 | 0.34 | 0.38 | 0.41 | 0.39 |
| Medical | 0.27 | 0.28 | 0.31 | 0.28 | 0.26 |
| Core15k | 0.74 | 0.73 | 0.79 | 0.76 | 0.80 |
| CAL500 | 0.12 | 0.15 | 0.17 | 0.12 | 0.11 |
| Mediamill | 0.17 | 0.19 | 0.20 | 0.18 | 0.16 |
| Nus-wide | 0.57 | 0.60 | 0.67 | 0.58 | 0.56 |
Coverage.
| Dataset | ML-kNN | NC | BOOSTEXTER | RANK-SVM | BML-kNN |
|---|---|---|---|---|---|
| Yeast | 6.22 | 6.31 | 6.25 | 6.27 | 6.16 |
| Emotion | 2.31 | 2.36 | 2.33 | 2.37 | 2.29 |
| Genbase | 0.56 | 0.53 | 0.56 | 0.61 | 0.59 |
| Scene | 0.45 | 0.47 | 0.49 | 0.50 | 0.44 |
| Enron | 15.4 | 15.1 | 14.4 | 14.1 | 13.9 |
| Medical | 2.49 | 2.45 | 2.61 | 2.56 | 2.64 |
| Core15k | 115 | 112 | 121 | 113 | 122 |
| CAL500 | 113 | 117 | 126 | 123 | 126 |
| Mediamill | 19.0 | 20.1 | 20.2 | 19.6 | 18.9 |
| Nus-wide | 14.0 | 13.6 | 13.9 | 13.8 | 13.3 |
Average precision.
| Dataset | ML-kNN | NC | BOOSTEXTER | RANK-SVM | BML-kNN |
|---|---|---|---|---|---|
| Yeast | 0.78 | 0.73 | 0.77 | 0.83 | 0.87 |
| Emotion | 0.62 | 0.67 | 0.65 | 0.69 | 0.71 |
| Genbase | 0.98 | 0.98 | 0.97 | 0.98 | 0.99 |
| Scene | 0.89 | 0.86 | 0.83 | 0.82 | 0.86 |
| Enron | 0.63 | 0.59 | 0.62 | 0.61 | 0.65 |
| Medical | 0.80 | 0.89 | 0.83 | 0.85 | 0.87 |
| Core15k | 0.23 | 0.21 | 0.20 | 0.24 | 0.19 |
| CAL500 | 0.46 | 0.43 | 0.45 | 0.47 | 0.49 |
| Mediamill | 0.69 | 0.67 | 0.66 | 0.69 | 0.72 |
| Nus-wide | 0.46 | 0.47 | 0.49 | 0.51 | 0.52 |
Ranking loss.
| Dataset | ML-kNN | NC | BOOSTEXTER | RANK-SVM | BML-kNN |
|---|---|---|---|---|---|
| Yeast | 0.18 | 0.19 | 0.23 | 0.21 | 0.25 |
| Emotion | 0.25 | 0.23 | 0.22 | 0.23 | 0.26 |
| Genbase | 0.02 | 0.01 | 0.03 | 0.02 | 0.02 |
| Scene | 0.09 | 0.14 | 0.11 | 0.13 | 0.07 |
| Enron | 0.10 | 0.12 | 0.13 | 0.09 | 0.12 |
| Medical | 0.05 | 0.1297 | 0.19 | 0.19 | 0.14 |
| Core15k | 0.14 | 0.13 | 0.12 | 0.17 | 0.14 |
| CAL500 | 0.18 | 0.18 | 0.19 | 0.19 | 0.17 |
| Mediamill | 0.06 | 0.07 | 0.09 | 0.08 | 0.05 |
| Nus-wide | 0.12 | 0.14 | 0.13 | 0.12 | 0.11 |