| Literature DB >> 33286912 |
Zhenwu Wang1, Tielin Wang1, Benting Wan2, Mengjie Han3.
Abstract
Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.Entities:
Keywords: classifier chains; feature selection; label correlation; multi-label classification
Year: 2020 PMID: 33286912 PMCID: PMC7597295 DOI: 10.3390/e22101143
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The framework of the PCC-FS algorithm.
Description of multi-label datasets.
| Dataset | Instance Number | Label Number | Continuous Feature Number | Discrete Feature Number | Density | Field |
|---|---|---|---|---|---|---|
| emotions | 593 | 6 | 72 | 0 | 0.311 | Music |
| CAL500 | 502 | 174 | 68 | 0 | 0.150 | Music |
| yeast | 2417 | 14 | 103 | 0 | 0.303 | biology |
| flags | 194 | 7 | 10 | 9 | 0.485 | Image |
| scene | 2407 | 6 | 294 | 0 | 0.1790 | Image |
| birds | 645 | 19 | 258 | 2 | 0.053 | Audio |
| enron | 1702 | 53 | 0 | 1001 | 0.064 | Text |
Performance comparison of nine algorithms on the emotions dataset.
| Algorithm | Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision |
|---|---|---|---|---|---|
| HOMER | 0.2525(5) | 0.3040(6) | 0.4032(5) | 2.5863(6) | 0.6897(6) |
| LP | 0.2808(6) | 0.3393(7) | 0.4605(7) | 2.6669(7) | 0.6643(7) |
| RAkEL | 0.2153(3) | 0.1891(4) | 0.3120(3) | 1.9464(4) | 0.7758(4) |
| Rank-SVM | 0.3713(9) | 0.4273(9) | 0.6154(9) | 3.0264(8) | 0.5714(9) |
| BP_MLL | 0.3519(8) | 0.4143(8) | 0.5868(8) | 3.0759(9) | 0.5732(8) |
| CC | 0.2171(4) | 0.1729(3) | 0.3124(4) | 1.8134(3) | 0.7852(3) |
| CCE | 0.2064(2) | 0.1646(2) | 0.2719(2) | 1.7699(2) | 0.8001(2) |
| LLSF-DL | 0.2893(7) | 0.2867(5) | 0.4273(6) | 2.3407(5) | 0.6936(5) |
| PCC-FS |
|
|
|
|
|
Performance comparisons of nine algorithms on the CAL500 dataset.
| Algorithm | Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision |
|---|---|---|---|---|---|
| HOMER | 0.2104(8) | 0.4071(8) | 0.8565(8) | 169.3291(8) | 0.2609(8) |
| LP | 0.1993(7) | 0.6559(9) | 0.9880(9) | 171.1590(9) | 0.1164(9) |
| RAkEL | 0.1686(6) | 0.2870(7) | 0.3529(7) | 165.3036(7) | 0.4008(7) |
| Rank-SVM | 0.1376(2) | 0.1824(5) | 0.1156(2) | 129.2589(2) | 0.4986(4) |
| BP_MLL | 0.1480(5) | 0.1815(4) | 0.1186(3) | 130.084(6) | 0.4967(5) |
| CC | 0.1386(4) | 0.1814(3) | 0.1216(4) | 129.698(4) | 0.5025(3) |
| CCE | 0.1377(3) | 0.1781(2) | 0.1255(5) | 129.851(5) | 0.5094(2) |
| LLSF-DL | 0.2330(9) | 0.1985(6) | 0.3500(6) |
| 0.4463(6) |
| PCC-FS |
|
|
| 129.616(3) |
|
Performance comparisons of nine algorithms on the yeast dataset.
| Algorithm | Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision |
|---|---|---|---|---|---|
| HOMER | 0.2619(8) | 0.3287(8) | 0.2871(7) | 9.2457(8) | 0.6259(8) |
| LP | 0.2768(9) | 0.3977(9) | 0.5143(9) | 9.3607(9) | 0.5733(9) |
| RAkEL | 0.2270(6) | 0.2143(6) | 0.2946(8) | 7.5086(6) | 0.7144(6) |
| Rank-SVM | 0.2450(7) | 0.1928(5) | 0.2521(5) | 6.3554(3) | 0.7217(5) |
| BP_MLL | 0.2112(5) | 0.1761(4) | 0.2429(4) | 6.5101(5) | 0.7473(4) |
| CC |
| 0.1699(3) | 0.2238(2) | 6.4200(4) | 0.7596(3) |
| CCE | 0.2015(3) | 0.1679(2) | 0.2284(3) | 6.3475(2) | 0.7645(2) |
| LLSF-DL | 0.2019(4) | 0.2585(7) | 0.2790(6) | 8.5881(7) | 0.7004(7) |
| PCC-FS | 0.2013(2) |
|
|
|
|
Performance comparisons of nine algorithms on the flags dataset.
| Algorithm | Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision |
|---|---|---|---|---|---|
| HOMER | 0.2683(3) | 0.2895(7) | 0.3703(7) | 4.1221(7) | 0.7630(7) |
| LP | 0.2962(6) | 0.5011(8) | 0.5587(8) | 4.9495(8) | 0.6407(8) |
| RAkEL |
| 0.2332(5) | 0.2255(3) | 3.7824(3) | 0.8118(2) |
| Rank-SVM | 0.5931(9) | 0.7052(9) | 0.7618(9) | 5.5303(9) | 0.4987(9) |
| BP_MLL | 0.3225(8) | 0.2226(3) | 0.2288(4) | 3.8734(5) | 0.8028(5) |
| CC | 0.2785(5) | 0.2136(2) | 0.2344(5) |
| 0.8117(3) |
| CCE | 0.2749(4) | 0.2252(4) | 0.2283(2) | 3.8513(4) | 0.8032(4) |
| LLSF-DL | 0.2987(7) | 0.2786(6) | 0.2838(6) | 4.0921(6) | 0.7683(6) |
| PCC-FS | 0.2619(2) |
|
| 3.7492(2) |
|
Performance comparisons of nine algorithms on the scene dataset.
| Algorithm | Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision |
|---|---|---|---|---|---|
| HOMER | 0.1488(7) | 0.2345(9) | 0.4595(9) | 1.2685(9) | 0.6946(9) |
| LP | 0.1439(6) | 0.2121(8) | 0.3984(7) | 1.1550(8) | 0.7308(8) |
| RAkEL | 0.1014(4) | 0.0998(4) | 0.2672(4) | 0.5854(4) | 0.8378(4) |
| Rank-SVM | 0.1501(8) | 0.1039(5) | 0.2863(5) | 0.6266(5) | 0.8275(5) |
| BP_MLL | 0.1859(9) | 0.1383(6) | 0.4550(8) | 0.7725(6) | 0.7411(7) |
| CC | 0.1137(5) |
| 0.2439(2) |
|
|
| CCE | 0.0949(2) | 0.0922(3) | 0.2447(3) | 0.5439(3) | 0.8495(3) |
| LLSF-DL | 0.0998(3) | 0.1898(7) | 0.3482(6) | 1.0536(7) | 0.7579(6) |
| PCC-FS |
| 0.0872(2) |
| 0.5193(2) | 0.8540(2) |
Performance comparisons of nine algorithms on the birds dataset.
| Algorithm | Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision |
|---|---|---|---|---|---|
| HOMER | 0.0641(4) | 0.2076(3) | 0.8231(6) | 5.2117(6) | 0.3806(5) |
| LP | 0.0731(5) | 0.2733(7) | 0.9009(9) | 6.0743(8) | 0.2673(6) |
| RAkEL |
|
| 0.6957(4) | 4.0188(4) | 0.5271(3) |
| Rank-SVM | 0.1080(9) | 0.5359(9) | 0.8273(7) | 6.2380(9) | 0.2421(7) |
| BP_MLL | 0.0587(3) | 0.4486(8) | 0.8637(8) | 5.2591(7) | 0.2023(9) |
| CC | 0.0923(8) | 0.2487(6) | 0.4959(2) | 3.3094(2) | 0.5311(2) |
| CCE | 0.0878(7) | 0.2521(5) | 0.5162(3) | 3.3771(3) | 0.5138(4) |
| LLSF-DL | 0.0536(2) | 0.1809(2) | 0.8229(5) | 4.2128(5) | 0.2239(8) |
| PCC-FS | 0.0827(6) | 0.2265(4) |
|
|
|
Performance comparisons of nine algorithms on the enron dataset.
| Algorithm | Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision |
|---|---|---|---|---|---|
| HOMER | 0.0606(7) | 0.2471(7) | 0.4918(7) | 28.0953(7) | 0.5067(7) |
| LP | 0.0707(8) | 0.5480(8) | 0.8144(8) | 39.5441(8) | 0.2246(8) |
| RAkEL | 0.0484(5) | 0.2011(6) | 0.2774(2) | 25.2163(6) | 0.6156(6) |
| Rank-SVM | 0.0737(9) | 0.6256(9) | 0.8854(9) | 45.4454(9) | 0.1471(9) |
| BP_MLL | 0.0553(6) |
|
|
|
|
| CC | 0.0481(4) | 0.0781(2) | 0.3099(4) | 11.8147(2) | 0.6817(3) |
| CCE | 0.0480(3) | 0.0809(4) | 0.3264(6) | 12.0097(4) | 0.6720(4) |
| LLSF-DL |
| 0.1545(5) | 0.2954(3) | 20.9599(5) | 0.6408(5) |
| PCC-FS | 0.0474(2) | 0.0784(3) | 0.3152(5) | 11.8254(3) | 0.6832(2) |
Figure 2Average rank of nine compared algorithms on different datasets.
Figure 3Evaluation of algorithms using sum of ranking differences. Scaled sum of ranking differences (SRD) values are plotted on x-axis and left y-axis, and right y-axis shows the relative frequencies (black curve). Parameters of the Gaussian fit are m = 66.81 s = 7.28. Probability levels 5% (XX1), Median (Med), and 95% (XX19) are also given.
Summary of the Friedman Statistics and the critical value in terms of each evaluation criterion (k: #comparing algorithms; N: #datasets).
| Evaluation Criteria |
|
|
|---|---|---|
| Hamming Loss | 4.2558 | 2.1382 |
| Ranking Loss | 8.9239 | |
| One-Error | 7.8679 | |
| Coverage | 9.6106 | |
| Average Precision | 13.6875 |
Figure 4Comparison of PCC-FS (control algorithm) against other compared algorithms using the Nemenyi test.
Comparisons between PCC-FS and other algorithms.
| Hamming Loss | Ranking Loss | One Error | Coverage | Average Precision | |
|---|---|---|---|---|---|
| HOMER | √ | √ | √ | √ | |
| LP | √ | √ | √ | √ | √ |
| RAkEL | |||||
| Rank-SVM | √ | √ | √ | √ | √ |
| BP_MLL | √ | ||||
| CC | |||||
| CCE | |||||
| LLSF-DL | √ |
Figure 5Confidence interval for ranking difference.