| Literature DB >> 33265534 |
Mirko Polato1, Ivano Lauriola1,2, Fabio Aiolli1.
Abstract
Kernel based classifiers, such as SVM, are considered state-of-the-art algorithms and are widely used on many classification tasks. However, this kind of methods are hardly interpretable and for this reason they are often considered as black-box models. In this paper, we propose a new family of Boolean kernels for categorical data where features correspond to propositional formulas applied to the input variables. The idea is to create human-readable features to ease the extraction of interpretation rules directly from the embedding space. Experiments on artificial and benchmark datasets show the effectiveness of the proposed family of kernels with respect to established ones, such as RBF, in terms of classification accuracy.Entities:
Keywords: SVM; boolean kernels; categorical data; kernel methods
Year: 2018 PMID: 33265534 PMCID: PMC7512961 DOI: 10.3390/e20060444
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Summary of the just presented Boolean kernels: and stands for the dimension of the feature space.
|
|
|
|
|
|---|---|---|---|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
Datasets information: name, number of instances (# Examples), number of features (# Features), class distribution and number of active variables for example.
| Dataset Name | # Examples | pos/neg (%) | # Features |
|
|---|---|---|---|---|
| 101 | 40/60 | 36 | 16 | |
| 106 | 50/50 | 228 | 57 | |
| 148 | 45/55 | 44 | 15 | |
| 232 | 46/54 | 32 | 16 | |
| 266 | 54/46 | 97 | 35 | |
| 267 | 79/21 | 44 | 22 | |
| 277 | 71/29 | 41 | 9 | |
| 339 | 41/59 | 34 | 15 | |
| 432 | 50/50 | 17 | 6 | |
| 653 | 55/45 | 40 | 9 | |
| 958 | 65/35 | 27 | 9 | |
| 1066 | 49/51 | 41 | 11 | |
| 1728 | 30/70 | 21 | 6 | |
| 2000 | 47/53 | 180 | 47 | |
| 3175 | 48/52 | 240 | 60 | |
| 3196 | 52/48 | 73 | 36 |
AUC performances on benchmark datasets. For each dataset the best performing kernel is highlighted with both the boldface font and with a dot (·).
| Dataset | mC | mD | C | D | mDNF | mCNF | DNF | CNF |
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
| 4.38 | 5.13 | 3.06 | 5.44 | 4.00 | 3.88 | 3.13 |
AUC performances on benchmark datasets. For each dataset the best performing kernel is highlighted with both the boldface font and with a dot (·).
| Dataset | Linear | RBF | DNF [ | d-mDNF | Tanimoto | avg.mDNF | avg.mCNF | avg.DNF | avg.CNF | |
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
| 7.25 | 5.25 | 7.81 | 4.13 | 5.00 | 5.13 | 5.31 | 4.56 | 3.50 |