| Literature DB >> 33267048 |
Georgios Feretzakis1, Dimitris Kalles1, Vassilios S Verykios1.
Abstract
The sharing of data among organizations has become an increasingly common procedure in several areas like banking, electronic commerce, advertising, marketing, health, and insurance sectors. However, any organization will most likely try to keep some patterns hidden once it shares its datasets with others. This article focuses on preserving the privacy of sensitive patterns when inducing decision trees. We propose a heuristic approach that can be used to hide a certain rule which can be inferred from the derivation of a binary decision tree. This hiding method is preferred over other heuristic solutions like output perturbation or cryptographic techniques-which limit the usability of the data-since the raw data itself is readily available for public use. This method can be used to hide decision tree rules with a minimum impact on all other rules derived.Entities:
Keywords: data sharing; decision trees; entropy; hiding rules; information gain; privacy preserving
Year: 2019 PMID: 33267048 PMCID: PMC7514818 DOI: 10.3390/e21040334
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1A binary decision tree before (left) and after (right) hiding and the associated rule sets.
Figure 2(a) Original decision tree; (b) modified decision tree with the absence of
Figure 3Original decision tree and Attribute 2 (A2) to be hidden.
(a) Original and modified attributes’ values; (b) original and modified gain ratios.
| (a) | (b) | ||||||
|---|---|---|---|---|---|---|---|
| #1 | #2 | #1 | #2 | Attribute | Original Gain Ratio | Modified Gain Ratio | |
| A1 | t | t | t | t | A1 | 0 | 0 |
| A2 | f | t | f |
| A2 | 1 | 0 |
| A3 | t | t | t | t | A3 | 0 | 0 |
| A4 | f | f | f | f | A4 | 0 | 0 |
| A5 | t | t | t | t | A5 | 0 | 0 |
| A6 | f | t | f |
| A6 | 1 | 0 |
| A7 | f | t | f |
| A7 | 1 | 0 |
| A8 | t | t | t | t | A8 | 0 | 0 |
| A9 | t | f |
| f | A9 | 1 | 0 |
| A10 | f | t | f |
| A10 | 1 | 0 |
| A11 | f | t | f |
| A11 | 1 | 0 |
| A12 | f | t | f |
| A12 | 1 | 0 |
| A13 | f | f | f | f | A13 | 0 | 0 |
| A14 | t | f |
| f | A14 | 1 | 0 |
| A15 | f | f | f | f | A15 | 0 | 0 |
| A16 | t | f |
| f | A16 | 1 | 0 |
| A17 | f | f | f | f | A17 | 0 | 0 |
| A18 | f | f | f | f | A18 | 0 | 0 |
| A19 | f | f | f | f | A19 | 0 | 0 |
| A20 | f | t | f |
| A20 | 1 | 0 |
| A21 | f | f | f | f | A21 | 0 | 0 |
| A22 | f | f | f | f | A22 | 0 | 0 |
| Class | n | p | n | p | |||
Figure 4Final decision tree without Attribute 2 (A2).
Weka output for the original and modified data sets.
| Original | Modified | |
|---|---|---|
| Correctly Classified Instances | 182 | 182 |
| Incorrectly Classified Instances | 5 | 5 |
| Kappa statistic | 0.834 | 0.834 |
| Mean absolute error | 0.0352 | 0.0352 |
| Root-mean-squared error | 0.1328 | 0.1328 |
| Relative absolute error | 29.2968% | 29.2968% |
| Root relative squared error | 48.8664% | 48.8664% |
Figure 5Original decision tree and the attribute (A22) to be hidden.
(a) Modified attributes’ values; (b) original and modified gain ratios.
| (a) | (b) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | Attr | Original Gain Ratio | Modified Gain Ratio | |
| A1 | f | f | t | f | f | f | t | t | A1 | 0.003383098 | 0.020571 |
| A2 | t | t | t | t | t | t | t | t | A2 | 0.813346207 | 1.110375 |
| A3 | f | f | f | f | f | f | f | f | A3 | 0 | 0 |
| A4 | f | f | f | f | f | f | f | f | A4 | 0 | 0 |
| A5 | f | f | f | f | f | f | f | f | A5 | 0 | 0 |
| A6 | f | f | f | f | f | f | f | f | A6 | 0 | 0 |
| A7 | f | f | f | f | f | f | f | f | A7 | 0 | 0 |
| A8 | f | f | f | f | f | f | f | f | A8 | 0 | 0 |
| A9 | f | f | f | f | f | f | f | f | A9 | 0 | 0 |
| A10 | f | f | f | f | f | f | f | f | A10 | 0 | 0 |
| A11 | f | f | f | f | f | f | f | f | A11 | 0 | 0 |
| A12 | f | f | f | f | f | f | f | f | A12 | 0 | 0 |
| A13 | f | f | f | f | f | f | f | f | A13 | 0 | 0 |
| A14 | f | f | f | f | t | f | f | f | A14 | 0.169914322 | 0 |
| A15 | f | f | f | f | f | f | f | f | A15 | 0 | 0 |
| A16 | f | f | f | f |
|
|
| f | A16 | 0.506553684 | 0.784201 |
| A17 | f | f | f | f | f | t | f | f | A17 | 0.169914322 | 0 |
| A18 | f | f | f | f | f | f | f | f | A18 | 0 | 0 |
| A19 | f | f | f | f | f | f | f | f | A19 | 0 | 0 |
| A20 | f | f | f | f | f | f | f | f | A20 | 0 | 0 |
| A21 | f | f | f | f | f | f | f | f | A21 | 0 | 0 |
| A22 |
|
|
| t | t | t | t | t | A22 | 1 | 0 |
| A23 | f | f | f | f | f | f | f | f | A23 | 0 | 0 |
| A24 | t | t | t | t | t | t | t | t | A24 | 0 | 0 |
| A25 | f | f | f | f | f | f | f | f | A25 | 0 | 0 |
| A26 | f | f | f | f | f | f | f | f | A26 | 0 | 0 |
| A27 | f | f | f | f | f | f | f | f | A27 | 0 | 0 |
| A28 | f |
|
| f | f | t | f | t | A28 | 0.251990035 | 0.020571 |
| A29 | t | t | f | t | t | t | f | f | A29 | 0.003383098 | 0.020571 |
| A30 | t | t | t | t | t | t | t | t | A30 | 0 | 0 |
| A31 | f | f | f | f | f | f | f | f | A31 | 0 | 0 |
| A32 | t | f | f |
| t | f | f | f | A32 | 0.019367128 | 0.020571 |
| A33 | f | f | f | f | f | f | f | f | A33 | 0 | 0 |
| Class | p | p | p | n | n | n | n | n | |||
Figure 6Final decision tree without Attribute 22 (A22).
Weka output for the original and modified data sets.
| Original | Modified | |
|---|---|---|
| Correctly Classified Instances | 3159 (98.8423%) | 3157 (98.7797%) |
| Incorrectly Classified Instances | 37 (1.1577%) | 39 (1.2203%) |
| Kappa statistic | 0.9768 | 0.9755 |
| Mean absolute error | 0.0167 | 0.0175 |
| Root-mean-squared error | 0.0914 | 0.0934 |
| Relative absolute error | 3.3492% | 3.4997% |
| Root relative squared error | 18.3009% | 18.7075% |
Figure 7Original decision tree and Attribute 28 (A28) to be hidden.
(a) Original and modified attributes’ values; (b) original and modified gain ratios.
| (a) | (b) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | Attr | Original Gain Ratio | Modified Gain Ratio | ||
| A1 | f | f | f | f | f | f | f | f | A1 | A1 | 0 | 0 |
| A2 | f | f | f | f | f | f | f | f | A2 | A2 | 0 | 0 |
| A3 | f | f | f | f | f | f | f | f | A3 | A3 | 0 | 0 |
| A4 | f | f | f | f | f | f | f | f | A4 | A4 | 0 | 0 |
| A5 | f | f | f | f | f | f | f | f | A5 | A5 | 0 | 0 |
| A6 | f | f | f | f | f | f | f | f | A6 | A6 | 0 | 0 |
| A7 | f | f | f | f | f | f | f | f | A7 | A7 | 0 | 0 |
| A8 | f | f | f | f | f | f | f | f | A8 | A8 | 0 | 0 |
| A9 | f | f | t | f | t | f | f | f | A9 | A9 | 0.15107 | 0 |
| A10 | f | f | f | f | f | f | f | f | A10 | A10 | 0 | 0 |
| A11 | f | f | f | f | f | f | f | f | A11 | A11 | 0.26173 | 0.45915 |
| A12 | t | t | t | t | t | t | t | t | A12 | A12 | 0.98794 | 2.21025 |
| A13 | f | f | f | f | f | f | f | f | A13 | A13 | 0 | 0 |
| A14 | f | f | f | f | f | f | f | f | A14 | A14 | 0 | 0 |
| A15 | f | f | f | f | f | f | f | f | A15 | A15 | 0 | 0 |
| A16 | t |
|
|
|
|
| t | t | A16 | A16 | 0.3529 | 0.96254 |
| A17 | f | f | f | f | f | f | f | f | A17 | A17 | 0 | 0 |
| A18 | f | f | f | f | f | f | f | f | A18 | A18 | 0 | 0 |
| A19 | f | f | f | f | f | f | f | f | A19 | A19 | 0 | 0 |
| A20 | f | f | t | f | t | f | f | f | A20 | A20 | 0.15107 | 0 |
| A21 | f | f | f | f | f | f | f | f | A21 | A21 | 0 | 0 |
| A22 | f | f | f | f | f | f | t | t | A22 | A22 | 0.09092 | 0.27402 |
| A23 | f | f | f | f | f | f | f | f | A23 | A23 | 0 | 0 |
| A24 | t | t | t | t | t | t | t | t | A24 | A24 | 0.63072 | 1.33031 |
| A25 | f | f | f | f | f | f | f | f | A25 | A25 | 0 | 0 |
| A26 | f | f | f | f | f | f | f | f | A26 | A26 | 0 | 0 |
| A27 | f | f | f | f | f | f | f | f | A27 | A27 | 0 | 0 |
| A28 | f | f | f | f | f | f | f |
| A28 | A28 | 0.54007 | 0 |
| A29 | f | f | f | f | f | f | f | f | A29 | A29 | 0 | 0 |
| A30 | f | f | f | f | f | f | f | f | A30 | A30 | 0 | 0 |
| A31 | f | f | f | f | f | f | f | f | A31 | A31 | 0 | 0 |
| A32 | t | t | t | t | t | t | t | t | A32 | A32 | 0 | 0 |
| A33 | t | t | t | t | t | t | t | t | A33 | A33 | 0 | 0 |
| Class | n | p | p | p | p | p | p | n | ||||
Figure 8Final decision tree without Attribute 28 (A28).
Weka output for the original and modified data sets.
| Original | Modified | |
|---|---|---|
| Correctly Classified Instances | 3159 (98.8423%) | 3159 (98.8423%) |
| Incorrectly Classified Instances | 37 (1.1577%) | 37 (1.1577%) |
| Kappa statistic | 0.9768 | 0.9768 |
| Mean absolute error | 0.0167 | 0.0166 |
| Root-mean-squared error | 0.0914 | 0.0911 |
| Relative absolute error | 3.3492% | 3.3253% |
| Root relative squared error | 18.3009% | 18.2355% |