| Literature DB >> 24342053 |
Bostjan Brumen1, Marjan Heričko, Andrej Sevčnikar, Jernej Završnik, Marko Hölbl.
Abstract
BACKGROUND: Medical data are gold mines for deriving the knowledge that could change the course of a single patient's life or even the health of the entire population. A data analyst needs to have full access to relevant data, but full access may be denied by privacy and confidentiality of medical data legal regulations, especially when the data analyst is not affiliated with the data owner.Entities:
Keywords: computer-assisted; confidentiality; data analysis; data protection; medical decision making; patient data privacy
Mesh:
Year: 2013 PMID: 24342053 PMCID: PMC3877744 DOI: 10.2196/jmir.2471
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Weka Explorer user interface showing settings used.
Results of analyses on original and encrypted data files for tree size, number of leaves, and accuracy.
| Database name | Original dataset | Encrypted dataset | ||||
|
| Tree sizea, n | Leavesb, n | Accuracyc, % | Tree sizea, n | Leavesb, n | Accuracyd, % |
| Abalone | 2312 | 1183 | 21.97 | 2312 | 1183 | 21.97 |
| Acute inflammations | 5 | 3 | 100.00 | 5 | 3 | 100.00 |
| Arrhythmia | 99 | 50 | 71.43 | 99 | 50 | 71.43 |
| Audiology (standardized) | 54 | 32 | 83.12 | 54 | 32 | 83.12 |
| Breast cancer | 6 | 4 | 68.04 | 6 | 4 | 68.04 |
| Breast cancer Wisconsin (original) | 27 | 14 | 95.38 | 27 | 14 | 95.38 |
| Breast tissue | 29 | 15 | 47.22 | 29 | 15 | 47.22 |
| Cardiotocography | 19 | 14 | 98.34 | 33 | 25 | 98.34 |
| Contraceptive method choice | 263 | 157 | 55.29 | 263 | 157 | 55.29 |
| Covertype | 29,793 | 14,897 | 93.59 | 29,793 | 14,897 | 93.59 |
| Dermatology | 41 | 31 | 92.74 | 41 | 31 | 92.74 |
| Echocardiogram | 9 | 5 | 70.37 | 9 | 5 | 70.37 |
| Ecoli | 43 | 22 | 78.95 | 43 | 22 | 78.95 |
| Haberman’s survival | 5 | 3 | 75.96 | 5 | 3 | 75.96 |
| Hepatitis | 21 | 11 | 79.25 | 21 | 11 | 79.25 |
| Horse colic | 29 | 18 | 68.55 | 29 | 18 | 68.55 |
| Iris | 9 | 5 | 96.08 | 9 | 5 | 96.08 |
| Lung cancer | 16 | 10 | 63.64 | 16 | 10 | 63.64 |
| Lymphography | 34 | 21 | 78.00 | 34 | 21 | 78.00 |
| Mammographic mass | 15 | 12 | 82.26 | 15 | 12 | 82.26 |
| Mushroom | 30 | 25 | 100.00 | 30 | 25 | 100.00 |
| Pima Indians diabetes | 39 | 20 | 76.25 | 39 | 20 | 76.25 |
| Post-operative patient | 1 | 1 | 70.97 | 1 | 1 | 70.97 |
| Primary tumor | 88 | 47 | 39.13 | 88 | 47 | 39.13 |
| Seeds | 15 | 8 | 97.18 | 15 | 8 | 97.18 |
| Soybean (large) | 93 | 61 | 90.52 | 93 | 61 | 90.52 |
| Spectf heart | 17 | 9 | 66.67 | 17 | 9 | 66.67 |
| Statlog (heart) | 45 | 27 | 76.09 | 45 | 27 | 76.09 |
| Yeast | 369 | 185 | 58.81 | 369 | 185 | 58.81 |
| Zoo | 17 | 9 | 94.12 | 17 | 9 | 94.12 |
aNumber of nodes (measurements) in a tree.
bNumber of decision rules in a tree.
cPercentage of correctly classified original items with respect to all items (ie, the number of times the tree's rules lead to the right decision).
dPercentage of correctly classified encrypted items with respect to all items (ie, the number of times the tree's rules lead to the right decision).
Paired samples statistics.
| Pairs | Mean | SD | SEM | |
|
|
|
|
| |
|
| Original size | 1118.1 | 5432.1 | 991.8 |
|
| Encrypted size | 1118.6 | 5432.0 | 991.8 |
|
|
|
|
| |
|
| Original leaves | 563.5 | 2715.7 | 495.8 |
|
| Encrypted leaves | 563.7 | 2715.7 | 495.8 |
|
|
|
|
| |
|
| Original accuracy | 0.763a | 0.189 | 0.034 |
|
| Encrypted accuracy | 0.763a | 0.189 | 0.034 |
aThe correlation and t test cannot be computed because the standard error of the difference is zero.
Bootstrapped paired samples test results.
| Pairs | Mean | Bias | SEM | 95% CI |
|
| Pair 1: Original size–encrypted size | –0.5 | –0.3 | 0.4 | –2.1, –0.5 | .19 |
| Pair 2: Original leaves–encrypted leaves | –0.2 | –0.1 | 0.2 | –0.9, –0.2 | .19 |
Figure 2Decision tree model to assist diagnosing diabetes mellitus built with plain text data from the Pima Indians Diabetes Dataset.
Figure 3Decision tree model to assist in diagnosing diabetes mellitus built with encrypted data.
List of queries for transforming encrypted data to original.
| Query | Result |
| SELECT original_atribute FROM lookup_table WHERE renamed_attribute=” U2FsdGVkX1/GxsbGxsbGxqJGWHYKVll/Ghr/VuGPcjE=” | 2_hr_postload_plasma_glucose |
| SELECT original_value FROM lookup_table WHERE encrypted_value=255 AND attribute_name=”2_hr_postload_plasma_glucose” | 127 |
| SELECT original_atribute FROM lookup_table WHERE renamed_attribute=” U2FsdGVkX1/GxsbGxsbGxnCuqWM8tY1K+ndRFEKNw6w=” | Body mass index |
| SELECT original_value FROM lookup_table WHERE encrypted_value=53.8 AND attribute_name=”Body mass index” | 26.4 |
| SELECT original_atribute FROM lookup_table WHERE renamed_attribute=” U2FsdGVkX1/GxsbGxsbGxht0/44acjje2SK5W1ldQ24=” | Age |
| SELECT original_value FROM lookup_table WHERE encrypted_value=57 AND attribute_name=”Age” | 28 |
| SELECT original_value FROM lookup_table WHERE encrypted_value=” U2FsdGVkX1/GxsbGxsbGxkb6Q6IlZ7BQpR5rsBg4Oi0=” AND attribute_name=”Class” | Tested_negative |