| Literature DB >> 25352565 |
Yongan Zhao1, Xiaofeng Wang1, Xiaoqian Jiang2, Lucila Ohno-Machado2, Haixu Tang1.
Abstract
OBJECTIVE: To propose a new approach to privacy preserving data selection, which helps the data users access human genomic datasets efficiently without undermining patients' privacy.Entities:
Keywords: Differential Privacy; Genome-wide association studies; Haplotype blocks; Privacy-preserving techniques; Single nucleotide polymorphisms (SNPs); Test statistics
Mesh:
Substances:
Year: 2014 PMID: 25352565 PMCID: PMC4433380 DOI: 10.1136/amiajnl-2014-003043
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1:The privacy risks of the pilot data built from the first dataset are low in the noised-added data by using the equal (A) and unequal (B) haplotype-based approaches. Each dot represents the test value (Ti) of a specific individual in the case (left) or test (right) group. The solid line indicates the 0.99 confidence level for re-identification of case individuals that are estimated based on the test statistic values of test individuals.
Selection of pilot datasets generated by using three noise-adding approaches
| Noise adding approaches | SNP-based | Equal-haploblock | Unequal-haploblock |
|---|---|---|---|
| Correct-order | 187 | 390 | 536 |
| Best-pick | 347 | 666 | 822 |
Dataset selection based on utility evaluation
| Noise adding | Confidence | Number of successes: correct order (best-pick) | |||
|---|---|---|---|---|---|
| χ2 | Fisher's | G-test | Trends test | ||
| SNP-based | – | 187 (347) | 196 (387) | 170 (383) | 168 (348) |
| 0.5 | 63 (219) | 96 (283) | 18 (115) | 0 (40) | |
| 0.8 | 135 (320) | 174 (361) | 144 (330) | 153 (333) | |
| 0.9 | 153 (310) | 175 (376) | 143 (343) | 145 (336) | |
| 0.95 | 222 (430) | 215 (461) | 210 (441) | 185 (399) | |
| e-h* | – | 390 (666) | 560 (797) | 436 (683) | 401 (685) |
| 0.5 | 398 (669) | 564 (809) | 429 (713) | 385 (653) | |
| 0.8 | 599 (819) | 792 (942) | 645 (832) | 614 (845) | |
| 0.9 | 751 (899) | 813 (988) | 798 (920) | 770 (924) | |
| 0.95 | 846 (976) | 738 (1000) | 858 (974) | 836 (968) | |
| un-h† | – | 536 (822) | 639 (896) | 532 (833) | 538 (817) |
| 0.5 | 578 (883) | 786 (951) | 612 (897) | 625 (909) | |
| 0.8 | 716 (969) | 913 (971) | 745 (966) | 695 (949) | |
| 0.9 | 852 (981) | 983 (991) | 888 (986) | 837 (995) | |
| 0.95 | 967 (995) | 993 (994) | 980 (1000) | 958 (1000) | |
*Equal-haploblock.
†Unequal-haploblock.
Number of experiments with high confidence of selecting the best dataset
| Confidence level | Noise adding | Number of successes with high confidence | |||
|---|---|---|---|---|---|
| χ2 | Fisher's | G-test | Trends test | ||
| ≥0.9 | e-h* | 724 | 714 | 778 | 733 |
| un-h† | 918 | 978 | 959 | 909 | |
| ≥0.95 | e-h | 694 | 576 | 743 | 710 |
| un-h | 892 | 971 | 924 | 875 | |
| ≥0.99 | e-h | 611 | 550 | 663 | 635 |
| un-h | 836 | 931 | 883 | 833 | |
*Equal-haploblock.
†Unequal-haploblock.
Data selection on a real clinical genomic dataset
| Noise adding | Confidence | Number of successes: correct order (best-pick) | |||
|---|---|---|---|---|---|
| χ2 | Fisher's | G-test | Trends test | ||
| SNP-based | – | 145 (323) | 133 (289) | 125 (297) | 133 (296) |
| 0.5 | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |
| 0.8 | 0 (0) | 89 (272) | 0 (0) | 130 (303) | |
| 0.9 | 106 (270) | 151 (307) | 90 (261) | 154 (338) | |
| 0.95 | 184 (371) | 153 (334) | 162 (348) | 126 (300) | |
| e-h* | – | 238 (463) | 211 (454) | 223 (446) | 228 (504) |
| 0.5 | 20 (194) | 275 (574) | 15 (139) | 279 (641) | |
| 0.8 | 188 (616) | 284 (607) | 225 (644) | 413 (682) | |
| 0.9 | 293 (611) | 394 (662) | 322 (640) | 262 (595) | |
| 0.95 | 404 (734) | 401 (686) | 433 (696) | 30 (435) | |
| un-h† | – | 301 (646) | 292 (602) | 291 (616) | 239 (528) |
| 0.5 | 308 (768) | 381 (775) | 304 (748) | 254 (682) | |
| 0.8 | 249 (865) | 299 (772) | 287 (887) | 399 (788) | |
| 0.9 | 332 (815) | 509 (774) | 372 (808) | 164 (622) | |
| 0.95 | 506 (854) | 377 (755) | 575 (863) | 18 (440) | |
*Equal-haploblock.
†Unequal-haploblock.