| Literature DB >> 35150254 |
Mete Akgün1,2,3,4, Nico Pfeifer2,5,6, Oliver Kohlbacher2,3,7.
Abstract
MOTIVATION: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease-gene association studies are of great importance. However, genomic data is very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation (MPC) to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy, and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data.Entities:
Year: 2022 PMID: 35150254 PMCID: PMC9004657 DOI: 10.1093/bioinformatics/btac070
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.General system architecture of our solution. Genomic variant stores communicate with the two non-colluding proxy servers D1 and D2. Users can query all data through these proxy servers in a secure manner
Fig. 2.Encoding of a single variant
Fig. 3.Generation of the variant tree
Fig. 4.Comparison of time performance of our solution and Demmler et al.’s solution (Demmler ) under various numbers of variants/numbers of query variants. (a) Runtime with a single patient, a varying number of variants, a fixed variant length = 48 bit, and 5 query variants, (b) runtime with a single patient, 100 000 variants, a fixed variant length = 48 bit and a varying number of query variants
Benchmark results and circuit properties for varying variant count at fixed variant length 48 bit, query length 5 and mask count 16
| Demmler | Our solution | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| No. of variant | No. of ANDs | Depth | Time (ms) | Comm (MB) | No. of ANDs | Depth | IO time (ms) | MPC time (ms) | Total time (ms) | Comm (MB) |
| 100 | 3.0×104 | 17 | 330 | <1 | 3.8×103 | 13 | <1 | 158 | 158 | <1 |
| 1000 | 2.4×105 | 20 | 1623 | 4 | 3.8×103 | 13 | <1 | 198 | 198 | <1 |
| 10 000 | 3.9×106 | 24 | 11 721 | 63 | 3.8×103 | 13 | 1 | 224 | 225 | <1 |
| 100 000 | 3.1×107 | 27 | 85 126 | 510 | 3.8×103 | 13 | 1 | 264 | 265 | <1 |
| 1 000 000 | 2.5×108 | 30 | 675 562 | 4089 | 3.8×103 | 13 | 1 | 290 | 291 | <1 |
| 3 000 000 | 1.0×109 | 32 | 2 694 093 | 16 357 | 3.8×103 | 13 | 1 | 304 | 305 | <1 |
Benchmark results and circuit properties for varying query length at fixed variant length 48 bit, variant count 100 000 and mask count 16
| Demmler | Our solution | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| No. of queries | No. of ANDs | Depth | Time (ms) | Comm (MB) | No. of ANDs | Depth | IO Time (ms) | MPC Time (ms) | Total Time (ms) | Comm (MB) |
| 1 | 6.3×106 | 24 | 18 070 | 103 | 767 | 11 | <1 | 197 | 197 | <1 |
| 2 | 1.2×107 | 25 | 34 982 | 204 | 1535 | 12 | <1 | 211 | 211 | <1 |
| 4 | 2.5×107 | 26 | 68 587 | 409 | 3071 | 13 | 1 | 237 | 238 | <1 |
| 8 | 5.1×107 | 27 | 132 171 | 830 | 6143 | 14 | 2 | 261 | 263 | <1 |
| 16 | 1.0×108 | 28 | 270 283 | 1635 | 12 287 | 15 | 4 | 356 | 360 | <1 |
| 32 | 2.0×108 | 29 | 542 836 | 3271 | 24 575 | 16 | 7 | 462 | 469 | <1 |
| 64 | 4.0×108 | 30 | 1 085 198 | 6542 | 49 151 | 17 | 14 | 607 | 621 | <1 |
| 128 | 8.0×108 | 31 | 2 171 901 | 13 086 | 98 303 | 21 | 29 | 937 | 966 | 1 |
Benchmark results and circuit properties for varying mask count at fixed variant length 48 bit, variant count 1 000 000 and query length 5
| No. of masks | No. of ANDs | Depth | IO time (ms) | MPC time (ms) | Total time (ms) | Comm (MB) |
|---|---|---|---|---|---|---|
| 16 | 3.8×103 | 13 | 1 | 277 | 278 | <1 |
| 64 | 1.5×104 | 15 | 1 | 699 | 700 | <1 |
| 256 | 6.1×104 | 17 | 1 | 1174 | 1175 | 1 |
| 1024 | 2.4×105 | 19 | 1 | 1901 | 1902 | 4 |
| 4096 | 9.6×105 | 21 | 1 | 5108 | 5109 | 16 |
| 16 384 | 3.9×106 | 23 | 1 | 11 286 | 11 287 | 63 |
Fig. 5.Time performance of our solution under various numbers of masks/numbers of patients. (a) Runtime with a varying number of masks, 1 000 000 variants, a fixed variant length = 48 bit, and 5 query variants, (b) Runtime with a varying number of patients, 1 000 000 variants, a fixed variant length = 64 bit and 5 query variants
Benchmark results and circuit properties for varying patient count at fixed query length 5, variant length 48 bit, variant count 1 000 000 and the number of masks 16
| No. of patient | No. of ANDs | Depth | IO time (ms) | MPC time (ms) | Total time (ms) | Comm (MB) |
|---|---|---|---|---|---|---|
| 210 | 3.9×106 | 29 | 1048 | 11 602 | 12 650 | 64 |
| 212 | 1.5×107 | 34 | 3998 | 43 190 | 47 188 | 255 |
| 214 | 6.2×107 | 39 | 15 663 | 167 019 | 182 682 | 1023 |
| 216 | 2.5×108 | 46 | 68 307 | 631 831 | 700 138 | 4092 |
| 218 | 1.0×109 | 49 | 255 113 | 2 634 416 | 2 889 529 | 16 371 |
| 220 | 4.0×109 | 54 | 1 021 196 | 10 607 612 | 11 628 808 | 65 485 |
Benchmark results and circuit properties for varying database count at fixed query length 5, variant length 48 bit, total number of variants of all patients and the number of masks 16
| No. of databases | No. of ANDs | Depth | IO time (ms) | MPC time (ms) | Total time (ms) | Comm (MB) |
|---|---|---|---|---|---|---|
| 28 | 3.9×106 | 29 | 302 | 3307 | 3609 | 64 |
| 210 | 3.9×106 | 29 | 1129 | 12 270 | 13 399 | 64 |
| 212 | 1.5×107 | 34 | 4197 | 45 603 | 49 800 | 255 |
| 214 | 6.2×107 | 39 | 15 663 | 167 019 | 182 682 | 1023 |
| 216 | 2.5×108 | 46 | 59 471 | 628 844 | 688 315 | 4092 |
| 218 | 1.0×109 | 49 | 223 026 | 2 352 573 | 2 575 599 | 16 371 |
| 220 | 4.0×109 | 54 | 835 895 | 8 779 585 | 9 615 480 | 65 485 |