| Literature DB >> 35715724 |
Md Momin Al Aziz1, Parimala Thulasiraman2, Noman Mohammed2.
Abstract
BACKGROUND: Several technological advancements and digitization of healthcare data have provided the scientific community with a large quantity of genomic data. Such datasets facilitated a deeper understanding of several diseases and our health in general. Strikingly, these genome datasets require a large storage volume and present technical challenges in retrieving meaningful information. Furthermore, the privacy aspects of genomic data limit access and often hinder timely scientific discovery.Entities:
Keywords: Outsourcing Genomic Data on Cloud; Parallel Construction of Generalized Suffix Tree; Privacy-preserving Queries on Genomic Data; Reverse Merkle Tree
Mesh:
Year: 2022 PMID: 35715724 PMCID: PMC9206251 DOI: 10.1186/s12863-022-01053-x
Source DB: PubMed Journal: BMC Genom Data ISSN: 2730-6844
Fig. 1Computational framework of the proposed method where the data owner holds the genomic dataset and constructs the GST in parallel on a private computing cluster (one-time preprocessing). The GST is then outsourced securely to the Cloud Server (CS) where the query q from researcher is executed in a privacy-preserving manner
Sample haplotype data representation where s∈{0,1} are the different positions on the same sequence
| # | ||||||
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2 | 1 | 1 | 1 | 0 | 1 | 0 |
| 3 | 1 | 1 | 0 | 0 | 0 | 1 |
| 4 | 0 | 1 | 0 | 1 | 1 | 0 |
| 5 | 0 | 1 | 0 | 1 | 0 | 1 |
Fig. 2Uncompressed Suffix Tree (Trie) construction
Fig. 3Vertical partitioning with path graphs (%1,%2) merging [8]
Fig. 4Bi-Directional partitioning scheme where data is separated into both rows and columns and merged using the shared memory model [8]
Fig. 5Reverse Merkle Hash for Suffix Tree on S1=010101 where we hash the value of each node in a top-down fashion
Fig. 6The search protocol of our proposed solution for Exact Match (Definition 1). Data owners are offline after sharing the encrypted GST to CS as the researchers and CS only need to be online for search operation. The encrypted query are send to CS and matched against the for the final result
Execution time (in minutes) for Horizontal and Vertical partition scheme with processors p={1,2,…,16}
| Data | Serial | Distributed | Shared | Hybrid | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 4 | 8 | 16 | 2 | 4 | 8 | 16 | 2 | 4 | 8 | 16 | |
| Horizontal Partitioning | |||||||||||||
| 200 | 0.08 | 0.23 | 0.09 | 0.09 | 0.10 | 0.14 | 0.05 | 0.04 | 0.03 | 0.14 | 0.07 | 0.05 | 0.05 |
| 300 | 0.27 | 1.04 | 0.23 | 0.2 | 0.23 | 0.38 | 0.15 | 0.11 | 0.08 | 0.37 | 0.16 | 0.12 | 0.12 |
| 400 | 0.59 | 2.03 | 0.55 | 0.38 | 0.38 | 1.18 | 0.35 | 0.21 | 0.2 | 1.12 | 0.31 | 0.23 | 0.25 |
| 500 | 1.53 | 3.14 | 1.32 | 1.06 | 1.01 | 2.27 | 0.57 | 0.36 | 0.28 | 2.09 | 0.52 | 0.38 | 0.41 |
| 1000 | 14.55 | 16.23 | 8.34 | 6.31 | 6.09 | 17.38 | 5.56 | 3.27 | 2.28 | 17.14 | 4.18 | 3.12 | 3.08 |
| Vertical Partitioning | |||||||||||||
| 200 | 0.08 | 0.19 | 0.08 | 0.05 | 0.03 | 0.16 | 0.07 | 0.04 | 0.02 | 0.14 | 0.05 | 0.03 | 0.02 |
| 300 | 0.27 | 0.56 | 0.28 | 0.17 | 0.09 | 0.48 | 0.22 | 0.16 | 0.08 | 0.39 | 0.13 | 0.10 | 0.06 |
| 400 | 0.59 | 1.41 | 1.05 | 0.36 | 0.16 | 1.44 | 1.01 | 0.34 | 0.19 | 1.21 | 0.32 | 0.21 | 0.13 |
| 500 | 1.53 | 3.07 | 1.49 | 1.08 | 0.37 | 3.18 | 1.49 | 1.08 | 0.36 | 2.35 | 0.58 | 0.40 | 0.24 |
| 1000 | 14.55 | 25.24 | 12.25 | 9.06 | 5.20 | 22.56 | 13.11 | 7.2 | 4.37 | 18.22 | 6.31 | 4.49 | 3.10 |
GST construction time (in seconds) using bi-directional partition scheme with processors p={1,4,8,16} [8]
| Data | Serial | Distributed | Shared | Hybrid | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 4 | 8 | 16 | 4 | 8 | 16 | 4 | 8 | 16 | 32 | |
| 200 | 4.8 | 94.2 | 90 | 87 | 43.8 | 42.6 | 38.4 | 70.8 | 73.2 | 75.6 | 1.51 |
| 300 | 16.2 | 121.8 | 107.4 | 106.2 | 72 | 48.6 | 43.2 | 88.8 | 75 | 75.6 | 1.37 |
| 400 | 35.4 | 168.6 | 148.2 | 124.8 | 102.6 | 54 | 57.6 | 114 | 87 | 96 | 1.36 |
| 500 | 91.8 | 231.6 | 151.8 | 154.8 | 145.2 | 76.2 | 62.4 | 146.4 | 103.8 | 105 | 1.36 |
| 1000 | 873 | 1135.2 | 428.4 | 291.6 | 856.8 | 202.2 | 154.8 | 635.4 | 312 | 214.2 | 1.36 |
Maximum Execution time (in seconds) of Tree Building (TB), Add Path (AP) and Tree Merge (TM) for dataset, D1000
| Horizontal | Vertical | Bi-directional | |||||||
|---|---|---|---|---|---|---|---|---|---|
| TB | AP | TM | TB | AP | TM | TB | AP | TM | |
| 4 | 113.35 | - | 70.02 | 292.97 | 2.7 | 66.8 | 4.01 | 0.37 | 3.85 |
| 8 | 47.38 | - | 85.4 | 138.87 | 2.9 | 61.1 | 0.62 | 0.16 | 1.8 |
| 16 | 15.6 | - | 98 | 64.4 | 3.2 | 57.6 | 0.12 | 0.07 | 1.2 |
Results on the speedup for dataset, D1000×1000 for all memory models and partitioning schemes with processors p={2,4,8,16}
| Method | Distributed | Shared | Hybrid | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 4 | 8 | 16 | 4 | 8 | 16 | 4 | 8 | 16 | |
| Horizontal | 1.19 | 1.61 | 2.80 | 1.11 | 2.02 | 3.33 | 2.31 | 3.24 | 4.69 |
| Vertical | 1.74 | 2.31 | 2.39 | 2.62 | 4.45 | 6.38 | 3.48 | 4.66 | 4.72 |
| Bi-directional | 0.77 | 2.04 | 2.99 | 1.02 | 4.32 | 5.64 | 1.37 | 2.80 | 4.08 |
Exact Matching, SMM and TSMM (Query 1, 3 and 4) using GST considering different datasets and query lengths (time in milliseconds)
| Query Length | | ||||||
|---|---|---|---|---|---|---|
| EM | SMM | TSMM | EM | SMM | TSMM | |
| 300 | 0.5 | 140 | 94 | 0.3 | 80 | 70 |
| 400 | 0.5 | 140 | 150 | 0.4 | 130 | 120 |
| 500 | 0.6 | 210 | 220 | 0.5 | 190 | 180 |
| 1000 | 1.1 | 680 | 720 | - | - | - |
Secure Exact Matching (EM), SMM and TSMM (Query 1, 3 and 4) using considering different datasets and query lengths (time in milliseconds). QP, GC, |q| denotes query processing time, Garbled Circuit, and Query Length respectively
| | | Reverse Merkle Hash with | GC | Shimizu et al. [ | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| QP | EM | SMM | PVSMM | EM | SMM | PVSMM | EM | EM | SMM | SMM | |
| 300 | 0.79 | 41.4 | 11599 | 2862 | 37.1 | 11100 | 2761 | 63246 | 63583 | 50358 | 43163 |
| 400 | 0.84 | 43.9 | 15337 | 3901 | 36.9 | 15385 | 3760 | 63194 | 62639 | 64867 | 55609 |
| 500 | 0.9 | 42.7 | 18563 | 4836 | 37.2 | 18477 | 4875 | 63439 | 62048 | 70754 | 67965 |
| 1000 | 1.58 | 45.2 | 36761 | 9368 | - | - | - | 63391 | - | 160854 | |