| Literature DB >> 35473587 |
Yoshiki Nakagawa1, Satsuya Ohata2, Kana Shimizu3,4.
Abstract
The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that [Formula: see text] is computed for a given depth of recursion where [Formula: see text] is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.Entities:
Keywords: FM-index; LCP array; Maximal exact match; Private genome sequence search; Secret sharing; Secure multiparty computation; Suffix array
Year: 2022 PMID: 35473587 PMCID: PMC9040336 DOI: 10.1186/s13015-022-00211-1
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.721
Fig. 1Arithmetic addition and multiplication over secret sharing
Secure subprotocols used in this paper
| Input | Output | |
|---|---|---|
Fig. 2Schematic view of our goal and model. (0) Server (DB holder) distributes Beaver triples. (A reliable third party can serve as the trusted initializer instead.) (1) Server distributes shares of the database. (2) User (query holder) distributes shares of the query. (3) The computing nodes jointly calculate shares of the result. (4) The results are sent to User. The offline phase is (0), DB preparation phase is (1), and Search phase consists of (2)–(4)
Summary of complexities for our protocols and related protocols
| Btime | Bsize | Dtime | Dsize | Stime | Comm. | Round | |
|---|---|---|---|---|---|---|---|
| ss-ROT (proposed) | 0 | 0 | |||||
| Secure LPM (proposed) | |||||||
| [ | − | − | − | − | |||
| Baseline LPM | |||||||
| Secure LMEM (proposed) | |||||||
| Baseline LMEM |
BTime and Bsize are generation time and size of BTs. Dtime and Dsize are generation time for the shares of the database and size of the shares. Stime is the time for Search phase. Comm. is the size of data exchanged between computing nodes. Round is the number of data exchanges
Fig. 3Example of a search when , , and . The goal is to compute . Here we assume generates . In Step 1 of Search phase, and jointly compute to obtain . ( is randomized by , so any element of V is leaked.) In a similar way, and compute and . In Step 2, and output and respectively. Since , , , and , ss-ROT successfully computes
Fig. 4Example of a secure table lookup when = GCT and = ACGT. Only the lookup for a lower bound is shown. For simplicity, and are denoted by and . () is computed by , and . V is referenced securely by using R. is computed by . is computed by . is computed by
Offline time (Time), offline size (Size), DB preparation time (Time), DB preparation size (Size), Search time on a local machine (Time), Search communication size (Size), estimated Search time for three environments: LAN (0.2 ms/10 Gbps), WAN (10 ms/100 Mbps), and WAN (50 ms/10 Mbps), for (only for Baseline LMEM), , and
| N | Offline | DB preparation | Search | Estimated timeon network | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Time | Size | Time | Size | Time | Size | LAN | WAN | WAN | ||
| Secure | 0.166 | 0.013 | 123 | 305 | 0.141 | 0.010 | 0.181 | 2.162 | 10.249 | |
| LPM | 0.141 | 0.013 | 1248 | 3051 | 0.113 | 0.010 | 0.153 | 2.134 | 10.221 | |
| (proposed) | 0.150 | 0.013 | 12628 | 30517 | 0.126 | 0.010 | 0.167 | 2.147 | 10.234 | |
| Secure | 2.318 | 0.162 | 123 | 77 | 2.888 | 0.040 | 3.028 | 9.911 | 38.020 | |
| LPM2 | 2.317 | 0.162 | 1236 | 774 | 2.878 | 0.040 | 3.018 | 9.901 | 38.010 | |
| (proposed) | 2.342 | 0.162 | 12387 | 7748 | 2.939 | 0.040 | 3.079 | 9.962 | 38.071 | |
| – | – | – | – | 691 | 163 | 691 | 707 | 838 | ||
| [ | – | – | – | – | 7817 | 517 | 7818 | 7863 | 8261 | |
| – | – | – | – | 20 h< | – | – | – | - | ||
| Baseline (LPM) | 3995 | 184 | 0.146 | 0.095 | 13 | 122 | 13 | 24 | 118 | |
| 38767 | 1841 | 1.522 | 0.954 | 164 | 1227 | 165 | 268 | 1196 | ||
| 20 h< | – | – | – | – | – | – | – | – | ||
| Secure | 7.619 | 1.704 | 435 | 1068 | 4.817 | 0.999 | 5.577 | 42.900 | 195.654 | |
| LMEM | 7.882 | 1.704 | 4467 | 10681 | 4.926 | 0.999 | 5.686 | 43.009 | 195.763 | |
| (proposed) | 8.457 | 1.704 | 46384 | 106811 | 5.740 | 0.999 | 6.501 | 43.824 | 196.578 | |
| Baseline | 12747 | 611 | 0.015 | 0.010 | 46 | 407 | 46 | 80 | 389 | |
| (LMEM) | 20 h< | – | – | – | – | – | – | – | – | |
The size unit is MB and the time unit is s except for the cell describing “20 h<”
Fig. 5Estimated time (actual search time on a local machine + estimated data-transfer time) for various N
Fig. 6Estimated time (actual search time on a local machine + estimated data-transfer time) for various