| Literature DB >> 30309353 |
Peeter Laud1, Alisa Pankova2,3.
Abstract
BACKGROUND: Practical applications for data analysis may require combining multiple databases belonging to different owners, such as health centers. The analysis should be performed without violating privacy of neither the centers themselves, nor the patients whose records these centers store. To avoid biased analysis results, it may be important to remove duplicate records among the centers, so that each patient's data would be taken into account only once. This task is very closely related to privacy-preserving record linkage.Entities:
Keywords: Deduplication; Privacy; Privacy-preserving record linkage; Secure multiparty computation
Mesh:
Year: 2018 PMID: 30309353 PMCID: PMC6180364 DOI: 10.1186/s12920-018-0400-8
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1System diagram. The setting to which the solutions proposed in this paper are applied
First solution
| Encrypting the hashes | 29 m |
| Shuffling the encrypted hashes | 60 s |
| Find the first element in each duplicated set | 30 s |
| Unshuffling the boolean results | 12 s |
| Public sort, housekeeping | 18 s |
|
| 32 m |
Fig. 2Second solution (w/o polynomials). Efficiency graph of the second solution without polynomial optimization. The number of CPU threads is limited to 2
Second solution (w/o polynomials)
| Housekeeping (avg per client) | 0.28 s |
| Encrypting the hashes (avg per client) | 6.3s |
| Shuffling the encrypted hashes (avg per client) | 0.51 s |
| Detecting duplicates (min of 1000 clients) | 20s |
| Detecting duplicates (max of 1000 clients) | 46s |
| 12 h 45 m |
Fig. 3Second solution (w/ polynomials). Efficiency graph of the second solution with polynomial optimization. The number of CPU threads is limited to 2
Second solution (w/ polynomials)
| Housekeeping (avg per client) | 0.28 s |
| Encrypting the hashes (avg per client) | 5.8 s |
| Shuffling the encrypted hashes (avg per client) | 0.53 s |
| Detecting duplicates (min of 1000 clients) | 14 s |
| Detecting duplicates (max of 1000 clients) | 666 s |
| Total (1000 clients) | 88 h 8 m |
Fig. 4Second solution (w/o polynomials, no thread limit). Efficiency graph of the second solution without polynomial optimization. The number of CPU threads is 32
Fig. 5Second solution (w/ polynomials, no thread limit). Efficiency graph of the second solution with polynomial optimization. The number of CPU threads is 32
Second solution (w/o polynomials, no thread limit)
| Housekeeping (avg per client) | 0.28 s |
| Encrypting the hashes (avg per client) | 6.3 s |
| Shuffling the encrypted hashes (avg per client) | 0.50 s |
| Detecting duplicates (min of 1000 clients) | 20 s |
| Detecting duplicates (max of 1000 clients) | 40 s |
| 11 h 20 m |
Second solution (w/ polynomials, no thread limit)
| Housekeeping (avg per client) | 0.28 s |
| Encrypting the hashes (avg per client) | 5.8 s |
| Shuffling the encrypted hashes (avg per client) | 0.55 s |
| Detecting duplicates (min of 1000 clients) | 14 s |
| Detecting duplicates (max of 1000 clients) | 112 s |
| 19 h 30 m |