| Literature DB >> 36209221 |
Martin Lablans1,2, Kay Hamacher3, Tobias Kussel4,5,6, Torben Brenner1,2, Galina Tremper1,2, Josef Schepers7.
Abstract
BACKGROUND: The low number of patients suffering from any given rare diseases poses a difficult problem for medical research: With the exception of some specialized biobanks and disease registries, potential study participants' information are disjoint and distributed over many medical institutions. Whenever some of those facilities are in close proximity, a significant overlap of patients can reasonably be expected, further complicating statistical study feasibility assessments and data gathering. Due to the sensitive nature of medical records and identifying data, data transfer and joint computations are often forbidden by law or associated with prohibitive amounts of effort. To alleviate this problem and to support rare disease research, we developed the Mainzelliste Secure EpiLinker (MainSEL) record linkage framework, a secure Multi-Party Computation based application using trusted-third-party-less cryptographic protocols to perform privacy-preserving record linkage with high security guarantees. In this work, we extend MainSEL to allow the record linkage based calculation of the number of common patients between institutions. This allows privacy-preserving statistical feasibility estimations for further analyses and data consolidation. Additionally, we created easy to deploy software packages using microservice containerization and continuous deployment/continuous integration. We performed tests with medical researchers using MainSEL in real-world medical IT environments, using synthetic patient data.Entities:
Keywords: Intersection cardinality; Medical informatics; Multi-party computation; Rare disease; Record linkage
Mesh:
Year: 2022 PMID: 36209221 PMCID: PMC9547637 DOI: 10.1186/s12967-022-03671-6
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 8.440
Fig. 1Visual example of the Bloom filter-based Dice similarity measure. In this example, the strings “MEIER” and “MAYERS” are compared, using different hash functions and a 12 bit bloom filter. The colors mark the differences. Note, that a change of one letter leads to at most 2k different set bits, that is, small changes in the strings lead to small changes in the bit pattern
Fig. 2MainSEL architectural overview. The diagram shows two MainSEL Docker Compose stacks, both interacting in a virtual, private network established by a OpenVPN server. Only the OpenVPN and Stunnel components are interfacing to the open network, all MainSEL core components use stack-internal networking
Fig. 3Setup and online runtime in seconds for varying database sizes and four circuit variants (cf. Sect. 3.2), in three network environments: A: 0.3 ms latency+Gbit/s bandwidth, B: 0.3 ms+100 Mbit/s, C: 100 ms+ 1 Gbit/s. The field configuration of the Mainzelliste, developed by the German Cancer Research Center (DKF), was used in all benchmarks
Comparison of the setup and online runtimes of the MPC RL based intersection cardinality procedure of varying numbers of records in circuit variant . Compared are the three networking configurations from Figure 3, for varying database sizes. The reported network communication cost is the sum of sent and received data
| Database | Comm. [MiB] | Setup Phase [s] | Online Phase [s] | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Size | #Rounds | Setup | Online | A | B | C | A | B | C |
| 1 | 266 | 0.6 | 0.1 | 0.014 | 0.01 | 0.8 | 0.063 | 0.063 | 13 |
| 10 | 330 | 5.7 | 0.7 | 0.078 | 0.073 | 1.5 | 0.085 | 0.081 | 16 |
| 25 | 346 | 14.1 | 1.7 | 0.14 | 0.14 | 1.8 | 0.081 | 0.1 | 16 |
| 50 | 362 | 28.1 | 3.4 | 0.73 | 0.69 | 2.3 | 0.1 | 0.1 | 17 |
| 100 | 378 | 53.7 | 6.7 | 1.9 | 1.9 | 4.9 | 0.14 | 0.14 | 19 |
| 500 | 410 | 279 | 25.6 | 12 | 11 | 13 | 0.34 | 0.34 | 21 |
| 1,000 | 426 | 557.8 | 47.1 | 24 | 23 | 28 | 0.57 | 0.6 | 23 |
| 2,500 | 458 | 1,394.4 | 115.5 | 60 | 60 | 64 | 1.3 | 1.3 | 32 |
| 5,000 | 474 | 2,788.6 | 222.5 | 120 | 120 | 120 | 2.5 | 2.5 | 39 |
| 10,000 | 490 | 5,577.4 | 444.9 | 240 | 240 | 250 | 5.8 | 5.7 | 51 |
Pairings of institutions participating in the synthetic data, real world tests
| Test Number | Party 1 | Party 2 | Party 3 |
|---|---|---|---|
| 1 | University Medical Centre Mannheim | RTWH Aachen University | Berlin Institute of Health |
| 2 | University Hospital Carl Gustav Carus, Dresden | University Hospital Frankfurt | University Medical Centre Mannheim |
| 3 | University Hospital Tübingen | University Hospital Würzburg | University Hospital Regensburg |
Fig. 4Composition of the synthetic datasets. All three generated datasets consist of roughly 18 000 records. The pairwise overlap consists of around 200 records. In addition, 8 records are included in all three datasets
Fig. 5Runtime composition of the full MainSEL system comparing two databases with 100 patients each. The “Bare MainSEL” setup consists of only the PostgreSQL, Mainzelliste, and Secure EpiLinker containers