Literature DB >> 33451317

Optimization of the Mainzelliste software for fast privacy-preserving record linkage.

Florens Rohde1, Martin Franke2, Ziad Sehili2, Martin Lablans3,4, Erhard Rahm2.   

Abstract

BACKGROUND: Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases.
METHODS: We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage.
RESULTS: The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality.
CONCLUSION: We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.

Entities:  

Keywords:  Blocking; Locality-sensitive hashing; Mainzelliste; Privacy-preserving record linkage

Year:  2021        PMID: 33451317      PMCID: PMC7809773          DOI: 10.1186/s12967-020-02678-1

Source DB:  PubMed          Journal:  J Transl Med        ISSN: 1479-5876            Impact factor:   5.531


  1 in total

Review 1.  An Architecture for Translational Cancer Research As Exemplified by the German Cancer Consortium.

Authors:  Martin Lablans; Esther Erika Schmidt; Frank Ückert
Journal:  JCO Clin Cancer Inform       Date:  2018-12
  1 in total
  2 in total

1.  The efficacy of automated feedback after internet-based depression screening: Study protocol of the German, three-armed, randomised controlled trial DISCOVER.

Authors:  Franziska Sikorski; Hans-Helmut König; Karl Wegscheider; Antonia Zapf; Bernd Löwe; Sebastian Kohlmann
Journal:  Internet Interv       Date:  2021-07-21

2.  Record linkage based patient intersection cardinality for rare disease studies using Mainzelliste and secure multi-party computation.

Authors:  Martin Lablans; Kay Hamacher; Tobias Kussel; Torben Brenner; Galina Tremper; Josef Schepers
Journal:  J Transl Med       Date:  2022-10-08       Impact factor: 8.440

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.