| Literature DB >> 32066455 |
Christopher Hampf1, Lars Geidel2, Norman Zerbe3, Martin Bialke4, Dana Stahl2, Arne Blumentritt2, Thomas Bahls4, Peter Hufnagl3, Wolfgang Hoffmann4.
Abstract
BACKGROUND: The identity management is a central component in medical research. Patients are recruited from various sites, which requires an error tolerant record linkage method, to ensure that patients are registered only once. In large research projects or institutions, the identity management has to deal with several thousands or millions of patients. In environments with large numbers of patients the register process could lead to high runtimes caused by record linkage. The Central Biomaterial Bank of the Charité (ZeBanC) searched for an identity management solution, which can handle millions of patients in large research projects with an acceptable performance. The goal of this paper was to simulate the registration of several million patients using the E-PIX service at Charité - Universitätsmedizin Berlin. The E-PIX service was evaluated in terms of needed runtimes, memory requirements, and processor utilization. A total of at least 20 million patients had to be registered. The runtimes to register patients into databases with various sizes should be examined, and the maximum number of patients, which the E-PIX service could handle, should be determined.Entities:
Keywords: Data privacy protection; Data quality; Duplicate detection; Identity management; Patient data; Record linkage
Mesh:
Year: 2020 PMID: 32066455 PMCID: PMC7027209 DOI: 10.1186/s12967-020-02257-4
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Specification of the computer system used to carry out the benchmark tests
| Component | Description |
|---|---|
| Processor | Two processors, each with 8 cores and 16 threads (Hyper-threading) and a clock rate of 2.9 GHz (boost up to 3.8 GHz) |
| Memory | 128 GiB RAM, but the E-PIX service, respectively the Java Virtual Machine (JVM) has a limit of 120 GiB. The remaining eight GiB were reserved for other processes like the database and the client, which send requests to the E-PIX service |
| Operating System | Microsoft Windows Server 2012 R2 in 64 bit variant |
Fig. 1Communication between client and E-PIX. The client transforms the PII into the SOAP format, sends this data for registration to E-PIX and begins the runtime measurement. After registration, the client receives the MPI from E-PIX. The runtime measurement is completed and a new registration can be started
Fig. 2Sequential registration of 3 million patients. Duration for registration of up to 3 million patients into E-PIX of version 2.8.2. The requests were sent sequentially with the system running continuously without restarts
Fig. 3Sequential registration of 6.5 million patients. Duration for registration of up to 6.5 million patients into E-PIX of version 2.8.2. The requests were sent sequentially with the system being restarted after every 500,000 registrations. After some restarts, the runtimes were lower than expected
Fig. 4Average duration per new registration into databases of various sizes. Average duration to register one new patient into a database of E-PIX version 2.8.2 with a certain number of pre-registered patients. The shown runtimes represent on the average of 100,000 registrations
Number of patients who can be registered in 1 day, depending on various sizes of the existing database
| Registered patients | Registrations per day |
|---|---|
| 0 | 1,460,594 |
| 1 million | 766,026 |
| 2 million | 496,069 |
| 3 million | 375,129 |
| 4 million | 244,452 |
| 5 million | 190,867 |
| 6 million | 178,675 |
| 10 million | 118,845 |
| 15 million | 76,056 |
| 20 million | 45,094 |
| 25 million | 29,990 |
| 30 million | 16,361 |
The behavior shows reciprocal proportionality as expected. The patients were registered sequentially. In the third benchmark, only 100,000 patients were registered during one cycle in this test. Consequently, the duration of the registration process into a database of 10 million prefilled records was less than a day and, thus, the number of patients was extrapolated. All other numbers are based on the real number of patient registrations with cyclic restarts
Used memory after a restart of E-PIX of version 2.8.2 for various numbers of patients
| Number of registered patients (million) | Used memory (in GiB) |
|---|---|
| 10 | 27.59 |
| 15 | 48.81 |
| 20 | 78.72 |
| 25 | 82.18 |
| 30 | 110.83 |
The relation is not linear and depends on the remaining available memory