| Literature DB >> 33625129 |
Patrick J Thoral1, Jan M Peppink1, Ronald H Driessen1, Eric J G Sijbrands2, Erwin J O Kompanje3, Lewis Kaplan4,5, Heatherlee Bailey6,5, Jozef Kesecioglu7,8, Maurizio Cecconi8,9, Matthew Churpek10, Gilles Clermont11, Mihaela van der Schaar12,13, Ari Ercole14,15, Armand R J Girbes1,8, Paul W G Elbers1,15.
Abstract
OBJECTIVES: Critical care medicine is a natural environment for machine learning approaches to improve outcomes for critically ill patients as admissions to ICUs generate vast amounts of data. However, technical, legal, ethical, and privacy concerns have so far limited the critical care medicine community from making these data readily available. The Society of Critical Care Medicine and the European Society of Intensive Care Medicine have identified ICU patient data sharing as one of the priorities under their Joint Data Science Collaboration. To encourage ICUs worldwide to share their patient data responsibly, we now describe the development and release of Amsterdam University Medical Centers Database (AmsterdamUMCdb), the first freely available critical care database in full compliance with privacy laws from both the United States and Europe, as an example of the feasibility of sharing complex critical care data.Entities:
Mesh:
Year: 2021 PMID: 33625129 PMCID: PMC8132908 DOI: 10.1097/CCM.0000000000004916
Source DB: PubMed Journal: Crit Care Med ISSN: 0090-3493 Impact factor: 9.296
Figure 1.An overview of the process to create AmsterdamUMCdb from different source databases (A). The anonymization threshold separates personal data from anonymous data (B). The applied risk-based deidentification strategy demonstrating the iterative nature of performing deidentification (C). Final table structure depicting the relations with the admissions table (D). Capitalized words in the tables refer to data types used: INTEGER (whole number), SMALINT (small-range integer), BIGINT (large-range integer), FLOAT (floating-point number) or VARCHAR (variable size character data). DBs = databases, EHR = electronic health record, GLIMS = General Laboratory Information Management System, ID = identifier, LIS = Laboratory Information System, PDMS = Patient Data Management System, PSS = patient scoring system.
Characteristics of Patients and Data in Amsterdam University Medical Centers Database (AmsterdamUMCdb)
| Characteristics | Total | ICU | Medium Care Unit (High-Dependency Unit) |
|---|---|---|---|
| Distinct patients, | 20,109 | 16,518 | 4,295 |
| ICU admissions, | 23,106 | 18,386 | 4,720 |
| ICU length of stay, d, median (IQR) | 1.08 (0.83–3.67) | 1.25 (0.92–4.71) | 0.83 (0.71–1.62) |
| Gender | |||
| Male, | 12,799 (63.65) | 10,565 (63.96) | 2,234 (52.01) |
| Age, yr, | |||
| 18–39 | 2,202 (10.95) | 1,538 (9.31) | 743 (17.30) |
| 40–49 | 1,897 (9.43) | 1,356 (8.21) | 613 (14.27) |
| 50–59 | 3,405 (16.93) | 2,740 (16.59) | 800 (18.63) |
| 60–69 | 5,272 (26.22) | 4,518 (27.35) | 954 (22.21) |
| 70–79 | 5,293 (26.32) | 4,635 (28.06) | 824 (19.19) |
| 80+ | 2,040 (10.14) | 1,731 (10.48) | 361 (8.41) |
| Admission year, | |||
| 2003–2009 | 8,556 (42.55) | 7,940 (48.07) | 809 (18.84) |
| 2010–2016 | 11,553 (57.45) | 8,578 (51.93) | 3,486 (81.16) |
| Admission type, | |||
| Surgical admissions | 11,294 (48.88) | 8,942 (48.63) | 2,352 (49.83) |
| Urgent admissions | 6,246 (27.03) | 4,985 (27.11) | 1,261 (26.72) |
| Reason for admission, | |||
| Cardiothoracic surgery | 5,935 (25.69) | 5,759 (31.32) | 176 (3.73) |
| Sepsis | 3,136 (13.57) | 2,751 (14.96) | 385 (8.16) |
| Respiratory failure | 1,568 (6.79) | 1,402 (7.63) | 166 (3.52) |
| Neurosurgery | 1,619 (7.01) | 739 (4.02) | 880 (18.64) |
| Trauma | 902 (3.90) | 613 (3.33) | 289 (6.12) |
| Gastrointestinal surgery | 1,149 (4.97) | 800 (4.35) | 349 (7.39) |
| Vascular surgery | 1,037 (4.49) | 791 (4.30) | 246 (5.21) |
| Cardiac arrest | 959 (4.15) | 958 (5.21) | 1 (0.02) |
| Neurologic disorders (nontraumatic) | 628 (2.72) | 475 (2.58) | 153 (3.24) |
| Cardiac disorders (including cardiogenic shock) | 538 (2.33) | 485 (2.64) | 53 (1.12) |
| Supportive therapies, | |||
| Vasopressors and/or inotropes | 13,575 (58.75) | 12,809 (69.67) | 766 (16.23) |
| Mechanical ventilation | 16,680 (72.19) | 16,305 (88.68) | 375 (7.94)a |
| Renal replacement therapy | 1,140 (4.93) | 1,136 (6.18) | 4 (0.08) |
| Outcome, | |||
| Death at unit discharge | 2,288 (9.90) | 2,216 (12.05) | 72 (1.53) |
| Death < 1 yr after discharge | 4,730 (20.47) | 4,002 (21.77) | 728 (15.42) |
| Severity scores | |||
| Urgent patients | |||
| APACHE II score, median (IQR) | 19 (13–26) | 21 (16–27) | 12 (8–16) |
| SOFA score (day 1), median (IQR) | 7 (4–10) | 8 (5–10) | 2 (1–4) |
| Elective patients | |||
| APACHE II score, median (IQR) | 16 (12–20) | 17 (14–21) | 11 (8–15) |
| SOFA score (day 1), median (IQR) | 6 (4–8) | 6 (4–8) | 2 (1–4) |
APACHE = Acute Physiology And Chronic Health Evaluation, IQR = interquartile range, SOFA = Sequential Organ Failure Assessment.
aNoninvasive ventilation.
For conciseness, only major categories of reason for admission are shown. Reasons for admission documented without diagnostic codes (i.e., full text only) were excluded from the analysis. APACHE II score ranges from 0 to 71; higher ranges indicate greater severity of illness. SOFA score ranges from 0 to 24; higher ranges indicate greater severity of illness.
Assumed Background Knowledge and Assessment of Reidentification Risk After Risk-Based Deidentification
| Hypothetical Adversary | |||
|---|---|---|---|
| Friendly Researcher | Rogue Researcher | Rogue Insurance Company | |
| Assumed background knowledge | |||
| Gender | X | X | X |
| Age | X | X | X |
| Weight | X | X | |
| Height | X | X | |
| Admission date | X | X | X |
| Survival at discharge | X | X | X |
| Number of ICU admissions | X | ||
| Assessment of reidentification risk | |||
| P(access) | 1.00 | 1.00 | 0.27 |
| P(intention) | 0.20 | 0.10 | 0.10 |
| Average risk | |||
| P(reidentification) | 0.047 | 0.047 | 0.009 |
| | 89 | 89 | 682 |
| | 26 | 26 | 65 |
| Maximum risk | |||
| P(reidentification) | 0.50 | 0.50 | 0.50 |
| | 2 | 2 | 2 |
| | 2 | 2 | 2 |
| P final risk | 0.01 | 0.05 | 0.0002 |
Strict average risk is used for determining final risk for the “friendly researcher” and the “rogue insurance company,” whereas maximum risk was used for the “rogue researcher.” For the “friendly researcher,” P(intention) is acquaintance risk, the risk of knowing somebody in the database.