Christopher M Sauer1,2, Tariq A Dam1, Leo A Celi1,2,3,4,5,6,7,8, Martin Faltys5, Miguel A A de la Hoz2,3,6, Lasith Adhikari7, Kirsten A Ziesemer8, Armand Girbes1, Patrick J Thoral1, Paul Elbers1. 1. Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence (LCCI), Amsterdam Medical Data Science (AMDS), Amsterdam Cardiovascular Science (ACS), Amsterdam Institute for Infection and Immunity (AII), Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands. 2. Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA. 3. Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA. 4. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA. 5. Department of Intensive Care Medicine, University Hospital, University of Bern, Bern, Switzerland. 6. Big Data Department, Fundacion Progreso y Salud, Regional Ministry of Health of Andalucia, Sevilla, Spain. 7. Connected Care and Personal Health, Philips Research North America, Cambridge, MA. 8. University Library, Vrije Universiteit, Amsterdam, The Netherlands.
Abstract
OBJECTIVE: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. DATA SOURCES: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. STUDY SELECTION: We selected all studies that reported on publicly available adult patient-level intensive care datasets. DATA EXTRACTION: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. DATA SYNTHESIS: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. CONCLUSIONS: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.
OBJECTIVE: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. DATA SOURCES: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. STUDY SELECTION: We selected all studies that reported on publicly available adult patient-level intensive care datasets. DATA EXTRACTION: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. DATA SYNTHESIS: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. CONCLUSIONS: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.
Authors: D F Stroup; J A Berlin; S C Morton; I Olkin; G D Williamson; D Rennie; D Moher; B J Becker; T A Sipe; S B Thacker Journal: JAMA Date: 2000-04-19 Impact factor: 56.272
Authors: Stephanie L Hyland; Martin Faltys; Matthias Hüser; Xinrui Lyu; Thomas Gumbsch; Cristóbal Esteban; Christian Bock; Max Horn; Michael Moor; Bastian Rieck; Marc Zimmermann; Dean Bodenham; Karsten Borgwardt; Gunnar Rätsch; Tobias M Merz Journal: Nat Med Date: 2020-03-09 Impact factor: 53.440
Authors: Antonin Dauvin; Carolina Donado; Patrik Bachtiger; Ke-Chun Huang; Christopher Martin Sauer; Daniele Ramazzotti; Matteo Bonvini; Leo Anthony Celi; Molly J Douglas Journal: NPJ Digit Med Date: 2019-11-29
Authors: Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark Journal: Sci Data Date: 2016-05-24 Impact factor: 6.444
Authors: Tom J Pollard; Alistair E W Johnson; Jesse D Raffa; Leo A Celi; Roger G Mark; Omar Badawi Journal: Sci Data Date: 2018-09-11 Impact factor: 6.444
Authors: Christopher M Sauer; Tariq A Dam; Leo A Celi; Martin Faltys; Miguel A A de la Hoz; Lasith Adhikari; Kirsten A Ziesemer; Armand Girbes; Patrick J Thoral; Paul Elbers Journal: Crit Care Med Date: 2022-03-02 Impact factor: 9.296
Authors: Davy van de Sande; Jasper van Bommel; Eline Fung Fen Chung; Diederik Gommers; Michel E van Genderen Journal: Crit Care Date: 2022-10-18 Impact factor: 19.334