Rahuldeb Sarkar1,2, Christopher Martin3,4, Heather Mattie5, Judy Wawira Gichoya6, David J Stone7, Leo Anthony Celi8,9,5. 1. Departments of Respiratory Medicine and Critical Care, Medway NHS Foundation Trust, Gillingham, Kent, UK. 2. Faculty of Life Sciences, King's College London, London, UK. 3. UCL Institute for Health Informatics, London, UK. 4. Crystallise Ltd, UK. 5. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA, 02115. 6. Interventional Radiology & Informatics, Department of Radiology & Imaging Sciences, Emory University, 1364 Clifton Rd NE Suite AG08 Atlanta, GA 30322. 7. Departments of Anesthesiology and Neurosurgery, and the Center for Advanced Medical Analytics, University of Virginia School of Medicine, Charlottesville, VA, 22908. 8. Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA 20139. 9. Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA 02215.
Abstract
BACKGROUND: Despite wide utilisation of severity scoring systems for case-mix determination and benchmarking in the intensive care unit, the possibility of scoring bias across ethnicities has not been examined. Recent guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources such as mechanical ventilation during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of three severity scoring systems (APACHE IVa, OASIS, SOFA) across ethnic groups in two large ICU databases in order to identify possible ethnicity-based bias. METHOD: Data from the eICU Collaborative Research Database and the Medical Information Mart for Intensive Care were analysed for score performance in Asians, African Americans, Hispanics and Whites after appropriate exclusions. Discrimination and calibration were determined for all three scoring systems in all four groups. FINDINGS: While measurements of discrimination -area under the receiver operating characteristic curve (AUROC) -were significantly different among the groups, they did not display any discernible systematic patterns of bias. In contrast, measurements of calibration -standardised mortality ratio (SMR) -indicated persistent, and in some cases significant, patterns of difference between Hispanics and African Americans versus Asians and Whites. The differences between African Americans and Whites were consistently statistically significant. While calibrations were imperfect for all groups, the scores consistently demonstrated a pattern of over-predicting mortality for African Americans and Hispanics. INTERPRETATION: The systematic differences in calibration across ethnic groups suggest that illness severity scores reflect bias in their predictions of mortality. FUNDING: LAC is funded by the National Institute of Health through NIBIB R01 EB017205. There was no specific funding for this study.
BACKGROUND: Despite wide utilisation of severity scoring systems for case-mix determination and benchmarking in the intensive care unit, the possibility of scoring bias across ethnicities has not been examined. Recent guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources such as mechanical ventilation during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of three severity scoring systems (APACHE IVa, OASIS, SOFA) across ethnic groups in two large ICU databases in order to identify possible ethnicity-based bias. METHOD: Data from the eICU Collaborative Research Database and the Medical Information Mart for Intensive Care were analysed for score performance in Asians, African Americans, Hispanics and Whites after appropriate exclusions. Discrimination and calibration were determined for all three scoring systems in all four groups. FINDINGS: While measurements of discrimination -area under the receiver operating characteristic curve (AUROC) -were significantly different among the groups, they did not display any discernible systematic patterns of bias. In contrast, measurements of calibration -standardised mortality ratio (SMR) -indicated persistent, and in some cases significant, patterns of difference between Hispanics and African Americans versus Asians and Whites. The differences between African Americans and Whites were consistently statistically significant. While calibrations were imperfect for all groups, the scores consistently demonstrated a pattern of over-predicting mortality for African Americans and Hispanics. INTERPRETATION: The systematic differences in calibration across ethnic groups suggest that illness severity scores reflect bias in their predictions of mortality. FUNDING: LAC is funded by the National Institute of Health through NIBIB R01 EB017205. There was no specific funding for this study.
Authors: Hans-Christian Thorsen-Meyer; Annelaura B Nielsen; Anna P Nielsen; Benjamin Skov Kaas-Hansen; Palle Toft; Jens Schierbeck; Thomas Strøm; Piotr J Chmura; Marc Heimann; Lars Dybdahl; Lasse Spangsege; Patrick Hulsen; Kirstine Belling; Søren Brunak; Anders Perner Journal: Lancet Digit Health Date: 2020-03-12
Authors: John Danziger; Miguel Ángel Armengol de la Hoz; Wenyuan Li; Matthieu Komorowski; Rodrigo Octávio Deliberato; Barret N M Rush; Kenneth J Mukamal; Leo Celi; Omar Badawi Journal: Am J Respir Crit Care Med Date: 2020-03-15 Impact factor: 21.405
Authors: Antoine Poncet; Thomas V Perneger; Paolo Merlani; Maurizia Capuzzo; Christophe Combescure Journal: Crit Care Date: 2017-04-04 Impact factor: 9.097
Authors: Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark Journal: Sci Data Date: 2016-05-24 Impact factor: 6.444
Authors: Tom J Pollard; Alistair E W Johnson; Jesse D Raffa; Leo A Celi; Roger G Mark; Omar Badawi Journal: Sci Data Date: 2018-09-11 Impact factor: 6.444