Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Algorithmic fairness audits in intensive care medicine: artificial intelligence for all?

Literature DB >> 36258241

Algorithmic fairness audits in intensive care medicine: artificial intelligence for all?

Davy van de Sande¹, Jasper van Bommel², Eline Fung Fen Chung², Diederik Gommers², Michel E van Genderen².

Abstract

Entities: Chemical

Keywords: Artificial intelligence; Bias; Equity; Intensive care

Mesh：

Year: 2022 PMID： 36258241 PMCID： PMC9578232 DOI： 10.1186/s13054-022-04197-5

Source DB: PubMed Journal: Crit Care ISSN： 1364-8535 Impact factor: 19.334

× No keyword cloud information.

Research on artificial intelligence (AI) has emerged as a promising field that has the potential to improve patient outcomes, for example, by optimizing timing of antibiotic therapy in the intensive care unit (ICU) or by AI-based delirium management, as recently published in this journal [1, 2]. Despite its potential, we have to be aware that not all patients may equally benefit from such advancements; ‘unfair’ or ‘unequal’ AI algorithms could reinforce systemic health disparities. For example, a recent study demonstrated consistent underdiagnosed chest X-ray pathologies by an AI algorithm in black and female patients [3]. In fact, even well-established ICU prediction models could be unfair. During the COVID-19 pandemic, Sequential Organ Failure Assessment (SOFA)-based allocation of ICU resources was proven to have racial inequality and could have induced disparities [4]. These results stress that especially future AI-based ICU interventions, or policies, should be fair and have a similar impact on all patients involved, irrespective of gender, ethnicity, and other protected personal characteristics as recently stated by the World Health Organization (WHO) [5]. One of the reasons AI research has skyrocketed in intensive care medicine [6] is the availability of large publicly available datasets, such as the Medical Information Mart for Intensive Care (MIMIC) [7]. These data are often collected at single site and as such could underrepresent different subpopulations across different ICUs [8]. To illustrate, less than 10% (number: 18,719/189,415) of the patients registered in the two largest ICU databases worldwide are African-American, while the vast majority are white male patients [8]. Given the serious consequences of unequal algorithms that could arise from such biased data [9], and the fact that several methods exist to mitigate such biases [10], it seems clear that an ‘algorithmic fairness audit’ should be part of the development and implementation process. Such an audit should facilitate the evaluation and reporting of an AI algorithms’ performance on specific subpopulations instead of only on the total population, which is the current standard (Fig. 1).

Fig. 1

Schematic overview of the intensive care medicine artificial intelligence fairness audit. Conventional clinical patient data (e.g., vital signs, laboratory values, and demographics) are typically used to train an AI algorithm and its performance is then evaluated on an internal or external test dataset to see whether it works in the first place. Next, the fairness audit should take place: evaluate model performance across multiple subpopulations (for example, based on ethnicity, age, gender, or other characteristics). If concerns regarding algorithmic fairness arise, re-training and/or re-calibration should be considered (go/no-go). *Protected personal characteristics such as ethnicity, socioeconomic information, and others need to be collected in patient health records. AI = artificial intelligence Although we acknowledge the complexity of algorithmic fairness, several practical steps could help to prevent unequal algorithms making their way to ICU patients’ bedside. We therefore outline a couple of them. Firstly, a common understanding of protected personal characteristics (e.g., age, gender, and ethnicity) that, at minimum, should be obtained is crucial to adequately design and perform fairness audits. The real question here is: To which protected personal characteristics should an AI algorithm definitely be fair? In answering this question, we must obviously account for historical (racial) and societal disparities [11] and intensify dialogue between key stakeholders (data protection authorities, editorial teams, patients, ICU professionals, and ethical review boards). In addition, it is known that there may exist ethnical differences in disease manifestation and comorbidity; for example, multimorbidity is more common among African-American patients than white patients [12]. With the above in mind, a list of protected personal characteristics should be composed to uniformly perform and report fairness audits. Secondly, and based on the former, relevant protected personal characteristics need to be routinely and uniformly collected in patient health records, worldwide. For example, ethnicity and socioeconomic information are typically protected under human rights codes but are unavailable in most ICUs outside of the USA, while age and gender are widely available [8]. In practical terms, this means we have to define specific subpopulations (e.g., define ethnic groups), train healthcare professionals, standardize data collections, and potentially adjust local policies, among others. Several recommendations could already help to collect such information [13], such as implement standardized collection forms in regular health checkups within primary care, link data from primary and secondary care, implement strict terms for use of such data, and periodically evaluate data quality and completeness. Also, several lessons can be learned from existing examples such as the UK, where ethnicity data are already routinely recorded in patient health records. Lastly, we need to determine which metrics should be used to assess fairness; are standard AI performance metrics (discrimination and calibration) sufficient or do we need fairness-specific metrics? There is a wealth of metrics that can particularly be used to assess whether treatments or predictions are equally divided over individuals or protected patient groups on multiple levels (e.g., are true positives and false positives equally distributed over protected and unprotected groups?, is the false negative and false positive ratio the same between protected and unprotected groups?, or do patients from protected and unprotected groups with the same risk prediction have the same probability of correctly belonging to the positive class?) [10]. The most appropriate metric to choose mainly depends on the context of the clinical problem; there is no one size that fits all [14]. As a starting point, an AI algorithms’ discrimination and calibration should be evaluated on various subpopulations before making the step toward clinical implementation. Also, depending on the context additional fairness-specific metrics should be determined. To improve algorithmic fairness, we therefore advocate for a standard fairness audit based on readily available data (age and gender), when developing and implementing AI algorithms in the ICU. Parallel to this, protected personal characteristics should be identified and collected to thoroughly evaluate fairness outcomes on multiple aspects in the future. Also, as the maturity of AI in intensive care medicine is expected to shift in the upcoming years from development to clinical implementation, (unforeseen) ethical considerations become increasingly important [15]. An AI fairness audit should be part of a larger set of ethical considerations to warrant safe and fair usage of AI in the ICU field. We are currently composing such a set based on the WHO guidance on AI ethics [5] (PROSPERO database ID: CRD42022347871).

13 in total

1. Racial Disparities in ICU Outcomes: A Systematic Review.

Authors: Samuel K McGowan; Kalli A Sarigiannis; Samuel C Fox; Michael A Gottlieb; Elaine Chen
Journal: Crit Care Med Date: 2022-01-01 Impact factor: 7.598

2. Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset.

Authors: Chuizheng Meng; Loc Trinh; Nan Xu; James Enouen; Yan Liu
Journal: Sci Rep Date: 2022-05-03 Impact factor: 4.996

3. Strategies to record and use ethnicity information in routine health data.

Authors: Ash Routen; Ashley Akbari; Amitava Banerjee; Srinivasa Vittal Katikireddi; Rohini Mathur; Martin McKee; Vahe Nafilyan; Kamlesh Khunti
Journal: Nat Med Date: 2022-07 Impact factor: 87.241

Review 4. The future of intensive care: delirium should no longer be an issue.

Authors: Katarzyna Kotfis; Irene van Diem-Zaal; Shawniqua Williams Roberson; Mark van den Boogaard; Yahya Shehabi; E Wesley Ely; Marek Sietnicki
Journal: Crit Care Date: 2022-07-05 Impact factor: 19.334

5. Moving from bytes to bedside: a systematic review on the use of artificial intelligence in the intensive care unit.

Authors: Davy van de Sande; Michel E van Genderen; Joost Huiskens; Diederik Gommers; Jasper van Bommel
Journal: Intensive Care Med Date: 2021-06-05 Impact factor: 17.440

6. Equitably Allocating Resources during Crises: Racial Differences in Mortality Prediction Models.

Authors: Deepshikha Charan Ashana; George L Anesi; Vincent X Liu; Gabriel J Escobar; Christopher Chesley; Nwamaka D Eneanya; Gary E Weissman; William Dwight Miller; Michael O Harhay; Scott D Halpern
Journal: Am J Respir Crit Care Med Date: 2021-07-15 Impact factor: 30.528

Review 7. Timing of antibiotic therapy in the ICU.

Authors: Marin H Kollef; Andrew F Shorr; Matteo Bassetti; Jean-Francois Timsit; Scott T Micek; Andrew P Michelson; Jose Garnacho-Montero
Journal: Crit Care Date: 2021-10-15 Impact factor: 9.097

8. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.

Authors: Laleh Seyyed-Kalantari; Haoran Zhang; Matthew B A McDermott; Irene Y Chen; Marzyeh Ghassemi
Journal: Nat Med Date: 2021-12-10 Impact factor: 87.241

9. MIMIC-III, a freely accessible critical care database.

Authors: Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal: Sci Data Date: 2016-05-24 Impact factor: 6.444

10. Examining multimorbidity differences across racial groups: a network analysis of electronic medical records.

Authors: Pankush Kalgotra; Ramesh Sharda; Julie M Croff
Journal: Sci Rep Date: 2020-08-11 Impact factor: 4.379