Literature DB >> 35504931

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset.

Chuizheng Meng1, Loc Trinh1, Nan Xu1, James Enouen1, Yan Liu2.   

Abstract

The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examination of both datasets and models. In this work, we focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset, and conduct comprehensive analyses of interpretability as well as dataset representation bias and prediction fairness of deep learning models for in-hospital mortality prediction. First, we analyze the interpretability of deep learning mortality prediction models and observe that (1) the best-performing interpretability method successfully identifies critical features for mortality prediction on various prediction models as well as recognizing new important features that domain knowledge does not consider; (2) prediction models rely on demographic features, raising concerns in fairness. Therefore, we then evaluate the fairness of models and do observe the unfairness: (1) there exists disparate treatment in prescribing mechanical ventilation among patient groups across ethnicity, gender and age; (2) models often rely on racial attributes unequally across subgroups to generate their predictions. We further draw concrete connections between interpretability methods and fairness metrics by showing how feature importance from interpretability methods can be beneficial in quantifying potential disparities in mortality predictors. Our analysis demonstrates that the prediction performance is not the only factor to consider when evaluating models for healthcare applications, since high prediction performance might be the result of unfair utilization of demographic features. Our findings suggest that future research in AI models for healthcare applications can benefit from utilizing the analysis workflow of interpretability and fairness as well as verifying if models achieve superior performance at the cost of introducing bias.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 35504931      PMCID: PMC9065125          DOI: 10.1038/s41598-022-11012-2

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.996


  20 in total

1.  PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.

Authors:  A L Goldberger; L A Amaral; L Glass; J M Hausdorff; P C Ivanov; R G Mark; J E Mietus; G B Moody; C K Peng; H E Stanley
Journal:  Circulation       Date:  2000-06-13       Impact factor: 29.690

2.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine.

Authors:  J L Vincent; R Moreno; J Takala; S Willatts; A De Mendonça; H Bruining; C K Reinhart; P M Suter; L G Thijs
Journal:  Intensive Care Med       Date:  1996-07       Impact factor: 17.440

3.  Can AI Help Reduce Disparities in General Medical and Mental Health Care?

Authors:  Irene Y Chen; Peter Szolovits; Marzyeh Ghassemi
Journal:  AMA J Ethics       Date:  2019-02-01

4.  Benchmarking deep learning models on large healthcare datasets.

Authors:  Sanjay Purushotham; Chuizheng Meng; Zhengping Che; Yan Liu
Journal:  J Biomed Inform       Date:  2018-06-05       Impact factor: 6.317

5.  A new severity of illness scale using a subset of Acute Physiology And Chronic Health Evaluation data elements shows comparable predictive accuracy.

Authors:  Alistair E W Johnson; Andrew A Kramer; Gari D Clifford
Journal:  Crit Care Med       Date:  2013-07       Impact factor: 7.598

6.  Evaluating the Visualization of What a Deep Neural Network Has Learned.

Authors:  Wojciech Samek; Alexander Binder; Gregoire Montavon; Sebastian Lapuschkin; Klaus-Robert Muller
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2017-11       Impact factor: 10.451

7.  Minimax Pareto Fairness: A Multi Objective Perspective.

Authors:  Natalia Martinez; Martin Bertran; Guillermo Sapiro
Journal:  Proc Mach Learn Res       Date:  2020-07

8.  The Influence of Race/Ethnicity and Education on Family Ratings of the Quality of Dying in the ICU.

Authors:  Janet J Lee; Ann C Long; J Randall Curtis; Ruth A Engelberg
Journal:  J Pain Symptom Manage       Date:  2015-09-16       Impact factor: 3.612

9.  The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults.

Authors:  W A Knaus; D P Wagner; E A Draper; J E Zimmerman; M Bergner; P G Bastos; C A Sirio; D J Murphy; T Lotring; A Damiano
Journal:  Chest       Date:  1991-12       Impact factor: 9.410

10.  There Is Hope After All: Quantifying Opinion and Trustworthiness in Neural Networks.

Authors:  Mingxi Cheng; Shahin Nazarian; Paul Bogdan
Journal:  Front Artif Intell       Date:  2020-07-31
View more
  2 in total

1.  Integrated multimodal artificial intelligence framework for healthcare applications.

Authors:  Luis R Soenksen; Yu Ma; Cynthia Zeng; Leonard Boussioux; Kimberly Villalobos Carballo; Liangyuan Na; Holly M Wiberg; Michael L Li; Ignacio Fuentes; Dimitris Bertsimas
Journal:  NPJ Digit Med       Date:  2022-09-20

2.  Algorithmic fairness audits in intensive care medicine: artificial intelligence for all?

Authors:  Davy van de Sande; Jasper van Bommel; Eline Fung Fen Chung; Diederik Gommers; Michel E van Genderen
Journal:  Crit Care       Date:  2022-10-18       Impact factor: 19.334

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.