| Literature DB >> 35511151 |
Hossein Estiri1,2, Zachary H Strasser1,2, Sina Rashidian3,4, Jeffrey G Klann1,2,5, Kavishwar B Wagholikar1,2, Thomas H McCoy6, Shawn N Murphy1,5,7,8.
Abstract
OBJECTIVE: The increasing translation of artificial intelligence (AI)/machine learning (ML) models into clinical practice brings an increased risk of direct harm from modeling bias; however, bias remains incompletely measured in many medical AI applications. This article aims to provide a framework for objective evaluation of medical AI from multiple aspects, focusing on binary classification models.Entities:
Keywords: COVID-19; bias; electronic health records; medical AI; predictive model
Mesh:
Year: 2022 PMID: 35511151 PMCID: PMC9277645 DOI: 10.1093/jamia/ocac070
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Figure 1.Generating bias metrics from MLHO models using EHR data from retrospective and prospective COVID-19 cohorts. *The dot plot is a schematic of the COVID-19 patient population over time. MLHO was applied to EHR data from the retrospective cohort to develop predictive models and produce bias metrics. Prospective bias metrics were generated by applying the retrospective predictive models to prospective cohorts.
Figure 2.Changes in the 2 model-level metrics for discrimination (AUROC—left panels) and error (Brier score—right panel) by group and over time. *The top-10 models for each outcome are broken down by race, ethnicity, gender, and over time.
Figure 3.Comparing model-level performance metrics using the Wilcoxon rank-sum test. *A color-coded cell means some type of bias compared to the overall model. −−− and −− represent significantly smaller than the overall model (at P < .001 and P < .01, respectively). +++ and ++ represent significantly larger than the overall model (at P < .001 and P < .01, respectively). **Discrimination power and error are opposing measures—better discrimination means smaller error.
Figure 4.The diagnostic reliability (calibration) diagrams for each outcome broken by group and temporal direction. *The background lines represent reliability curves from each of the top models selected for prospective evaluation.
Figure 5.Mean absolute error of predictions across patient age.