Literature DB >> 35241790

Crossing the chasm from model performance to clinical impact: the need to improve implementation and evaluation of AI.

Jayson S Marwaha^1,2, Joseph C Kvedar^3,4.

Abstract

Entities: Chemical

Year: 2022 PMID： 35241790 PMCID： PMC8894388 DOI： 10.1038/s41746-022-00572-2

Source DB: PubMed Journal: NPJ Digit Med ISSN： 2398-6352

× No keyword cloud information.

Artificial intelligence (AI) has been the subject of considerable interest for many years for its potential to improve clinical care—yet its actual impact on patient outcomes when deployed in clinical settings remains largely unknown. In a recent systematic review by Zhou et al.[1], the authors surprisingly show that its impact so far has been quite limited. They reviewed 65 randomized controlled trials (RCTs) evaluating AI-based clinical interventions and found that there was no clinical benefit of using AI prediction tools compared to the standard of care in nearly 40% of studies. Among a subset of trials that the authors identified as having a low risk of bias, the clinical benefit of using deep learning (DL) predictive models over traditional statistical (TS) risk calculators was only minimal, and there was no benefit in using machine learning (ML) models over TS tools. Somewhat counterintuitively, most of the AI tools in these trials exhibited an excellent area under the receiver operating characteristic (AUROC; a common performance metric for predictive models) during development (median AUROC 0.81, IQR 0.75–0.90) and validation (median AUROC 0.83, IQR 0.79–0.97): a humbling reminder that robust predictive utility does not guarantee clinical impact at the bedside. As the science of building accurate predictive models progresses, our ability to translate these advancements into real-world clinical utility remains comparatively limited. How can we bridge this gap between AUROCs and clinical benefit?

Building out the implementation science of AI

Limited user adoption—due to lack of clinician trust and model interpretability among many other reasons—has long been cited as a key barrier to clinical impact[2,3]. Encouraging providers to thoughtfully incorporate a model’s prediction into their decision and ultimate behavior regarding patient care—particularly in scenarios where predictions by the model and the human diverge—is a challenge with no clear solution yet. However, significant hurdles remain even after clinician buy-in. A successful AI tool is one that triggers a tailored workflow: the tool’s prediction must be translated into the most appropriate human intervention to generate clinical value[4]. Recent examples of clinically-impactful predictive models are ones that have been coupled with the optimal real-world intervention for each possible model output[5]. Unfortunately, little work exists on this issue: interventions are often selected somewhat arbitrarily or left up to clinician judgement[4]. We must develop methods for systematically identifying the best possible intervention to pair with an accurate prediction.

Using real-world evidence to evaluate AI

To better understand the impact of AI at the bedside, we must embrace new ways of evaluating it. To date, there have been few randomized trials on this topic, as highlighted by Zhou et al. However, traditional time-consuming and costly RCTs are not the only way to measure the impact of these tools. To hasten the pace and lower the costs of answering this question, we must also leverage rich sources of observational data (e.g. administrative claims databases and electronic health records [EHRs]) and causal inference methods to passively monitor the impact of AI in clinical practice, as an adjunct to clinical trials. The US Food and Drug Administration (FDA) has begun using real-world data to inform regulatory decisions for drugs and devices[6]; researchers studying AI should similarly adopt this approach.

Exploring new applications of AI

Zhou et al. reveal that the scope of applications of AI at the bedside has been almost entirely limited to making individual diagnostic and prognostic predictions; the primary outcomes for trials evaluating AI have been limited to performance on specific clinical tasks (e.g., adenoma detection rate on endoscopy); and superiority in these trials has typically been defined as exceeding human performance. To uncover additional opportunities for AI to create value for health systems, researchers must be more flexible in identifying potential use cases, selecting outcomes of interest, and defining clinical benefit. Providing targeted outreach to vulnerable patients[7], enabling rapid comparative effectiveness studies at the bedside[8], and automating burdensome administrative tasks[9] should be further explored as applications for AI. Improving population health metrics[7], reducing administrative costs[10], and alleviating constraints on providers’ resources and time should be examined as outcomes in future trials. Furthermore, the narrow definition of a beneficial AI tool as one that outcompetes the human should be expanded to include one that effectively complements the human—either by matching human performance on repetitive tasks, or by forming a synergistic human-computer intervention that accomplishes beyond what either could do alone[11,12]. The findings by Zhou et al. highlight several important opportunities to advance the field of clinical AI. Expanded applications, broader definitions of clinical benefit, new evaluation methods, and tailored interventions are just a few of many possible considerations that may help bridge the gap between in silico predictive performance and real-world utility.

9 in total

Review 1. The false hope of current approaches to explainable artificial intelligence in health care.

Authors: Marzyeh Ghassemi; Luke Oakden-Rayner; Andrew L Beam
Journal: Lancet Digit Health Date: 2021-11

2. Prediction of patient disposition: comparison of computer and human approaches and a proposed synthesis.

Authors: Yuval Barak-Corren; Isha Agarwal; Kenneth A Michelson; Todd W Lyons; Mark I Neuman; Susan C Lipsett; Amir A Kimia; Matthew A Eisenberg; Andrew J Capraro; Jason A Levy; Joel D Hudgins; Ben Y Reis; Andrew M Fine
Journal: J Am Med Inform Assoc Date: 2021-07-30 Impact factor: 4.497

3. Trust and medical AI: the challenges we face and the expertise needed to overcome them.

Authors: Thomas P Quinn; Manisha Senadeera; Stephan Jacobs; Simon Coghlan; Vuong Le
Journal: J Am Med Inform Assoc Date: 2021-03-18 Impact factor: 4.497

4. A framework for making predictive models useful in practice.

Authors: Kenneth Jung; Sehj Kashyap; Anand Avati; Stephanie Harman; Heather Shaw; Ron Li; Margaret Smith; Kenny Shum; Jacob Javitz; Yohan Vetteth; Tina Seto; Steven C Bagley; Nigam H Shah
Journal: J Am Med Inform Assoc Date: 2021-06-12 Impact factor: 4.497

5. Comment on: Truth and truthiness: evidence, experience and clinical judgement in surgery.

Authors: J S Marwaha; B Beaulieu-Jones; W Yuan; G A Brat
Journal: Br J Surg Date: 2021-12-01 Impact factor: 11.122

6. Precision population analytics: population management at the point-of-care.

Authors: Paul C Tang; Sarah Miller; Harry Stavropoulos; Uri Kartoun; John Zambrano; Kenney Ng
Journal: J Am Med Inform Assoc Date: 2021-03-01 Impact factor: 4.497

Review 7. Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review.

Authors: Qian Zhou; Zhi-Hang Chen; Yi-Heng Cao; Sui Peng
Journal: NPJ Digit Med Date: 2021-10-28

8. Predictive analytics and tailored interventions improve clinical outcomes in older adults: a randomized controlled trial.

Authors: Sara Bersche Golas; Mariana Nikolova-Simons; Ramya Palacholla; Jorn Op den Buijs; Gary Garberg; Allison Orenstein; Joseph Kvedar
Journal: NPJ Digit Med Date: 2021-06-10

9 in total