Literature DB >> 32845249

Is Artificial Intelligence Better Than Human Clinicians in Predicting Patient Outcomes?

Abstract

In contrast with medical imaging diagnostics powered by artificial intelligence (AI), in which deep learning has led to breakthroughs in recent years, patient outcome prediction poses an inherently challenging problem because it focuses on events that have not yet occurred. Interestingly, the performance of machine learning-based patient outcome prediction models has rarely been compared with that of human clinicians in the literature. Human intuition and insight may be sources of underused predictive information that AI will not be able to identify in electronic data. Both human and AI predictions should be investigated together with the aim of achieving a human-AI symbiosis that synergistically and complementarily combines AI with the predictive abilities of clinicians. ©Joon Lee. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 26.08.2020.

Entities: Disease Gene Species

Keywords: artificial intelligence; human-AI symbiosis; human-generated predictions; machine learning; patient outcome prediction

Mesh：

Year: 2020 PMID： 32845249 PMCID： PMC7481865 DOI： 10.2196/19918

Source DB: PubMed Journal: J Med Internet Res ISSN： 1438-8871 Impact factor: 5.428

Absence of Human-Generated Predictions in Patient Outcome Research

In recent years, there has been a proliferation of patient outcome prediction research that applies machine learning (ML) and artificial intelligence (AI) to electronic health records (EHRs) and other clinical and administrative health data. The central premises are that 1) complex health data contains predictive information that ML can effectively extract and transform into a predictive algorithm and 2) accurate prediction of patient outcomes can facilitate early, preventative intervention and more efficient health care resource allocation through identification of high-risk patients. For example, predicting which intensive care unit patients are likely to develop sepsis can prompt early initiation of fluid resuscitation, vasopressor therapy, or antibiotics, which can reduce damage from insufficient organ perfusion [1,2]. Although AI has been enormously successful in medical imaging diagnostics, where the medical condition of interest is already present or absent in the images (eg, diagnosis of diabetic retinopathy [3] and classification of skin legions [4]), patient outcome prediction poses an inherent challenge of predicting events that have not yet occurred (eg, mortality, length of stay, and readmission) [5]. This challenge is common to both AI and human clinicians. Interestingly, while human and AI predictions are often directly compared in medical imaging research [6-8], patient outcome prediction studies tend to focus only on ML and seldom investigate human predictions. This is corroborated by a number of systematic reviews and meta-analyses, which target only ML methods [9-14] or empirical methods [15-19]. This gap in the literature is coherent across a wide range of medical specialties and diseases, including trauma [9], cancer [11], neurosurgery [10], depression [12], acute gastrointestinal bleeding [13], sepsis [14], acute liver failure [15], ischemic stroke [16], thermal injury [17], and cardiovascular disease [18,19]. The absence of human predictions appears to be a recent trend, as older literature prior to the current widespread use of modern ML and EHRs includes more comparisons of human and AI predictions [20-23]. There are several possible reasons why human performance is more frequently studied in medical imaging than in patient outcome prediction. First, radiologists are trained to analyze, interpret, and classify images, whereas most other medical specialists are not trained to directly predict patient outcomes. While accurate prognostic information can certainly be helpful in any medical specialty, it is usually generated by empirical risk scoring systems such as the Framingham Risk Score [24] or Acute Physiology and Chronic Health Evaluation (APACHE) [25] rather than by human clinicians. Second, human predictions in medical imaging are readily available from routine clinical practice or can be generated systematically by trained radiologists. Conversely, it is rare for clinicians in other medical specialties to record patient outcome predictions that they generate on a regular basis. Third, the implicit assumption is that humans cannot accurately predict patient outcomes because analysis of complex, high-dimensional clinical data may be required; moreover, recall bias is rampant in the human mind.

Humans and AI Should Work as a Team

However, there is no reason to rule out the possibility that human clinicians can outperform AI in patient outcome prediction, at least in some clinical scenarios. While AI can only access information that can be recorded in the form of electronic data, human clinicians interact face-to-face with their patients and have access to both clinical and contextual information. The qualitative information collected via clinicians’ five senses can be critical in patient outcome prediction; however, this information is mostly absent in EHRs, if it is possible to record it at all. Although some qualitative observations can be recorded in EHRs as free-text notes, such as nursing notes, these data are logged in a limited, inconsistent fashion. Human intuition and insight may well be the most underused resources in patient outcome prediction. While the performance of ML-based patient outcome prediction models appears impressive on paper, the most accurately predicted cases tend to be “easy” cases where the likely outcomes are already obvious to human clinicians [26]. This further supports the hypothesis that human clinicians perform well in patient outcome prediction. On the other hand, AI easily outperforms humans in processing, analyzing, and finding patterns in complex, high-dimensional data [27]. As demonstrated by IBM Watson [28] and AlphaGo [29], the memory, attention, and information processing abilities of AI vastly exceed the capabilities of human cognition [30]. This AI advantage is crucial for extracting and using data-driven insights from big data [31]; it is also key to the recent successful breakthroughs in ML, particularly in deep learning [32], in a number of problem domains, including medical imaging [33]. In addition, AI does not suffer from fatigue [34] or cognitive biases (eg, recall bias) [35] as humans do. However, even if AI outperforms human clinicians in patient outcome prediction, human performance represents a more meaningful benchmark that puts AI performance in better perspective. Understanding the superiority of AI in comparison with humans can facilitate adoption of AI technology in real patient care. The bottom line is that both AI and humans can make unique contributions to patient outcome prediction, and they should help each other to maximize predictive performance. Patient outcome prediction research should aim for human-AI symbiosis, where the respective predictive abilities of AI and human clinicians are combined in a synergistic and complementary way [36]. Given the challenging nature of patient outcome prediction, creating an AI to act alone without human help will simply lead to suboptimal predictive performance because even state-of-the-art ML technology cannot leverage information that is not present in the data [26]. Another way for AI and humans to work together is via the human-in-the-loop model, where humans directly inform machines on how to learn from the data at hand by providing guidance based on human intuition and knowledge. The term “interactive machine learning” [37] was coined to describe this paradigm; it encompasses more well-known branches of ML, such as active learning, where humans select which data points should be labelled. This human-in-the-loop approach can greatly reduce the computational complexity of some ML problems; for example, it has shown promising results in protein folding [38]. Moreover, in the field of human-computer interaction, the human-in-the-loop concept has been studied in the context of vehicle control [39], security [40,41], and decision-making [40,42]. Knowledge from these application areas can potentially inform the design of human-AI symbiosis in patient outcome prediction. AI and human prediction performance may vary across different types of patients. Complex patterns in data can be more predictive than human intuition in certain patient subgroups, and the opposite may be true in other subpopulations. An investigation of how AI and human predictions can be optimally combined for different types of patients could directly contribute to advancing precision medicine. A better understanding of the respective predictive powers of AI and humans in various clinical scenarios can also help increase human trust in AI (eg, “For this type of patient, I need to trust AI more because most predictive information is buried in the complex data”). This can facilitate evidence-based adoption of AI technology. For human clinicians to completely trust AI, it is necessary to understand why an algorithm arrives at a given conclusion; this requires transparency, traceability, and causality. The active field of explainable AI has been producing useful methods, such as SHapley Additive exPlanations (SHAP) [43], that can help explain how ML models work at an algorithmic level (this explanation is almost always based on correlation rather than causation); however, human clinicians ultimately want to elevate this algorithmic explainability to a model that is understandable by humans with sufficient causal understanding, also known as causability [44]. Therefore, mapping explainability to causability will be key in achieving true human-AI symbiosis. One major roadblock to the proposed human-AI symbiosis is the need to collect a large number of human predictions in a variety of clinical scenarios, which is labor-intensive and adds to clinicians’ workloads. Seamlessly integrated electronic prediction collection platforms (eg, embedded in a multi-center EHR system) can minimize this burden and enable large-scale prediction collection.

From Patient Outcome Prediction to Real Impact

Once predictive performance is optimized via human-AI symbiosis, the next important step is to formulate clinical guidelines so that the predictive information is actionable. This is a crucial step, as accurate predictions alone will not lead to any real impact; rather, the combination of accurate predictions and appropriate interventions by clinicians will have a greater effect [5,26]. The ultimate goal of patient outcome prediction is to improve patient outcomes and decrease health care costs through early intervention and efficient use of health care resources. To prove that this goal has been met, we will need to perform randomized clinical trials of AI-driven patient care [45], such as that conducted by Wijnberge and colleagues [46]. In addition to simply comparing AI with human work alone, these randomized clinical trials should investigate a promising third species: human-AI symbiosis.

37 in total

1. Clinical versus actuarial predictions of violence of patients with mental illnesses.

Authors: W Gardner; C W Lidz; E P Mulvey; E C Shaw
Journal: J Consult Clin Psychol Date: 1996-06

2. Mastering the game of Go with deep neural networks and tree search.

Authors: David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2016-01-28 Impact factor: 49.962

3. From Local Explanations to Global Understanding with Explainable AI for Trees.

Authors: Scott M Lundberg; Gabriel Erion; Hugh Chen; Alex DeGrave; Jordan M Prutkin; Bala Nair; Ronit Katz; Jonathan Himmelfarb; Nisha Bansal; Su-In Lee
Journal: Nat Mach Intell Date: 2020-01-17

4. Randomized Clinical Trials of Artificial Intelligence.

Authors: Derek C Angus
Journal: JAMA Date: 2020-02-17 Impact factor: 56.272

5. Dermatologist-level classification of skin cancer with deep neural networks.

Authors: Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun
Journal: Nature Date: 2017-01-25 Impact factor: 49.962

6. The Canadian C-spine rule performs better than unstructured physician judgment.

Authors: Glen Bandiera; Ian G Stiell; George A Wells; Catherine Clement; Valerie De Maio; Katherine L Vandemheen; Gary H Greenberg; Howard Lesiuk; Robert Brison; Daniel Cass; Jonathan Dreyer; Mary A Eisenhauer; Iain Macphail; R Douglas McKnight; Laurie Morrison; Mark Reardon; Michael Schull; James Worthington
Journal: Ann Emerg Med Date: 2003-09 Impact factor: 5.721

Review 7. High-performance medicine: the convergence of human and artificial intelligence.

Authors: Eric J Topol
Journal: Nat Med Date: 2019-01-07 Impact factor: 53.440

Review 8. Prediction models for cardiovascular disease risk in the general population: systematic review.

Authors: Johanna A A G Damen; Lotty Hooft; Ewoud Schuit; Thomas P A Debray; Gary S Collins; Ioanna Tzoulaki; Camille M Lassale; George C M Siontis; Virginia Chiocchia; Corran Roberts; Michael Maia Schlüssel; Stephen Gerry; James A Black; Pauline Heus; Yvonne T van der Schouw; Linda M Peelen; Karel G M Moons
Journal: BMJ Date: 2016-05-16

Review 9. Clinical prediction models for mortality and functional outcome following ischemic stroke: A systematic review and meta-analysis.

Authors: Marion Fahey; Elise Crayton; Charles Wolfe; Abdel Douiri
Journal: PLoS One Date: 2018-01-29 Impact factor: 3.240

Review 10. Causability and explainability of artificial intelligence in medicine.

Authors: Andreas Holzinger; Georg Langs; Helmut Denk; Kurt Zatloukal; Heimo Müller
Journal: Wiley Interdiscip Rev Data Min Knowl Discov Date: 2019-04-02

6 in total

1. How Clinicians Perceive Artificial Intelligence-Assisted Technologies in Diagnostic Decision Making: Mixed Methods Approach.

Authors: Deana Shevit Goldin; Hyeyoung Hah
Journal: J Med Internet Res Date: 2021-12-16 Impact factor: 5.428

2. Clinician Preimplementation Perspectives of a Decision-Support Tool for the Prediction of Cardiac Arrhythmia Based on Machine Learning: Near-Live Feasibility and Qualitative Study.

Authors: Stina Matthiesen; Søren Zöga Diederichsen; Mikkel Klitzing Hartmann Hansen; Christina Villumsen; Mats Christian Højbjerg Lassen; Peter Karl Jacobsen; Niels Risum; Bo Gregers Winkel; Berit T Philbert; Jesper Hastrup Svendsen; Tariq Osman Andersen
Journal: JMIR Hum Factors Date: 2021-11-26

3. Association of radiation dose intensity with overall survival in patients with distant metastases.

Authors: Johnny Kao; Mark K Farrugia; Samantha Frontario; Amanda Zucker; Emily Copel; John Loscalzo; Ashish Sangal; Boramir Darakchiev; Anurag Singh; Symeon Missios
Journal: Cancer Med Date: 2021-09-30 Impact factor: 4.452

4. Physicians' Perceptions of and Satisfaction With Artificial Intelligence in Cancer Treatment: A Clinical Decision Support System Experience and Implications for Low-Middle-Income Countries.

Authors: Srinivas Emani; Angela Rui; Hermano Alexandre Lima Rocha; Rubina F Rizvi; Sergio Ferreira Juaçaba; Gretchen Purcell Jackson; David W Bates
Journal: JMIR Cancer Date: 2022-04-07

5. CREATE: A New Data Resource to Support Cardiac Precision Health.

Authors: Seungwon Lee; Bing Li; Elliot A Martin; Adam G D'Souza; Jason Jiang; Chelsea Doktorchik; Danielle A Southern; Joon Lee; Natalie Wiebe; Hude Quan; Cathy A Eastwood
Journal: CJC Open Date: 2020-12-27

6. A Clinical Decision Support System for the Prediction of Quality of Life in ALS.

Authors: Anna Markella Antoniadi; Miriam Galvin; Mark Heverin; Lan Wei; Orla Hardiman; Catherine Mooney
Journal: J Pers Med Date: 2022-03-10

6 in total