| Literature DB >> 35851286 |
Paul Festor1,2, Yan Jia3,4, Anthony C Gordon5, A Aldo Faisal6,7, Ibrahim Habli3,8, Matthieu Komorowski1,5.
Abstract
OBJECTIVES: Establishing confidence in the safety of Artificial Intelligence (AI)-based clinical decision support systems is important prior to clinical deployment and regulatory approval for systems with increasing autonomy. Here, we undertook safety assurance of the AI Clinician, a previously published reinforcement learning-based treatment recommendation system for sepsis.Entities:
Keywords: artificial intelligence; decision support systems, clinical; machine Learning; safety Management
Mesh:
Year: 2022 PMID: 35851286 PMCID: PMC9289024 DOI: 10.1136/bmjhci-2022-100549
Source DB: PubMed Journal: BMJ Health Care Inform ISSN: 2632-1009
Figure 1Safety assurance methodology for the AI Clinician (adapted from AMLAS3). The process alternates between defining safety constraints to mitigate clinical hazards in the context of use of the system (left panel) and model adjustments and evaluation against the defined constraints. AMLAS, Assurance of Machine Learning in Autonomous Systems.
Description and rationale for the four chosen clinical scenarios
| Hazardous clinical scenario | Clinical safety impact | Prevalence in MIMIC-III dataset | Safety-driven refinement of RL model | Updated safety evidence | Caveats or uncertainties |
| A: giving no vasopressors and low or no fluids (≤20 mL/hour) to a patient with low BP. | Sustained untreated hypotension leading to organ failure and death. | MAP <55: | Add 30 points of intermediate penalty if the condition is met. | The modified ‘safe’ policy had lower rate of unsafe behaviour than original AI policy, in three scenarios and the difference was not significant in the fourth (see figure 4) | No clear threshold for defining hypotension |
| B: giving the maximum vasopressors dose (>0.65 µg/kg/min) to a patient with high BP. | Excessive blood pressure leading to increased risk of organ failure, bleeding and stroke. | MAP >95: | No clear threshold for defining hypertension. Some patients may have a clinical indication for high BP targets (eg, TBI). | ||
| C: giving no fluids to a patient with low BP and low CVP. | Hypotensive and likely hypovolaemic patient left untreated. | MAP≤55 and CVP≤5: | Measuring the fluid volume status is very difficult. CVP is a poor proxy but the closest approximate we have available in the data. | ||
| D: giving the maximum dose of fluids (>240 mL/hour) to a patient with normal BP, high cumulative fluid balance and high CVP. | Giving excessive fluids to a septic patient who is unlikely to be hypovolaemic is harmful, leading to fluid accumulation, known risk factor for organ failure and poor outcomes. | MAP≥75 and cumulative balance >10 L and CVP ≥15: |
BP, blood pressure; CVP, central venous pressure (expressed in mm Hg); MAP, mean arterial pressure (expressed in mm Hg); TBI, traumatic brain injury.
Figure 2Visualisation of the differences between the proportion of human and AI unsafe decisions, and statistical significance. Each subplot corresponds to one scenario. For each scenario, tests were run across a range of blood pressure thresholds. Within each subplot, the top plot shows the variation of the number of human and AI unsafe decisions for a range of bp thresholds, and the bottom plot shows the statistical significance of this difference. The AI consistently leads to a lower number of unsafe decisions in all scenarios, except for scenario C where the difference was not statistically significant. Scenarios C and D reflect our previous study6 showing that the AI Clinician is more conservative in terms of fluid doses.
Figure 3SHapley additive explanations (SHAP) relative feature importance analysis,12 highlighting which patient characteristics were associated with ‘unsafe’ clinician behaviour in the four scenarios. The most important features are at the top, and the less important at the bottom. On each line, a positive SHAP value indicates that the feature is associated with unsafe decisions, and the spread indicates the strength of the association. The colour indicates whether the influence stems from high or low feature values. For example, in scenario A, a low SOFA score (blue SHAP values) is associated with a higher risk (positive SHAP values) of unsafe decisions. For a glossary of terms and abbreviations, see online supplemental appendix C. SOFA, Sequential Organ Failure Assessment.
Figure 4Results from the model retraining with added safety constraints. (A) Proportion of unsafe decisions in the four scenarios (see text) for three agents: human clinicians (behaviour policy), the original AI Clinician (learnt initial policy) and the modified AI Clinician (learnt safe policy). The original AI Clinician has a lower proportion of unsafe behaviour than human clinicians, while the modified ‘safe’ policy does better than the original AI Clinician. (B) Off-policy policy evaluation of the original and the modified AI Clinician policies. The value of the modified policy, with the added safety constraints, is slightly lower than the unrestricted policy (median, IQR): 90 (89.2–90) for the modified policy versus 99.5 (99.5–99.5) for the original policy, both being higher than the clinician’s (0, –1.5 to 0.6). Bottom: distribution of 25 actions for initial (C) and improved (D) policies. The safe AI policy recommends more low-dose vasopressors, likely to try and correct instances of hypotension left untreated.