Literature DB >> 34811466

Advancing health equity with artificial intelligence.

Nicole M Thomasian^1,2, Carsten Eickhoff^3,4, Eli Y Adashi⁵.

Abstract

Population and public health are in the midst of an artificial intelligence revolution capable of radically altering existing models of care delivery and practice. Just as AI seeks to mirror human cognition through its data-driven analytics, it can also reflect the biases present in our collective conscience. In this Viewpoint, we use past and counterfactual examples to illustrate the sequelae of unmitigated bias in healthcare artificial intelligence. Past examples indicate that if the benefits of emerging AI technologies are to be realized, consensus around the regulation of algorithmic bias at the policy level is needed to ensure their ethical integration into the health system. This paper puts forth regulatory strategies for uprooting bias in healthcare AI that can inform ongoing efforts to establish a framework for federal oversight. We highlight three overarching oversight principles in bias mitigation that maps to each phase of the algorithm life cycle.

Entities: Chemical

Keywords: Algorithmic bias; Artificial intelligence; Health equity; Health policy; Machine learning

Mesh：

Year: 2021 PMID： 34811466 PMCID： PMC8607970 DOI： 10.1057/s41271-021-00319-5

Source DB: PubMed Journal: J Public Health Policy ISSN： 0197-5897 Impact factor: 2.222

Introduction

Artificial intelligence (AI) continues to feature prominently in the health sector by way of its ever-growing contribution to population and public health practice, management, and surveillance [1, 2]. However, just as AI seeks to mirror human cognition through its data-driven analytics, it can also reflect the biases present in our collective conscience. Indeed, a number of algorithms guiding population medicine initiatives, federal healthcare reimbursements, and clinical management have been found to discriminate against protected social groups [3-6]. In this Viewpoint, we argue that federal authorities should implement a regulatory framework for healthcare AI that incorporates standards to ensure health equity. To support this argument, we present past and counterfactual examples of racially biased algorithms alongside their respective mitigation strategies. It should be noted, however that many of the principles described herein with respect to race will also apply to other social identifiers such as sex, income bracket, etc. The overarching objective of this paper is, therefore, to raise awareness around the important topic of bias in AI among health experts and policymakers to guide future oversight directions.

Background

Highly visible examples of biased computation in healthcare, such as race corrections for glomerular filtration rate and pulmonary function, continue to endure despite decades of efforts to eradicate them [5]. It could be even more difficult to undo algorithmic bias in AI that, due to model complexity and varying degrees of artificial intelligence literacy, may not be readily apparent to the user. This idea of concealed discrimination is not new to the health sector and can occur regardless of intent. Implicit bias is a classic example of this phenomenon. In the context of skin cancer, implicit bias contributes to delays in melanoma diagnoses among dark-skinned patients that translate to worse outcomes [7]. Another example would be well-documented disparities in the administration of pain medication on the basis of race [8]. In terms of public health at large, this pattern of obfuscation as a means of reinventing racism can be illustrated using the concept of the ‘submerged state.’ The submerged state makes reference to the use of concealed welfare in the form of federal disbursements or tax credit programs as an under-recognized means of reinforcing the racial wealth gap [9]. A recent example impacting public health systems in the US would be the appropriations of the Coronavirus Aid, Relief, and Economic Security (CARES) Act funds using a formula based on lost healthcare facility revenue, a miscalibration of need such that hospitals in communities of color remain inadequately reimbursed [4]. Our goal is to avoid the case where unfettered applications of artificial intelligence in healthcare become a nidus for bias ‘submerged’ within the complexity of AI models.

Federal governance

In January 2021, the US Food and Drug Administration (FDA) indicated its intent to develop a regulatory guidance for AI in their ‘Artificial Intelligence/Machine Learning-Based Action Plan,’ an update that follows an earlier FDA discussion paper in April 2019 [10]. As of September 2021, however, the FDA has already approved a total of 77 artificial intelligence applications under the ‘Software as Medical Devices’ (SaMDs) classification for use in clinical settings (see Fig. 1). Pursuant to section 520(o) of the 21st Century Cures Act, SaMD refers to software that is intended to primarily drive clinical decision making or to analyze patient health data or medical images [11]. It should be noted that a large proportion of artificial intelligence algorithms are exempted by this definition and are already in widespread use throughout the health sector. Therefore, while the establishment of a federal guidance for clinical AI that incorporates health equity considerations would set a much-needed precedent for fairness in medical informatics, there is no guarantee that such a framework would trickle down to the FDA-exempt majority of healthcare AI that still influence resource allocation, access to public health services, and medical care. Limitations in federal authority make it incumbent on developers, public health practitioners, and policymakers to also assist in shifting the status quo of bias mitigation as a desirable addition to an unequivocal inclusion.

Fig. 1

FDA-Approved Artificial Intelligence-based Algorithms as of September 2021

A lifecycle approach to regulation

‘Artificial intelligence’ refers to computational models that automate tasks typically performed by humans, and this umbrella term encompasses machine learning algorithms (See Box 1) [12]. From a regulatory perspective, it is critical for policymakers to understand that bias mitigation should not end with AI model development but, rather, extend across the product lifecycle. The AI lifecycle moves from model development to validation to implementation, with maintenance and updates also possible in the post-implementation period. Bias can enter at any point during the AI lifecycle. Below, we illustrate key topics in algorithmic bias with strategies to address ongoing barriers that should be incorporated into a federal legal guidance for advancing health equity in AI. We also highlight a key tension: how can we account for racial disparities in care access and quality while avoiding the use of the social indicator of race to draw inferences about intrinsic human biology? The bias mitigation principles described herein are illustrated using broad strokes, as they are intended as a general guide for public health practitioners and policymakers. A glossary of key terms can be found in Box 1.

Box 1

Glossary of key terms

Term	Definition
Artificial intelligence (AI)	An umbrella term referring to computational technologies that automate tasks typically performed by humans
Machine learning	A subset of AI that refers to models that can learn from examples without the explicit programming of rules
Healthcare AI	An umbrella term referring to AI for use in the health sector (i.e., disease surveillance, diagnostics and treatment, resource allocation, delivery of health services, workflow, etc.)
Protected group	Groups that face discrimination due to a shared social characteristic that are protected under the federal legal code (i.e., race, gender, age, ability, etc.)
Algorithmic bias	An algorithm’s performance, allocation, or outcome for a protected social group puts them at a (dis-)advantage with respect to the unprotected social group
Health equity	The ability of all patients to attain their full health potential is the same across all groups [36]
Development	Creation of the model: a process that encompasses data pre-processing, model training/validation/testing efforts
Validation (regulatory)	Assessment of model performance prior to its formal implementation
Implementation	Integration of the AI model into the healthcare setting for real-world use
Maintenance	Updates made to the AI model after it is in real-world use to assure a continued high-quality performance
Training	A process where the model learns trends or categories from data
Validation (model)	A process that confirms the generality of the trained model and explores different hyperparameter choices
Testing	A process that evaluates model performance on an unseen dataset
Pre-training	A process that trains a model on a large, non-specific dataset prior to subsequent fine-tuning on the actual dataset to improve overall performance
Federated learning	Each institution trains a model using their home data and the model weights are communicated to a centralized server to develop an aggregate model; there is no sharing of protected health information
Cyclic weight transfer	An institution trains a model using their home data and passes the updated model weights to the next institution, the process repeats until all institutions have participated; there is no sharing of protected health information
Bias accounting	The process of measuring bias, when applicable to the algorithm’s intended use case
Bias mitigation	The process of correcting for bias, when applicable to the algorithm’s intended use case
Positive predictive value	The likelihood that if you screen positive that you actually have the disease
Negative predictive value	The likelihood that if you screen negative that you actually do not have the disease
Equalized odds	No difference in sensitivity and specificity across all groups
Predictive parity	No difference in positive predictive value rates across all groups
Demographic parity	No difference in positive outcome rates across all groups
Validation (AI lifecycle)	Evaluation of model performance prior to formal implementation
Interpretability	The degree to which the decision process of AI is understandable to humans
Continuously learning AI	AI that can update in real-time to learn from incoming data

Glossary of key terms

Development

Approaches to bias mitigation in the development phase address data- or model-specific factors. Below, we describe strategies on both sides of this dichotomy, providing real-world examples for context.

Data factors

Developers should consider how limitations in data quality can threaten model performance with respect to race. At a minimum, data should include patients from various racial backgrounds in cases where failure to account for race is linked to known disparities in care. This principle can be challenging in practice, however, particularly when working with rare diseases. Take acromegaly, for example, an endocrine pathology that results from excess growth hormone production and classically presents with characteristic changes in facial bone structure. Facial recognition software with AI is currently being explored for early detection of acromegaly, but studies to-date have been performed in exclusively white or asian populations [13-15]. The incidence of acromegaly does not change with respect to race, and early diagnosis is critical to enhance patient outcomes and quality of life [16]. This situation begs the question: how can we improve the availability of racially diverse data in order to promote health equity? Creation of curated, open databases with deidentified or aggregate patient information is one excellent solution to combat racially imbalanced data [2]. However, this option might not always be feasible for data sharing in facial and skin images, geographically linked, or environmental exposure data due to privacy concerns [17]. Another option would be to pretrain models with large, non-specific datasets or with synthetic data generation to boost the model’s ability to recognize a diversity of cases prior to training with the original data (See Box 1) [15]. Finally, collaborative training techniques such as federated learning and cyclic weight transfer are just two examples of options to boost data availability, as they can allow for model training across multiple institutions without transfer of patient data (See Box 1) [18, 19]. To promote transparency, developers should document the distributions of patient characteristics for race in aggregate, and they should explain their reasoning for implementing or not implementing techniques to improve the model’s performance with respect to race. Finally, developers should think about and convey how any subjective or missing health data, such as entries in the electronic medical record, for example, may be coded with preexisting human biases, and how this may affect their model predictions with respect to race. For example, consider a population medicine algorithm that identifies patients who might benefit from a specialized community chronic pain clinic and associated physical rehabilitation program. If this model uses subjective criteria such as provider-reported pain scores, this could bias against people of color who are generally perceived as having less pain than their white counterparts [20]. This impact of data reliability on model performance should also be subsequently assessed both during model evaluation and implementation using bias-auditing techniques.

Model factors

Next, we will discuss how bias can be engineered into models by their design. Failure to account for race in contexts where known disparities are relevant to the algorithm’s intended use case has been repeatedly shown to entrench bias [3, 21, 22]. Consider, for example, an algorithm that enrolls patients into high-risk care management programs using a risk score [3]. One such widely used, race-blind model was found to systematically under enroll black patients, who were sicker than their white counterparts for a given risk score [3]. The increased enrollment threshold for black patients to gain access to these beneficial programs was due to the effect of proxy variables, namely healthcare costs, which were lower in black participants. This relationship is believed to be owing to a long-standing lack of trust in the healthcare system and disparities in health access among black Americans. Both of these observations are well described in the literature and the latter finding is also consistent with the itemized distribution of healthcare expenses across groups in the study [3, 23]. By accounting for race through adjustment of the data label choice, the authors were able to correct the algorithm so that placement was adjudicated fairly [3]. The above example illustrates how failing to assess how race and other aspects of social identity are treated by the model can compromise health equity, and it brings us to our discussion of bias accounting. It is important to note that there are many metrics that can be used for bias accounting during development (i.e., equalized odds, predictive parity, demographic parity, etc.) (See Box 1). Bias metric(s) should be carefully selected based on the as they are not universally compatible [21]. Take, for example, an algorithm that is in compliance with predictive parity such that all groups are selected at an identical positive predictive value rate. That same model could dually defy the principle of equalized odds such that sensitivity and specificity differ across groups. We can revisit our earlier example of implicit bias in the diagnosis of skin cancer to illustrate this principle. Let us consider a hypothetical health surveillance image recognition algorithm where patients can take a picture of their mole(s) using a Smartphone App to see if they should seek further evaluation to rule out malignancy. We ensure that this model is in compliance with predictive parity but is later found to disproportionately underpredict dark-skinned patients with malignant melanomas as having benign lesions of low concern. How is this possible? One interpretation lies in our failure to account for false-negative rates across groups. False negatives represent instances where patients with a malignant lesion are falsely reassured, disparities in which implicit bias is thought to contribute to and that we want to avoid replicating in our model [7]. Therefore, a metric that limits discrepancies in false-negative rates across groups such as equalized odds might be preferred for this algorithm’s intended use case. As this process will look differently for each algorithm, it is critical for developers to provide well-documented evidence and reasoning to justify tradeoffs in chosen bias mitigation techniques [3, 21]. Ideally this information would be made publicly available in a deidentified format whenever possible, but should, at a minimum, be reviewed by those overseeing implementation of the algorithm prior to its use. Bias adjustments are to be made in cases where the social impacts of race are clearly linked to access, diagnosis, or treatment of your healthcare question,

Validation

Model validation on retrospective data alone is not rigorous enough to account for biases that can emerge in real-world conditions. Prospective studies in healthcare settings, ideally in the form of a randomized trial, should be federally mandated for AI applications with the potential to cause a patient bodily harm or death. To navigate the tension between optimizing model performance and safety without diminishing incentives to develop new AI, it may be beneficial to develop a tiered-risk system linked to specific validation criteria. Regulators can look towards AI-reporting recommendations in academia as a reference when formulating validation criteria, as the research community has recently developed a number of checklists designed to quantify the robustness of artificial intelligence studies [24, 25]. Again, as many AI applications in population and public health will be exempt from federal review, we would encourage those overseeing implementation at their institution or organization to play an active role in upholding rigorous validation criteria for algorithms prior to any large-scale rollouts. Model interpretability is an important step in validation that also doubles as a bias mitigation strategy by providing explanations about the ‘inner-workings’ of an AI model in a way that is understandable to humans. Briefly, interpretable AI can be achieved using ‘model-specific’ methods that are constrained to a certain model type or broadly adaptable ‘model-agnostic’ techniques [26] (see Box 1). Interpretability pipelines can highlight important model logic or features at two levels: global methods assess population-level performance, whereas local explanations reveal the reasoning underlying a specific model prediction instance. Both approaches to interpretability can be useful depending on the intended use case and both are becoming increasingly available and computationally affordable [26-29]. Developers can use interpretability to detect bias during the validation stage because it can highlight when the model is (1) using race when it should not be or (2) not using race when it should be in its evaluation of a given dataset or input.

Implementation

Interpretability is also critical in controlling for human factors that can manifest during the implementation phase. Users of healthcare AI may, quite reasonably, find it hard to trust a model without understanding how it works. This resistance can have the untoward effect of preventing the realization of the equitable outcomes that a well-built algorithm was designed to achieve. We feel that the healthcare field must move towards interpretability as a standard prerequisite for all models slated for real-world implementation, with explanations from developers for cases in which their implementation is not feasible or reliable. In the future, it may be possible to leverage artificial intelligence to automate interpretability pipelines to streamline and promote their use by developers [30]. Interpretability is only part of the story when it comes to the discussion of human factors during algorithm implementation [31, 32] Developers should attempt to understand why population and public health practitioners choose to use or to ignore model recommendations and develop controls for these factors whenever possible. For example, cumbersome user interface, user fatigue, or organizational constraints such as ties to financial reimbursement or liability could cause humans to ignore the advice of an unbiased algorithm. Therefore, it is important for those overseeing implementation to measure user uptake and qualitative user experience (the latter should also take place during development) in addition to outcomes data during post-market re-appraisals. Next, continual bias auditing and surveillance are a key arbiter of equity in the implementation phase. Performance and bias checks should be a shared responsibility among healthcare AI developers and end organizations, and they should be incorporated into algorithm tuning and maintenance at regular intervals. Checks may need to occur more or less frequently depending on the automatic update permissions of the algorithm in the real-world setting and the risk to health and safety. Regulators should work with involved stakeholders to develop a system that does not use overly burdensome reporting requirements that would discourage the use of continuously learning AI that can update in real time [33] (see Box 1). A solution may also be baked into this problem, as there is a potential to leverage AI for future automation of this bias screening process to promote the uptake and frequency of surveillance. We would also advocate for the creation of AI task forces at the federal, state, and local public health levels to spearhead the continuous review of models for bias and broader quality control in the post-implementation period. In a related point, the transparency and stewardship of AI algorithms are currently stymied by a general lack of inventory, and this should be made a priority moving forward. For example, as of September 1st 2021, FDA-approved artificial intelligence algorithms are currently indexed as approved medical devices without a distinct search filter or database, which has prompted the creation of third-party databases for tracking (see Fig. 1) [34, 35]. Since many AI applications are in use in an auxiliary decision support, allocation, or workflow capacity not requiring FDA approval, it is critical for population and public health organizations to maintain their own dedicated inventories for bias surveillance purposes. While the examples outlined in this paper discuss race consciousness and racism in healthcare AI, many of these principles will also translate to other aspects of patient social identity such as sex, religion, and income brackets. Further research is needed to determine how best to pick up on bias related to intersectionality and to non-discrete identifiers ( such as gender, age, lifestyle, ability, and employment sector that are harder to classify, stratify over, and rectify. Still, we envision AI as a powerful force for advancing health equity that, if applied with the proper controls, can mitigate persistent inequalities that plague our healthcare through fair and unbiased evaluation. Looking ahead, we must develop harmonized standards for health equity in AI and make every effort to uphold them at the federal, industry, academic, and community levels.

23 in total

1. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers.

Authors: John Mongan; Linda Moy; Charles E Kahn
Journal: Radiol Artif Intell Date: 2020-03-25

Review 2. Machine Learning and Health Care Disparities in Dermatology.

Authors: Adewole S Adamson; Avery Smith
Journal: JAMA Dermatol Date: 2018-11-01 Impact factor: 10.282

3. Deep-Learning Approach to Automatic Identification of Facial Anomalies in Endocrine Disorders.

Authors: Ren Wei; Chendan Jiang; Jun Gao; Ping Xu; Debing Zhang; Zhicheng Sun; Xiaohai Liu; Kan Deng; Xinjie Bao; Guoqiang Sun; Yong Yao; Lin Lu; Huijuan Zhu; Renzhi Wang; Ming Feng
Journal: Neuroendocrinology Date: 2019-07-19 Impact factor: 4.914

4. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database.

Authors: Stan Benjamens; Pranavsingh Dhunnoo; Bertalan Meskó
Journal: NPJ Digit Med Date: 2020-09-11

5. Racial and ethnic disparities in emergency department analgesic prescription.

Authors: Joshua H Tamayo-Sarver; Susan W Hinze; Rita K Cydulka; David W Baker
Journal: Am J Public Health Date: 2003-12 Impact factor: 9.308

6. RICORD: A Precedent for Open AI in COVID-19 Image Analytics.

Authors: Harrison X Bai; Nicole M Thomasian
Journal: Radiology Date: 2021-01-05 Impact factor: 11.105

7. Automatic Detection of Acromegaly From Facial Photographs Using Machine Learning Methods.

Authors: Xiangyi Kong; Shun Gong; Lijuan Su; Newton Howard; Yanguo Kong
Journal: EBioMedicine Date: 2017-12-15 Impact factor: 8.143

8. Digital technology and COVID-19.

Authors: Daniel Shu Wei Ting; Lawrence Carin; Victor Dzau; Tien Y Wong
Journal: Nat Med Date: 2020-04 Impact factor: 53.440

9. Federated Learning for Healthcare Informatics.

Authors: Jie Xu; Benjamin S Glicksberg; Chang Su; Peter Walker; Jiang Bian; Fei Wang
Journal: J Healthc Inform Res Date: 2020-11-12

Review 10. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension.

Authors: Xiaoxuan Liu; Samantha Cruz Rivera; David Moher; Melanie J Calvert; Alastair K Denniston
Journal: Nat Med Date: 2020-09-09 Impact factor: 87.241

1 in total

1. Developing, Implementing, and Evaluating an Artificial Intelligence-Guided Mental Health Resource Navigation Chatbot for Health Care Workers and Their Families During and Following the COVID-19 Pandemic: Protocol for a Cross-sectional Study.

Authors: Jasmine M Noble; Ali Zamani; MohamadAli Gharaat; Dylan Merrick; Nathanial Maeda; Alex Lambe Foster; Isabella Nikolaidis; Rachel Goud; Eleni Stroulia; Vincent I O Agyapong; Andrew J Greenshaw; Simon Lambert; Dave Gallson; Ken Porter; Debbie Turner; Osmar Zaiane
Journal: JMIR Res Protoc Date: 2022-07-25

1 in total