Literature DB >> 36249707

Emerging Ethical Considerations for the Use of Artificial Intelligence in Ophthalmology.

Nicholas G Evans¹, Danielle M Wenner², I Glenn Cohen³, Duncan Purves⁴, Michael F Chiang⁵, Daniel S W Ting⁶, Aaron Y Lee⁷.

Abstract

Entities: Chemical

Year: 2022 PMID： 36249707 PMCID： PMC9560632 DOI： 10.1016/j.xops.2022.100141

Source DB: PubMed Journal: Ophthalmol Sci ISSN： 2666-9145

× No keyword cloud information.

Rapid developments in artificial intelligence (AI) promise improved diagnosis and care for patients, but raise ethical issues.1, 2, 3, 4, 5 Over 6 months, in consultation with the American Academy of Ophthalmology Committee on Artificial Intelligence, we analyzed potential ethical concerns, with a focus on applications of AI in ophthalmology that are deployed or will be deployed in the near future. We identified 3 pressing issues: (1) transparency, paradigmatically through the explanation or interpretation of AI models; (2) attribution of responsibility issues for particular harms arising from the use or misuse of AI; and (3) scalability of use cases and screening infrastructure.

Transparency

The ability to understand why a machine learning model has produced a particular result is an oft-cited ethical principle for AI.4, 57, 8, 9, 10 We distinguish between AI models that are interpretable, or governed by models that are directly understandable by humans, and AI models that are too complex for any human to comprehend (sometimes called “black box” models), requiring post hoc explainability for how results are produced. Recent work has shown that lack of transparency is associated with decreased accuracy of AI algorithms.11, 12 Issues of transparency may arise, for example, in diagnosing diabetic retinopathy, glaucoma, age-related macular degeneration, and retinopathy of prematurity (ROP). Transparency also may be important when an AI model does not perform as expected or gives a false answer. Given a novel image to analyze, for example, AI may misdiagnose a patient based on an incomplete or inadequate training set. Machine learning and especially deep learning platforms need to be trained on large amounts of historical data (e.g., fundus photography) to learn which features of an image are associated with a particular condition. When a novel image is presented that is atypical, such as if a diabetic retinopathy AI model is given an image harboring central retinal vein occlusion, the AI model may provide false or even nonsense answers. Without transparency, it may be impossible to explain why a particular failure occurred. Even if the general explanation is that the training set is insufficiently broad, what data are missing or needed may be opaque. Transparency is arguably secondary to the capacity for AI to improve patient outcomes and public health. Machine learning systems in ophthalmology have been tested, but to date only 1 trial has demonstrated improved patient outcomes. Experiences in other specialties, such as a 2017 trial of using automated interpretation of cardiotocographs in labor, have found no improvement in clinical outcomes as a result of AI. Thus, transparency may be insufficient to justify the use of AI if it fails to improve patient outcomes. The degree to which transparency is obligatory may also depend on the medical specialty. In some cases, accurate, empirically verified results may be sufficient. In infectious disease, for example, broad-spectrum antibiotics may be tried in the absence of detailed information of a pathogen. However, ophthalmology is highly explainable in diagnostic terms, with strict definitions for most diseases. Deferring to AI may present a significant decrease in confidence in the diagnostic process, especially when only modest increases in verifiability are achieved. The degree to which this arises, and how this trade-off between transparency and confidence varies by specialty, needs further investigation. Lack of sufficient transparency may exacerbate other issues in the use of medical AI. Although human physicians can reflect on and justify their actions to colleagues, an AI model’s mistakes are predetermined through training. Errors may propagate from a single point of failure if they become the diagnostic standard across, say, an entire hospital network. Patients may seek a second opinion, but if an algorithm is widely distributed, the same system may be performing the diagnosis at a separate clinic. Future AI models may be able to revise their predictions in response to new data, but this presents its own challenges, especially if these revisions lack transparency. Excessive trust in AI may be worse for patient outcomes than if AI were approached more skeptically. Sometimes, the benefits of AI may outweigh transparency concerns. Consider ROP, a leading cause of childhood blindness worldwide. The clinical benefit of screening is well established, but is hampered globally by cost and human labor requirements. Artificial intelligence may provide a low-cost screening option in resource-scarce settings, where even modest improvements in testing and treatment could have a significant impact, given the steep long-term costs of ROP. Although challenges translating diagnosis to treatment in low-income settings remain, the large potential benefits and low cost justify the use of AI. Explainable AI may obviate some of these transparency concerns. However, Rudin has noted that explainability may be a misnomer. Instead, the focus should be on creating models that are inherently interpretable, rather than attempting to generate solutions for unexplainable AI. For the foreseeable future, then, a tension exists between deploying black-boxed AI immediately or waiting for explainable AI, where delays may come at the cost of improvements to patient outcomes.

Responsibility

Ethical frameworks may distinguish between the responsibility for ensuring AI performs in a certain way and the moral or legal liability when harms occur. Herein, we deal with only ethical responsibilities and not, for example, legal liability, although these are related issues. In health care, a responsibility gap arises when responsibility cannot be easily attributed to 1 or more actors, including hospitals, health and malpractice insurers, individual physicians and nurses, and so on. In ophthalmology, 1 private company, IDx, has accepted responsibility for errors in their AI platform, effectively attempting to close the responsibility gap through claiming responsibility for AI outcomes, enshrining this in legal terms by purchasing liability and malpractice insurance on behalf of the platform. Companies are responsible for ensuring that AI algorithms function appropriately and safely when used as indicated, but may not be for off-label uses. In their consideration of the legal aspects of AI, for example, IDx claims their principles require that creators “assume liability for harm caused by the diagnostic output of the device when used properly and on-label.” Responsibility for ensuring appropriate off-label use thus may seem to fall to the provider, but the fragile nature of these models means even strong associations between patient outcomes and off-label postmarket AI use may be undermined if subtle changes in patient characteristics cause the algorithm to produce flawed results.13, 19 Whether providers can responsibly determine appropriate use based on these unknown variations is unclear. Responsibility issues may become more acute in future adaptive AI that update their weightings of factors associated with a diagnosis in response to new data. Here, responsibility for appropriate use might include managing which data are retained by the system. For these adaptive regimes, evaluating performance for on-label and off-label conditions will require continuous postmarket monitoring, rather than the current premarket approval approach for pharmaceuticals or other devices. Allocating responsibility at the level of governance and regulation is an additional challenge. Others have argued that regulation of AI should focus on continuous monitoring with a system view that sees new AI as part of a larger network of actors and institutions and evaluates its performance in the context of that network. The obligation to promote benefits and reduce harms is held jointly by, and distributed between, the creators and users of an AI platform. However, implementing this in practice would require overhauling the institutions that govern medical innovation and practice. One preliminary approach would require large, adaptive clinical trials of human adjudication versus AI diagnosis. This approach could validate AI performance in a variety of contexts to improve outcomes, adapt to other potential uses, and develop trust in the system. In 2018, engineers at Google demonstrated that image adjudication by retinal specialists improved algorithmic outcomes for the diagnosis of diabetic retinopathy. In the same year, IDx reported that their autonomous AI-based diagnostic platform exceeds human reference standards. Last year, 2 AI-assisted ROP diagnosis packages were approved for use as part of China’s developing medical AI landscape. When specialist opinion can be linked to correct surrogate outcomes or risk of poor outcomes, these trials become an intermediate step toward demonstrating the efficacy of AI, improving patient outcomes, enhancing trust, and providing a broader context for AI use.

Scalability and Implementation

One promise of AI is to automate high-volume screening. Consider a near-future hypothetical. In the United Kingdom, the English National Health Service Diabetic Eye Screening Program screened more than 2 million patients from 2015 through 2016 for diabetic retinopathy. We could imagine a case in which this service incorporates AI diagnosis, an implementation that could place most cases of diabetic retinopathy in the country under a single algorithm. Two failure modes exist for mass AI-driven diagnostics. First, standard errors in diagnostics matter at scale: a sensitivity of 99.9% for a test that applies to a condition affecting hundreds of millions of patients still entails hundreds of thousands of false-negative results. Importantly, transitioning to AI could redistribute false-positive or false-negative results in a population. This raises concerns of justice if, for example, AI misdiagnoses disproportionately impact disadvantaged groups, as has occurred with pulse oximeters and radiograph datasets, resulting in a form of health poverty in which individuals, groups, or populations are unable to benefit from AI because of a scarcity of representative data, and may even be harmed by it at the population level. The degree to which this may occur with ophthalmologic AI applications is an empirical question. However, we do know that racial bias in ophthalmologic clinical trials is an ongoing concern, and this trend could continue into AI development if it remains unchecked. However, the distribution of harms using AI might be traded against the distribution of services through the deployment of AI, such that: Some patients have worse outcomes than others because of the distribution of risk by AI; yet Those patients have better outcomes than they would otherwise have had because The AI model is ultimately less biased than physician treatment alone or The benefits of access to services outweigh the potential harms of bias or Both. Consider the proliferation of telemedicine during the COVID-19 pandemic, particularly for individuals who otherwise may experience delayed diagnosis or treatment.27, 28 Artificial intelligence-assisted diagnostics could make it easier to diagnose patients remotely and at local points of care using, for example, new innovations such as slit-lamp biomicroscopes used with smartphones and AI-based interpretation of results. A potential trade-off arises between errors caused by AI when a physician cannot directly access the patient and the benefits of receiving early diagnosis. In rare or emergent cases (such as a pandemic) where the risk of travel to a medical facility presents additional risk, AI may provide preliminary guidance on whether to seek care inside a clinical setting. Moreover, even if AI does produce worse outcomes than a physician diagnosis, AI may be justified to the extent that delayed or missed diagnosis is worse. However, the social benefit of AI to telemedicine relies in part on the extent to which inequalities of access to information technology can be remedied. Telemedicine is unevenly adopted by providers, may not be supported by insurers, and depends on reliable internet access. However, smartphone penetration may be higher than access to specialist medical care in some if not many areas, and thus favorable tradeoffs may exist through local AI-driven solutions. Like other emerging technologies. The setting in which medical AI is implemented is a major determinant of the risks and benefits. A second failure mode is a systemic failure that affects all or most users simultaneously. These very low-probability, very high-consequence events could arise, for example, in the case of a continual learning AI system intended to improve with additional data, but that, through sustained machine error, ultimately diverges radically from its original parameters and begins assigning false results. Depending on how submissions to the AI platform are structured, adversarial uses could arise in which intentionally doctored images are submitted to achieve the same effect. Protection from systemic failures is unlikely to be achieved through self-governance and will require regulatory action to guard against. Adding ongoing cyber security and fault tree testing to the approval requirements is 1 solution, but 2 challenges arise. First, premarket regulation typically does entail continuous monitoring of the system; study of results by human analysts and quality control tests against the algorithm to prevent system failures may become dysfunctional on a large scale. Second, the Food and Drug Administration regulates only medical devices, of which IDx is one, but some AI models (such as the Apple Watch pulse oximeter [Apple, Inc]) may constitute a general wellness product designed to be sold directly to consumers. Addressing both challenges may reduce the possibility of low-probability and high-consequence events, but represent trade-offs in system efficiency and resource use around AI in medicine. In response, the Food and Drug Administration and similar agencies in other countries may require reform to accommodate the challenges presented by AI. Alternatively, the mismatch between the current regulatory structure and the potential impacts of AI in medicine may mean that the Food and Drug Administration is ultimately not well-suited for regulating AI. In the latter case, a new agency may be required, or governance could occur through a different mechanism entirely, for example, through government payment choices in national health insurance schemes. In conclusion, artificial intelligence presents a range of novel opportunities to improve medical care and to make health care more widely accessible to patients. However, the use of AI raises many ethical concerns, even in cases where it augments the capabilities of human physicians and technicians. These issues are in part endogenous to AI, and are in part a function of the regulatory, social, and political circumstances in which it is developed and implemented. Realizing the full benefits of AI will require reaching a consensus on which trade-offs are acceptable as this technology is implemented at scale.

23 in total

1. Artificial Intelligence and Black-Box Medical Decisions: Accuracy versus Explainability.

Authors: Alex John London
Journal: Hastings Cent Rep Date: 2019-01 Impact factor: 2.683

2. Machine diagnosis.

Authors: Aaron Lee
Journal: Nature Date: 2019-04-10 Impact factor: 49.962

3. Using Deep Learning to Automate Goldmann Applanation Tonometry Readings.

Authors: Ted Spaide; Yue Wu; Ryan T Yanagihara; Shu Feng; Omar Ghabra; Jonathan S Yi; Philip P Chen; Francy Moses; Aaron Y Lee; Joanne C Wen
Journal: Ophthalmology Date: 2020-04-25 Impact factor: 12.079

4. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.

Authors: Cynthia Rudin
Journal: Nat Mach Intell Date: 2019-05-13

5. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial.

Authors:
Journal: Lancet Date: 2017-03-21 Impact factor: 79.321

Review 6. The English National Screening Programme for diabetic retinopathy 2003-2016.

Authors: Peter H Scanlon
Journal: Acta Diabetol Date: 2017-02-22 Impact factor: 4.280

7. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices.

Authors: Michael D Abràmoff; Philip T Lavin; Michele Birch; Nilay Shah; James C Folk
Journal: NPJ Digit Med Date: 2018-08-28