Literature DB >> 31304321

With an eye to AI and autonomous diagnosis.

Pearse A Keane1, Eric J Topol2,3.   

Abstract

Entities:  

Keywords:  Preventive medicine; Randomized controlled trials

Year:  2018        PMID: 31304321      PMCID: PMC6550235          DOI: 10.1038/s41746-018-0048-y

Source DB:  PubMed          Journal:  NPJ Digit Med        ISSN: 2398-6352


× No keyword cloud information.
In this issue of npj Digital Medicine, Abramoff and colleagues report the findings from a prospective study that evaluates the performance of a diabetic retinopathy diagnostic system (IDx-DR) in a primary care setting.[1] This represents an important clinical milestone as, in April 2018, these results were used to form the basis for FDA approval of the system, thus becoming the first fully autonomous AI-based system approved for marketing in the USA.[2] Given the potentially transformative potential of AI for healthcare (in particular a technique referred to as “deep learning”)—but also its associated hype—this lays an important foundation for future translation of such technologies to routine clinical practice. Deep learning uses artificial neural networks—so-called because of their superficial resemblance to biological neural networks—as computational models to discover intricate structure in large, high-dimensional datasets.[3] Although first espoused in the 1980s, deep learning has come to prominence in recent years, driven in large part by the power of graphics processing units (GPUs) originally developed for video gaming, cloud computing, and the increasing availability of large, carefully annotated datasets. Since 2012, deep learning has brought seismic changes to the technology industry, with major breakthroughs in areas as diverse as image and speech recognition, natural language translation, robotics, and even self-driving cars. In 2015, Scientific American listed deep learning as one of their ‘world changing’ ideas for the year.[4] Deep learning is particularly well suited to image classification tasks and so has huge potential in medical imaging applications—scans, slides, skin lesions and the patterns in medical practice that occur frequently and are associated with screening, triage, diagnosis, and monitoring. A number of recent research studies have demonstrated this potential in multiple domains, albeit in retrospective in silico settings.[5] The work reported by Abramoff et al. is an important milestone as the first of its kind to be performed in a prospective real-world clinical environment, and using a product that will be commercially available rather than a research prototype. The need for external validation studies is well recognized in the machine learning community; however, there may be less awareness of the additional specific value provided by a prospective clinical study, as well as the time, effort, and considerable costs that such studies entail. Prospective, non-interventional studies, such as that described by Abramoff and colleagues, will likely be fundamental to addressing questions about automated diagnosis efficacy. However, such studies will not address the issue of clinical effectiveness—do patients directly benefit from the use of such AI systems? In the case of diabetic retinopathy, the question might be: do patients ultimately have good—or at least non-inferior—visual outcomes when this system is used? This is not a trivial point—computer aided detection (CAD) systems for mammography were approved by the FDA in 1998, and by 2008 74% of all screening mammograms in the Medicare population were interpreted using this technology.[6] However, nearly 20 years later a large study concluded “CAD does not improve diagnostic accuracy of mammography and may result in missed cancers. These results suggest that insurers pay more for computer-aided detection with no established benefit to women.”[6] To properly address this issue, prospective interventional studies should be required. Of course, such randomized clinical trials may not be feasible or warranted in every case; however, it will be incumbent on the clinical community to engage with this question. A further important point is that, historically, diagnostic accuracy studies have often been suboptimally or poorly reported. With the likely further clinical translation of AI systems, it will become increasingly important for STARD, and other trial reporting guidelines, to be both followed and regularly updated.[7] The clinical research community has also got blind spots. In particular, there is a lack of awareness of the so-called 'AI Chasm', that is the gulf between developing a scientifically sound algorithm and its use in any meaningful real-world applications.[8] It is one thing to develop an algorithm that works well on a small dataset from a specific population, it is quite another to develop one that will generalize to other populations and across different imaging modalities. There is also a large gulf between the experimental code produced for a proof-of-concept research study, and the eventual code to be used in a product with regulatory approvals. The latter constitutes a medical device and so must typically be rewritten from the ground up, with a quality management system in place, and in compliance with Good Manufacturing Practice. The time, expertise, and expense associated with this can be considerable and likely not possible for clinicians without an industry partner or other significant commercial support. It is also important to highlight that many aspects of the regulatory processes for AI are still evolving and that there is uncertainty about the implications of this, both for planning of clinical trials and commercial development. Firstly, it is worth explicitly pointing out a prevalent misconception about AI diagnostic systems. Although these systems typically learn by being trained on large amounts of labelled images, at some point this process is stopped and diagnostic thresholds are set. In the work by Abramoff and colleagues, the software was locked prior to the clinical trial—after this point, the software behaves in a similar fashion to non-AI diagnostic systems. That is to say the auto-didactic aspect of the algorithm is no longer doing ‘on the job’ learning. It may be some years before clinical trial methodologies and regulatory frameworks have evolved to deal with algorithms capable of learning on a case-by-case basis in a real-world setting. Secondly, it is worth highlighting that the IDx-DR was reviewed under the FDA’s De Novo premarket review pathway.[8] This is a regulatory pathway for low- to moderate-risk devices that are novel and for which there is no legally marketed device. The bar for subsequent approval of diabetic retinopathy AI diagnostic systems is likely to be higher. While this study is undoubtedly a milestone, and an important benchmark for future research, it is also important to touch on some of its shortcomings. Although recruitment occurred from 10 primary care sites, it is still a relatively small study in diagnostic accuracy terms. Due to low initial numbers of patients with potentially referable diabetic retinopathy, it was necessary to institute a pre-specified enrichment strategy where patients with poorer control of their diabetes were preferentially recruited. The low prevalence of disease in screening populations is likely to be a continued issue for design of prospective AI studies. In part due to these small numbers, it is not really possible to draw conclusions about the efficacy of the system for the evaluation of the most severe, sight-threatening forms of diabetic retinopathy requiring urgent ophthalmic intervention to prevent irreversible visual loss. Further clarity would also be required on the study end points. The prespecified sensitivity end point agreed with the FDA was 85.0% and this was met with a point estimate of primary sensitivity of 87.2%. However, the confidence intervals of this estimate were 81.8–91.2% (that is, spanned the superiority end point). The study also employed an intention-to-screen protocol; however, 40 participants successfully enrolled in the study were excluded from analysis as their images were subsequently found to be insufficient quality to be graded by the image reading center. The authors attempt to address this by considering a worst-case scenario where all such images are incorrectly graded and repeating the analysis. In this approach the sensitivity would be 80.7% (76.7–84.2%). They note that this calculation rules out a pre-specified inferiority hypothesis of 75%, but do not highlight that the superiority end point would no longer be met. In larger scale studies, these discrepancies may be important. Aside from these methodological questions, there are some clinical limitations. The reviewers correctly highlight a number of other pathologies subsequently identified by the Wisconsin Image Reading Center, including possible glaucoma and possible age-related macular degeneration. Although not intended for this purpose, it is unavoidable that the system will encounter patients with these and other more serious pathologies (for example, retinal detachment or choroidal melanoma). In its current version, the algorithm can only provide classification related to diabetic retinopathy and would not identify these other retinal conditions. The diagnostic system has also quite narrow inclusion criteria for usage. It requires images to be acquired with a specific retinal fundus camera (Topcon NW400), which costs approximately $18,000 and is approved by the FDA to detect “more than mild” diabetic retinopathy; it also excludes many patients with pre-existing diabetic retinopathy. The latter stipulation may be a particular issue for patients with diabetic retinopathy, who often fail to attend appointments for eye examinations and may not be aware of any treatments that they have previously had for this condition. A considerable body of work has highlighted this issue in the context of the diabetic retinopathy screening in the UK.[9] One potential solution in the future is empowering patients to perform their own retinal eye exam via their smartphone, with cloud-based, AI interpretation. This would likely require pupillary dilation or infrared-light, but it would sidestep the expense and inconvenience of formal eye exams. There is also the question as to whether the now-approved device will have significant uptake in the clinic. Besides the expense, it remains to be determined how and where it would be implemented. Will primary care clinics incorporate retinal screening into their practice? This is not really an ‘autonomous system’ since someone needs to acquire the image—who will perform that? Diabetic retinopathy, in particular, and other diseases of the eye, have been a major focus of AI research in medicine to date. In large retrospective studies of diabetic retinopathy, the algorithmic diagnosis was compared with ophthalmologists by either fundus photographs or optical coherence tomography, and the accuracy rates were higher (as high as AUC 0.99 in two datasets) than in the current trial.[10,11] This is noteworthy and to be expected since the results from looking backwards in machine datasets are not likely to mirror forward clinical assessment. While it is always easy to be critical of studies that forge new ground, it is important to applaud the authors for this pivotal work. Although deep learning will not be a panacea, it has huge potential in many clinical areas where high dimensional data is mapped to a simple classification and for which datasets are potentially stable over extended periods. As such, it will be incumbent on healthcare professionals to become more familiar with this and other AI technologies in the coming years to ensure that they are used appropriately. This study represents an important first step in that direction.
  7 in total

Review 1.  Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD.

Authors:  Daniël A Korevaar; W Annefloor van Enst; René Spijker; Patrick M M Bossuyt; Lotty Hooft
Journal:  Evid Based Med       Date:  2013-12-24

Review 2.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

3.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors:  Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal:  JAMA       Date:  2016-12-13       Impact factor: 56.272

4.  Diagnostic Accuracy of Digital Screening Mammography With and Without Computer-Aided Detection.

Authors:  Constance D Lehman; Robert D Wellman; Diana S M Buist; Karla Kerlikowske; Anna N A Tosteson; Diana L Miglioretti
Journal:  JAMA Intern Med       Date:  2015-11       Impact factor: 21.873

5.  Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.

Authors:  Daniel S Kermany; Michael Goldbaum; Wenjia Cai; Carolina C S Valentim; Huiying Liang; Sally L Baxter; Alex McKeown; Ge Yang; Xiaokang Wu; Fangbing Yan; Justin Dong; Made K Prasadha; Jacqueline Pei; Magdalene Y L Ting; Jie Zhu; Christina Li; Sierra Hewett; Jason Dong; Ian Ziyar; Alexander Shi; Runze Zhang; Lianghong Zheng; Rui Hou; William Shi; Xin Fu; Yaou Duan; Viet A N Huu; Cindy Wen; Edward D Zhang; Charlotte L Zhang; Oulan Li; Xiaobo Wang; Michael A Singer; Xiaodong Sun; Jie Xu; Ali Tafreshi; M Anthony Lewis; Huimin Xia; Kang Zhang
Journal:  Cell       Date:  2018-02-22       Impact factor: 41.582

6.  Attitudes, access and anguish: a qualitative interview study of staff and patients' experiences of diabetic retinopathy screening.

Authors:  A E Hipwell; J Sturt; A Lindenmeyer; I Stratton; R Gadsby; P O'Hare; P H Scanlon
Journal:  BMJ Open       Date:  2014-12-15       Impact factor: 2.692

Review 7.  Opportunities and obstacles for deep learning in biology and medicine.

Authors:  Travers Ching; Daniel S Himmelstein; Brett K Beaulieu-Jones; Alexandr A Kalinin; Brian T Do; Gregory P Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M Hoffman; Wei Xie; Gail L Rosen; Benjamin J Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M Cofer; Christopher A Lavender; Srinivas C Turaga; Amr M Alexandari; Zhiyong Lu; David J Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura K Wiley; Marwin H S Segler; Simina M Boca; S Joshua Swamidass; Austin Huang; Anthony Gitter; Casey S Greene
Journal:  J R Soc Interface       Date:  2018-04       Impact factor: 4.293

  7 in total
  38 in total

Review 1.  Designing deep learning studies in cancer diagnostics.

Authors:  Andreas Kleppe; Ole-Johan Skrede; Sepp De Raedt; Knut Liestøl; David J Kerr; Håvard E Danielsen
Journal:  Nat Rev Cancer       Date:  2021-01-29       Impact factor: 60.716

2.  Clinical research underlies ethical integration of healthcare artificial intelligence.

Authors:  Melissa D McCradden; Elizabeth A Stephenson; James A Anderson
Journal:  Nat Med       Date:  2020-09       Impact factor: 53.440

3.  Welcoming new guidelines for AI clinical research.

Authors:  Eric J Topol
Journal:  Nat Med       Date:  2020-09       Impact factor: 53.440

4.  Towards implementation of AI in New Zealand national diabetic screening program: Cloud-based, robust, and bespoke.

Authors:  Li Xie; Song Yang; David Squirrell; Ehsan Vaghefi
Journal:  PLoS One       Date:  2020-04-10       Impact factor: 3.240

Review 5.  Radiology artificial intelligence: a systematic review and evaluation of methods (RAISE).

Authors:  Brendan S Kelly; Conor Judge; Stephanie M Bollard; Simon M Clifford; Gerard M Healy; Awsam Aziz; Prateek Mathur; Shah Islam; Kristen W Yeom; Aonghus Lawlor; Ronan P Killeen
Journal:  Eur Radiol       Date:  2022-04-14       Impact factor: 5.315

Review 6.  Artificial intelligence for early diagnosis of lung cancer through incidental nodule detection in low- and middle-income countries-acceleration during the COVID-19 pandemic but here to stay.

Authors:  Susana Goncalves; Pei-Chieh Fong; Mariya Blokhina
Journal:  Am J Cancer Res       Date:  2022-01-15       Impact factor: 6.166

7.  Trust and medical AI: the challenges we face and the expertise needed to overcome them.

Authors:  Thomas P Quinn; Manisha Senadeera; Stephan Jacobs; Simon Coghlan; Vuong Le
Journal:  J Am Med Inform Assoc       Date:  2021-03-18       Impact factor: 4.497

8.  Deep Learning Model for Real-Time Prediction of Intradialytic Hypotension.

Authors:  Hojun Lee; Donghwan Yun; Jayeon Yoo; Kiyoon Yoo; Yong Chul Kim; Dong Ki Kim; Kook-Hwan Oh; Kwon Wook Joo; Yon Su Kim; Nojun Kwak; Seung Seok Han
Journal:  Clin J Am Soc Nephrol       Date:  2021-02-11       Impact factor: 8.237

9.  Moving from bytes to bedside: a systematic review on the use of artificial intelligence in the intensive care unit.

Authors:  Davy van de Sande; Michel E van Genderen; Joost Huiskens; Diederik Gommers; Jasper van Bommel
Journal:  Intensive Care Med       Date:  2021-06-05       Impact factor: 17.440

10.  Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems.

Authors:  Aaron Y Lee; Ryan T Yanagihara; Cecilia S Lee; Marian Blazes; Hoon C Jung; Yewlin E Chee; Michael D Gencarella; Harry Gee; April Y Maa; Glenn C Cockerham; Mary Lynch; Edward J Boyko
Journal:  Diabetes Care       Date:  2021-01-05       Impact factor: 19.112

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.