Literature DB >> 34020592

Generalisability through local validation: overcoming barriers due to data disparity in healthcare.

William Greig Mitchell1,2, Edward Christopher Dee3, Leo Anthony Celi4,5,6,7.   

Abstract

Cho et al. report deep learning model accuracy for tilted myopic disc detection in a South Korean population. Here we explore the importance of generalisability of machine learning (ML) in healthcare, and we emphasise that recurrent underrepresentation of data-poor regions may inadvertently perpetuate global health inequity.Creating meaningful ML systems is contingent on understanding how, when, and why different ML models work in different settings. While we echo the need for the diversification of ML datasets, such a worthy effort would take time and does not obviate uses of presently available datasets if conclusions are validated and re-calibrated for different groups prior to implementation.The importance of external ML model validation on diverse populations should be highlighted where possible - especially for models built with single-centre data.

Entities:  

Keywords:  Disparity; Healthcare equity; Machine learning; Ophthalmology

Year:  2021        PMID: 34020592      PMCID: PMC8138973          DOI: 10.1186/s12886-021-01992-6

Source DB:  PubMed          Journal:  BMC Ophthalmol        ISSN: 1471-2415            Impact factor:   2.209


We read with great interest the article describing the application of deep learning to recognize optic disc tilt, and the discussion of its importance in considering ophthalmic measurements, by Cho et al. [1] The rapid evolution of artificial intelligence in ophthalmic image recognition has created unprecedented opportunities for efficient, accurate and cost-effective diagnosis with less human input – of particular value in resource-poor settings where specialist input is relatively scarcer [2]. While we are encouraged by the model accuracy and commend the authors for describing strengths and weaknesses of their study, we would like to highlight a limitation and subsequent area of further exploration that would strengthen the utility of their work: the need to evaluate the generalisability of their model outside their single-centre, paediatric South Korean population. We believe validation of the model developed by Cho et al. on other populations, particularly those lacking local imaging repositories, would be of great value. Sociodemographic disparities in machine learning (ML) are well described; with recurrent underrepresentation of some populations posing substantial risks of unknown ML biases. Indeed, a recent report noted that 172 countries (totalling 3.5 billion people) have no publicly available ophthalmic imaging datasets [3], profoundly illuminating the possibility of sampling-bias and subsequent poor generalisability in global ML studies. Such disparity in data availability, if left unchecked, may inadvertently perpetuate global health inequity. Generalisability is itself not binary, nuancing issues of sampling bias in clinical applications of ML [4]. A proxy for validity, generalisability is challenged when translating findings across different clinical settings – if not by demographic diversity (as may be the case for Cho et al), then by unique patient-level differences or local clinical idiosyncrasies. Creating clinically useful ML systems is therefore contingent not only upon demographic and clinical generalisability, but also on an understanding of how, when, and why different ML models work in different settings. Although we echo the need for increased diversification of ML datasets, such a worthy effort would take time. The need for increased diversity does not obviate the uses of presently available datasets if conclusions are conscientiously validated and re-calibrated for different populations prior to implementation. For example, a recent study from India outlined the value of validating ML models on diverse populations; the model, built with relatively homogenous sociodemographic data, was demonstrated to be more broadly-applicable to other populations in disease detection [5]. Indeed, there are myriad publicly-available ophthalmic imaging datasets, whose algorithms could be broadly validated to assess the extent of their value to populations in data-poor regions, which may lack the infrastructure to develop their own repositories [6-13]. As shown by Gulshan and others, validation of algorithms based on inevitably imperfect data can identify when and where these models still hold clinical value. While models may not be universally generalisable [14], identifying populations in which they are accurate – and to what degree, and in what circumstances – still holds importance, particularly in allowing countries lacking the infrastructure to build local imaging datasets to still benefit from international ML findings. Investing in the infrastructure for local validation and re-calibration will also lay groundwork for eventual contribution of local data to international repositories, which may be required to enhance local validity of models. While there are no hard and fast rules as regards the amount of data needed to validate and re-calibrate a model trained on population A before deploying to population B, the process of validation and re-calibration requires certain steps and features [15-17]. Variables in the original model from population A must be present in the dataset from population B. Data from population B for model validation has to be as recent as possible. A target acceptable discrimination and calibration should be set by those who will use the model and those who will be affected by the model. Special attention should be made in evaluating the accuracy in marginalized groups. If the model performance is below the set threshold, then re-calibration is necessary. In general, the number of patients required should be an order of magnitude greater than the number of features in the model. For images, a principal component analysis is performed to determine which image features are important. Another crucial factor in determining the minimum cohort size is the prevalence of the diagnosis for a classification algorithm or the event for a prediction algorithm. The less prevalent a diagnosis or event is, the larger the sample size required. Although medicine stands to benefit immensely from publicly-available anonymised data and its applications in artificial intelligence [18], building equitable sociodemographic representation in data repositories is crucial. In the meantime, conscientious local validation and re-calibration will elucidate how and when current ML findings can be applied to heterogeneous populations; and may help to ameliorate disparities in access to ML-driven tools. The importance of model validation on other diverse populations should be emphasised where possible, especially for models built with single-centre data.
  14 in total

1.  ORIGA(-light): an online retinal fundus image database for glaucoma analysis and research.

Authors:  Zhuo Zhang; Feng Shou Yin; Jiang Liu; Wing Kee Wong; Ngan Meng Tan; Beng Hai Lee; Jun Cheng; Tien Yin Wong
Journal:  Annu Int Conf IEEE Eng Med Biol Soc       Date:  2010

2.  Ensuring Fairness in Machine Learning to Advance Health Equity.

Authors:  Alvin Rajkomar; Michaela Hardt; Michael D Howell; Greg Corrado; Marshall H Chin
Journal:  Ann Intern Med       Date:  2018-12-04       Impact factor: 25.391

3.  REVIEW - a reference data set for retinal vessel profiles.

Authors:  Bashir Al-Diri; Andrew Hunter; David Steel; Maged Habib; Taghread Hudaib; Simon Berry
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2008

4.  Automated measurement of the arteriolar-to-venular width ratio in digital color fundus photographs.

Authors:  Meindert Niemeijer; Xiayu Xu; Alina V Dumitrescu; Priya Gupta; Bram van Ginneken; James C Folk; Michael D Abramoff
Journal:  IEEE Trans Med Imaging       Date:  2011-06-16       Impact factor: 10.048

Review 5.  Deployment of Artificial Intelligence in Real-World Practice: Opportunity and Challenge.

Authors:  Mingguang He; Zhixi Li; Chi Liu; Danli Shi; Zachary Tan
Journal:  Asia Pac J Ophthalmol (Phila)       Date:  2020 Jul-Aug

6.  Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review.

Authors:  Indranil Balki; Afsaneh Amirabadi; Jacob Levman; Anne L Martel; Ziga Emersic; Blaz Meden; Angel Garcia-Pedrero; Saul C Ramirez; Dehan Kong; Alan R Moody; Pascal N Tyrrell
Journal:  Can Assoc Radiol J       Date:  2019-09-12       Impact factor: 2.248

Review 7.  Accelerating ophthalmic artificial intelligence research: the role of an open access data repository.

Authors:  Ashley Kras; Leo A Celi; John B Miller
Journal:  Curr Opin Ophthalmol       Date:  2020-09       Impact factor: 3.761

8.  Performance of a Deep-Learning Algorithm vs Manual Grading for Detecting Diabetic Retinopathy in India.

Authors:  Varun Gulshan; Renu P Rajan; Kasumi Widner; Derek Wu; Peter Wubbels; Tyler Rhodes; Kira Whitehouse; Marc Coram; Greg Corrado; Kim Ramasamy; Rajiv Raman; Lily Peng; Dale R Webster
Journal:  JAMA Ophthalmol       Date:  2019-09-01       Impact factor: 7.389

9.  Robust vessel segmentation in fundus images.

Authors:  A Budai; R Bock; A Maier; J Hornegger; G Michelson
Journal:  Int J Biomed Imaging       Date:  2013-12-12

Review 10.  Application of machine learning in ophthalmic imaging modalities.

Authors:  Yan Tong; Wei Lu; Yue Yu; Yin Shen
Journal:  Eye Vis (Lond)       Date:  2020-04-16
View more
  3 in total

1.  Sensor Data Integration: A New Cross-Industry Collaboration to Articulate Value, Define Needs, and Advance a Framework for Best Practices.

Authors:  Ieuan Clay; Christian Angelopoulos; Anne Lord Bailey; Aaron Blocker; Simona Carini; Rodrigo Carvajal; David Drummond; Kimberly F McManus; Ingrid Oakley-Girvan; Krupal B Patel; Phillip Szepietowski; Jennifer C Goldsack
Journal:  J Med Internet Res       Date:  2021-11-09       Impact factor: 5.428

Review 2.  Optimizing human-centered AI for healthcare in the Global South.

Authors:  Chinasa T Okolo
Journal:  Patterns (N Y)       Date:  2022-01-03

3.  Global disparity bias in ophthalmology artificial intelligence applications.

Authors:  Luis Filipe Nakayama; Ashley Kras; Lucas Zago Ribeiro; Fernando Korn Malerbi; Luis Salles Mendonça; Leo Anthony Celi; Caio Vinicius Saito Regatieri; Nadia K Waheed
Journal:  BMJ Health Care Inform       Date:  2022-04
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.