Selen Bozkurt1, Eli M Cahan1,2, Martin G Seneviratne1, Ran Sun1, Juan A Lossio-Ventura1, John P A Ioannidis1,3,4,5,6, Tina Hernandez-Boussard1,4,7. 1. Department of Medicine, Stanford University, Stanford, California, USA. 2. NYU School of Medicine, New York, New York, USA. 3. Department of Epidemiology and Population Health, School of Medicine, Stanford University, Stanford, California, USA. 4. Department of Biomedical Data Science, Stanford University, Stanford, California, USA. 5. Department of Statistics, Stanford University, Stanford, California, USA. 6. Meta-Research Innovation Center at Stanford, Stanford University, Stanford, California, USA. 7. Department of Surgery, Stanford University, Stanford, California, USA.
Abstract
OBJECTIVE: The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic health record (EHR) data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility. MATERIALS AND METHODS: We searched PubMed for articles applying ML models to improve clinical decision-making using EHR data. We limited our search to papers published between 2015 and 2019. RESULTS: Across the 164 studies reviewed, demographic variables were inconsistently reported and/or included as model inputs. Race/ethnicity was not reported in 64%; gender and age were not reported in 24% and 21% of studies, respectively. Socioeconomic status of the population was not reported in 92% of studies. Studies that mentioned these variables often did not report if they were included as model inputs. Few models (12%) were validated using external populations. Few studies (17%) open-sourced their code. Populations in the ML studies include higher proportions of White and Black yet fewer Hispanic subjects compared to the general US population. DISCUSSION: The demographic characteristics of study populations are poorly reported in the ML literature based on EHR data. Demographic representativeness in training data and model transparency is necessary to ensure that ML models are deployed in an equitable and reproducible manner. Wider adoption of reporting guidelines is warranted to improve representativeness and reproducibility.
OBJECTIVE: The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic health record (EHR) data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility. MATERIALS AND METHODS: We searched PubMed for articles applying ML models to improve clinical decision-making using EHR data. We limited our search to papers published between 2015 and 2019. RESULTS: Across the 164 studies reviewed, demographic variables were inconsistently reported and/or included as model inputs. Race/ethnicity was not reported in 64%; gender and age were not reported in 24% and 21% of studies, respectively. Socioeconomic status of the population was not reported in 92% of studies. Studies that mentioned these variables often did not report if they were included as model inputs. Few models (12%) were validated using external populations. Few studies (17%) open-sourced their code. Populations in the ML studies include higher proportions of White and Black yet fewer Hispanic subjects compared to the general US population. DISCUSSION: The demographic characteristics of study populations are poorly reported in the ML literature based on EHR data. Demographic representativeness in training data and model transparency is necessary to ensure that ML models are deployed in an equitable and reproducible manner. Wider adoption of reporting guidelines is warranted to improve representativeness and reproducibility.
Authors: Benjamin A Goldstein; Ann Marie Navar; Michael J Pencina; John P A Ioannidis Journal: J Am Med Inform Assoc Date: 2016-05-17 Impact factor: 4.497
Authors: William R Hersh; Mark G Weiner; Peter J Embi; Judith R Logan; Philip R O Payne; Elmer V Bernstam; Harold P Lehmann; George Hripcsak; Timothy H Hartzog; James J Cimino; Joel H Saltz Journal: Med Care Date: 2013-08 Impact factor: 2.983
Authors: Karel G M Moons; Robert F Wolff; Richard D Riley; Penny F Whiting; Marie Westwood; Gary S Collins; Johannes B Reitsma; Jos Kleijnen; Sue Mallett Journal: Ann Intern Med Date: 2019-01-01 Impact factor: 25.391
Authors: Tina Hernandez-Boussard; Paul Macklin; Emily J Greenspan; Amy L Gryshuk; Eric Stahlberg; Tanveer Syeda-Mahmood; Ilya Shmulevich Journal: Nat Med Date: 2021-12 Impact factor: 87.241
Authors: Juan Antonio Lossio-Ventura; Wenyu Song; Michael Sainlaire; Patricia C Dykes; Tina Hernandez-Boussard Journal: Int J Med Inform Date: 2022-03-16 Impact factor: 4.730
Authors: Jonathan H Lu; Alison Callahan; Birju S Patel; Keith E Morse; Dev Dash; Michael A Pfeffer; Nigam H Shah Journal: JAMA Netw Open Date: 2022-08-01
Authors: Mohammad Zhalechian; Mark P Van Oyen; Mariel S Lavieri; Carlos Gustavo De Moraes; Christopher A Girkin; Massimo A Fazio; Robert N Weinreb; Christopher Bowd; Jeffrey M Liebmann; Linda M Zangwill; Christopher A Andrews; Joshua D Stein Journal: Ophthalmol Sci Date: 2021-12-21