Griffin M Weber1,2, William G Adams3, Elmer V Bernstam4, Jonathan P Bickel5, Kathe P Fox6, Keith Marsolo7, Vijay A Raghavan8, Alexander Turchin9, Xiaobo Zhou10, Shawn N Murphy11, Kenneth D Mandl1,5. 1. Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. 2. Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA. 3. Department of Pediatrics, Boston Medical Center, Boston, MA, USA. 4. Department of Internal Medicine, McGovern Medical School, School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, USA. 5. Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA. 6. Department of Analytics and Behavior Change, Aetna, Hartford, CT, USA. 7. Department of Pediatrics, Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA. 8. Scientific Information Management, Merck, Boston, MA, USA. 9. Division of Endocrinology, Brigham and Women's Hospital, Boston, MA, USA. 10. Department of Radiology, Wake Forest University School of Medicine, Winston Salem, NC, USA. 11. Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.
Abstract
OBJECTIVE: One promise of nationwide adoption of electronic health records (EHRs) is the availability of data for large-scale clinical research studies. However, because the same patient could be treated at multiple health care institutions, data from only a single site might not contain the complete medical history for that patient, meaning that critical events could be missing. In this study, we evaluate how simple heuristic checks for data "completeness" affect the number of patients in the resulting cohort and introduce potential biases. MATERIALS AND METHODS: We began with a set of 16 filters that check for the presence of demographics, laboratory tests, and other types of data, and then systematically applied all 216 possible combinations of these filters to the EHR data for 12 million patients at 7 health care systems and a separate payor claims database of 7 million members. RESULTS: EHR data showed considerable variability in data completeness across sites and high correlation between data types. For example, the fraction of patients with diagnoses increased from 35.0% in all patients to 90.9% in those with at least 1 medication. An unrelated claims dataset independently showed that most filters select members who are older and more likely female and can eliminate large portions of the population whose data are actually complete. DISCUSSION AND CONCLUSION: As investigators design studies, they need to balance their confidence in the completeness of the data with the effects of placing requirements on the data on the resulting patient cohort.
OBJECTIVE: One promise of nationwide adoption of electronic health records (EHRs) is the availability of data for large-scale clinical research studies. However, because the same patient could be treated at multiple health care institutions, data from only a single site might not contain the complete medical history for that patient, meaning that critical events could be missing. In this study, we evaluate how simple heuristic checks for data "completeness" affect the number of patients in the resulting cohort and introduce potential biases. MATERIALS AND METHODS: We began with a set of 16 filters that check for the presence of demographics, laboratory tests, and other types of data, and then systematically applied all 216 possible combinations of these filters to the EHR data for 12 million patients at 7 health care systems and a separate payor claims database of 7 million members. RESULTS: EHR data showed considerable variability in data completeness across sites and high correlation between data types. For example, the fraction of patients with diagnoses increased from 35.0% in all patients to 90.9% in those with at least 1 medication. An unrelated claims dataset independently showed that most filters select members who are older and more likely female and can eliminate large portions of the population whose data are actually complete. DISCUSSION AND CONCLUSION: As investigators design studies, they need to balance their confidence in the completeness of the data with the effects of placing requirements on the data on the resulting patient cohort.
Authors: Jennifer E Devoe; Rachel Gold; Patti McIntire; Jon Puro; Susan Chauvie; Charles A Gallia Journal: Ann Fam Med Date: 2011 Jul-Aug Impact factor: 5.166
Authors: Shawn N Murphy; Griffin Weber; Michael Mendis; Vivian Gainer; Henry C Chueh; Susanne Churchill; Isaac Kohane Journal: J Am Med Inform Assoc Date: 2010 Mar-Apr Impact factor: 4.497
Authors: William R Hersh; Mark G Weiner; Peter J Embi; Judith R Logan; Philip R O Payne; Elmer V Bernstam; Harold P Lehmann; George Hripcsak; Timothy H Hartzog; James J Cimino; Joel H Saltz Journal: Med Care Date: 2013-08 Impact factor: 2.983
Authors: Kenneth D Mandl; Isaac S Kohane; Douglas McFadden; Griffin M Weber; Marc Natter; Joshua Mandel; Sebastian Schneeweiss; Sarah Weiler; Jeffrey G Klann; Jonathan Bickel; William G Adams; Yaorong Ge; Xiaobo Zhou; James Perkins; Keith Marsolo; Elmer Bernstam; John Showalter; Alexander Quarshie; Elizabeth Ofili; George Hripcsak; Shawn N Murphy Journal: J Am Med Inform Assoc Date: 2014-05-12 Impact factor: 4.497
Authors: Jason L Vassy; Yuk-Lam Ho; Jacqueline Honerlaw; Kelly Cho; J Michael Gaziano; Peter W F Wilson; David R Gagnon Journal: J Biomed Inform Date: 2018-01-03 Impact factor: 6.317
Authors: Michael A Puskarich; Clif Callaway; Robert Silbergleit; Jesse M Pines; Ziad Obermeyer; David W Wright; Renee Y Hsia; Manish N Shah; Andrew A Monte; Alexander T Limkakeng; Zachary F Meisel; Phillip D Levy Journal: Acad Emerg Med Date: 2018-08-16 Impact factor: 3.451
Authors: Iona Cheng; Scarlett L Gomez; Mindy C DeRouen; Caroline A Thompson; Alison J Canchola; Anqi Jin; Sixiang Nie; Carmen Wong; Jennifer Jain; Daphne Y Lichtensztajn; Yuqing Li; Laura Allen; Manali I Patel; Yihe G Daida; Harold S Luft; Salma Shariff-Marco; Peggy Reynolds; Heather A Wakelee; Su-Ying Liang; Beth E Waitzfelder Journal: Cancer Epidemiol Biomarkers Prev Date: 2021-05-17 Impact factor: 4.254
Authors: Kueiyu Joshua Lin; Gary E Rosenthal; Shawn N Murphy; Kenneth D Mandl; Yinzhu Jin; Robert J Glynn; Sebastian Schneeweiss Journal: Clin Epidemiol Date: 2020-02-04 Impact factor: 4.790
Authors: Dinah Foer; Patrick E Beeler; Jing Cui; Elizabeth W Karlson; David W Bates; Katherine N Cahill Journal: Am J Respir Crit Care Med Date: 2021-04-01 Impact factor: 21.405