Benjamin A Goldstein1,2, Matthew Phelan2, Neha J Pagidipati2,3, Sarah B Peskoe1. 1. Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA. 2. Center for Predictive Medicine, Duke Clinical Research Institute, Durham, North Carolina, USA. 3. Department of Medicine, Duke University, Durham, North Carolina, USA.
Abstract
OBJECTIVE: Electronic health records (EHR) data have become a central data source for clinical research. One concern for using EHR data is that the process through which individuals engage with the health system, and find themselves within EHR data, can be informative. We have termed this process informed presence. In this study we use simulation and real data to assess how the informed presence can impact inference. MATERIALS AND METHODS: We first simulated a visit process where a series of biomarkers were observed informatively and uninformatively over time. We further compared inference derived from a randomized control trial (ie, uninformative visits) and EHR data (ie, potentially informative visits). RESULTS: We find that only when there is both a strong association between the biomarker and the outcome as well as the biomarker and the visit process is there bias. Moreover, once there are some uninformative visits this bias is mitigated. In the data example we find, that when the "true" associations are null, there is no observed bias. DISCUSSION: These results suggest that an informative visit process can exaggerate an association but cannot induce one. Furthermore, careful study design can, mitigate the potential bias when some noninformative visits are included. CONCLUSIONS: While there are legitimate concerns regarding biases that "messy" EHR data may induce, the conditions for such biases are extreme and can be accounted for.
OBJECTIVE: Electronic health records (EHR) data have become a central data source for clinical research. One concern for using EHR data is that the process through which individuals engage with the health system, and find themselves within EHR data, can be informative. We have termed this process informed presence. In this study we use simulation and real data to assess how the informed presence can impact inference. MATERIALS AND METHODS: We first simulated a visit process where a series of biomarkers were observed informatively and uninformatively over time. We further compared inference derived from a randomized control trial (ie, uninformative visits) and EHR data (ie, potentially informative visits). RESULTS: We find that only when there is both a strong association between the biomarker and the outcome as well as the biomarker and the visit process is there bias. Moreover, once there are some uninformative visits this bias is mitigated. In the data example we find, that when the "true" associations are null, there is no observed bias. DISCUSSION: These results suggest that an informative visit process can exaggerate an association but cannot induce one. Furthermore, careful study design can, mitigate the potential bias when some noninformative visits are included. CONCLUSIONS: While there are legitimate concerns regarding biases that "messy" EHR data may induce, the conditions for such biases are extreme and can be accounted for.
Authors: William R Hersh; Mark G Weiner; Peter J Embi; Judith R Logan; Philip R O Payne; Elmer V Bernstam; Harold P Lehmann; George Hripcsak; Timothy H Hartzog; James J Cimino; Joel H Saltz Journal: Med Care Date: 2013-08 Impact factor: 2.983
Authors: Susan E Spratt; Katherine Pereira; Bradi B Granger; Bryan C Batch; Matthew Phelan; Michael Pencina; Marie Lynn Miranda; Ebony Boulware; Joseph E Lucas; Charlotte L Nelson; Benjamin Neely; Benjamin A Goldstein; Pamela Barth; Rachel L Richesson; Isaretta L Riley; Leonor Corsino; Eugenia R McPeek Hinz; Shelley Rusincovitch; Jennifer Green; Anna Beth Barton; Carly Kelley; Kristen Hyland; Monica Tang; Amanda Elliott; Ewa Ruel; Alexander Clark; Melanie Mabrey; Kay Lyn Morrissey; Jyothi Rao; Beatrice Hong; Marjorie Pierre-Louis; Katherine Kelly; Nicole Jelesoff Journal: J Am Med Inform Assoc Date: 2017-04-01 Impact factor: 4.497
Authors: Alessandro Gasparini; Keith R Abrams; Jessica K Barrett; Rupert W Major; Michael J Sweeting; Nigel J Brunskill; Michael J Crowther Journal: Stat Neerl Date: 2019-09-05 Impact factor: 1.190
Authors: Kristie Kusibab; John A Gallis; Joseph R Egger; Maren K Olsen; Sandy Askew; Dori M Steinberg; Gary Bennett Journal: Obesity (Silver Spring) Date: 2020-09-27 Impact factor: 5.002
Authors: Daniel Chavez-Yenter; Melody S Goodman; Yuyu Chen; Xiangying Chu; Richard L Bradshaw; Rachelle Lorenz Chambers; Priscilla A Chan; Brianne M Daly; Michael Flynn; Amanda Gammon; Rachel Hess; Cecelia Kessler; Wendy K Kohlmann; Devin M Mann; Rachel Monahan; Sara Peel; Kensaku Kawamoto; Guilherme Del Fiol; Meenakshi Sigireddi; Saundra S Buys; Ophira Ginsburg; Kimberly A Kaphingst Journal: JAMA Netw Open Date: 2022-10-03
Authors: Saef Izzy; Zabreen Tahir; Rachel Grashow; David J Cote; Ali Al Jarrah; Amar Dhand; Herman Taylor; Michael Whalen; David M Nathan; Karen K Miller; Frank Speizer; Aaron Baggish; Marc G Weisskopf; Ross Zafonte Journal: J Neurotrauma Date: 2021-04-06 Impact factor: 4.869
Authors: Rose Sisk; Lijing Lin; Matthew Sperrin; Jessica K Barrett; Brian Tom; Karla Diaz-Ordaz; Niels Peek; Glen P Martin Journal: J Am Med Inform Assoc Date: 2021-01-15 Impact factor: 4.497