Kueiyu Joshua Lin1,2, Gary E Rosenthal3, Shawn N Murphy4,5, Kenneth D Mandl6, Yinzhu Jin1, Robert J Glynn1, Sebastian Schneeweiss1. 1. Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. 2. Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. 3. Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA. 4. Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. 5. Research Information Science and Computing, Partners Healthcare, Somerville, MA, USA. 6. Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Abstract
OBJECTIVE: Electronic health records (EHR) data-discontinuity, i.e. receiving care outside of a particular EHR system, may cause misclassification of study variables. We aimed to validate an algorithm to identify patients with high EHR data-continuity to reduce such bias. MATERIALS AND METHODS: We analyzed data from two EHR systems linked with Medicare claims data from 2007 through 2014, one in Massachusetts (MA, n=80,588) and the other in North Carolina (NC, n=33,207). We quantified EHR data-continuity by Mean Proportion of Encounters Captured (MPEC) by the EHR system when compared to complete recording in claims data. The prediction model for MPEC was developed in MA and validated in NC. Stratified by predicted EHR data-continuity, we quantified misclassification of 40 key variables by Mean Standardized Differences (MSD) between the proportions of these variables based on EHR alone vs the linked claims-EHR data. RESULTS: The mean MPEC was 27% in the MA and 26% in the NC system. The predicted and observed EHR data-continuity was highly correlated (Spearman correlation=0.78 and 0.73, respectively). The misclassification (MSD) of 40 variables in patients of the predicted EHR data-continuity cohort was significantly smaller (44%, 95% CI: 40-48%) than that in the remaining population. DISCUSSION: The comorbidity profiles were similar in patients with high vs low EHR data-continuity. Therefore, restricting an analysis to patients with high EHR data-continuity may reduce information bias while preserving the representativeness of the study cohort. CONCLUSION: We have successfully validated an algorithm that can identify a high EHR data-continuity cohort representative of the source population.
OBJECTIVE: Electronic health records (EHR) data-discontinuity, i.e. receiving care outside of a particular EHR system, may cause misclassification of study variables. We aimed to validate an algorithm to identify patients with high EHR data-continuity to reduce such bias. MATERIALS AND METHODS: We analyzed data from two EHR systems linked with Medicare claims data from 2007 through 2014, one in Massachusetts (MA, n=80,588) and the other in North Carolina (NC, n=33,207). We quantified EHR data-continuity by Mean Proportion of Encounters Captured (MPEC) by the EHR system when compared to complete recording in claims data. The prediction model for MPEC was developed in MA and validated in NC. Stratified by predicted EHR data-continuity, we quantified misclassification of 40 key variables by Mean Standardized Differences (MSD) between the proportions of these variables based on EHR alone vs the linked claims-EHR data. RESULTS: The mean MPEC was 27% in the MA and 26% in the NC system. The predicted and observed EHR data-continuity was highly correlated (Spearman correlation=0.78 and 0.73, respectively). The misclassification (MSD) of 40 variables in patients of the predicted EHR data-continuity cohort was significantly smaller (44%, 95% CI: 40-48%) than that in the remaining population. DISCUSSION: The comorbidity profiles were similar in patients with high vs low EHR data-continuity. Therefore, restricting an analysis to patients with high EHR data-continuity may reduce information bias while preserving the representativeness of the study cohort. CONCLUSION: We have successfully validated an algorithm that can identify a high EHR data-continuity cohort representative of the source population.
Authors: Margaret C Fang; Dongjie Fan; Sue Hee Sung; Daniel M Witt; John R Schmelzer; Steven R Steinhubl; Steven H Yale; Alan S Go Journal: Med Care Date: 2017-12 Impact factor: 2.983
Authors: Peter M Wahl; Keith Rodgers; Sebastian Schneeweiss; Brian F Gage; Javed Butler; Charles Wilmer; Marshall Nash; Gregory Esper; Norman Gitlin; Neal Osborn; Louise J Short; Rhonda L Bohn Journal: Pharmacoepidemiol Drug Saf Date: 2010-06 Impact factor: 2.890
Authors: Kueiyu Joshua Lin; Robert J Glynn; Daniel E Singer; Shawn N Murphy; Joyce Lii; Sebastian Schneeweiss Journal: Epidemiology Date: 2018-05 Impact factor: 4.822
Authors: Andrew Cunningham; C Michael Stein; Cecilia P Chung; James R Daugherty; Walter E Smalley; Wayne A Ray Journal: Pharmacoepidemiol Drug Saf Date: 2011-03-08 Impact factor: 2.890
Authors: J A P Da Silva; J W G Jacobs; J R Kirwan; M Boers; K G Saag; L B S Inês; E J P de Koning; F Buttgereit; M Cutolo; H Capell; R Rau; J W J Bijlsma Journal: Ann Rheum Dis Date: 2005-08-17 Impact factor: 19.103
Authors: Griffin M Weber; William G Adams; Elmer V Bernstam; Jonathan P Bickel; Kathe P Fox; Keith Marsolo; Vijay A Raghavan; Alexander Turchin; Xiaobo Zhou; Shawn N Murphy; Kenneth D Mandl Journal: J Am Med Inform Assoc Date: 2017-11-01 Impact factor: 4.497
Authors: Brent A Williams; Stephen Voyce; Stephen Sidney; Véronique L Roger; Timothy B Plante; Sharon Larson; Michael J LaMonte; Darwin R Labarthe; Bailey M DeBarmore; Alexander R Chang; Alanna M Chamberlain; Catherine P Benziger Journal: J Am Heart Assoc Date: 2022-04-12 Impact factor: 6.106
Authors: Melissa A Haendel; Christopher G Chute; Tellen D Bennett; David A Eichmann; Justin Guinney; Warren A Kibbe; Philip R O Payne; Emily R Pfaff; Peter N Robinson; Joel H Saltz; Heidi Spratt; Christine Suver; John Wilbanks; Adam B Wilcox; Andrew E Williams; Chunlei Wu; Clair Blacketer; Robert L Bradford; James J Cimino; Marshall Clark; Evan W Colmenares; Patricia A Francis; Davera Gabriel; Alexis Graves; Raju Hemadri; Stephanie S Hong; George Hripscak; Dazhi Jiao; Jeffrey G Klann; Kristin Kostka; Adam M Lee; Harold P Lehmann; Lora Lingrey; Robert T Miller; Michele Morris; Shawn N Murphy; Karthik Natarajan; Matvey B Palchuk; Usman Sheikh; Harold Solbrig; Shyam Visweswaran; Anita Walden; Kellie M Walters; Griffin M Weber; Xiaohan Tanner Zhang; Richard L Zhu; Benjamin Amor; Andrew T Girvin; Amin Manna; Nabeel Qureshi; Michael G Kurilla; Sam G Michael; Lili M Portilla; Joni L Rutter; Christopher P Austin; Ken R Gersing Journal: J Am Med Inform Assoc Date: 2021-03-01 Impact factor: 7.942