Yong Chen1, Jianqiao Wang1, Jessica Chubak2, Rebecca A Hubbard1. 1. Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA. 2. Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA.
Abstract
PURPOSE: Many outcomes derived from electronic health records (EHR) not only are imperfect but also may suffer from exposure-dependent differential misclassification due to variability in the quality and availability of EHR data across exposure groups. The objective of this study was to quantify the inflation of type I error rates that can result from differential outcome misclassification. METHODS: We used data on gold-standard and EHR-derived second breast cancers in a cohort of women with a prior breast cancer diagnosis from 1993 to 2006 enrolled in Kaiser Permanente Washington. We simulated an exposure that was independent of the true outcome status. A surrogate outcome was then simulated with varying sensitivity and specificity according to exposure status. We estimated the type I error rate for a test of association relating this exposure to the surrogate outcome, while varying outcome sensitivity and specificity in exposed individuals. RESULTS: Type I error rates were substantially inflated above the nominal level (5%) for even modest departures from nondifferential misclassification. Holding sensitivity in exposed and unexposed groups at 85%, a difference in specificity of 10% between the exposed and unexposed (80% vs 90%) resulted in a 36% type I error rate. Type I error was inflated more by differential specificity than sensitivity. CONCLUSIONS: Differential outcome misclassification may induce spurious findings. Researchers using EHR-derived outcomes should use misclassification-adjusted methods whenever possible or conduct sensitivity analyses to investigate the possibility of false-positive findings, especially for exposures that may be related to the accuracy of outcome ascertainment.
PURPOSE: Many outcomes derived from electronic health records (EHR) not only are imperfect but also may suffer from exposure-dependent differential misclassification due to variability in the quality and availability of EHR data across exposure groups. The objective of this study was to quantify the inflation of type I error rates that can result from differential outcome misclassification. METHODS: We used data on gold-standard and EHR-derived second breast cancers in a cohort of women with a prior breast cancer diagnosis from 1993 to 2006 enrolled in Kaiser Permanente Washington. We simulated an exposure that was independent of the true outcome status. A surrogate outcome was then simulated with varying sensitivity and specificity according to exposure status. We estimated the type I error rate for a test of association relating this exposure to the surrogate outcome, while varying outcome sensitivity and specificity in exposed individuals. RESULTS: Type I error rates were substantially inflated above the nominal level (5%) for even modest departures from nondifferential misclassification. Holding sensitivity in exposed and unexposed groups at 85%, a difference in specificity of 10% between the exposed and unexposed (80% vs 90%) resulted in a 36% type I error rate. Type I error was inflated more by differential specificity than sensitivity. CONCLUSIONS: Differential outcome misclassification may induce spurious findings. Researchers using EHR-derived outcomes should use misclassification-adjusted methods whenever possible or conduct sensitivity analyses to investigate the possibility of false-positive findings, especially for exposures that may be related to the accuracy of outcome ascertainment.
Authors: Emily Cox; Bradley C Martin; Tjeerd Van Staa; Edeltraut Garbe; Uwe Siebert; Michael L Johnson Journal: Value Health Date: 2009-09-10 Impact factor: 5.725
Authors: Jessica Chubak; Onchee Yu; Gaia Pocobelli; Lois Lamerato; Joe Webster; Marianne N Prout; Marianne Ulcickas Yood; William E Barlow; Diana S M Buist Journal: J Natl Cancer Inst Date: 2012-04-30 Impact factor: 13.506
Authors: Jiayi Tong; Jing Huang; Jessica Chubak; Xuan Wang; Jason H Moore; Rebecca A Hubbard; Yong Chen Journal: J Am Med Inform Assoc Date: 2020-02-01 Impact factor: 4.497
Authors: Jingcheng Du; Chongliang Luo; Ross Shegog; Jiang Bian; Rachel M Cunningham; Julie A Boom; Gregory A Poland; Yong Chen; Cui Tao Journal: JAMA Netw Open Date: 2020-11-02