| Literature DB >> 29206922 |
Jing Huang1, Rui Duan1, Rebecca A Hubbard1, Yonghui Wu2, Jason H Moore1, Hua Xu2, Yong Chen1.
Abstract
OBJECTIVES: This study proposes a novel Prior knowledge guided Integrated likelihood Estimation (PIE) method to correct bias in estimations of associations due to misclassification of electronic health record (EHR)-derived binary phenotypes, and evaluates the performance of the proposed method by comparing it to 2 methods in common practice.Entities:
Keywords: association study; bias reduction; electronic health record; misclassification; prior information
Year: 2018 PMID: 29206922 PMCID: PMC7378882 DOI: 10.1093/jamia/ocx137
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Comparison of likelihood function with unknown accuracy (blue solid line), likelihood function conditioned on misspecified accuracy (black solid line), likelihood function conditioned on known accuracy (black dashed line), and prior knowledge guided integrated likelihood function (red solid line). The true sensitivity and specificity are 90%.
Five prior distributions used for the proposed PIE method
| Prior names | Prior for sensitivity | Prior for specificity | ||
|---|---|---|---|---|
| Distribution | sd | Distribution | sd | |
| PIE1 | 0.5+1/2*logitnormal(0.67,0.60) | 0.07 | 0.5+1/2*logitnormal(0.73,0.80) | 0.08 |
| PIE1_sv | 0.5+1/2*logitnormal(0.70,0.20) | 0.02 | 0.5+1/2*logitnormal(0.80,0.23) | 0.03 |
| PIE2 | 0.5+1/2*logitnormal(0.50,0.60) | 0.07 | 0.5+1/2*logitnormal(0.58,0.60) | 0.07 |
| PIE2_lv | 0.5+1/2*logitnormal(0.50,1.20) | 0.12 | 0.5+1/2*logitnormal(0.53,1.20) | 0.12 |
| PIE3 | uniform(0.60,0.90) | 0.09 | uniform(0.65,0.95) | 0.09 |
Figure 2.Illustration of the 5 types of prior distributions in PIE method: PIE1 (distributions peak at the true values of sensitivity and specificity); PIE1_sv (distributions peak at the true values of sensitivity and specificity, with small variance); PIE2 (distributions have peaks that differ from true values); PIE2_lv (distributions have peaks that differ from true values, with large variance); and PIE3 (uniform distributions not centered at the true values). Vertical dashed line marks the true value of sensitivity or specificity, and solid line marks the peak of the prior distribution.
Figure 3.Box plots of estimates ofβ1 using the ML method with correctly specified sensitivity and specificity (gold standard), the method ignoring misclassification (naïve), the ML method with misspecified sensitivity and specificity (ML-MS), and the prior knowledge guided integrated likelihood method with 3 priors (PIE1, PIE2, PIE3). Solid black segment in each box shows the median of the estimates.
Comparison of methods for estimation of the association parameter,, in term of bias and standard deviation
| Bias | Standard deviation | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GS | Naïve | ML-MS | PIE1 | PIE2 | PIE3 | GS | Naïve | ML-MS | PIE1 | PIE2 | PIE3 | ||
| 0.85, 0.90 | 1 | 0.00 | −0.42 | 0.70 | 0.04 | 0.21 | 0.06 | 0.10 | 0.09 | 0.24 | 0.17 | 0.20 | 0.20 |
| 1.5 | 0.03 | −0.78 | 1.24 | 0.07 | 0.24 | 0.08 | 0.17 | 0.08 | 0.47 | 0.28 | 0.28 | 0.30 | |
| 0.65, 0.80 | 1 | 0.04 | −0.70 | −0.45 | 0.11 | −0.23 | −0.42 | 0.22 | 0.09 | 0.10 | 0.51 | 0.39 | 0.14 |
| 1.5 | 0.08 | −1.15 | −0.79 | 0.17 | −0.17 | −0.68 | 0.44 | 0.08 | 0.10 | 0.69 | 0.62 | 0.22 | |
Abbreviations: GS, or gold standard: ML method with true sensitivity and specificity; Naive: method ignoring misclassification; ML-MS: ML method with misspecified sensitivity and specificity; PIE1, PIE2, PIE3: PIE methods under 3 priors.
Figure 4.Box plots of estimates ofβ1 using the prior knowledge guided integrated likelihood method with 4 priors (PIE1_sv, PIE1, PIE2_lv, PIE2). Solid black segment in each box shows the median of the estimates.
Summary statistics of the variables of interest in the diabetes dataset from KPW
| Variables of interest | |
|---|---|
| Treated diabetes | |
| Yes | 230 (11.4) |
| No | 1792 (88.6) |
| Hypertension | |
| Yes | 1403 (69.4) |
| No | 619 (30.6) |
| Race | |
| White | 1821 (90.1) |
| Nonwhite | 201 (9.9) |
Estimated effect sizes (in log odds ratio scale) of the risk factors for diabetes using different methods
| Hypertension | BMI | Race | ||||
|---|---|---|---|---|---|---|
| Point estimate | Relative bias (%) | Point estimate | Relative bias (%) | Point estimate | Relative bias (%) | |
| Gold standard | 0.53 | 0 | 0.09 | 0 | 0.57 | 0 |
| Naïve | 0.41 | −23 | 0.08 | −11 | 0.48 | −16 |
| ML-MS | 0.68 | 28 | 0.10 | 11 | 0.65 | 14 |
| PIE | 0.48 | −9 | 0.09 | 0 | 0.56 | 2 |