| Literature DB >> 26634383 |
Yen Sia Low1, Blanca Gallego2, Nigam Haresh Shah1.
Abstract
AIMS: Electronic health records (EHR), containing rich clinical histories of large patient populations, can provide evidence for clinical decisions when evidence from trials and literature is absent. To enable such observational studies from EHR in real time, particularly in emergencies, rapid confounder control methods that can handle numerous variables and adjust for biases are imperative. This study compares the performance of 18 automatic confounder control methods.Entities:
Keywords: bias; clinical decision support; cohort studies; confounding; electronic health records; machine learning; propensity scores
Mesh:
Year: 2015 PMID: 26634383 PMCID: PMC4933592 DOI: 10.2217/cer.15.53
Source DB: PubMed Journal: J Comp Eff Res ISSN: 2042-6305 Impact factor: 1.744
Description of the 18 confounder control methods being compared.
| Random matching | – | Random | Logistic regression |
| ExpertPS_match | Expert selection | By closest PS | Conditional logistic regression |
| ExpertPS_adjust | Expert selection | – | Logistic regression |
| hdPS_match | Above minimum prevalence and ranked by univariate association with outcome | By closest PS | Conditional logistic regression |
| hdPS_adjust | As for hdPS | – | Logistic regression |
| lassoPS_match | Lasso regularization | By closest PS | Conditional logistic regression |
| lassoPS_adjust | Lasso regularization | – | Logistic regression |
| rfPS_match | Multiple random subspaces | By closest PS | Conditional logistic regression |
| rfPS_adjust | Multiple random subspaces | – | Logistic regression |
| lassoMV | Lasso regularization | – | Lasso logistic regression |
| Euclidean | – | By closest distance | Conditional logistic regression |
| Jaccard | – | By closest distance | Conditional logistic regression |
| Dice | – | By closest distance | Conditional logistic regression |
| Cosine | – | By closest distance | Conditional logistic regression |
| Pearson | – | By closest distance | Conditional logistic regression |
| Spearman | – | By closest distance | Conditional logistic regression |
| Bootstrap | – | Random | Logistic regression |
| Jackknife | – | Random | Logistic regression |
Balance of baseline variables between exposed and matched controls in simulated datasets 1 and 2.
Heatmaps showing the balance of baseline variables between exposed and matched controls groups in terms of (A) -log 10 p-values and (B) standardized mean difference in dataset 1. Lighter cells indicate smaller values while darker cells indicate bigger values. (C) Heatmap showing fraction of times the variables were considered for confounder control method in dataset 1. Darker cells show variables that were selected more frequently. Heatmaps (D–F) show the respective equivalent of heatmaps (A–C)
Balance of baseline variables between exposed and matched controls in dataset 3.
Heatmaps showing the balance of baseline variables between exposed and matched controls groups in terms of (A) -log 10 p-values and (B) standardized mean difference in dataset 3. Lighter cells indicate smaller values while darker cells indicate bigger values. (C) PS distributions of exposed (red) and control (black) groups before and after matching by PS methods in dataset 3.
Performance in means (standard deviations) of the 18 confounder control methods in simulated dataset 1.
| Baseline | true OR ≈ 0.70 | | | | |
| Random matching | 0.74 (0.23) | 0.31 (0.02) | 0.12 (0.09) | 1863 (44) | 0.2 (0.05) |
| ExpertPS_match | 0.65 (0.30) | 0.45 (0.10) | 0.26 (0.23) | 1333 (48) | 0.9 (0.2)† |
| ExpertPS_adjust | 0.67 (0.22) | 0.33 (0.02) | 0.07 (0.05) | 2000 (0) | 0.9 (0.2)† |
| hdPS_match | 0.70 (0.24) | 0.32 (0.03) | 0.13 (0.10) | 1748 (104) | 1.1 (0.4) |
| hdPS_adjust | 0.69 (0.21) | 0.30 (0.02) | 0.09 (0.07) | 2000 (0) | 1.1 (0.4) |
| lassoPS_match | 0.67 (0.26) | 0.39 (0.04) | 0.20 (0.16) | 1199 (47) | 1.0 (0.3) |
| lassoPS_adjust | 0.66 (0.22) | 0.34 (0.02) | 0.12 (0.09) | 2000 (0) | 1.0 (0.3) |
| rfPS_match | 0.68 (0.26) | 0.39 (0.04) | 0.19 (0.15) | 1237 (44) | 0.7 (0.3) |
| rfPS_adjust | 0.66 (0.22) | 0.34 (0.02) | 0.11 (0.08) | 2000 (0) | 0.7 (0.3) |
| lassoMV | 0.72 (0.23) | 0.30 (0.04) | 0.10 (0.08) | 2000 (0) | 1.1 (0.3) |
| Euclidean | 0.70 (0.26) | 0.37 (0.04) | 0.17 (0.14) | 1336 (31) | 0.2 (0.07) |
| Jaccard | 0.66 (0.22) | 0.34 (0.02) | 0.11 (0.08) | 1361 (32) | 10.4 (1.3) |
| Dice | 0.75 (0.29) | 0.37 (0.04) | 0.20 (0.15) | 1361 (32) | 8.4 (0.9) |
| Cosine | 0.74 (0.28) | 0.37 (0.04) | 0.19 (0.15) | 1359 (32) | 8.5 (0.8) |
| Pearson | 0.75 (0.28) | 0.37 (0.03) | 0.20 (0.15) | 1340 (40) | 1.2 (0.2) |
| Spearman | 0.76 (0.30) | 0.37 (0.04) | 0.21 (0.17) | 1316 (40) | 1.3 (0.2) |
| Bootstrap | 0.73 (0.23) | 0.31 (0.02) | 0.11 (0.08) | 1863 (44) | 18.2 (1.37) |
| Jackknife | 0.75 (0.23) | 0.34 (0.02) | 0.11 (0.08) | 1553 (26) | 16.1 (1.6) |
†The reported times for expertPS refer to the computing times of the logistic models and did not include time for expert consultation, which was not necessary as the causal structure was known for the simulated datasets.
OR: Odds ratio.
Performance in means (standard deviations) of the 18 confounder control methods in simulated dataset 2.
| Baseline | True OR ≈ 1.63 | | | | |
| Random matching | 1.55 (0.33) | 0.20 (0.01) | 0.09 (0.07) | 1998 (7) | 0.2 (0.07) |
| ExpertPS_match | 1.76 (2.01) | 0.38 (0.11) | 0.25 (0.25) | 1231 (47) | 1.7 (0.4)† |
| ExpertPS_adjust | 1.48 (0.36) | 0.23 (0.01) | 0.07 (0.05) | 2000 (0) | 1.7 (0.4)† |
| hdPS_match | 1.39 (0.32) | 0.22 (0.01) | 0.13 (0.10) | 1626 (108) | 5.1 (1.0) |
| hdPS_adjust | 1.38 (0.28) | 0.20 (0.01) | 0.11 (0.08) | 2000 (0) | 5.1 (1.0) |
| lassoPS_match | 1.41 (0.40) | 0.27 (0.02) | 0.16 (0.12) | 1117 (46) | 2.2 (0.5) |
| lassoPS_adjust | 1.39 (0.33) | 0.23 (0.01) | 0.12 (0.09) | 2000 (0) | 2.2 (0.5) |
| rfPS_match | 1.39 (0.36) | 0.25 (0.02) | 0.15 (0.12) | 1256 (47 | 2.4 (0.5) |
| rfPS_adjust | 1.38 (0.31) | 0.22 (0.01) | 0.12 (0.08) | 2000 (0) | 2.4 (0.5) |
| lassoMV | 1.54 (0.33) | 0.19 (0.02) | 0.08 (0.06) | 2000 (0) | 5.6 (0.9) |
| Euclidean | 1.47 (0.39) | 0.25 (0.02) | 0.15 (0.11) | 1328 (22) | 0.4 (0.09) |
| Jaccard | 1.56 (0.36) | 0.22 (0.01) | 0.12 (0.09) | 1604 (18) | 12.4 (2.2) |
| Dice | 1.56 (0.37) | 0.22 (0.01) | 0.12 (0.09) | 1604 (17) | 10.2 (1.7) |
| Cosine | 1.55 (0.39) | 0.22 (0.01) | 0.12 (0.09) | 1606 (18) | 10.3 (1.8) |
| Pearson | 1.56 (0.38) | 0.23 (0.01) | 0.12 (0.09) | 1582 (20) | 1.6 (0.3) |
| Spearman | 1.56 (0.39) | 0.23 (0.01) | 0.12 (0.09) | 1555 (21) | 1.9 (0.3) |
| Bootstrap | 1.57 (0.34) | 0.23 (0.01) | 0.09 (0.07) | 1643 (10) | 19.9 (2.6) |
| Jackknife | 1.55 (0.33) | 0.20 (0.01) | 0.09 (0.07) | 1998 (7) | 22.9 (2.8) |
†The reported times for expertPS refer to the computing times of the logistic models and did not include time for expert consultation, which was not necessary as the causal structure was known for the simulated datasets.
OR: Odds ratio.
Bubble plot of estimated effect size β (bubble color) and their 95% CI (bubble size) across 14 methods (rows) and 20 outcomes (columns) in dataset 3.
Small intensely colored bubbles indicate significant effects with narrow 95% CI. Because PS methods used both covariate adjustment and matching, results from covariate adjustment are overlaid on the results from matching (See Supplementary Data C for related forest plots and Supplementary Data D for numerical values.)