| Literature DB >> 34953160 |
Imane Benasseur1,2, Denis Talbot2,3, Madeleine Durand4,5, Anne Holbrook6, Alexis Matteau4,5, Brian J Potter4,5, Christel Renoux7,8,9, Mireille E Schnitzer8,10,11, Jean-Éric Tarride12,13, Jason R Guertin2,3.
Abstract
PURPOSE: Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challenges. The objective was to evaluate the capacity of such methods to select confounders and reduce unmeasured confounding bias.Entities:
Keywords: algorithms; biostatistics; confounding factors; machine learning; pharmacoepidemiology; propensity score
Mesh:
Year: 2022 PMID: 34953160 PMCID: PMC9304306 DOI: 10.1002/pds.5403
Source DB: PubMed Journal: Pharmacoepidemiol Drug Saf ISSN: 1053-8569 Impact factor: 2.732
FIGURE 1Directed acyclic graph illustrating the problem of unmeasured confounders. Double‐arrows between variables are used as notational shorthand to mean that unobserved common causes may exist resulting in correlations. True confounders affect both the exposure and the outcome, but only measured confounders can be included in the analysis (box). Proxies for unmeasured confounders are variables that are affected by or correlated with the unmeasured confounders but are not confounders themselves
FIGURE 2Flowchart of the simulation study
Summary of the parameters for simulating the synthetic data
| Scenario | Sub‐groups | Correlations | Exposure‐covariate associations ( | Outcome‐covariate associations ( | Hidden variable |
|---|---|---|---|---|---|
| 1 |
(X1,…, X40), (X41,…,X70), X71,…,X90), (X91, X100) |
W. = 0.2 B. = 0.1 |
(α1, α2, α5, α6, α71, α72, α75, α76) = 1 (α41, α42, α45, α46, α91, α92, α95, α96) = 1 all other |
(
(
all other | None |
| 2 |
W. = 0.2 B. = 0.1 | X1 | |||
| 3 |
W. = 0.4 B. = 0.2 | None | |||
| 4 |
W. = 0.4 B. = 0.2 | X1 |
Abbreviations: W. = within dimensions, B. = between dimensions.
Results of simulation Scenario 1 (weak correlations, no hidden confounder)
| Method | Bias | SD | ESE | Rel. RMSE | CP |
|---|---|---|---|---|---|
| True | 0.003 | 0.030 | 0.030 | 1.00 | 94.4 |
| Crude | 0.155 | 0.027 | 0.026 | 5.14 | 0.0 |
| BAC | 0.008 | 0.035 | 0.032 | 1.19 | 90.8 |
| GBCEE | 0.005 | 0.037 | 0.036 | 1.24 | 92.5 |
| GLiDeR | 0.026 | 0.034 | 0.025 | 1.40 | 72.6 |
| SC‐TMLE | 0.005 | 0.037 | 0.023 | 1.23 | 77.8 |
| hdPS IPTW ( | 0.094 | 0.039 | 0.033 | 3.33 | 23.6 |
| hdPS Match ( | 0.094 | 0.042 | 0.041 | 3.38 | 26.3 |
| m‐hdPS IPTW ( | 0.018 | 0.042 | 0.041 | 1.49 | 91.4 |
| m‐hdPS Match ( | 0.018 | 0.046 | 0.039 | 1.61 | 88.9 |
Abbreviations: CP, Coverage of 95% confidence intervals; ESE, estimated standard error; Rel. RMSE, relative root‐mean squared error (compared to true model); SD, standard deviation.
Results of simulation Scenario 2 (weak correlations, one hidden confounder)
| Method | Bias | SD | ESE | Rel. RMSE | CP |
|---|---|---|---|---|---|
| True | 0.091 | 0.029 | 0.029 | 1.00 | 13.0 |
| Crude | 0.153 | 0.025 | 0.026 | 1.63 | 0.0 |
| BAC | 0.029 | 0.034 | 0.032 | 0.47 | 84.0 |
| GBCEE | 0.026 | 0.036 | 0.035 | 0.46 | 87.8 |
| GLiDeR | 0.042 | 0.032 | 0.025 | 0.55 | 58.1 |
| SC‐TMLE | 0.025 | 0.035 | 0.024 | 0.45 | 70.6 |
| hdPS IPTW ( | 0.095 | 0.035 | 0.033 | 1.06 | 20.1 |
| hdPS Match ( | 0.094 | 0.039 | 0.035 | 1.07 | 25.3 |
| m‐hdPS IPTW ( | 0.032 | 0.039 | 0.040 | 0.52 | 83.8 |
| m‐hdPS Match ( | 0.031 | 0.042 | 0.039 | 0.55 | 85.7 |
Abbreviations: CP, Coverage of 95% confidence intervals; ESE, estimated standard error; Rel. RMSE, relative root‐mean squared error (compared to true model); SD, standard deviation.
Results of simulation Scenario 3 (strong correlations, no hidden confounder)
| Method | Bias | SD | ESE | Rel. RMSE | CP |
|---|---|---|---|---|---|
| True | 0.002 | 0.031 | 0.031 | 1.00 | 94.4 |
| Crude | 0.189 | 0.026 | 0.026 | 6.14 | 0.0 |
| BAC | 0.009 | 0.036 | 0.034 | 1.18 | 92.6 |
| GBCEE | 0.005 | 0.040 | 0.039 | 1.30 | 93.3 |
| GLiDeR | 0.037 | 0.035 | 0.024 | 1.64 | 61.2 |
| SC‐TMLE | 0.004 | 0.037 | 0.023 | 1.19 | 77.1 |
| hdPS IPTW ( | 0.109 | 0.040 | 0.035 | 3.73 | 17.1 |
| hdPS Match ( | 0.108 | 0.044 | 0.035 | 3.74 | 19.9 |
| m‐hdPS IPTW ( | 0.040 | 0.048 | 0.044 | 2.00 | 78.2 |
| m‐hdPS Match ( | 0.041 | 0.049 | 0.039 | 2.05 | 77.2 |
Abbreviations: CP, Coverage of 95% confidence intervals; ESE, estimated standard error; Rel. RMSE, relative root‐mean squared error (compared to true model); SD, standard deviation.
Results of simulation Scenario 4 (strong correlations, one hidden confounder)
| Method | Bias | SD | ESE | Rel. RMSE | CP |
|---|---|---|---|---|---|
| True | 0.087 | 0.029 | 0.030 | 1.00 | 18.5 |
| Crude | 0.188 | 0.025 | 0.026 | 2.06 | 0.0 |
| BAC | 0.027 | 0.036 | 0.034 | 0.49 | 85.3 |
| GBCEE | 0.023 | 0.040 | 0.038 | 0.50 | 88.7 |
| GLiDeR | 0.051 | 0.035 | 0.024 | 0.66 | 46.0 |
| SC‐TMLE | 0.022 | 0.038 | 0.023 | 0.47 | 72.5 |
| hdPS IPTW ( | 0.110 | 0.039 | 0.035 | 1.27 | 16.6 |
| hdPS Match ( | 0.110 | 0.041 | 0.035 | 1.27 | 17.7 |
| m‐hdPS IPTW ( | 0.050 | 0.044 | 0.043 | 0.72 | 71.7 |
| m‐hdPS Match ( | 0.052 | 0.046 | 0.039 | 0.75 | 70.2 |
Abbreviations: CP, Coverage of 95% confidence intervals; ESE, estimated standard error; Rel. RMSE, relative root‐mean squared error (compared to true model); SD, standard deviation.
Results of plasmode simulation Scenario 1 based on electronic health record data from Quebec, Canada, public insurance (N = 10 000; 336 covariates, one hidden confounder)
| Method | Bias | SD | ESE | Rel. RMSE | CP |
|---|---|---|---|---|---|
| True | −0.009 | 0.008 | 0.007 | 1.00 | 76.8 |
| Crude | −0.059 | 0.009 | 0.009 | 5.14 | 0.0 |
| BAC | 0.000 | 0.009 | 0.008 | 0.78 | 91.7 |
| GBCEE | 0.002 | 0.009 | 0.009 | 0.81 | 93.8 |
| GLiDeR* | 0.001 | 0.095 | 0.035 | 8.16 | 91.5 |
| SC‐TMLE | 0.002 | 0.009 | 0.008 | 0.81 | 86.5 |
| hdPS IPTW ( | 0.019 | 0.017 | 0.014 | 2.23 | 91.0 |
| hdPS Match ( | 0.016 | 0.011 | 0.010 | 1.70 | 86.5 |
| m‐hdPS IPTW ( | 0.028 | 0.028 | 0.019 | 3.40 | 89.8 |
| m‐hdPS Match ( | 0.018 | 0.011 | 0.010 | 1.82 | 82.5 |
Note: *12 replications were dropped due to estimates lying outside the possible range values (RD < −1 or RD > 1).
Abbreviations: CP, Coverage of 95% confidence intervals; ESE, estimated standard error; Rel. RMSE, relative root‐mean squared error (compared to true model); SD, standard deviation.
Results of the plasmode Scenario 2 based on electronic health record data from Quebec, Canada, public insurance (N = 20 000; 573 covariates, one hidden confounder)
| Method | Bias | SD | ESE | Rel. RMSE | CP |
|---|---|---|---|---|---|
| True | −0.016 | 0.008 | 0.006 | 1.00 | 26.4 |
| Crude | −0.072 | 0.006 | 0.006 | 3.88 | 0.0 |
| BAC | 0.001 | 0.007 | 0.003 | 0.36 | 59.1 |
| GBCEE | 0.003 | 0.008 | 0.007 | 0.47 | 92.7 |
| GLiDeR* | NA | NA | NA | NA | NA |
| SC‐TMLE | 0.002 | 0.008 | 0.007 | 0.42 | 88.2 |
| hdPS IPTW ( | 0.102 | 0.215 | 0.067 | 12.83 | 61.8 |
| hdPS Match ( | 0.016 | 0.010 | 0.010 | 1.04 | 71.8 |
| m‐hdPS IPTW ( | 0.107 | 0.202 | 0.069 | 12.35 | 63.6 |
| m‐hdPS Match ( | 0.017 | 0.011 | 0.010 | 1.06 | 65.5 |
Note: *Most estimates of GLiDeR (88/110) lay outside the possible range values (RD < −1 or RD > 1).
Abbreviations: CP, Coverage of 95% confidence intervals; ESE, estimated standard error; Rel. RMSE, relative root‐mean squared error (compared to true model); SD, standard deviation.
FIGURE 3Bias (squares) and SD (bars) of the estimates according to simulation scenarios. Scenario 1: Weak correlations, no hidden confounders; Scenario 2: Weak correlation, one hidden confounder; Scenario 3: Strong correlations, no hidden confounders; Scenario 4: Strong correlations, one hidden confounder
FIGURE 4Bias and SD (bars) of the different estimators in the plasmode simulation based on electronic health record data from Quebec, Canada, public insurance. In scenario 1, n = 10 000, 336 covariates are considered, and one confounder is hidden. In Scenario 2, n = 20 000, 573 covariates are considered, and five confounders are hidden