| Literature DB >> 35729705 |
Richard Wyss1, Chen Yanover2, Tal El-Hay2,3, Dimitri Bennett4, Robert W Platt5, Andrew R Zullo6, Grammati Sari7, Xuerong Wen8, Yizhou Ye9, Hongbo Yuan10, Mugdha Gokhale11, Elisabetta Patorno1, Kueiyu Joshua Lin1,12.
Abstract
PURPOSE: Supplementing investigator-specified variables with large numbers of empirically identified features that collectively serve as 'proxies' for unspecified or unmeasured factors can often improve confounding control in studies utilizing administrative healthcare databases. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic research. In this paper, we survey current approaches and recent advancements for high-dimensional proxy confounder adjustment in healthcare database studies.Entities:
Keywords: causal inference; confounding; machine learning
Mesh:
Year: 2022 PMID: 35729705 PMCID: PMC9541861 DOI: 10.1002/pds.5500
Source DB: PubMed Journal: Pharmacoepidemiol Drug Saf ISSN: 1053-8569 Impact factor: 2.732
FIGURE 1Illustration and examples for ‘proxy confounder’ adjustment.
FIGURE 2Different phases for high‐dimensional proxy confounder adjustment.
FIGURE 3Causal diagram illustrating one scenario where the use of marginal empirical associations for confounder selection can result in over‐adjusting for instrumental variables. In this causal structure, X 2 is marginally associated with both treatment and outcome, but is independent of the outcome after conditioning on X 1.
Examples of diagnostic metrics for causal inference.
| Condition being tested | Possible diagnostic checks | Limitations and comments |
|---|---|---|
| Positivity | Overlap of estimated PS across treatment groups |
Impact of limited overlap can depend on the adjustment approach Including more covariates for adjustment can decrease overlap. Consequently, it can be difficult to determine the optimal adjustment set in terms of maximizing confounding control vs bias due to nonoverlap |
| Conditional Exchangeability on Measured Covariates | Covariate balance across treatment groups after PS adjustment |
Primarily used for PS analyses. Less useful for causal inference approaches that model that outcome directly, including doubly robust methods. Can be difficult to quantify the impact of residual imbalance on bias in estimated treatment effects Can be difficult to determine on which variables balance should be assessed (e.g., do not want to balance instrumental variables). |
| Prediction diagnostics to assess correct model specification |
Can reward PS models that include instruments More useful for causal inference approaches that model the outcome, including doubly robust methods | |
| Simulation‐based approaches for generating synthetic datasets to evaluate bias in estimated treatment effects |
A very general approach that is applicable to any causal inference method Requires advanced simulation techniques to closely approximate the confounding structure of the study population | |
| Violation of conditional exchangeability due to unmeasured confounding | Real negative and positive control exposures and/or outcomes |
Can be useful to identifying bias caused by unmeasured confounders Can be difficult to identify good negative and/or positive controls |
| Sensitivity to hidden biases (e.g., unmeasured confounding, misclassification) | E‐value |
Implementation and communication is simple and straightforward Recent critiques have argued that the E‐value can be misleading due to its simplicity |
| Formal quantitative bias analysis | Several approaches have been proposed to conduct in‐depth sensitivity analyses for hidden biases. These can provide more detailed assessment of robustness of causal analyses, but are subject to underlying assumptions and can be tedious to implement |