| Literature DB >> 31619986 |
M Sanni Ali1,2,3, Daniel Prieto-Alhambra2,4, Luciane Cruz Lopes5, Dandara Ramos3, Nivea Bispo3, Maria Y Ichihara3,6, Julia M Pescarini3, Elizabeth Williamson1, Rosemeire L Fiaccone3,6,7, Mauricio L Barreto3,6, Liam Smeeth1,3.
Abstract
Randomized clinical trials (RCT) are accepted as the gold-standard approaches to measure effects of intervention or treatment on outcomes. They are also the designs of choice for health technology assessment (HTA). Randomization ensures comparability, in both measured and unmeasured pretreatment characteristics, of individuals assigned to treatment and control or comparator. However, even adequately powered RCTs are not always feasible for several reasons such as cost, time, practical and ethical constraints, and limited generalizability. RCTs rely on data collected on selected, homogeneous population under highly controlled conditions; hence, they provide evidence on efficacy of interventions rather than on effectiveness. Alternatively, observational studies can provide evidence on the relative effectiveness or safety of a health technology compared to one or more alternatives when provided under the setting of routine health care practice. In observational studies, however, treatment assignment is a non-random process based on an individual's baseline characteristics; hence, treatment groups may not be comparable in their pretreatment characteristics. As a result, direct comparison of outcomes between treatment groups might lead to biased estimate of the treatment effect. Propensity score approaches have been used to achieve balance or comparability of treatment groups in terms of their measured pretreatment covariates thereby controlling for confounding bias in estimating treatment effects. Despite the popularity of propensity scores methods and recent important methodological advances, misunderstandings on their applications and limitations are all too common. In this article, we present a review of the propensity scores methods, extended applications, recent advances, and their strengths and limitations.Entities:
Keywords: bias; confounding; effectiveness; health technology assessment; observational study; propensity score; safety; secondary data
Year: 2019 PMID: 31619986 PMCID: PMC6760465 DOI: 10.3389/fphar.2019.00973
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1Methods to control for confounding in observational studies.*Multiple imputation is valid when the assumption of Missing at Random (MAR) holds;**if time-varying confounder is affected by previous treatment, all PS-based methods except marginal structural model (MSM) using inverse probability of treatment weight (IPTW) will give biased estimate;***self-controlled case-series design; ♠)stratification using effect modifier and adjustment within the strata to account for other covariates; ♠♠)Disease risk score (prognostic score) method; ♠♠♠)restriction or choosing an active comparison group vs non-user group; ♣)G-formula and ♣♣)G-estimation of structural nested models, which rely on specification of the outcome model; ♣♣♣)instrumental variable methods. (Adapted in part from Schneeweiss (2006), Uddin et al. (2016), and Zhang et al. (2018)).
Figure 2Causal diagrams representing time-varying treatment (AZT), outcome (progression to AIDS, AIDS), and time-varying confounding (CD4 count). Time-varying confounding is not affected by prior treatment (A), time-varying confounding is affected by prior treatment (B), time-varying confounding affected by unmeasured factor U, which is also associated with the outcome (C), and conditioning or stratifying on time-varying confounder, indicated by box around CD41), creates association between time-varying confounder CD40) and unmeasured factor U (D).
Comparison of the different propensity score methods.
| Characteristics | Matchinga | Stratificationb | Regressionc | IPTWd |
|---|---|---|---|---|
| Model dependence | Minimum | Minimum | High | Minimum |
| Application1 | Easy | Easy | Easy | Complex |
| Overall transparency | High | High | Low | Medium |
| Easy to communicate | Yes | Yes | Not always | Not always |
| Design and analysis | Separated | Separated | Mixed | Separated |
| Easy to check balance | Yes | Yes | No | Yes |
| Requires unique assumption2 | No | No | Yes | No |
| Excluded individuals from analysis3 | Yes | No | No | Yes-No |
| Variance estimation | Not clear | Easy | Easy | Complex |
| Easy to interpret4 | Not always | Yes | No | Often |
| ”Propensity score paradox” | Sensitive | No | No | No |
| Estimand5 | Often ATT | ATE, ATT | ATE | ATE, ATT |
| Time-varying confounding6 | No | No | No | Yes |
| Multiple treatments | Possible | Complex | Complex | Easier |
| Multi-level treatment applications | Exist | Exist | None | Exist |
| Treatment effect modification | Easier | Complex | Easier | Complex |
aConstructs treated and untreated matched groups with similar propensity scores. bConstructs subgroups of treated and untreated individuals, often quintiles or deciles of PS. cPS is used, as a single summary of all covariates included in PS model, in regression model. dPSs are used as weights to create a pseudo-population in which exposure and measured covariates included in the treatment (PS) model are independent (Ali et al., 2016). 1Estimation of stabilized weights as well as extension to time-varying treatment and confounding setting in MSMs framework can be complex (Ali et al., 2016). 2Requires correct specification of PS and outcome model, apart from the basic assumptions that there is positivity and no unmeasured confounding (Ali et al., 2016). 3Weight trimming excludes some individuals in the tails of the propensity score distribution. 4In PSM, when treated individuals are excluded, interpretation of the treatment effect may change, not just ATT and in Stratification, when there is treatment effect modification by the PS, in regression adjustment using PS, when non-collapsible effect measures such as odds ratios are used. 5Modification of the matching or weighting method enable to estimate either ATT or ATE. 6When time-varying confounder is affected by previous treatment, all the propensity score based methods fail to correctly control for the confounding bias including standard IPWs; however, MSMs using IPWs.
Figure 3Multiple imputation in propensity score methods; multiple copies of imputed data are created and propensity score is estimated using these datasets. Treatment effects are estimated in several datasets (A) and propensity scores from multiple datasets are pooled and treatment effect estimated in a single dataset (B). *Other PS methods, stratification, IPTW, and covariate adjustment using PS could also be used instead of matching.
Figure 4Methods to assess treatment effect modification in propensity score matching.
Summary of considerations when planning, conducting, and reporting propensity score analysis.
| Characteristics | What to consider | Methods available to deal with | What should or should not be done |
|---|---|---|---|
| Missing data | Missing data mechanism | Multiple imputation if missing at random (MAR) | Avoid complete case analysis and missing indicator category, the later may be biased even when MCAR assumption holds. |
| Variable selection | Potential confounders, intermediates, colliders | Clinical knowledge/expert opinion. | Avoid adjusting for intermediates, colliders, and strong instrumental variables the later (only when sure or suspect strong unmeasured confounding). |
| Association between variables with outcome (and treatment). | Avoid the use of p-values, or step-wise variable selection methods. | ||
| Balance diagnostics. | |||
| Propensity score estimation | Variables included, interactions and higher order terms. | Logistic regression, Recursive partitioning, Neural network, Classification and regression trees, Random forest, and Boosting regression. | Report on the method used for estimation and variables included in the propensity score method. |
| Propensity score methods | The research question, the treatment effect estimand, and the extent of overlap. | Density plots of propensity scores. | Report the density plots or histograms in the propensity score distribution (preferably overlapping coefficients of the density plots). |
| Propensity score matching | Matching algorithm, matching with or with our replacement, and matching ratio | Exact (coarsened) matching, nearest neighbor matching (with or without caliper), stratified matching, and full matching. Matching ratio can be: 1-to-1 matching, 1-to- many matching, variable ratio matching, and full matching. | Report on the number of starting population, number matched, and number excluded (with their pre-treatment characteristics). |
| Propensity score stratification | Number of strata | Deciles and quintiles of propensity scores. | Report on the number of strata used and the covariate balance between treatment groups in each strata. |
| Regression adjustment using propensity score | Linear relationship between the outcome and the propensity core. | Report on whether linear relationship between the outcome and propensity core is checked and is fulfilled. | |
| Inverse probability of treatment weighting | Whether there is sufficient overlap (positivity). | Weighted regression. Robust variance estimation or Bootstrapping for constructing confidence intervals. | Report on how weights are calculated, if weights are stabilized, the mean weights in both treatment groups, if trimming has been done. |
| Time-varying exposure | Whether there is time-varying confounding, and if any, whether it is affected by previous treatment. | Marginal Structural models using IPTW, G-formula and G-estimation of structural nested models. | If previous treatment affect time-varying confounding avoid matching, stratification and regression adjustment; apply MSM using IPTW. |
| Treatment effect modification | Identify potential effect modifier. | Matching on PS within strata of effect modifier, among others. | Avoid the use of stratified analysis using the PSM data without adjustment for covariates. |
| Multilevel treatment | Whether multilevel structure exists in the data, the number of clusters/levels | Multilevel propensity score methods. | Avoid use of single-level propensity score applications. Include multilevel structure at least in propensity score estimation or outcome analysis, preferably in both. |
| Multiple treatments | Number of treatment groups, whether there is order in the treatment categories (such as dosage). | Multiple matching and weighting: multinomial logistic regression, ordinal logistic regression, or generalized boosted model. | |
| Residual Confounding | Whether there is imbalance in covariates. | Doubly robust methods, propensity score calibration (PSC), high dimensional propensity score (HDPS) method. | Report on which method was used and why? |
| Unmeasured confounding | Whether there is potential unmeasured confounding, or whether the data contain proxies for unmeasured confounding. | Alternative methods such as instrumental variable methods, PSC, HDPS, or consider sensitivity analysis. | Report on the method used and the sensitivity analysis conducted. |