| Literature DB >> 36232216 |
Sola Han1,2, Hae Sun Suh1,3.
Abstract
We aimed to compare the ability to balance baseline covariates and explore the impact of residual confounding between conventional and machine learning approaches to derive propensity scores (PS). The Health Insurance Review and Assessment Service database (January 2012-September 2019) was used. Patients with atrial fibrillation (AF) who initiated oral anticoagulants during July 2015-September 2018 were included. The outcome of interest was stroke/systemic embolism. To estimate PS, we used a logistic regression model (i.e., a conventional approach) and a generalized boosted model (GBM) which is a machine learning approach. Both PS matching and inverse probability of treatment weighting were performed. To evaluate balance achievement, standardized differences, p-values, and boxplots were used. To explore residual confounding, E-values and negative control outcomes were used. In total, 129,434 patients were identified. Although all baseline covariates were well balanced, the distribution of continuous variables seemed more similar when GBM was applied. E-values ranged between 1.75 and 2.70 and were generally higher in GBM. In the negative control outcome analysis, slightly more nonsignificant hazard ratios were observed in GBM. We showed GBM provided a better ability to balance covariates and had a lower impact of residual confounding, compared with the conventional approach in the empirical example of comparative effectiveness analysis.Entities:
Keywords: atrial fibrillation; comparative effectiveness research; machine learning; propensity score
Mesh:
Substances:
Year: 2022 PMID: 36232216 PMCID: PMC9566283 DOI: 10.3390/ijerph191912916
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Study scheme. (A) Patient selection and follow-up. (B) Overall process.
Summarized process for comparing E-values in this study.
|
Calculate E-values from main analysis (E-values from conventional and machine learning approaches). Calculate E-value as an anchor (E-value derived from age/sex excluded PS model) for each comparison. Calculate Calculate difference in E-values from main analysis (e.g., E-value from IPTW (GBM) − E-value from IPTW (logistic)). If difference in E-values from main analysis (4) ≥ If difference in E-values from main analysis (4) < Calculate maximum possible coverage for the comparisons for (6). |
Figure 2Patient-selection flow.
E-values of the estimated hazard ratios for stroke or systemic embolism.
| Comparisons | E-value for the Hazard Ratios (1–3) | E-Value | Meaningful Difference | Maximum Possible | |||
|---|---|---|---|---|---|---|---|
| PSM | PSM | IPTW | IPTW | IPTW | |||
| Comparison 1 | 2.24 (1.92) | 2.23 (1.90) | 2.32 (2.04) | 2.35 (2.07) | 2.32 | 0.03 | - |
| Comparison 2 | 1.96 (1.66) | 2.01 (1.71) | 2.00 (1.78) | 2.08 (1.85) | 1.67 | 0.41 | 29 |
| Comparison 3 | 2.13 (1.76) | 2.19 (1.81) | 2.19 (1.82) | 2.30 (1.93) | 2.36 | 0.06 | - |
| Comparison 4 | 1.75 (1.47) | 1.89 (1.60) | 1.89 (1.64) | 1.93 (1.68) | 1.65 | 0.28 | 64 |
| Comparison 5 | 2.45 (2.05) | 2.27 (1.88) | 2.52 (2.13) | 2.43 (2.05) | 2.42 | 0.01 | - |
| Comparison 6 | 2.70 (2.32) | 2.67 (2.28) | 2.53 (2.21) | 2.56 (2.24) | 2.10 | 0.46 | 37 |
| Comparison 7 | 2.01 (1.73) | 2.04 (1.75) | 2.02 (1.78) | 2.04 (1.80) | 1.94 | 0.10 | 30 |
| Comparison 8 | 2.05 (1.76) | 2.08 (1.78) | 2.04 (1.80) | 2.10 (1.85) | 1.72 | 0.38 | 16 |
Comparison 1, standard dose of apixaban vs warfarin; Comparison 2, reduced dose of apixaban vs warfarin; Comparison 3, standard dose of dabigatran vs warfarin; Comparison 4, reduced dose of dabigatran vs warfarin; Comparison 5, standard dose of edoxaban vs warfarin; Comparison 6, reduced dose of edoxaban vs warfarin; Comparison 7, standard dose of rivaroxaban vs warfarin; Comparison 8, reduced dose of rivaroxaban vs warfarin. (1) Conventional approach: PSM (logistic) and IPTW (logistic). (2) Machine learning approach: PSM (GBM) and IPTW (GBM). (3) Larger E-value indicates that the stronger unmeasured confounder associations can nullify the observed hazard ratios. (4) E-values derived from analysis that excludes age and sex variables from the PS model. (5) Meaningful difference Δ was defined by the researchers of this study as the difference between E-value from main analysis and E-value defined as an anchor. Interpretation under the definition used in this study is needed. (6) Maximum possible coverage (%) = [(maximum E-value from main analysis) − (minimum E-value from main analysis)]/(meaningful difference Δ) × 100. It was calculated for the comparisons that E-values from main analysis were not meaningfully different.
Figure 3Hazard ratios for negative control outcomes.