Literature DB >> 35073380

Propensity score methods for comparative-effectiveness analysis: A case study of direct oral anticoagulants in the atrial fibrillation population.

Giorgio Ciminata¹, Claudia Geue¹, Olivia Wu¹, Manuela Deidda¹, Noemi Kreif², Peter Langhorne³.

Abstract

OBJECTIVE: To explore methodological challenges when using real-world evidence (RWE) to estimate comparative-effectiveness in the context of Health Technology Assessment of direct oral anticoagulants (DOACs) in Scotland.
METHODS: We used linkage data from the Prescribing Information System (PIS), Scottish Morbidity Records (SMR) and mortality records for newly anticoagulated patients to explore methodological challenges in the use of Propensity score (PS) matching, Inverse Probability Weighting (IPW) and covariate adjustment with PS. Model performance was assessed by standardised difference. Clinical outcomes (stroke and major bleeding) and mortality were compared for all DOACs (including apixaban, dabigatran and rivaroxaban) versus warfarin. Patients were followed for 2 years from first oral anticoagulant prescription to first clinical event or death. Censoring was applied for treatment switching or discontinuation.
RESULTS: Overall, a good balance of patients' covariates was obtained with every PS model tested. IPW was found to be the best performing method in assessing covariate balance when applied to subgroups with relatively large sample sizes (combined-DOACs versus warfarin). With the IPTW-IPCW approach, the treatment effect tends to be larger, but still in line with the treatment effect estimated using other PS methods. Covariate adjustment with PS in the outcome model performed well when applied to subgroups with smaller sample sizes (dabigatran versus warfarin), as this method does not require further reduction of sample size, and trimming or truncation of extreme weights.
CONCLUSION: The choice of adequate PS methods may vary according to the characteristics of the data. If assumptions of unobserved confounding hold, multiple approaches should be identified and tested. PS based methods can be implemented using routinely collected linked data, thus supporting Health Technology decision-making.

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 35073380 PMCID： PMC8786176 DOI： 10.1371/journal.pone.0262293

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Comparative-effectiveness research aims to reduce the gap between clinical research and clinical practice [1, 2], thus, providing clinicians, patients and policy makers with the clinical evidence needed to make informed decisions concerning healthcare. In this context, both randomised controlled trials (RCTs) and real-world evidence (RWE) contribute to generating clinical evidence for decision-making. In randomised trials, randomisation ensures that differences in patient characteristics such as age, sex, comorbidities and disease severity, are similarly distributed between treatment groups; the observed difference in term of outcome between the treatment groups in the study population can be attributable to the treatment [2]. In RWE, the absence of randomisation does not allow for an unbiased comparison between patients who are exposed and those who are not exposed to the treatment under study. Hence, the observed differences in health outcomes between the groups may be influenced by the population characteristics or other additional factors rather than by the treatment. Crucially, a lack of randomisation in RWE studies gives rise to confounding by indication, occurring when the prognostic factors, such as disease severity, used for treatment selection also affect the outcome [1]. For instance, patients with more severe conditions receive more intense treatments, and as a result, when comparing outcomes among treatment groups in a naïve way, the more intensive treatment may be associated with poorer outcomes. Nevertheless, RWE may provide additional insights concerning safety and effectiveness of a treatment and in some cases may be the only available source of evidence if randomised data are not available [1]. Thus, RWE is increasingly used in Health Technology Assessment to inform reimbursement and coverage decisions. In this context, RWE is used in the “accelerated market access” process where initial decisions are conditional on additional randomised and non-randomised evidence generated over time [3]. Historically, regression adjustment has been used to address confounding in RWE; but over the last decade, there has been an increasing interest in the application of Propensity Score based (PS) methods, such as matching and inverse probability weighting (IPW), when using observational data in medical research. Propensity score methods attempt to mimic the process or randomisation by estimating the probability of treatment assignment conditional on observed baseline characteristics [1, 4]. Propensity score methods offer several advantages over conventional regression methods [4]. However, while PS methods may reduce the bias due to observable confounders such as age, sex and existing comorbidities, other unobserved confounding, such as patients’ tolerability and access to healthcare may still bias the PS estimates. Propensity score methods can address observed confounding if the assumption of ‘no unobserved confounding’ is reasonable, i.e. that the investigator was able to measure al variables that both influence the treatment assignment and are prognostic of the outcome [5]. To account for the presence of both observed and unobserved confounding, different statistical methods such as instrumental variable, difference in differences and regression discontinuity should be used as an alternative to PS methods [6, 7]. The objective of this study is to explore methodological challenges in using RWE, with a focus on PS based methods, to estimate comparative-effectiveness for a case study of direct oral anticoagulants (DOACs); a class of drugs, including apixaban, dabigatran and rivaroxaban, used for the prevention of stroke in the population affected by atrial fibrillation (AF). The rapid onset of action, following oral administration, is one of the main assets of DOACs. The predictability of pharmacodynamics and pharmacokinetics allows DOACs to be used at a fixed dose without requirement for routine anticoagulation monitoring [8, 9]. This study forms part of a wider project that used routinely collected data where clinical and comparative-effectiveness of DOACs are assessed in greater detail. Confounding by indication appears to be an issue with DOACs, and most studies assessing their comparative- effectiveness have used different PS based methods to address observed confounders [10, 11]. However, these studies often do not provide a rationale for choosing a specific PS based method, among different variations. In most studies assessing the effectiveness of DOACs (either head-to-head or compared to warfarin), neither the comparison between PS methods nor the reason for selecting a specific PS method are provided [10, 11]. Despite the fact that there are clear differences between PS methods, the choice of one method over another often appears to be arbitrary without a clear rationale supporting model selection. Among twenty-two studies identified from a recent systematic review accessing the effectiveness of DOACs compared to warfarin [11], only two studies appeared to compare PS methods or justify PS model selection.

Methods

Data sources and cohort

Fully anonymised data were obtained from the Information Services Division (ISD) of NHS Scotland as part of a wider project that used routinely collected data to evaluate the comparative-effectiveness of DOACs in the prevention of stroke in the AF population. Scotland offers a robust record linkage system, where administrative patient-level health data are routinely collected. All patients treated with either warfarin or DOACs between 2009 and 2017 were identified from the Scottish Prescribing Information System (PIS), a database that includes prescribing records for all medicines and their associated costs, which are prescribed and dispensed by community pharmacies, dispensing doctors, and a small number of specialist appliance suppliers [12]. Records from PIS are available from 2009 onwards; therefore, to establish a cohort of patients with a first prescription of warfarin or DOACs, and no exposure to anticoagulation within one year prior to the index date, only patients starting anticoagulation from 2010 onwards were included in the analysis. Individual-level data linkage was then carried out with General Acute Inpatient and Day Case Scottish Morbidity Records 01 and mortality records to identify a cohort of AF patients (defined using ICD-10 code I48X) and clinical and mortality events. Inpatient records contain all general acute admissions, categorized as inpatient stays or day cases, discharged from non-obstetric and non-psychiatric specialties [13]. The clinical outcomes were identified from SMR01 according to ICD-10 and OPCS-4 codes (S1 Table). Further, to ensure that only patients that were likely to have received OACs because of an AF diagnosis were included, any patients with a diagnosis other than AF were excluded from the analysis. Clinical codes for inclusion and exclusion criteria are presented in the S2 Table. From our cohort of AF patients who are first time OAC users, we defined three subgroups: those on warfarin, those on any DOACs (including prescriptions of apixaban, dabigatran and rivaroxaban), and those on dabigatran only. We included any DOACs prescription into a single subgroup to assure an adequate overall treatment sample size; we refer to this subgroup as the combined-DOACs subgroup. The subgroup including only AF patients on dabigatran, is the smallest subgroup of DOACs users in Scotland compared to apixaban and rivaroxaban [14] and was therefore used to assess whether any of the PS approaches tested was sensitive to sample size. Thus, two comparisons were possible: combined-DOACs versus warfarin and dabigatran versus warfarin.

Propensity score estimation

Propensity score methods estimate the probability of treatment assignment conditional on observed baseline characteristics [1, 4]. The PS estimation was carried out for each of our subgroups (warfarin, the combined-DOACs and dabigatran), resulting in two different PS models. Propensity scores were estimated with a logit model accounting for a series of baseline characteristics of first time OAC users. We accounted for age and sex, which are relevant drivers of treatment choice and are prognostic of the outcomes of interest. We also accounted for socio economic status using the Scottish Index of Multiple Deprivation (SIMD), reflecting areas of multiple deprivation ranked from the most to the least deprived and measured as quintiles, where the most and the least deprived areas are represented by 1 and 5, respectively [15]. Further, PS were estimated taking into account the risk score calculated (for each patient for the 5-year period prior to their first anticoagulation prescription) with the risk prediction tools CHA2DS2-VASc and HAS-BLED designed to stratify respectively in the context of AF the risk of stroke and the risk of bleeding [16-18]. Other relevant confounders that we used in our PS estimation were ischaemic stroke or systemic embolism or transient ischaemic attack (TIA), vascular disease, hypertension, diabetes, cancer, prescription predisposing bleeding, and comorbidity. The PS in each of the two different PS models was estimated according to the full set of covariates listed above. The proportion of missing data was <5%, hence imputation of missing values was not used [19], and a complete available case analysis was carried out. For each PS model tested, and for each comparison (combined-DOACs versus warfarin and dabigatran versus warfarin) PS distribution was inspected graphically to identify potentially extreme weights and to ascertain whether an adequate overlapping distribution has been achieved. Extreme weights are considered as such if PSs are <0.1 for the treatment group (combined-DOACs and dabigatran) or >0.9 for the control group (warfarin). Distributions of the predicted probabilities between treatment and control groups should overlap to indicate that covariates between groups are comparable [4, 20].

Propensity score models assessed

Assuming that the assumption of no unobserved confounding was reasonable, and with the support of guidelines on the use of observational data to inform estimates of treatment effectiveness in technology appraisal [5], for the combined-DOACs versus warfarin and dabigatran versus warfarin comparisons, we identified and tested different PS based methods: PS matching, covariate adjustment including PS as covariate and a series of IPW methods. With the propensity score matching we created a sub-sample of each treatment group, and for each comparison, sharing a similar PS value. This allows outcomes between treatment groups to be directly compared [4]. A key aspect of this PS method is whether matching should be done with or without replacement. In the first case, any patient from the control group can be used several times for more than one treated individual. Replacement is particularly useful in settings where the treatment group significantly outnumbers the control group. By contrast, matching without replacement allows patients from the control group to be matched against those in treatment group only once [4, 21]. In our data, warfarin (the control) outnumbers the treatment group (combined-DOACs and dabigatran); this is due the adoption of DOACs for the prevention of stroke in the AF population being relatively recent compared to warfarin [22]. Thus, PS matching without replacement was selected as the most suitable PS matching method. In the covariate adjustment method, the only steps required in the PS model were deciding the functional form of the regression model and PS estimation. The IPW methods we tested were Inverse Probability of Treatment Weighting (IPTW) and IPTW combined with Inverse Probability of Censoring Weighting (IPCW). In the IPTW method, a weight, reflecting the probability of being exposed to either combined-DOACs or dabigatran and equal to the reciprocal of the PS, was assigned to each patient in the treatment group. A weight equal to the reciprocal of one minus the PS was assigned to patients in the warfarin group. In the IPW method combining IPTW with IPCW, two different sets of weights were estimated. The weights for IPTW were estimated as discussed. Those for IPCW were estimated by censoring patients who switch treatment, and by assigning weights to individuals who were not censored but shared similar characteristics with the switchers. Then, IPTW and IPCW weights were multiplied to obtain the overall weight reflecting ATE and censoring [4, 23, 24]. The adequacy of model specification for PS matching and IPW methods was assessed by means of standardized differences; a measure generally used to compare the mean of variables between treatment and control groups. The use of standardised differences for balance assessment has been advocated in the literature as it is invariant to sample size and can be applied across different PS methods. Further, such measure is easily interpretable using graphical displays even with a large number of covariates [4, 20]. For the covariate adjustment method, we have used “weighted conditional” standardised difference as described by Austin (2008). With this method, the pooled standard deviation, obtained from the difference in the mean of a covariate between treated and untreated subjects, is integrated over the distribution of the propensity score [20]. With both methods, differences in the means of covariates is considered negligible if below the threshold of 0.1 standard deviation [25]. Although there is no universal agreement on what the threshold for the standardised difference should be, the threshold of 0.1 is now considered by researchers as an adequate measure for diagnostic purposes assessing covariates balance and imbalance [26]. The PS methods described above have been used to estimate the Average treatment effect (ATE), defined as the average treatment effect for the entire population (i.e. regardless of whether a particular individual has been treated) [27]. Specifically, in our analysis, the ATE, being the estimand of interest, was estimated on the whole AF population and for each comparison i.e., ATE of being treated with any DOAC (combined-DOACs) and ATE of being treated with dabigatran.

Outcome model

Cox proportional hazards regression was used to compare risks between control and treatment groups, for each comparison and for three major AF related clinical events: stroke-all (including haemorrhagic and ischaemic), major bleeding and all-cause mortality. To compensate for any potential remaining covariate imbalance and further reducing the bias caused by residual differences in observed baseline covariates, we included age, sex, socio-economic status and comorbidity in our outcome models. The other variables, described in the Propensity Score Estimation subsection, were assumed to be captured by comorbidity and were, therefore not included. Patients were censored if they switched or discontinued treatment; for each method, the risks of stroke, major bleeding or death (for patients exposed to either DOACs or warfarin) were estimated from anticoagulation initiation to the time of clinical event or death during a 2-year follow-up period. The first clinical event for each treatment was determined within a competing risk framework. In this analysis, treatment discontinuation, i.e. temporal gaps between consecutive prescriptions, was considered to be occurring if the gap exceeded a 28 days threshold, and the supply of the penultimate prescription did not fill the gap. The threshold was identified in a drug utilisation study using the same patient-level data utilised in this paper [22]. For the IPW method combining IPTW with IPCW, censoring was specifically modelled in the PS model. As previously described, patients who switched treatment were censored; while weights were assigned to individuals who were not censored but shared similar characteristics with the switchers. These weights were then multiplied by the weights obtained from IPTW. In addition to comparing PS models in terms of performance by measuring the standardised differences for each covariate, the ATE, estimated with the outcome model for each of the clinical outcomes selected, was compared across method to assess whether and how it differs depending on the PS method used.

Ethics and data sharing

We obtained the necessary permissions and approvals to access these national datasets. No ethical approval was needed. All data underlying the analyses are confidential and subject to disclosure control. Data can only be obtained through application to Information Services Division (ISD) via the Public Benefit and Privacy Panel (PBPP).

Results

Cohort characteristics

From the cohort of first time OAC users identified from the PIS between 2009 and December 2017, two subgroups of patients on either warfarin (34,876) or combined-DOACs (15,142) were identified. Among the combined-DOACs users, 622 patients were on dabigatran. Overall, mean age of patients at the time of the first prescription was similar across all treatment groups. Across all treatments, patients with the highest risk of stroke and the lowest risk of bleeding, measured using the CHA2DS2-VASc and HAS-BLED score respectively, represented the majority. While most patients had no comorbidities across all treatment groups, those on warfarin represented the biggest proportion. Further, the proportion of patients with a history of stroke or TIA was lower in the warfarin group than any other treatment group. About one third of patients on anticoagulation had hypertension, which is an important risk factor for stroke. In addition, the majority of patients across all treatment groups were also on medication predisposing to bleeding such as aspirin and non-steroidal anti-inflammatory drugs. Patients’ baseline characteristics are reported in Table 1.

Table 1

Baseline characteristics.

Characteristics	Warfarin	Combined-DOACs	Dabigatran
Characteristics	N (%)	N (%)	N (%)
Subgroup	34,876	15,142	622
Sex
Men	20,007 (57.37)	8,433 (55.69)	378 (60.77)
Women	14,869 (42.63)	6,709 (44.31)	244 (39.23)
Mean age (SD)	75 (11.09)	74(11.32)	72(11.10)
SIMD (Scottish index of multiple deprivation)
1 (most deprived)	6,814 (19.54)	2,813 (18.58)	87 (13.99)
2	7,420 (21.28)	2,965 (19.58)	104 (16.72)
3	7,297 (20.92)	3,039 (20.07)	149 (23.95)
4	6,977 (20.01)	3,110 (20.54)	171 (27.49)
5 (least deprived)	6,368 (18.26)	3,215 (21.23)	111 (17.85)
CHA2DS2-VASc score (risk of stroke)
0–1 (low to moderate risk)	7,705 (22.09)	3,429 (22.65)	171 (27.49)
2–3 (moderate to high risk)	11,232 (32.21)	4,606 (30.42)	195 (31.35)
≥4 (high risk)	15,939 (45.70)	7,107 (46.94)	256 (41.16)
HAS-BLED score (risk of bleeding)
0–2 (low to moderate risk)	24,875 (71.32)	9,862 (65.13)	447 (71.86)
≥3 (moderate to high risk)	10,001 (28.68)	5,280 (34.87)	175 (28.14)
Comorbidity
No comorbidity	18,374 (52.68)	6,502 (42.94)	311 (50.00)
1 comorbidity	6,952 (19.93)	3,525 (23.28)	133 (21.38)
>1 comorbidity	9,550 (27.38)	5,115 (33.78)	178 (28.62)
Stroke or TIA	2,542 (7.29)	1,912 (12.63)	80 (12.86)
Vascular disease	4,903 (14.06)	2,562 (16.92)	85 (13.67)
Hypertension	10,901 (31.26)	5,361 (35.40)	200 (32.15)
Diabetes mellitus	4,449 (12.76)	2,275 (15.02)	85 (13.67)
Cancer	2,904 (8.33)	1,342 (8.86)	43 (6.91)
Drugs causing bleeding	18,843 (54.03)	8,453 (55.82)	314 (50.48)

Note: DOACs = Direct Oral Anticoagulants, SIMD = Scottish Index of Multiple Deprivation, TIA = Transient Ischaemic Attack.

Propensity score distribution

The PSs for the combined-DOACs versus warfarin comparison showed an adequate overlapping distribution, (Fig 1A). This was also observed in the dabigatran versus warfarin comparison; however, the totality of the PS generated were extreme and had a poor overlap (Fig 1B). In these cases, applying PS trimming or extreme weights truncation is clearly not feasible.

Fig 1

Propensity score distribution for warfarin, combined-DOACs, and dabigatran.

Note: DOACs = Direct Oral Anticoagulants.

Propensity score distribution for warfarin, combined-DOACs, and dabigatran.

Note: DOACs = Direct Oral Anticoagulants.

Covariate balance assessment

Following the first graphical assessment on the PS models specification, the distribution of baseline covariates between treatment groups was assessed by means of standardized differences. As shown in Fig 2, the unadjusted standardized differences indicated an adequate starting balance for most of the baseline characteristics of patients on combined-DOACs or on warfarin, with differences in the means of covariates below the threshold of 0.1 standard deviation.

Fig 2

Covariate imbalance assessment for combined-DOACs vs. warfarin.

Note: PS = Propensity Score, PSM = Propensity Score Matching, IPW = Inverse Probability Weighting, SIMD = Scottish Index of Multiple Deprivation, TIA = Transient Ischaemic Attack.

Covariate imbalance assessment for combined-DOACs vs. warfarin.

Note: PS = Propensity Score, PSM = Propensity Score Matching, IPW = Inverse Probability Weighting, SIMD = Scottish Index of Multiple Deprivation, TIA = Transient Ischaemic Attack. Overall, a good balance of patients’ covariates was obtained with every PS model tested. However, the standardised difference for CHA2DS2-VASc score ≥4, did not improve with the PS matching method. Nevertheless, the standardised difference for these patient characteristics was still below the threshold regardless of being adjusted or unadjusted. The standardized difference above the threshold reflected some differences in terms of age, socio-economic status and previous history of stroke or TIA between dabigatran and warfarin users in the starting baseline characteristics. However, an adequate balance was achieved for all covariates with the PS covariate adjustment method. Propensity score matching failed to provide a good balance in terms of patient characteristics between the dabigatran and warfarin groups. Similarly, improved balance was not achieved for every covariate when using the IPW approach. In particular, the balance for socio demographic characteristics captured by SIMD (category 5) and the covariate indicating a high risk of bleeding (HAS-BLED score ≥3) although still below the threshold, was suboptimal compared to the unadjusted initial baseline characteristics balance (Fig 3).

Fig 3

Covariate imbalance assessment for dabigatran vs. warfarin.

Note: PS = Propensity Score, PSM = Propensity Score Matching, IPW = Inverse Probability Weighting, SIMD = Scottish Index of Multiple Deprivation, TIA = Transient Ischaemic Attack.

Covariate imbalance assessment for dabigatran vs. warfarin.

Note: PS = Propensity Score, PSM = Propensity Score Matching, IPW = Inverse Probability Weighting, SIMD = Scottish Index of Multiple Deprivation, TIA = Transient Ischaemic Attack.

Clinical outcomes

Despite the differences in terms of baseline characteristics and sample size, the treatment effect is comparable across methods (Figs 4 and 5). However, the treatment effect estimated with the IPTW-IPCW approach, tends to be larger, but still in line, compared to the treatment effect estimated with other PS methods. This is particularly evident when the sample size of the treatment subgroup is relatively small (Fig 5).

Fig 4

HRs for combined-DOACs vs. warfarin by propensity score methods.

Note: DOACs = Direct Oral Anticoagulants, PS = Propensity Score, PSM = Propensity Score Matching, IPTW = Inverse Probability of Treatment Weighting (IPTW), Inverse Probability of Censoring Weighting (IPCW).

Fig 5

HRs for dabigatran vs. warfarin by propensity score methods.

Note: PS = Propensity Score, PSM = Propensity Score Matching, IPTW = Inverse Probability of Treatment Weighting (IPTW), Inverse Probability of Censoring Weighting (IPCW).

HRs for combined-DOACs vs. warfarin by propensity score methods.

HRs for dabigatran vs. warfarin by propensity score methods.

Note: PS = Propensity Score, PSM = Propensity Score Matching, IPTW = Inverse Probability of Treatment Weighting (IPTW), Inverse Probability of Censoring Weighting (IPCW).

Discussion

In clinical practice, population case mix may diverge substantially, making a comparison of safety and effectiveness of two health interventions difficult. Propensity score methods allow for reducing any potential imbalance between covariates and obtaining more homogenous and comparable treatment groups [4, 28]. Although PS methods have largely been used in comparative-effectiveness research assessing the effectiveness of DOACs compared to warfarin [10, 11], only two studies appeared to compare PS methods or justify PS model selection. In one of these two studies, the use of IPTW was justified by stating that in survival analysis, PS weighting offers greater bias reduction compared to other methods such as matching or stratification; nevertheless, this was not empirically tested in their analysis [29]. In the other study carried out by Foslund and colleagues (2018), IPTW and PS by stratification were used in the sensitivity analysis to support the validity of the main Cox regression analyses, but no direct comparison between methods was made [30]. In addition to this, we have screened eighteen other studies, using PS based methods to control for confounding, identified from another recent systematic review assessing the comparative- effectiveness of DOACs head-to-head [10]. Among these studies we have identified only one additional study where different PS approaches were tested [31]. Despite the popularity of PS methods, there are limitations in their application. The generalizability of results may be an important issue when using the matching method as a significant proportion of individuals will be omitted when creating the matched sub-sample. Unlike the PS matching method, IPW analysis is carried out on the entire cohort. Nevertheless, IPW offers, along with matching, an important advantage over the covariate adjustment with PS approach, requiring only the PS model specification for a correct ATE estimation. However, with poor PS overlap, the resulting extreme weights directly derived from PS may undermine the robustness of the model [23, 24, 32]. In our analysis, even before adjusting with PS estimates, the baseline characteristics between groups were already adequately balanced. However, in some cases the standardised differences indicated that the balance of certain baseline characteristics between treatments did not improve after PS adjustment. This occurrence is reported in the literature, and it seems to be common with the PS matching method when the propensity score is misspecified or matching with replacement is required [25]. Overall patients on dabigatran were younger, with a low risk of stroke and with fewer comorbidities compared to patients on warfarin. This seems to suggest that dabigatran was selectively prescribed to patients with lower risk of stroke and in general healthier than patients on warfarin. Evidence of selective prescribing of dabigatran in younger patients with lower risk of stroke has been reported in the literature [33]. Among the PS methods tested, with a relatively large sample size (DOACs versus warfarin comparison), IPW showed the best covariate balance. However, PS covariate adjustment, less sensitive to sample size not requiring trimming or truncation of extreme weights as with IPW methods, showed the best covariate balance in the dabigatran versus warfarin comparison. Nevertheless, all the different PS methods tested produced treatment effects of similar magnitude. In general, PS covariate adjustment has been perceived as less robust than PS matching and IPW methods, as it is more sensitive to distributional assumptions and PS specification, therefore not reflecting the true treatment effect [23, 24, 32]. Nevertheless, PS covariate adjustment was found to be a valid option to adjust for confounding by indication and in some instances outperformed the other methods reporting much reduced standardised differences. Moreover, PS methods may not necessarily perform better in assessing covariate imbalance than conventional standard regression. In particular, Elze and colleagues (2017) found that in the presence of substantial covariate imbalance with individuals with very large weights, IPW methods give inaccurate treatment effect estimates. In the case studies evaluated, after truncation, the estimated treatment effect moved towards the crude treatment effect, indicating the inadequacy of these methods in adjusting for covariate imbalance in the presence of heavy weights. On the other hand, the performance of PS matching and standard covariate adjustment were comparable, although PS matching gave less accurate estimates in some instances [34].

Limitations

In this study we provide an overview of the PS based methods used to address confounding by indication; however, there were a number of limitations inherent to the nature of RWE and PS based methods. Firstly, the relatively small size of the dabigatran subgroup did not allow the analysis to test for PS by stratification, a method involving the stratification of individuals into mutually exclusive subgroups according to their estimated PS [4]. A further constraint in this analysis, concerns the limitation of PS methods of addressing unmeasured confounding which may still bias the estimates. In particular, it is recognised that confounding by indication is the main source of confounding in newly marketed medications where early adopters are most likely to prescribe new drugs when they become available, whereas other prescribers may prefer to opt for existing drugs with proven and established clinical effectiveness. While PS methods can address confounding by indication, there may still be unobserved confounders, such as patients’ tolerability and access to healthcare that are difficult to measure [35].

Conclusion

We have shown how routinely collected linked data can be used to implement PS based methods to generate robust and credible real-world evidence to inform reimbursement and coverage decisions. Propensity score matching and IPW methods are considered theoretically superior to PS covariate adjustment as the latter may be more prone to model misspecification. In this study, IPW showed the best covariate balance when applied to subgroups with relatively large sample sizes. However, when applied to subgroups with relatively small sample sizes, using PS as a covariate in the outcome model should be considered, as this method does not require further reduction of sample size, and trimming or truncation of extreme weights. Therefore, relaying on a single method for reducing bias due to confounding should be avoided, as the method of choice may not reflect the true treatment effect, thus leading to an incorrect interpretation of the effect, in the real world, of a given intervention. It follows that, as long as assumptions such as no unobserved confounding hold, several methods should be identified and tested. As the choice of adequate PS methods may vary according to the characteristics of the observational data available, appropriate methodological design should be in place for comparative-effectiveness analyses including: the assessment of PS overlaps between treatments, inspection of extreme weights, and comparison of PS methods by their standardised difference.

Clinical outcomes—ICD-10 and OPCS-4 codes.

Codes for inclusion/exclusion criteria.

ICD:10 = International Statistical Classification of Diseases and Related Health Problems 10th Revision, OPCS-4 = Classification of Interventions and Procedures, BNF = British national formulary, VTE = venous thromboembolism. (PDF) Click here for additional data file. 20 Aug 2021 PONE-D-21-15771 Propensity score methods for comparative-effectiveness analysis: a case study of direct oral anticoagulants in the atrial fibrillation population PLOS ONE Dear Dr. Ciminata, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 03 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Carmine Pizzi Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. Thank you for stating the following in the Competing Interests section: “GC and CG have received research grants from Bristol-Myers Squibb UK and Pfizer UK outside the submitted work. OW has received consulting fee from Bayer UK outside the submitted work. MD, NK and PL declare no conflict of interest.” Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf. 3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 4. Please note that in order to use the direct billing option the corresponding author must be affiliated with the chosen institute. Please either amend your manuscript to change the affiliation or corresponding author, or email us at plosone@plos.org with a request to remove this option. Comments to the Author Reviewer #1: The present paper is brilliant and of high qualitative level Some issues. In the abstract it should be added how the models performed best, that is which outcomes were evaluated to assess performance of the model. Methods: why patients in "dabigatran only" were assessed? and not rivaroxaban etc? Methods: I woudl remore propensity score to mimic rct, which is correct but not formally true methods dealing with missing value shoud be added Reviewer #2: The paper is well written and deals with the use of the best methodological approach to estimate the effectiveness of anticoagulation treatment by using a dataset extracted from the Scottish Morbidity Records (SMR). The aim of the authors was to explore the use of Propensity Score (PS) mathcing, Inverse Probability Weighting (IPW) and covariate adjustment with PS. As a clinician, I have appreciated the efforts of the authors to give a clear view on the best statistical analysis to an important topic with crucial influence on Health Technology Assessment (HTA). The Authors have clarified the main differences among the statistical approaches, for example indicating the IPW the best performing method to assess covariate balance when applied to subjects of relatively large sample-size (such as combined DOACs versus warfarin). Instead, covariate adjustment with PS appears to the most appropriate method when applied to subjects of relatively small sample-size (such as dabigatran versus warfarin). Just one comment on the concept of the introduction of dabigatran in clinical practice. The use of dabigatran as opposed to warfarin is more linked to the more convenient management of the drug than to other reasons. This is what has happened in the recents years in the clinical practice, but the same also is seen for apixaban and rivaroxaban. 13 Oct 2021 We would like to thank both reviewers for their time and valuable comments. We have addressed these as follows. Reviewer#1: In the abstract it should be added how the models performed best, that is which outcomes were evaluated to assess performance of the model. Response: In the Abstract the following is already stated: “Model performance was assessed by standardised difference. Clinical outcomes and mortality were compared for all DOACs (including apixaban, dabigatran and rivaroxaban) versus warfarin”. The outcomes used in our model are now specified in the methods section of the abstract (line 41). However, the clinical outcomes where not used to assess model performance, but for indicating how the size of the treatment effect (reported in the form of Hazard Ratios) may change according to the PS method employed. For instance, with the IPTW-IPCW approach, the treatment effect tends to be larger, but still in line, compared to the treatment effect estimated with other PS methods. This is now stated in the results section of the Abstract (line 48 and 49). Reviewer#1: why patients in "dabigatran only" were assessed? and not rivaroxaban etc? Response: An explanation on why patients on “dabigatran only” were assessed is already provided in the manuscript; that is the smallest subgroup of DOACs users in Scotland compared to apixaban and rivaroxaban. However, this has now been made clearer (line 169). Reviewer#1: Methods: I would remove propensity score to mimic RCT, which is correct but not formally true. Response: The sentence referring to propensity score mimicking RCT has now been deleted (line 174). Reviewer#1: methods dealing with missing value should be added. Response: No methods for dealing with missing value were used. Given that missing data was <5%, no imputation method was used, a complete available case analysis was provided instead. This is now stated in the manuscript (line 194-196). Reviewer #2: Just one comment on the concept of the introduction of dabigatran in clinical practice. The use of dabigatran as opposed to warfarin is more linked to the more convenient management of the drug than to other reasons. This is what has happened in the recent years in the clinical practice, but the same also is seen for apixaban and rivaroxaban. Response: More on the “more convenient management” of DOACs versus warfarin has now been added to the methods section (line 110-13). That is, rapid onset of action, and the possibility of using fixed doses without requirement for routine anticoagulant monitoring. Kind Regards, Dr Giorgio Ciminata Submitted filename: Response_to_Reviewers.docx Click here for additional data file. 21 Dec 2021 Propensity score methods for comparative-effectiveness analysis: a case study of direct oral anticoagulants in the atrial fibrillation population PONE-D-21-15771R1 Dear Dr. Ciminata, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Carmine Pizzi Academic Editor PLOS ONE 12 Jan 2022 PONE-D-21-15771R1 Propensity score methods for comparative-effectiveness analysis: a case study of direct oral anticoagulants in the atrial fibrillation population Dear Dr. Ciminata: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Prof Carmine Pizzi Academic Editor PLOS ONE

28 in total

1. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores.

Authors: S T Normand; M B Landrum; E Guadagnoli; J Z Ayanian; T J Ryan; P D Cleary; B J McNeil
Journal: J Clin Epidemiol Date: 2001-04 Impact factor: 6.437

2. Using inverse probability-weighted estimators in comparative effectiveness analyses with observational databases.

Authors: Lesley H Curtis; Bradley G Hammill; Eric L Eisenstein; Judith M Kramer; Kevin J Anstrom
Journal: Med Care Date: 2007-10 Impact factor: 2.983

3. Goodness-of-fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score.

Authors: Peter C Austin
Journal: Pharmacoepidemiol Drug Saf Date: 2008-12 Impact factor: 2.890

Review 4. Estimating causal effects from large data sets using propensity scores.

Authors: D B Rubin
Journal: Ann Intern Med Date: 1997-10-15 Impact factor: 25.391

5. Assessing the comparative effectiveness of newly marketed medications: methodological challenges and implications for drug development.

Authors: S Schneeweiss; J J Gagne; R J Glynn; M Ruhl; J A Rassen
Journal: Clin Pharmacol Ther Date: 2011-11-02 Impact factor: 6.875

6. Constructing inverse probability weights for marginal structural models.

Authors: Stephen R Cole; Miguel A Hernán
Journal: Am J Epidemiol Date: 2008-08-05 Impact factor: 4.897

Review 7. Comparison of Propensity Score Methods and Covariate Adjustment: Evaluation in 4 Cardiovascular Studies.

Authors: Markus C Elze; John Gregson; Usman Baber; Elizabeth Williamson; Samantha Sartori; Roxana Mehran; Melissa Nichols; Gregg W Stone; Stuart J Pocock
Journal: J Am Coll Cardiol Date: 2017-01-24 Impact factor: 24.094

Review 8. Pharmacological and Non-pharmacological Treatments for Stroke Prevention in Patients with Atrial Fibrillation.

Authors: Laura Ueberham; Nikolaos Dagres; Tatjana S Potpara; Andreas Bollmann; Gerhard Hindricks
Journal: Adv Ther Date: 2017-09-27 Impact factor: 3.845

9. Major Bleeding Risk During Anticoagulation with Warfarin, Dabigatran, Apixaban, or Rivaroxaban in Patients with Nonvalvular Atrial Fibrillation.

Authors: Gboyega Adeboyeje; Gosia Sylwestrzak; John J Barron; Jeff White; Alan Rosenberg; Jacob Abarca; Geoffrey Crawford; Rita Redberg
Journal: J Manag Care Spec Pharm Date: 2017-09

10. Comparative effectiveness and safety of non-vitamin K antagonist oral anticoagulants and warfarin in patients with atrial fibrillation: propensity weighted nationwide cohort study.

Authors: Torben Bjerregaard Larsen; Flemming Skjøth; Peter Brønnum Nielsen; Jette Nordstrøm Kjældgaard; Gregory Y H Lip
Journal: BMJ Date: 2016-06-16

1 in total

1. A systematic review of chiropractic care for fall prevention: rationale, state of the evidence, and recommendations for future research.

Authors: Weronika Grabowska; Wren Burton; Matthew H Kowalski; Robert Vining; Cynthia R Long; Anthony Lisi; Jeffrey M Hausdorff; Brad Manor; Dennis Muñoz-Vergara; Peter M Wayne
Journal: BMC Musculoskelet Disord Date: 2022-09-05 Impact factor: 2.562

1 in total