Literature DB >> 33204166

Replicating Randomized Trial Results with Observational Data Using the Parametric g-Formula: An Application to Intravenous Iron Treatment in Hemodialysis Patients.

Angelo Karaboyas^1,2, Hal Morgenstern³, Nancy L Fleischer², Douglas E Schaubel^4,5, Bruce M Robinson^1,6.

Abstract

BACKGROUND: Reproducibility of clinical and epidemiologic research is important to generalize findings and has increasingly been scrutinized. A recently published randomized trial, PIVOTAL, evaluated high vs low intravenous iron dosing strategies to manage anemia in hemodialysis patients in the UK. Our objective was to assess the reproducibility of the PIVOTAL trial findings using data from a well-established cohort study, the Dialysis Outcomes and Practice Patterns Study (DOPPS).
METHODS: To overcome the absence of randomization in the DOPPS, we applied the parametric g-formula, an extension of standardization to longitudinal data. We estimated the effect of a proactive high-dose vs reactive low-dose iron supplementation strategy on all-cause mortality (primary outcome), hemoglobin, two measures of iron concentration (ferritin and TSAT), and erythropoiesis-stimulating agent dose over 12 months of follow-up in 6325 DOPPS patients.
RESULTS: Comparing high- vs low-iron dose strategies, the 1-year mortality risk difference was 0.020 (95% CI: 0.008, 0.031) and risk ratio was 1.20 (95% CI: 1.07, 1.33), compared with null 1-year findings in the PIVOTAL trial. Differences in secondary outcomes were directionally consistent but of lesser magnitude than in the PIVOTAL trial.
CONCLUSION: Our findings are somewhat consistent with the recent PIVOTAL trial, with discrepancies potentially attributable to model misspecification and differences between the two study populations. In addition to the importance of our results to nephrologists and hence hemodialysis patients, our analysis illustrates the utility of the parametric g-formula for generalizing results and comparing complex and dynamic treatment strategies using observational data.

Entities: Chemical

Keywords: anemia; causal inference; dialysis; iron; nephrology; reproducibility

Year: 2020 PMID： 33204166 PMCID： PMC7667704 DOI： 10.2147/CLEP.S283321

Source DB: PubMed Journal: Clin Epidemiol ISSN： 1179-1349 Impact factor: 4.790

Background

Large high-quality randomized trials are costly, time-consuming, and inflexible to different selection criteria and intervention protocols, and are often impractical or unethical to conduct. A practical alternative is to apply the parametric g-formula, an extension of standardization to longitudinal data, which is well suited to evaluate complex and dynamic treatment strategies using observational data.1–6 In this study, we present an application to anemia treatment in patients with end-stage kidney disease undergoing hemodialysis 3 times/week. Conflicting evidence from observational data exists regarding the safety of high-dose intravenous (IV) iron supplementation in hemodialysis patients.7–12 IV iron is often administered to complement erythropoiesis-stimulating agent (ESA) treatment and avoid iron deficiency by replacing the iron utilized for erythropoiesis.13 IV iron dosing decisions are, in the context of hemoglobin level, guided primarily by serum ferritin, a marker of iron stores, and transferrin saturation (TSAT), a marker of circulating iron.14 Investigators of the Proactive IV Iron Therapy in Haemodialysis Patients (PIVOTAL) study, a large, open-label, UK-based randomized controlled trial, concluded that a proactive high-dose (vs reactive low-dose) IV iron treatment regime was superior.15 The first objective of our study is to replicate findings from the PIVOTAL trial by applying the parametric g-formula to hemodialysis patients in the European arm of the Dialysis Outcomes and Practice Patterns Study (DOPPS), where anemia management practices are relatively similar to the UK.16 The second objective is to simulate the PIVOTAL study in a similar trial population by applying the parametric g-formula to DOPPS patients restricted according to PIVOTAL inclusion criteria. If the hypothetical target trial we emulate is similar enough to the actual trial, the PIVOTAL findings should be replicable in our simulation. The potential to evaluate many variations of complex intervention strategies across different populations using the parametric g-formula could prove to be enormously informative in the age of big data.

Methods

Data Source

The DOPPS is a prospective cohort study of center-based, adult chronic hemodialysis patients in 21 countries, ongoing since 1996. Study sites and patients are randomly selected to achieve nationally representative samples in each country. Details on study design and objectives are included in prior publications17,18 and at . Study approval and patient consent were obtained as required by national and local ethics committee regulations. This analysis included a cohort of hemodialysis patients from 7 European countries (Belgium, France, Germany, Italy, Spain, Sweden, UK) in DOPPS Phase 4 (2009–2011) and Phase 5 (2012–2015). Information on patient demographics and comorbidity history was abstracted from medical records at DOPPS enrollment. Measured laboratory values and medication prescriptions were abstracted from medical records at baseline and monthly during follow-up.

Protocol

We designed a target trial to match the PIVOTAL trial19 as closely as possible, and then utilized DOPPS data to emulate this target trial (and thus, the PIVOTAL trial itself). In the PIVOTAL trial,15 the IV iron dose assigned each month depended on the most recent values of ferritin and TSAT. In the proactive high-dose arm, 400 mg IV iron was administered monthly unless upper thresholds of ferritin (>700 ng/mL) or TSAT (>40%) were reached, in which case iron was withheld for 1 month. In the reactive low-dose arm, 100, 200, or 400 mg of IV iron was administered monthly, depending on levels of TSAT and ferritin (Table 1), with IV iron withheld if ferritin >700 ng/mL, TSAT >40%, or both ferritin >200 ng/mL and TSAT >20%. Protocol details and the extent to which we were able to emulate the trial are included in the and , as recommended by Lodi et al.20

Table 1

Summary of PIVOTAL Trial15 Treatment Strategies Emulated in DOPPS

	Proactive High Dose			Reactive Low Dose
	TSAT (%)			TSAT (%)
Ferritin (ng/mL)	≤20	21–39	≥40	≤20	21–39	≥40
<100	400	400	0	400	400	0
100–200	400	400	0	200	200	0
201–700	400	400	0	100	0	0
>700	0	0	0	0	0	0

Notes: Intravenous (IV) iron dose administered in the following month (mg) based on most recent value of serum ferritin and transferrin saturation (TSAT) under the proactive high-dose vs reactive low-dose treatment strategy. Iron doses in both the PIVOTAL trial15 and this DOPPS analysis were completely determined by the most recent ferritin and TSAT value; Grey background highlights situations in which the IV iron dose administered differs between the 2 protocols.

Summary of PIVOTAL Trial15 Treatment Strategies Emulated in DOPPS Notes: Intravenous (IV) iron dose administered in the following month (mg) based on most recent value of serum ferritin and transferrin saturation (TSAT) under the proactive high-dose vs reactive low-dose treatment strategy. Iron doses in both the PIVOTAL trial15 and this DOPPS analysis were completely determined by the most recent ferritin and TSAT value; Grey background highlights situations in which the IV iron dose administered differs between the 2 protocols.

Statistical Analysis

To test the high vs low-dose IV iron treatment strategies, we implemented the parametric g-formula to account for the treatment (IV iron) – confounder (ferritin, TSAT) feedback loop (Figure 1). The two primary steps of the parametric g-formula are (Step 1) modeling the joint distribution of all variables (Table 2) and (Step 2) simulating variables over the follow-up period using the estimates from Step 1. These steps are described in detail in the . Additional details related to the formulae and assumptions have been previously reported.2,21

Figure 1

Illustration of longitudinal data collection and hypothesized relationships.

Table 2

Summary of Step 1 Models and Covariates

Model	Variables	Regression Model When Used as Outcome	Functional Form When Used as Predictor
1	Hospitalization	Logistic	Binary (yes/no)
2	C-reactive protein	Linear (log-scale)	Log-linear
3	Serum albumin	Linear	Linear
4	Serum phosphorus	Linear (log-scale)	Categories (3.5, 5.5, 7.0 mg/dL)
5	Hemoglobin	Linear	Categories (9, 10, 11, 12, 13 g/dL)
6	Serum ferritin	Linear (log-scale)	Categories (100, 200, 400, 700, 1000 ng/mL)
7	TSAT	Linear (log-scale)	Categories (15, 20, 25, 30, 35, 40%)
8	Catheter use	Logistic	Binary (yes/no)
9	IV iron dose	Multinomial logistic	Categories (0, 35, 63 mg/week)
10+11	ESA dose	Logistic, linear (log-scale)	Categories (0, 3000, 6000, 9000, 15000 units/week)
12	Died next month	Logistic	N/A

Notes: Categories, indicates cut-points used; IV iron doses were largely discrete, and so the 3 non-zero categories of 1–35 (mostly 25), 35–63 (mostly 50 or 62.5), and >63 (mostly 100) mg/week generally correspond to 100, 200, and 400 mg/month, respectively; For ESA dose, separate models were used to first model use (yes/no), and then the dosage among the users; Died next month is an indicator for whether the patient died during the following (not current) month.

Summary of Step 1 Models and Covariates Notes: Categories, indicates cut-points used; IV iron doses were largely discrete, and so the 3 non-zero categories of 1–35 (mostly 25), 35–63 (mostly 50 or 62.5), and >63 (mostly 100) mg/week generally correspond to 100, 200, and 400 mg/month, respectively; For ESA dose, separate models were used to first model use (yes/no), and then the dosage among the users; Died next month is an indicator for whether the patient died during the following (not current) month. Illustration of longitudinal data collection and hypothesized relationships. For our second objective, we attempt to more closely replicate results from the PIVOTAL trial by restricting our DOPPS sample based on PIVOTAL inclusion criteria. We excluded patient-months (not patients) that did not meet the criteria for inclusion; then prior to Step 1, for each patient, we selected as the new “baseline” the first month that the patient met PIVOTAL eligibility criteria.19 Figure 2 summarizes these criteria and how we attempted to replicate each criterion in DOPPS. Step 1 models included the baseline month and all subsequent patient-months for eligible patients; Step 2 was then carried out as in the primary analysis.

Figure 2

Flow diagram with PIVOTAL trial15 exclusion criteria.

Flow diagram with PIVOTAL trial15 exclusion criteria. We reported 12-month trajectories for all modeled variables and mortality risk for: (1) observed DOPPS data; (2) natural course (expected) simulation; (3) PIVOTAL high-dose simulation; and (4) PIVOTAL low-dose simulation. We sought to make three comparisons: observed data vs natural course simulation (1 vs 2) to check for model misspecification; PIVOTAL high vs low-dose (3 vs 4) simulations to assess the treatment strategies; and simulated PIVOTAL strategies vs the published PIVOTAL trial15 data to assess how closely our parametric g-formula results matched a real randomized trial. From our simulations, we reported the 1-year mortality risk ratio (RR) and risk difference (RD) comparing the two PIVOTAL strategies. Confidence intervals (CIs) were estimated by combining multiple imputations with bootstrapping based on the “MI boot (pooled sample)” procedure22 previously implemented by Karaboyas et al.23 We also performed a complete-case sensitivity analysis. In general, we relied on published g-formula analyses by Taubman24 and others,2,21,25–27 following their step-by-step approach to help guide our analysis coded using SAS version 9.4 (SAS institute, Cary, NC).

Results

Study Sample

Models in Step 1 utilized data from 97,044 patient-months across 6325 patients; the median (interquartile range [IQR]) number of months contributed by each patient was 15.9,26 Table 3 shows baseline patient characteristics for1 the full DOPPS sample used in our primary analysis (N=6325);2 the DOPPS subset after restricting based on PIVOTAL eligibility criteria (N=1508); and3,4 PIVOTAL patients randomized to the high-dose and low-dose IV iron treatment protocols. Note that blank cells in the PIVOTAL columns represent variables not reported in the PIVOTAL (Table 1).15 There were several key differences between DOPPS patients and PIVOTAL participants: DOPPS patients were older, had been on hemodialysis for a longer period, weighed less, had higher levels of serum ferritin, TSAT, and hemoglobin, and were more likely to have a history of heart failure, hypertension, and peripheral vascular disease. Some of these differences were neutralized by further restriction of the DOPPS data based on PIVOTAL eligibility criteria (eg, time since hemodialysis start, ferritin, TSAT), but others were not (eg, age, weight, hemoglobin, comorbidity history). The extent of missing data in each sample is shown in .

Table 3

Summary of Baseline Patient Characteristics in the DOPPS (by Type of Analysis) and the PIVOTAL Trial15 (by Treatment Group)

	DOPPS Observational Data		PIVOTAL Trial Data
Patient Characteristics	Primary Analysis	PIVOTAL- Restricted	Proactive High-Dose Arm	Reactive Low-Dose Arm
N patients	6325	1508	1093	1048
Time-fixed variables
Age (years)	65.9 ± 15.1	66.0 ± 15.4	62.7 ± 14.9	62.9 ± 15.1
Sex (% male)	61%	62%	65%	66%
Time since HD start (months)	23.0 (6.1, 62.0)	4.1 (2.5, 6.5)	4.9 (2.8, 8.4)	4.8 (2.8, 8.1)
Weight (kg)	72.0 ± 16.8	75.3 ± 17.6	81.3 ± 21.0	82.9 ± 20.9
Anemia-related variables
Serum ferritin (ng/mL)	357 (183, 581)	179 (99, 276)	214 (132, 305)	217 (137, 301)
TSAT (%)	24 (18, 33)	19 (15, 24)	20 (16, 24)	20 (16, 24)
Hemoglobin (g/dL)	11.4 ± 1.4	11.2 ± 1.4	10.6 ± 1.4	10.5 ± 1.4
ESA use (%)	88%	100%	100%	100%
ESA dose (1000 units/week)	7.8 (4.8, 12.5)	8.6 (5.0, 13.0)	8.0 (5.0, 10.0)	8.0 (5.0, 12.0)
IV iron use (%)	70%	81%	–	–
IV iron dose (mg/month)	383 ± 232	439 ± 245	–	–
Other time-updated variables
Serum albumin (g/dL)	3.7 ± 0.5	3.7 ± 0.5	–	–
Serum phosphorus (mg/dL)	4.9 ± 1.6	5.1 ± 1.5	–	–
C-reactive protein (mg/L)	6.0 (2.9, 13.4)	5.0 (2.9, 10.5)	6.0 (3.3, 13.9)	7.0 (4.0, 15.0)
Hospitalized in last month (%)	10%	10%	–	–
Catheter use (%)	28%	35%	41%	41%
Comorbidity history (%)
Coronary artery disease	34%	33%	–	–
Heart failure	21%	21%	4%	4%
Cerebrovascular disease	16%	15%	–	–
Other cardiovascular disease	31%	28%	–	–
Cancer (non-skin)	17%	17%	–	–
Diabetes	36%	42%	45%	44%
Hepatitis B or C	5%	0%	0%	0%
Gastrointestinal bleeding	5%	6%	–	–
Hypertension	87%	89%	74%	72%
Lung disease	14%	14%	–	–
Neurologic disease	12%	11%	–	–
Psychiatric disorder	17%	14%	–	–
Peripheral vascular disease	30%	28%	8%	9%
Recurrent cellulitis, gangrene	9%	7%	–	–

Notes: Mean ± standard deviation, median (IQR), or % shown; Median ESA dose restricted to users; PIVOTAL trial data derived from Table 1 in Macdougall et al15 with variables shown as “–” if not reported; PIVOTAL-restricted DOPPS patients are a subset of the patients included in the primary analysis, but further restricted to emulate PIVOTAL exclusion criteria.

Abbreviations: HD, hemodialysis; TSAT, transferrin saturation; ESA, erythropoiesis-stimulating agent; IV, intravenous.

Summary of Baseline Patient Characteristics in the DOPPS (by Type of Analysis) and the PIVOTAL Trial15 (by Treatment Group) Notes: Mean ± standard deviation, median (IQR), or % shown; Median ESA dose restricted to users; PIVOTAL trial data derived from Table 1 in Macdougall et al15 with variables shown as “–” if not reported; PIVOTAL-restricted DOPPS patients are a subset of the patients included in the primary analysis, but further restricted to emulate PIVOTAL exclusion criteria. Abbreviations: HD, hemodialysis; TSAT, transferrin saturation; ESA, erythropoiesis-stimulating agent; IV, intravenous.

Parametric g-Formula Results: Full DOPPS Sample

For the 6325 DOPPS patients included in our primary analysis, we first compared observed data (ie, mean or median levels for up to 12 months of DOPPS follow-up) with our natural course simulation, and found minimal deviations (). The 1-year mortality risk in parametric g-formula simulations was 0.120 vs 0.101 under the high vs low IV iron dose simulated interventions (Figure 3A); the corresponding RR was 1.20 (95% CI: 1.07, 1.33), and the RD was 0.020 (95% CI: 0.008, 0.031). Differences in secondary outcomes under the two interventions over the 12-month simulation were as follows: mean hemoglobin was 0.13 (95% CI: 0.09, 0.17) g/dL higher for the high- vs low-dose strategy (Figure 3B). Median ferritin was 357 ng/mL at baseline and increased to 475 ng/mL under the high-dose strategy while decreasing to 292 ng/mL under the low-dose strategy, a difference at 12 months of 182 (95% CI: 171, 196) ng/mL (Figure 3C). Median TSAT was 25% and decreased slightly to 23.9% under the low-dose strategy, and gradually increased to 27.5% under the high-dose strategy, a difference of 3.6% (95% CI: 3.2%, 4.0%) (Figure 3D). Median ESA dose was 506 (95% CI: 287, 718) units/week lower (6.7% lower) under the high- vs low-dose strategy at 12 months (Figure 3E). Mean assigned IV iron dose (including 0 doses) was much greater under the high vs low IV iron dose strategy (253 vs 80 mg/month) at 12 months (Figure 3F). Comparing cumulative dosing over the 12-month period, patients assigned to the high- vs low-dose strategy received 5.8% (95% CI: 3.3%, 8.1%) less ESA and three times as much IV iron (3166 vs 981 mg) (). Results were generally consistent in a sensitivity analysis based on a complete-case analysis of 40,721 patient-months across 3,994 patients ().

Figure 3

Comparison of proactive high-dose vs reactive low-dose IV iron treatment strategy over 12 months using the parametric g-formula. High-dose and low-dose strategies defined by PIVOTAL trial15 protocol as described in Table 1; Outcomes: (A) all-cause mortality, (B) hemoglobin, (C) serum ferritin, (D) TSAT, (E) ESA dose, (F) IV iron dose.

Parametric g-Formula Results: Restricted to a PIVOTAL-Like Subset of DOPPS

In our second objective attempting to replicate the PIVOTAL population by further restriction of the DOPPS data as described in Figure 2, our sample size was reduced from 6325 to 1508 patients. In this subset, we found no major departures from the observed data in our natural-course simulation (). The 1-year mortality risk was 0.098 vs 0.083 under the high vs low IV iron dose-simulated interventions (Figure 4A); the corresponding RR was 1.19 (95% CI: 0.84, 1.59) and the RD was 0.015 (95% CI: −0.015, 0.041) – very similar to the overall sample, albeit with less precision. Baseline levels of hemoglobin, ferritin, and TSAT were much lower in this subset compared to the primary analysis (Table 3); subsequent rises are illustrated under both treatment strategies – though more pronounced under the high-dose strategy – and after 12 months, the differences between strategies (Figure 4B–D) were comparable to those observed in the overall sample. Doses of ESA (Figure 4E) and IV iron (Figure 4F) were higher at baseline in this subset, reflecting patient differences due to PIVOTAL restrictions, but eventually reached a steady state, with doses under the two strategies similar to the overall sample.

Figure 4

Comparison of proactive high-dose vs reactive low-dose IV iron treatment strategy over 12 months using the parametric g-formula, restricted to PIVOTAL-like patients. High-dose and low-dose strategies defined by PIVOTAL trial15 protocol as described in Table 1; N=1508 PIVOTAL-like DOPPS patients restricted to emulate PIVOTAL exclusion criteria; Outcomes: (A) all-cause mortality, (B) hemoglobin, (C) serum ferritin, (D) TSAT, (E) ESA dose, (F) IV iron dose.

Comparisons with PIVOTAL

Table 4 summarizes our parametric g-formula results – both primary (Objective 1) and restricted (Objective 2) – in comparison to the PIVOTAL randomized trial. The 1-year mortality risk was about 0.08 in both PIVOTAL arms, whereas we observed a risk difference of 0.019 (primary) and 0.015 (restricted) under the high- vs low-dose simulation. After 12 months, the difference in the mean cumulative IV iron dose assigned under the high vs low-dose strategy was ~2000 mg in the PIVOTAL trial and in both of our analyses. We found that median cumulative ESA dose was 20,000–30,000 units lower under the high- vs low-dose strategy after 12 months; this difference was smaller than the 90,000 units lower median cumulative ESA dose reported in the PIVOTAL trial. Similarly, differences in laboratory values after 12 months under the high vs low-dose strategy in the full DOPPS analysis (0.13 g/dL higher mean hemoglobin, 183 ng/mL higher median ferritin, 3.6% higher median TSAT) were directionally consistent with PIVOTAL findings, but smaller in magnitude (as estimated from in Macdougall et al15: ~0.2 g/dL higher mean hemoglobin, ~450 ng/mL higher median ferritin, ~7% higher median TSAT).

Table 4

Summary of Findings: Comparing PIVOTAL Trial with DOPPS Simulation Using the Full Sample (Objective 1) and PIVOTAL-Like Restricted Subset

	PIVOTAL Trial: Randomized Results			DOPPS Simulation: Full Sample			DOPPS Simulation: PIVOTAL-Restricted
Outcomes	High Dose	Low Dose	Difference	High Dose	Low Dose	Difference	High-Dose	Low-Dose	Difference
N patients	1093	1048	–	6325	6325	–	1508	1508	–
Laboratory values at baseline
Mean hemoglobin (g/dL)	10.6	10.5	–	11.39	11.39	–	11.16	11.16	–
Median ferritin (ng/mL)	214	217	–	357	357	–	184	184	–
Median TSAT (%)	20.0	20.0	–	25.0	25.0	–	20.0	20.0	–
Laboratory values after 12 months^a
Mean hemoglobin (g/dL)	11.1	10.9	0.2	11.54	11.41	0.13	11.46	11.35	0.11
Median ferritin (ng/mL)	580	130	450	475	292	183	435	268	167
Median TSAT (%)	26	19	7	27.5	23.9	3.6	27.0	23.4	3.6
Cumulative dose through 12 months^a
Median ESA dose (100K units)	380	470	−90	342	364	−22	353	379	−26
Mean IV iron dose (mg)	3800	1800	2000	3166	981	2185	3460	1267	2193
All-cause mortality^a
1-year risk	0.08	0.08	0	0.120	0.101	0.019	0.098	0.083	0.015

Notes: PIVOTAL trial data derived from Macdougall et al15; aIndicates numbers were approximated from figures; PIVOTAL-restricted DOPPS patients are a subset of the patients included in the primary analysis, but further restricted to emulate PIVOTAL exclusion criteria.

Summary of Findings: Comparing PIVOTAL Trial with DOPPS Simulation Using the Full Sample (Objective 1) and PIVOTAL-Like Restricted Subset Notes: PIVOTAL trial data derived from Macdougall et al15; aIndicates numbers were approximated from figures; PIVOTAL-restricted DOPPS patients are a subset of the patients included in the primary analysis, but further restricted to emulate PIVOTAL exclusion criteria.

Discussion

In the DOPPS cohort of hemodialysis patients, we implemented the parametric g-formula to compare patient outcomes under two simulated IV iron treatment regimens defined by the protocol used in the recently published PIVOTAL randomized trial.15 In both the overall DOPPS sample and PIVOTAL-restricted subset, we found that after 12 months, the proactive high-dose vs reactive low-dose strategy resulted in much higher serum ferritin levels, slightly higher levels of hemoglobin and TSAT, and slightly lower ESA doses, but a higher risk of mortality. Thus, our findings do not suggest a preference for the proactive high IV iron dose regimen. Our simulated differences in laboratory values after 12 months under the high vs low-dose strategy were directionally consistent with the PIVOTAL trial, but smaller in magnitude. Our mortality results were, however, not consistent with the trial; PIVOTAL authors observed a hazard ratio (HR) of 0.85 (95% CI: 0.73, 1.00) for their primary composite outcome over the full 42-month follow-up period for the high-dose vs low-dose arm, although in Macdougall et al15 appears to show no difference (HR=RR=1) in their secondary endpoint of all-cause mortality after the first 12 months of follow-up. One possibility as to why our results did not match PIVOTAL more closely is that incident hemodialysis patients could be immediately randomized to a treatment protocol in the PIVOTAL trial, while the parametric g-formula requires 2 previous months of data to inform the models and simulations; this functionally limits us to patients with 3+ months on hemodialysis therapy, after low hemoglobin levels are likely to have been mostly corrected.28 If anemia treatments provide an initial boost to levels of hemoglobin, ferritin, and TSAT in previously untreated incident hemodialysis patients that dissipate once patients enter more of a steady-state, this may help explain why our effect sizes were smaller than in the PIVOTAL trial. Indeed, the majority of the 12-month differences in these laboratory outcomes in the PIVOTAL trial were observed within the first 3 months of follow-up.15 Another possible explanation is model misspecification in Step 1. Similarities in the trajectories of our natural course simulation vs the observed data () were mostly encouraging, as any departures may signify potential model misspecification. However, if we are consistently underestimating the effect of IV iron on intermediate outcomes (ie, hemoglobin, ferritin, TSAT), any biases in Step 1 models may affect predictions of ESA dose and mortality risk in the Step 2 simulation. A third possibility is that IV iron may have different effects on iron measures and survival in the generally healthier patients selected for the trial.19 There were clear differences in the DOPPS cohort vs PIVOTAL participants, many of which remained even after we attempted to restrict our sample to PIVOTAL-like patients (Table 3). While we were able to restrict on unambiguous lab cut-offs (eg, ferritin <400 ng/mL), we were limited in our ability to restrict on other more subjective criteria (eg, “life expectancy <12 months per the judgement of the investigator”). Finally, while randomized trials are considered the gold standard, our discrepancies may be in part due to the broader issue of trial reproducibility, particularly open-label trials,29 and the potential for overestimation of effects in the controlled environment of randomized trials, when the focus is on efficacy over effectiveness and only highly selected patients are enrolled.30 Some observational studies found that higher IV iron doses were associated with elevated risk of adverse events,7–9 and some did not.10–12 However, all of these studies considered IV iron as a static rather than dynamic treatment strategy; thus, we cannot expect to make a quantitatively fair comparison between those effect estimates and ours. Li et al31 used inverse probably weighted (IPW) estimation of marginal structural models to evaluate dynamic iron supplementation strategies and found – similar to our study – that patients under more intensive iron dosing strategies had higher mortality. Most of these studies were conducted in the US, where ferritin levels are much higher than in Europe.16 In the PIVOTAL trial,15 the ferritin threshold at which to discontinue IV iron in the proactive high-dose arm was 700 ng/mL, lower than the median value (800 ng/mL) observed in the US in February 2019,32 limiting generalizability of our analysis – and PIVOTAL itself – regarding optimal treatment for patients with ferritin >700 ng/mL.33 A key strength of our study was the ability to compare well-defined dynamic treatment strategies. Rather than ask whether patients who received >400 vs 200–399 mg of iron over a specified time period had better outcomes, our research question better reflects the complexities of clinical practice. This study design, in contrast to a randomized trial, is flexible to many potential interventions (eg, altering the ferritin/TSAT criteria) and inclusion criteria. Second, this method properly accounts for a treatment-confounder feedback loop (eg, ferritin → IV iron → ferritin),34 but without the possibility that unstable weights will drive results, as with IPW methods.2,25,27 Third, using a European cohort has two advantages:1 we were able to adjust for C-reactive protein (CRP), a marker of inflammation with a strong positive association with both ferritin and mortality that is not routinely measured in the US;35 and2 we avoided violations of positivity (when certain subgroups always or never receive the treatment),36 which would occur in other regions where IV iron dosing strategies are either more aggressive than the high-dose arm (US) or more conservative than the low-dose arm (Japan).16 Finally, while a small sample size can be augmented in Step 2,26 our large sample in the primary analysis allows for improved precision of the Step 1 coefficient estimation. Our study had some limitations shared by all parametric g-formula analyses. First, the parametric g-formula can account for time-dependent confounders, but only to the extent they are measured accurately. Second, under the “g-null paradox,”2 we may still observe an association seemingly due to a treatment effect when the causal null hypothesis is true, given a large enough sample size; however, there is no evidence this occurs in practice.37 Lastly, reliance on many parametric models creates more opportunity for bias, as misspecification in one model may reverberate throughout the simulation. As we were not able to replicate all PIVOTAL findings, the following concerns and obstacles to using large databases to mimic randomized trials should be appreciated. First, we were unable to narrow our cohort to a PIVOTAL-like population through restriction alone, despite attempts to implement the trial exclusion criteria. Second, while the maximum PIVOTAL trial follow-up was 42 months, the median DOPPS follow-up was 15 months, and so we focused on 1-year outcomes to avoid simulating follow-up beyond the limits of the empirical data. Third, we emulated a secondary endpoint of PIVOTAL – all-cause mortality – to avoid potential misclassification bias for nonfatal cardiovascular events reported in the DOPPS. Fourth, our intent-to-treat38 analysis assumed perfect adherence with the treatment strategy because IV medications are conveniently administered at each hemodialysis session 3x/week. Indeed, PIVOTAL findings were very similar when analyzed per-protocol vs intent to treat because only 1% of patients had major protocol violations,15 Fifth, ferritin and TSAT measurements were regimented monthly in the PIVOTAL trial; the DOPPS data reflect real-world clinical practice, in which these labs are sometimes only assessed every 3 months. The basic principle of the non-parametric g-formula as an extension of standardization is appealing for many reasons; but the extensive modeling is, in practice, unlikely to fully account for the many unknown associations and interactions between variables.

Conclusion

It is challenging, and often not possible,39 to replicate clinical trial evidence using observational data. Because the hypothetical target trial we emulated was not identical to the published PIVOTAL trial, we may not necessarily expect the same answer to these slightly different research questions.1 While there may be inherent limitations in perfectly reproducing the results of a randomized trial which often represents a highly specialized population, this application demonstrates the value and flexibility of the parametric g-formula for comparing many variations of complex intervention strategies and generalizing results to a broader target population. Our results provide valuable evidence to nephrologists and hence hemodialysis patients, and illustrate a framework to evaluate treatment strategies that have not been tested in randomized trials.

32 in total

Review 1. Post-PIVOTAL Iron Dosing with Maintenance Hemodialysis.

Authors: David Collister; Navdeep Tangri
Journal: Clin J Am Soc Nephrol Date: 2019-06-10 Impact factor: 8.237

2. Methods for dealing with time-dependent confounding.

Authors: R M Daniel; S N Cousens; B L De Stavola; M G Kenward; J A C Sterne
Journal: Stat Med Date: 2012-12-03 Impact factor: 2.373

3. Data from the Dialysis Outcomes and Practice Patterns Study validate an association between high intravenous iron doses and mortality.

Authors: George R Bailie; Maria Larkina; David A Goodkin; Yun Li; Ronald L Pisoni; Brian Bieber; Nancy Mason; Lin Tong; Francesco Locatelli; Mark R Marshall; Masaaki Inaba; Bruce M Robinson
Journal: Kidney Int Date: 2014-07-30 Impact factor: 10.612

4. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available.

Authors: Miguel A Hernán; James M Robins
Journal: Am J Epidemiol Date: 2016-03-18 Impact factor: 4.897

5. Analysis of occupational asbestos exposure and lung cancer mortality using the g formula.

Authors: Stephen R Cole; David B Richardson; Haitao Chu; Ashley I Naimi
Journal: Am J Epidemiol Date: 2013-04-04 Impact factor: 4.897

Review 6. Treatment Effect in Earlier Trials of Patients With Chronic Medical Conditions: A Meta-Epidemiologic Study.

Authors: Fares Alahdab; Wigdan Farah; Jehad Almasri; Patricia Barrionuevo; Feras Zaiem; Raed Benkhadra; Noor Asi; Mouaz Alsawas; Yifan Pang; Ahmed T Ahmed; Tamim Rajjo; Amrit Kanwar; Khalid Benkhadra; Zayd Razouki; M Hassan Murad; Zhen Wang
Journal: Mayo Clin Proc Date: 2018-02-21 Impact factor: 7.616

7. Estimating long-term treatment effects in observational data: A comparison of the performance of different methods under real-world uncertainty.

Authors: Simon J Newsome; Ruth H Keogh; Rhian M Daniel
Journal: Stat Med Date: 2018-04-19 Impact factor: 2.373

8. Feasibility of Using Real-World Data to Replicate Clinical Trial Evidence.

Authors: Victoria L Bartlett; Sanket S Dhruva; Nilay D Shah; Patrick Ryan; Joseph S Ross
Journal: JAMA Netw Open Date: 2019-10-02

9. Estimating the Fraction of First-Year Hemodialysis Deaths Attributable to Potentially Modifiable Risk Factors: Results from the DOPPS.

Authors: Angelo Karaboyas; Hal Morgenstern; Yun Li; Brian A Bieber; Raymond Hakim; Takeshi Hasegawa; Michel Jadoul; Elke Schaeffner; Raymond Vanholder; Ronald L Pisoni; Friedrich K Port; Bruce M Robinson
Journal: Clin Epidemiol Date: 2020-01-16 Impact factor: 4.790

10. Association between serum ferritin and mortality: findings from the USA, Japan and European Dialysis Outcomes and Practice Patterns Study.

Authors: Angelo Karaboyas; Hal Morgenstern; Ronald L Pisoni; Jarcy Zee; Raymond Vanholder; Stefan H Jacobson; Masaaki Inaba; Lisa C Loram; Friedrich K Port; Bruce M Robinson
Journal: Nephrol Dial Transplant Date: 2018-12-01 Impact factor: 5.992

1 in total

1. Avoiding Time-Related Biases: A Feasibility Study on Antidiabetic Drugs and Pancreatic Cancer Applying the Parametric g-Formula to a Large German Healthcare Database.

Authors: Claudia Börnhorst; Tammo Reinders; Wolfgang Rathmann; Brenda Bongaerts; Ulrike Haug; Vanessa Didelez; Bianca Kollhorst
Journal: Clin Epidemiol Date: 2021-10-28 Impact factor: 4.790

1 in total