Literature DB >> 31718423

A causal modelling framework for reference-based imputation and tipping point analysis in clinical trials with quantitative outcome.

Ian R White^1,2, Royes Joseph¹, Nicky Best³.

Abstract

We consider estimation in a randomised placebo-controlled or standard-of-care-controlled drug trial with quantitative outcome, where participants who discontinue an investigational treatment are not followed up thereafter, and the estimand follows a treatment policy strategy for handling treatment discontinuation. Our approach is also useful in situations where participants take rescue medication or a subsequent line of therapy and the estimand follows a hypothetical strategy to estimate the effect of initially randomised treatment in the absence of rescue or other active treatment. Carpenter et al proposed reference-based imputation methods which use a reference arm to inform the distribution of post-discontinuation outcomes and hence to inform an imputation model. However, the reference-based imputation methods were not formally justified. We present a causal model which makes an explicit assumption in a potential outcomes framework about the maintained causal effect of treatment after discontinuation. We use mathematical argument and a simulation study to show that the "jump to reference", "copy reference" and "copy increments in reference" reference-based imputation methods, with the control arm as the reference arm, are special cases of the causal model with specific assumptions about the causal treatment effect. We also show that the causal model provides a flexible and transparent framework for a tipping point sensitivity analysis in which we vary the assumptions made about the causal effect of discontinued treatment. We illustrate the approach with data from two longitudinal clinical trials.

Entities: Chemical Disease Gene Species

Keywords: Causal inference; Clinical trial; Multiple imputation; Reference-based imputation; Sensitivity analysis; de facto estimand; missing data

Mesh：

Year: 2019 PMID： 31718423 PMCID： PMC7048106 DOI： 10.1080/10543406.2019.1684308

Source DB: PubMed Journal: J Biopharm Stat ISSN： 1054-3406 Impact factor: 1.051

Introduction

Missing outcome data represent a major threat to the validity of randomised controlled trials (RCTs), and appropriate analysis methods have been much discussed. An influential report showed that different analysis methods may target different estimands (different measures of treatment effect) and argued that specification of the estimand is an important part of trial design and should inform trial analysis and reporting (National Research Council 2010). Regulators have joined the call for estimands to be defined clearly, and the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) Steering Committee has endorsed the development of new regulatory guidance on the choice of estimands and sensitivity analysis in clinical trials (European Medicines Agency 2017). We consider two types of estimand considered by the National Research Council (2010): (E1) difference in outcome improvement at the planned endpoint if all participants had tolerated or adhered to trial protocol; (E2) difference in outcome improvement at the planned endpoint for all randomised participants. The former measures how treatment works in an ideal setting (efficacy), while the latter measures how treatment might work in practice (effectiveness). To encompass outcomes that measure harms of treatment, Carpenter et al. (2013) (henceforth CRK) proposed the broader terms de jure and de facto estimand for (E1) and (E2) respectively. The ICH E9(R1) draft addendum (European Medicines Agency 2017) refers to a “hypothetical strategy” and a “treatment policy strategy” in defining estimands (E1) and (E2) respectively. Sometimes investigators continue to collect data after treatment discontinuation. The use of such off-treatment data depends on the estimand (Permutt 2016). For the estimation of a de jure estimand, off-treatment data for participants who discontinued randomised treatment could be used in a complier average causal effect analysis (Dunn et al. 2003). In practice, however, off-treatment data are typically either not collected or excluded from the primary analysis, and the missing data are assumed to be missing at random (MAR): that is, it is assumed that participants who discontinued treatment would have benefited from continued treatment in the same way as those who remained on treatment. However, estimation of a de facto estimand ideally makes use of the off-treatment data, which should be collected where possible (National Research Council 2010). When all discontinuers are followed up and complete outcome data are obtained, the de facto estimand can be directly estimated by comparing observed means (Little and Kang 2015). This paper considers estimation of a de facto estimand for a quantitative outcome when off-treatment data are not collected. For participants who have discontinued treatment, this requires assumptions about whether and to what extent they continue to benefit from their previous treatment. Our approach is also relevant to the situation in which rescue treatment (over and above the per-protocol treatment regime for the control arm) is available for those who discontinue randomised treatment, but interest is in the effect attributable to the initially randomised treatment without the confounding effects of rescue medications (corresponding to estimand 6 in Mallinckrodt et al. (2012)), and data after rescue are either unavailable or are ignored. In the previous work on this topic, Little and Yau (1996) presented a multiple imputation (MI) approach that could incorporate a variety of alternative assumptions about the treatment effect after treatment discontinuation for the estimation of de facto estimands in RCTs. CRK presented a generic algorithm for MI of post-discontinuation outcome data. They assumed that post-discontinuation outcomes in a given trial arm behave in some way like outcomes in a reference arm (often the control arm), and proposed various specific methods for forming the imputation distribution. These methods have been called “reference-based imputation” (RBI) or “control-based imputation” methods. However, CRK did not theoretically justify these methods. In this paper, we assume that participants who have discontinued their randomised treatment receive treatment that is similar to that allocated to the control arm: thus for reference-based imputation, we take the reference to be the control treatment, where this is typically either placebo or standard of care. Specification of estimands is clarified by using counterfactual outcomes – outcomes that have not been or could not have been observed. Such counterfactuals are best described using potential outcomes notation (Angrist et al. 1996; Little and Rubin 2000). Our aims in this paper are first to propose and implement a causal model, using explicit assumptions about the causal effect of a previously discontinued treatment, and second to show that three of the RBI methods are special cases of the causal model, and hence to provide their theoretical justification. Implementing the causal model requires untestable assumptions, so we need sensitivity analyses to understand the impact of these assumptions on inferences and conclusions from the primary analysis. Kenward et al. (2001) described a principled approach to sensitivity analyses which varies a sensitivity parameter that quantifies deviations from the missing data assumption. A tipping point sensitivity analysis (e.g. Yan et al. (2009); Liublinska and Rubin (2014)) extends this approach by varying the sensitivity parameter until the conclusion from the primary analysis is overturned. The third aim of this paper is to propose a tipping point sensitivity analysis using the causal model. The paper is organised as follows: In Section 2, we set out notation and define the RBI methods. Section 3 contains the key new material: here we set out the causal model and discuss equivalence with RBI methods. In Section 4, we discuss implementation. In Section 5 we verify the equivalence of RBI and causal model estimates in a simulation study. In Section 6, we illustrate the proposed approach and demonstrate the tipping point analysis with two example data sets. We conclude with summary remarks in Section 7.

Reference-based imputation (RBI) methods

Notation

We consider a two-arm longitudinal RCT with quantitative outcome observations scheduled at baseline and at occasions after randomisation. Let be the random variable for the participant’s randomised treatment arm: for the active treatment arm and for the control arm. Let be the random variable for the participant’s outcome at visit . It is convenient to define , the vector of all outcomes up to and including visit ; , the vector of all outcomes after visit ; and , the vector of all outcomes. Let be the random variable for the participant’s last visit prior to discontinuing treatment, so . is observable for all but only observed for , because we assume no off-treatment data. We aim to impute the unobserved values of for : we stress that these are the outcomes that existed but were unobserved, not the outcomes that would have existed if treatment had been continued. We define the potential outcome at visit as the outcome that would have been observable if, possibly contrary to fact, a participant received active treatment for periods only. In particular, is the potential outcome if never treated, and is the potential outcome if always treated. We define , and as before. We let , the mean of the potential outcome at visit if active treatment was received for periods only. Similarly, we define , and . The variance-covariance matrix of the potential outcomes is with submatrices , and . We define the matrix of regression coefficients of potential outcomes after visit on those up to visit as , and the residual variance of the potential outcomes after visit given those up to visit as . Figure 1 illustrates this notation in the case of an adverse outcome which deteriorates (increases) in the absence of treatment and improves (decreases) in the presence of treatment. If treatment is discontinued at time then mean outcomes up to time are unaffected ( for ) but outcomes after time are worsened ( for ). The notation is summarised in supplementary appendix A.

Figure 1.

Notation illustrated. Lines indicate mean potential outcomes under three potential treatment scenarios. Circles indicate observable outcomes for a participant who discontinues treatment at visit .

Notation illustrated. Lines indicate mean potential outcomes under three potential treatment scenarios. Circles indicate observable outcomes for a participant who discontinues treatment at visit . This potential outcomes notation allows for only one type of treatment. We assume that the observed outcomes are not affected by other treatments. For the outcomes after treatment discontinuation, we assume either that rescue treatment (over and above the per-protocol treatment regime for the control arm) is not available or that interest is in the effect attributable to the initially randomised treatment without the confounding effects of rescue medications. The de jure estimand (estimand E1) at visit is . The de facto estimand (estimand E2) is the estimand of interest in this paper and is . Often, primary interest is in the last visit, .

Reference-based imputation

CRK proposed a generic MI algorithm for this setting: For each treatment arm, fit a multivariate normal model to all observed data, using a Bayesian approach with an improper prior and assuming MAR. The model should have unstructured mean and unstructured variance-covariance matrix. For each treatment arm, draw a mean vector and variance-covariance matrix from the posterior distribution. For each treatment arm and each possible treatment discontinuation visit , use the draws to build the hypothetical joint distribution of the outcomes up to time and the outcomes after time , using one of the methods described below. For each treatment arm and each observed treatment discontinuation visit , construct the imputation distribution of given . Sample from this conditional distribution, to create a “completed” data set. Repeat steps 2–4 times, resulting in imputed data sets. Analyse each imputed data set and combine the results using Rubin’s rules (Rubin 1987). To understand the assumptions behind the CRK algorithm, we express it using the potential outcomes notation. In step 1, the model is fitted to each treatment arm separately. In the control arm, the observed outcomes are . In the active treatment arm, the observed outcomes are , because we assume no off-treatment data. Hence, under MAR assumptions that we make explicit in Section 3.1, the multivariate normal model fitted to the control arm has mean and variance , and that fitted to the active treatment arm has mean and variance . In step 2, values of , , and are drawn from their posterior distributions. In step 3, the drawn values are used to build hypothetical joint distributions of . Specifically, for participants in the active treatment arm who discontinue treatment at time , a joint distribution is built for . CRK proposed using a multivariate normal distribution. Five methods are mainly distinguished by their choice of mean: Missing at random (MAR): mean = . Last mean carried forward (LMCF): mean = where is a row vector of length . Copy reference (CR): mean = . Jump to reference (J2R): mean = . Copy increments in reference (CIR): CRK proposed corresponding variance matrices. We simplify their description by observing that only the regression coefficient matrix and conditional variance matrix of the potential outcomes after visit given those up to visit are used in later steps. CRK set these to be and for MAR and LMCF, and and for J2R, CIR and CR. An approach that we call RBI alternative instead uses and for all RBI methods. In step 4, the joint distributions above are used to derive conditional distributions for . Under J2R, for example, this is The rest of the CRK algorithm follows standard MI methods, using the conditional distribution as the imputation model. The hypothetical joint distributions under each of these methods are written in the notation of this paper in supplementary appendix B. The corresponding imputation distributions for the subgroup for any are given under “Reference-based imputation methods” in Table 1. The imputation means are written as a selection term, reflecting how the subgroup differs from other participants, plus a term linearly related to the treatment effect up to time . This motivates our causal model in section 3, which relates causal treatment effects after to those up to .

Table 1.

Imputation distribution of for given randomisation , past and treatment discontinuation visit , under various reference-based imputation methods with control arm as reference (Carpenter et al. 2013) and under the causal model. is a ‘carry-forward’ matrix containing columns of zeroes and a final column of ones, so that is a column vector containing copies of .

	Imputation distribution
Method	Mean	Variance
Reference-based imputation methods
MAR	βt(tmax)Y≤t−μ≤t(t)+μ>t(tmax)	Ωt(tmax)
LMCF	βt(tmax)Y≤t−μ≤t(t)+Ctμ≤t(t)	Ωt(tmax)
J2R	βt(0)Y≤t−μ≤t(t)+μ>t(0)	Ωt(0)
CIR	βt(0)Y≤t−μ≤t(t)+Ctμ≤t(t)−μ≤t(0)+μ>t(0)	Ωt(0)
CR*	βt(0)Y≤t−μ≤t(t)+βt(0)μ≤t(t)−μ≤t(0)+μ>t(0)	Ωt(0)
Causal model
	βt(t)Y≤t−μ≤t(t)+Ktμ≤t(t)−μ≤t(0)+μ>t(0)	Ωt(t)

* The CR mean is more simply written , but the expression given here facilitates comparison with the other methods.

New causal model

In this section, we first set out the assumptions of the causal model, and then derive the imputation model.

Assumptions

Assumption A1. Randomisation is independent of potential outcomes: for all . Assumption A2. is missing at random (MAR). Assumption A2 states that the observed outcomes in the control arm are MAR. If there are no missing data before treatment discontinuation, then we can also write this for all . In other words, treatment discontinuation in the control arm does not relate to future untreated outcomes, given the past and present. Assumption A3. is MAR. Assumption A3 states that the counterfactual fully treated outcomes in the active arm are MAR. If there are no missing data before treatment discontinuation, then we can also write this for all . In other words, treatment discontinuation in the active arm does not relate to future counterfactual fully treated outcomes, given the past and present. We do not assume that the actual outcomes in the active arm, , are MAR. Indeed, this is unlikely to be true, since (if treatment is effective) treatment discontinuation causally affects actual future outcomes. Thus, treatment discontinuation is allowed to relate to future actual outcomes, given the past and present. Assumption A4. follows a linear regression for each . This assumption implies that the conditional mean of each future potential outcome () depends linearly on the past observed outcomes . We make no assumption of linearity in or , so that trajectories over time have no assumed form. A4 is true if follows a multivariate Normal distribution. The linear regression has mean and residual variance matrix . Assumption A5. . A5 states that treatment discontinuation at visit is unaffected by future partly treated potential outcomes. It appears similar to the equation A3, but the latter refers instead to future fully-treated potential outcomes. If there are no missing data before treatment discontinuation then a stronger assumption which implies both A3 and A5 is for all and all . Assumption A6. . A6 is an explicit assumption about how the maintained effect of treatment after discontinuation relates to the effect of treatment before discontinuation. Equivalently, is a matrix of sensitivity parameters: it is not identified by the data and must be specified by the user. Some suggestions for are made in Section 3.3. Our model makes no assumption about how the effect of active treatment changes over time while active treatment is continued. However, implicit in assumption A6 is that there is no delayed response to the control treatment: thus when a patient discontinues randomised treatment, we assume the effects of any treatments they switch to are similar to the effects they would have experienced had they received the control treatment from the start of the trial.

Modelling outcomes after treatment discontinuation

In this subsection, we consider an individual in the active arm who stops treatment at visit . We use the above assumptions to derive a model for this individual’s outcomes after treatment discontinuation, conditional on their history . In section 3.3 we take this model as an imputation model and compare it with the RBI imputation models. We write the conditional mean outcome after treatment discontinuation in this model as the sum of three terms: where the first term represents the difference between the subgroup who discontinue at visit and the whole group (“selection term”), the second term represents the treatment effect in the whole group (“maintained treatment effect”), and the third term is the untreated mean. We write the selection term as Using assumption A6, and substituting (1) and (3) into (2) gives the mean of the imputation model: For the variance, we approximate by which is valid when differences between drop-out patterns are small compared with the variation in the data, and otherwise conservative. By A1, this is . This imputation distribution is given under “Causal model” in Table 1.

Using the causal model

We need to fix three parameters in order to identify the causal model: , and . In most cases the post-discontinuation treatment effect may be assumed to depend only on the treatment effect at the discontinuation visit and not on treatment effects at earlier visits, and therefore has non-zero elements only in the final column; our software implementation below relies on this assumption. For tipping point sensitivity analyses, we consider two single-parameter causal models for the outcome at visit after discontinuation at visit : where are the times (on a suitable scale) of visits . The maintained treatment effect after treatment discontinuation is constant in model (5) but decays exponentially in model (6), being multiplied by for every unit of time, where . A combined model is Next, we need to fix , the matrix of regression coefficients of on . Assumptions A2 and A3 identify , the regression of on , and , the regression of on , respectively. We propose assuming either or . We call these “regression from reference” and “regression from active”, respectively. If all treatment effects are homogeneous (i.e. if does not vary between individuals for any ) then and both “regression from reference” and “regression from active” are valid. If we are willing to assume equal variance-covariance matrices across trial arms () then and “regression from reference” and “regression from active” give the same results. The same arguments and proposals apply for .

Comparison with reference-based imputation

From Table 1, RBI methods J2R, CIR and CR correspond to particular choices of the causal model, while the MAR and LMCF methods do not correspond to this causal model. is set to for J2R, for CIR, and for CR. This makes precise the statement of Mallinckrodt et al. (2012) that, under CIR, CR and J2R, the subgroup has the treatment effect at visit maintained, diminished and eliminated, respectively, at visit . Further, and are set to and . If the RBI alternative variance structures are used then the same equivalences apply, but with and .

Estimation

The CRK algorithm described in section 2 is easily adapted to impute under the causal model. Steps 1 and 2 are unchanged, and provide draws of , , and . Step 3 is skipped since the imputation distribution is directly derived from the causal model. Step 4 starts by imputing any missing data in the control arm under assumption A2, and any missing data in the active arm before treatment discontinuation under assumption A3. It then constructs the imputation distribution for active-arm data after treatment discontinuation using specification of , and as in Section 3.3. Steps 5 and 6 are unchanged. We describe implementation using the SAS macros developed by James Roger to perform MI under the RBI methods. These are available on the web page (on www.missingdata.org.uk) of the DIA working group for missing data. We modified the Part2A macro to impute under the causal model with where is a scalar that may vary between participants. This enables causal model (5) to be implemented by setting , the same for all participants. When interest is in the outcome at visit , causal model (6) can be implemented by setting , which varies across participants with different values of . The modified macro is available on the DIA working group web page and sample code is provided in supplementary appendix C. By default, the variance-covariance matrices in the two arms, and , are assumed equal, but the user can specify them to be unequal, which is the case we consider. Alternative implementations are given in supplementary appendix D.

Simulation

We performed a simulation study to verify equivalence of the RBI methods with the proposed causal model for estimating the treatment effect at the final visit and to assess the impact of mis-specification of and . Mis-specification of has no impact on bias and little impact on variance, and for brevity is not discussed.

Design

Details of the data generating mechanism for simulating the observed and unobserved data are given in appendix E. Briefly, we consider an RCT with one baseline observation and two post-baseline visits during the treatment period (that is, ). Some active-arm participants discontinue treatment after visit 1 and are not observed at time 2; all other participants continue randomised treatment and are fully observed. The mechanism for discontinuing treatment is either MCAR or MAR. The data distribution for the observed data has either or : the latter is designed to make choices of important. For each mechanism for simulating the observed data, we analysed the data in three ways, each with several different settings. For complete data, we generated the unobserved data using a maintained treatment effect parameter or and setting or . For causal model imputation, we imputed the missing data using the causal model assuming a maintained treatment effect parameter or , and setting , the assumed value of , equal to or . For RBI imputation, we imputed the missing data using the reference-based imputation methods CR, CIR and J2R with the variance-covariance matrix taken from the control arm or from the active arm. In all cases, we estimated the treatment effect from a linear regression of on randomised arm and baseline . With imputed data, standard errors were computed using Rubin’s rules.

Results

Table 2 displays the average estimated treatment difference at visit 2 for each data generating mechanism (columns) and each analysis method (rows).

Table 2.

Simulation study with or : estimates of treatment effect at visit 2 using complete data, causal model imputation and RBI imputation. in all cases. means .

	Data generating mechanisms for observed data
	MCAR		MAR
	β1(2)=β1(0)	β1(2)≠β1(0)	β1(2)=β1(0)	β1(2)≠β1(0)
A. Complete data generated with:
β1(1)=β1(0)
k=0.00	0.99	0.99	1.00	0.70
k=0.50	1.24	1.24	1.25	0.95
k=0.74	1.36	1.36	1.37	1.07
k=1.00	1.49	1.49	1.50	1.20
β1(1)=β1(2)
k=0.00	0.99	1.00	1.00	1.00
k=0.50	1.24	1.25	1.25	1.25
k=0.74	1.36	1.37	1.37	1.37
k=1.00	1.49	1.50	1.50	1.50
B. Causal model imputation with assumed β˜1(1) and k˜
β˜1(1)=β1(0)
k˜=0.00	1.00	1.00	1.00	0.71
k˜=0.50	1.24	1.25	1.25	0.96
k˜=0.74	1.36	1.37	1.37	1.08
k˜=1.00	1.49	1.50	1.50	1.21
β˜1(1)=β1(2)
k˜=0.00	1.00	1.00	1.00	1.00
k˜=0.50	1.24	1.25	1.25	1.25
k˜=0.74	1.36	1.37	1.37	1.37
k˜=1.00	1.49	1.50	1.50	1.50
C. RBI imputation with assumed variance-covariance and method
Variance-covariance matrix from control arm
J2R	1.00	1.00	1.00	0.71
CR	1.24	1.25	1.25	0.96
CIR	1.49	1.50	1.50	1.21
Variance-covariance matrix from active arm
J2R	1.00	1.00	1.00	1.00
CR	1.24	1.37	1.25	1.38
CIR	1.49	1.50	1.50	1.50

Note: Maximum Monte Carlo standard error

Simulation study with or : estimates of treatment effect at visit 2 using complete data, causal model imputation and RBI imputation. in all cases. means . Note: Maximum Monte Carlo standard error Comparing panels A (analysis of complete data) and B (analysis by causal model imputation) shows that the causal model imputation methods result in unbiased estimates when the assumed values of and agree with the true values of and . Comparing panels B and C (analysis by RBI imputation) shows that the RBI estimates with variance-covariance matrix drawn from the control arm (as in CRK) agree with specific cases of the causal model estimates with , and the RBI estimates with variance-covariance matrix drawn from the active arm (as in RBI alternative) agree with specific cases of the causal model estimates with . Specifically, J2R corresponds to , CIR corresponds to , and CR corresponds to the second element of which is 0.50 or 0.74 depending on the data generating mechanism. Comparing different choices of when the observed data had either no selection effect (i.e. under MCAR) or (that is, in the first three data generating mechanisms), we see that choice of does not affect estimates, as expected from Section 3.4. Sensitivity to choice of was observed in the fourth data generating mechanism (MAR with ): mean causal model estimates were reduced by 0.29 by assuming instead of . Sensitivity to choice of was the same for all values of (see Section 3.4): for example, assuming instead of reduced mean causal model estimates by 0.50 irrespective of the value of . Table 3 displays the average standard error (SE) (the average of the 1000 SEs) and the empirical SE (the sample standard deviation of the 1000 point estimates) for the treatment difference at the final visit. Empirical and average SEs for J2R and CIR are similar to those for the corresponding causal model estimates. The SEs for CR are slightly larger than those for the causal model with the second element of , because is estimated in CR while is an assumed value in the causal model. With MAR data, the larger average and empirical SEs due to using the variance-covariance matrix from the active arm rather than from the control arm arise mainly because there are no missing data in the control arm and heterogeneity is larger in the active arm than in the control arm. More importantly, as shown in Seaman et al. (2014), the results confirm that both RBI and causal model methods give (1) smaller empirical SEs than the estimator based on the complete data, and (2) larger average SEs (estimated using Rubin’s rules) than the empirical SEs of the methods and the empirical SEs based on the complete data. We comment on these observations in the discussion.

Table 3.

Simulation study: average standard error (empirical standard error) for the treatment difference at the final visit using complete data, causal model imputation and RBI imputation. and as in Table 2.

	Data generating mechanisms for observed data
	MCAR		MAR
	β1(2)=β1(0)	β1(2)≠β1(0)	β1(2)=β1(0)	β1(2)≠β1(0)
A. Complete data generated with:
β1(1)=β1(0)
k=0.00	0.276 (0.273)	0.318 (0.311)	0.260 (0.255)	0.295 (0.288)
k=0.50	0.272 (0.269)	0.315 (0.308)	0.261 (0.256)	0.298 (0.291)
k=0.74	0.302 (0.206)	0.342 (0.247)	0.311 (0.219)	0.351 (0.258)
k=1.00	0.270 (0.266)	0.313 (0.306)	0.262 (0.257)	0.301 (0.295)
β1(1)=β1(2)
k=0.00	0.276 (0.273)	0.318 (0.316)	0.260 (0.255)	0.292 (0.288)
k=0.50	0.272 (0.269)	0.315 (0.312)	0.261 (0.256)	0.296 (0.291)
k=0.74	0.271 (0.268)	0.314 (0.311)	0.262 (0.256)	0.298 (0.294)
k=1.00	0.270 (0.266)	0.313 (0.310)	0.262 (0.257)	0.300 (0.296)
B. Causal model imputation with assumed β˜1(1) and k˜
β˜1(1)=β1(0)
k˜=0.00	0.310 (0.168)	0.337 (0.189)	0.305 (0.171)	0.328 (0.181)
k˜=0.50	0.301 (0.190)	0.327 (0.226)	0.299 (0.189)	0.322 (0.214)
k˜=0.74	0.302 (0.206)	0.327 (0.249)	0.301 (0.206)	0.325 (0.237)
k˜=1.00	0.305 (0.226)	0.332 (0.277)	0.306 (0.227)	0.332 (0.267)
β˜1(1)=β1(2)
k˜=0.00	0.310 (0.168)	0.359 (0.187)	0.315 (0.188)	0.359 (0.206)
k˜=0.50	0.301 (0.190)	0.344 (0.224)	0.310 (0.204)	0.350 (0.236)
k˜=0.74	0.302 (0.206)	0.342 (0.247)	0.311 (0.219)	0.351 (0.258)
k˜=1.00	0.305 (0.226)	0.345 (0.275)	0.316 (0.239)	0.356 (0.285)
C. RBI imputation with assumed variance-covariance matrix and method
Variance-covariance matrix from control arm
J2R	0.310 (0.168)	0.337 (0.189)	0.305 (0.171)	0.328 (0.181)
CR	0.303 (0.192)	0.328 (0.229)	0.306 (0.200)	0.331 (0.226)
CIR	0.305 (0.226)	0.332 (0.277)	0.306 (0.227)	0.332 (0.267)
Variance-covariance matrix from active arm
J2R	0.310 (0.168)	0.359 (0.187)	0.315 (0.188)	0.359 (0.206)
CR	0.305 (0.192)	0.344 (0.249)	0.332 (0.236)	0.370 (0.283)
CIR	0.305 (0.226)	0.345 (0.275)	0.316 (0.239)	0.356 (0.285)

Note: Maximum Monte Carlo standard error

Simulation study: average standard error (empirical standard error) for the treatment difference at the final visit using complete data, causal model imputation and RBI imputation. and as in Table 2. Note: Maximum Monte Carlo standard error

Examples

We use two example data sets from randomised, double-blind, parallel-group studies comparing active treatment with placebo. The first is from a trial of 172 participants with major depressive disorders, taken from the DIA page of www.missingdata.org.uk, and used in the DIA working group to demonstrate various missing data related analytical methods. The outcome variable is the 17-item Hamilton Depression Rating Scale, HAMD17. The second, kindly supplied by Devan Mehrotra, is from a pain trial with a pain score as outcome. In the HAMD17 trial, 76% (64/84) and 74% (65/88) of the randomised participants completed the final (fourth) visit in the active and placebo arms, respectively. In the pain score trial, the completion rate at the final (sixth) visit was 70% (47/67) and 67% (36/54) in the active and placebo arms, respectively. In both trials, participants were not followed up after treatment discontinuation. The observed trajectory means and the frequency of dropout patterns in each trial are shown in Figure 2.

Figure 2.

HAMD17 and pain score data sets: observed mean profile according to the visit at which treatment was discontinued in the active and placebo arms.

Note: In the pain score data, four subjects in the active arm and two subjects in the placebo arm did not complete any post-baseline visit and were excluded from analysis.

HAMD17 and pain score data sets: observed mean profile according to the visit at which treatment was discontinued in the active and placebo arms. Note: In the pain score data, four subjects in the active arm and two subjects in the placebo arm did not complete any post-baseline visit and were excluded from analysis. We used the SAS 5 macros for implementing the RBI methods and causal models (Section 4). For the RBI methods, we assumed participants in the active arm were treated similarly to the placebo arm after discontinuing the active treatment. To construct the joint distribution of pre- and post-discontinuation active-arm data under the RBI methods, we first used the variance-covariance matrix from the placebo arm (RBI analyses) and then repeated the methods with the variance-covariance matrix from the active arm (RBI alternative analyses). Table 4 shows the estimated treatment effect on HAMD17 and pain score at the final visit from standard MI, MMRM and RBI methods. The standard MI and MMRM methods estimate the de jure estimand. These differ slightly for HAMD17 because of a small incompatibility between the imputation and analysis models: the imputation model uses all visits to estimate a common effect of the baseline covariate PoolInv, but the analysis model uses only the final visit. The RBI methods estimate the de facto estimand and show, as expected, treatment estimates of smaller magnitude than the de jure estimand, with J2R giving the smallest magnitude of treatment effect followed by CR. Using the variance-covariance matrix from the active arm rather than from the placebo arm gives slightly more conservative estimates.

Table 4.

HAMD17 and pain score data: estimated treatment effect at the final visit using standard multiple imputation with 100 imputations, mixed model for repeated measures (MMRM) and RBI methods.

Estimand &	HAMD17			Pain score
Method	Estimate 1	Std. error	p-value	Estimate 2	Std. error	p-value
De jure
Standard MI	−2.62	0.99	0.01	−0.88	0.39	0.03
MMRM	−2.58	1.03	0.01	−0.88	0.39	0.03
De facto
RBI: variance-covariance matrix from the placebo arm
J2R	−2.01	1.01	0.05	−0.64	0.40	0.11
CR	−2.22	0.99	0.03	−0.75	0.39	0.06
CIR	−2.30	0.99	0.02	−0.77	0.39	0.05
RBI alternative: variance-covariance matrix from the drug arm
J2R	−1.99	1.01	0.05	−0.60	0.39	0.13
CR	−2.20	0.99	0.03	−0.71	0.39	0.07
CIR	−2.28	0.99	0.02	−0.73	0.39	0.06

Monte Carlo standard error for MI methods is .

HAMD17 and pain score data: estimated treatment effect at the final visit using standard multiple imputation with 100 imputations, mixed model for repeated measures (MMRM) and RBI methods. Monte Carlo standard error for MI methods is . Monte Carlo standard error for MI methods is . We next demonstrate tipping point sensitivity analyses using causal models (5) and (6). In model (5), a fraction of the treatment effect is maintained at all visits after discontinuation. Figure 3 shows the de facto estimates and 95% CI over a range of from −0.5 to 2.5. As shown in the theory and the simulation results, the J2R and CIR estimates correspond to using the causal model with (no maintained treatment effect after discontinuation) and (fully maintained treatment effect after discontinuation), respectively. Values mean that the effect of treatment after discontinuation is harmful, while values mean that the effect of treatment after discontinuation is greater than before discontinuation. The tipping point analysis on HAMD17 shows that statistical significance is lost when (with variance-covariance from the placebo arm) or (with variance-covariance from the active arm). In both cases, this suggests that the de facto estimate of treatment effect on HAMD17 is non-significant only if any benefit of the active treatment is lost immediately following discontinuation. For the pain score trial, statistical significance is lost when or (depending on whether the variance-covariance matrix is taken from the placebo or the active arm, respectively). This suggests that, in order for the de facto estimate of treatment effect to be statistically significant, there would need to be a delayed benefit such that the treatment effect was greater after discontinuation than before discontinuation. In both trials, comparing Table 4 with Figure 3 shows that the MAR analyses give estimates of the de jure estimand that are numerically similar to the causal model estimates of the de facto estimand when values around are assumed.

Figure 3.

HAMD17 and pain score data sets: tipping point analysis for the estimated treatment effect at the final visit using causal model (5). The model has a constant treatment effect after treatment discontinuation, equal to fraction of the treatment effect at treatment discontinuation. The horizontal solid and dotted lines represent the treatment estimates and their pointwise 95% CI, respectively. The vertical solid line corresponds to such that p-value in the left-hand side of the line (tipping point). In model (6), the treatment effect decays exponentially after discontinuation. Here, for J2R and for CIR. We took visits as the timescale, so that in model (6). Figure 4 shows the de facto estimates of treatment effect at the final visit and its 95% CI from the causal model over a range of . This model does not accommodate the effect of treatment after discontinuation being either harmful or greater than before discontinuation, and because of the more limited range of , the tipping point is not reached: all results are statistically significant for HAMD17 and not significant for the pain score.

Figure 4.

HAMD17 and pain score data sets: tipping point analysis for the estimated treatment effect at the final visit using causal model (6). The model has the treatment effect decaying exponentially after treatment discontinuation, by a ratio for each visit. The horizontal solid and dashed lines represent the treatment estimates and their pointwise 95% CI, respectively. The tipping point is not attained in the range .

Discussion

We have considered longitudinal RCTs with quantitative outcomes in which participants who discontinue an active treatment are not followed up thereafter, but are assumed to receive a treatment similar to the control treatment. We have focused on estimating the effect of assignment to treatment in the actual treatment circumstances of the trial (de facto or treatment-policy estimand) rather than the treatment effect if all participants had tolerated or adhered to trial protocol (de jure or hypothetical estimand). We have proposed a generalised causal modelling approach to account for treatment discontinuation in the estimation of the de facto estimand. The proposed causal model makes an explicit assumption about the maintained causal effect of treatment after treatment discontinuation and provides flexibility to perform sensitivity analyses to the causal assumption. The causal model agrees with RBI methods in certain cases, and this provides a formal justification of these RBI methods. The proposed causal model specifies how much of the treatment effect is maintained after treatment discontinuation, which we represent by the matrix . We illustrated this with two examples of : equation (5) with the maintained treatment effect independent of time since discontinuation, and equation (6) with the maintained treatment effect decaying exponentially with visits since discontinuation. A simple extension would allow to depend on the reason for treatment discontinuation. Ideally, sponsors should justify the choice of in the trial protocol based on the nature of the trial and the treatments. The choice of regression slope in the imputation model, reflecting within-subject dependence of post-discontinuation outcomes on pre-discontinuation outcomes should similarly be pre-specified. It is hard to recommend a single choice and perhaps both and should be implemented. If the analyst is willing to assume equal variance-covariance matrices across trial arms then the situation is simpler and is the obvious choice. It is sensible to make the corresponding choices for the residual variance matrix . Our model has been presented for the case of active-arm treatment discontinuation, where subjects who discontinue do not then receive rescue medication over and above the per-protocol treatment regime for the control arm or when interest is in the effect attributable to the initially randomised treatment without the confounding effects of rescue medications. An unresolved problem is how to handle initiation of rescue medications when the confounding effects of rescue medications are of interest. The model can be extended to handle the control arm starting active treatment: an assumption like A6 still holds, but the matrix must be replaced by assumptions about how the treatment effect builds up over time. Assumption A6 implies that if treatment has no effect before discontinuation then it has no effect after discontinuation. This seems reasonable in general; if it was unreasonable in a particular trial, then a constant term could be added in assumption A6 and equation (1). Other assumptions are possible, such as a non-linear model. We have focussed on varying assumption A6, but we should also assess a number of other assumptions. The MAR assumptions A2 and A3, and the related assumption A5, could be made more plausible if the model could be extended to include further time-dependent covariates. Alternatively one could explore sensitivity to these assumptions by methods like those of Ratitch et al. (2013). It is less clear how to assess departures from the linearity assumption A4. All the methods we have considered – RBI methods, causal model and MMRM – make a multivariate Normal (MVN) assumption. Our key finding that the causal model and RBI methods are equivalent is valid even if the MVN assumption is false. However, failure of the MVN assumption risks causing bias in all the methods. The assumption can be checked in the observed data using standard methods. If the MVN assumption is correct for the observed data and the maintained treatment effect model (6) is correct, then the imputed data have the correct mean, and so the treatment effect in the imputed data is unbiased even if the MVN assumption is false for the unobserved data. If data were skewed, then it would be wise to consider a transformation before analysis. Our model applies to quantitative outcomes. Extension to other outcomes would be useful. The repeated-sampling variance of the estimated treatment effect tends to be smaller than the Rubin’s rules estimate of variance for a given (Table 3). The repeated-sampling variance can be approximated in practice using the delta method (Liu and Pang 2016; Oehlert 1992). Carpenter et al. (2014) argue that the repeated-sampling variance is not appropriate since it is typically smaller than the complete-data variance (to an extent which depends on the value of ). They also argue that the Rubin’s rules estimate of variance of the treatment effect is larger than the complete-data variance, because of the information lost due to the missing data, and this makes it an appropriate variance (Carpenter et al. 2014; Cro et al. 2019). We point out that the type I error rate is correct for the repeated-sampling variance and too small for the Rubin’s rules variance, meaning that the Rubin’s rules variance carries a loss of power; therefore, the repeated-sampling variance may be appropriate for a primary analysis. In summary, whilst MI is an attractive and powerful method for handling missing data in both experimental and observational studies, it is not always clear what estimand is being targeted or what assumptions are being made about how outcomes for subjects who discontinue randomised treatment relate to those who remain on study. The recent estimands debate (European Medicines Agency 2017) has led to a growing recognition that more complex estimation approaches that do not rely on randomisation may be needed to handle post-randomisation events that lead to missing data, and there are calls for causal inference methods to become more widely adopted (e.g. Akacha et al. (2017); Little and Kang (2015)). We join this call to encourage greater understanding and application of ideas from the causal inference literature to help support the definition and estimation of estimands of interest in a randomised clinical trial. We hope that this paper illustrates how a causal inference framework can provide clarity and rigour in stating estimands, stating assumptions, and performing estimation.

15 in total

Review 1. Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches.

Authors: R J Little; D B Rubin
Journal: Annu Rev Public Health Date: 2000 Impact factor: 21.981

2. On analysis of longitudinal clinical trials with missing data using reference-based imputation.

Authors: G Frank Liu; Lei Pang
Journal: J Biopharm Stat Date: 2015-09-29 Impact factor: 1.051

3. A taxonomy of estimands for regulatory clinical trials with discontinuations.

Authors: Thomas Permutt
Journal: Stat Med Date: 2015-12-17 Impact factor: 2.373

4. Missing data handling methods in medical device clinical trials.

Authors: Xu Yan; Shiowjen Lee; Ning Li
Journal: J Biopharm Stat Date: 2009-11 Impact factor: 1.051

5. A structured approach to choosing estimands and estimators in longitudinal clinical trials.

Authors: C H Mallinckrodt; Q Lin; I Lipkovich; G Molenberghs
Journal: Pharm Stat Date: 2012-09-10 Impact factor: 1.894

6. Sensitivity analysis for a partially missing binary outcome in a two-arm randomized clinical trial.

Authors: Victoria Liublinska; Donald B Rubin
Journal: Stat Med Date: 2014-05-20 Impact factor: 2.373

7. Estimands in clinical trials - broadening the perspective.

Authors: Mouna Akacha; Frank Bretz; Stephen Ruberg
Journal: Stat Med Date: 2016-07-19 Impact factor: 2.373

8. Estimating psychological treatment effects from a randomised controlled trial with both non-compliance and loss to follow-up.

Authors: G Dunn; M Maracy; C Dowrick; J L Ayuso-Mateos; O S Dalgard; H Page; V Lehtinen; P Casey; C Wilkinson; J L Vazquez-Barquero; G Wilkinson
Journal: Br J Psychiatry Date: 2003-10 Impact factor: 9.319

9. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation.

Authors: James R Carpenter; James H Roger; Michael G Kenward
Journal: J Biopharm Stat Date: 2013 Impact factor: 1.051

10. Information-anchored sensitivity analysis: theory and application.

Authors: Suzie Cro; James R Carpenter; Michael G Kenward
Journal: J R Stat Soc Ser A Stat Soc Date: 2018-11-16 Impact factor: 2.483

2 in total

1. Randomized double-blind placebo-controlled proof-of-concept trial of resveratrol for outpatient treatment of mild coronavirus disease (COVID-19).

Authors: Marvin R McCreary; Patrick M Schnell; Dale A Rhoda
Journal: Sci Rep Date: 2022-06-29 Impact factor: 4.996

2. Randomized Double-blind Placebo-controlled Proof-of-concept Trial of Resveratrol for Outpatient Treatment of Mild Coronavirus Disease (COVID-19).

Authors: Marvin R McCreary; Patrick M Schnell; Dale A Rhoda
Journal: Res Sq Date: 2021-09-13

2 in total