Literature DB >> 34303343

Statistical analysis of two arm randomized pre-post designs with one post-treatment measurement.

Abstract

BACKGROUND: Randomized pre-post designs, with outcomes measured at baseline and after treatment, have been commonly used to compare the clinical effectiveness of two competing treatments. There are vast, but often conflicting, amount of information in current literature about the best analytic methods for pre-post designs. It is challenging for applied researchers to make an informed choice.
METHODS: We discuss six methods commonly used in literature: one way analysis of variance ("ANOVA"), analysis of covariance main effect and interaction models on the post-treatment score ("ANCOVAI" and "ANCOVAII"), ANOVA on the change score between the baseline and post-treatment scores ("ANOVA-Change"), repeated measures ("RM") and constrained repeated measures ("cRM") models on the baseline and post-treatment scores as joint outcomes. We review a number of study endpoints in randomized pre-post designs and identify the mean difference in the post-treatment score as the common treatment effect that all six methods target. We delineate the underlying differences and connections between these competing methods in homogeneous and heterogeneous study populations.
RESULTS: ANCOVA and cRM outperform other alternative methods because their treatment effect estimators have the smallest variances. cRM has comparable performance to ANCOVAI in the homogeneous scenario and to ANCOVAII in the heterogeneous scenario. In spite of that, ANCOVA has several advantages over cRM: i) the baseline score is adjusted as covariate because it is not an outcome by definition; ii) it is very convenient to incorporate other baseline variables and easy to handle complex heteroscedasticity patterns in a linear regression framework.
CONCLUSIONS: ANCOVA is a simple and the most efficient approach for analyzing pre-post randomized designs.

Entities: Chemical Disease Gene Species

Keywords: ANCOVA; ANOVA; Change score; Pre-post design; Repeated measures; Treatment effect

Year: 2021 PMID： 34303343 PMCID： PMC8305561 DOI： 10.1186/s12874-021-01323-9

Source DB: PubMed Journal: BMC Med Res Methodol ISSN： 1471-2288 Impact factor: 4.615

Background

Two arm parallel randomized trials have been widely used to compare the clinical effectiveness of competing treatments in improving patients’ health outcomes. In these trials, continuous outcomes of interest were routinely measured at baseline (defined as “baseline score”) and one post treatment time point (defined as “post-treatment score”). The primary purpose of designing a pre-post randomized study is to answer the scientific question of interest: is treatment more effective than treatment ? To assess the difference in the treatment effectiveness between two treatments, we need to select a study endpoint and quantify a treatment effect. Common study endpoints include the post treatment score, the change score from baseline to post treatment, a percentage change from baseline, and rate of change from baseline. The difference between two arms on selected study endpoints is defined as the treatment effect. Few studies have investigated the links between these different metrics of treatment effect in a randomized pre-post trial. These underlying connections are critical in understanding the equivalence among some statistical methods that may appear to be very different at the first sight. We need to be certain about the type of treatment effect each method targets and select the one that yields an unbiased and the most efficient estimator of the treatment effect of our interest. There are a number of statistical methods commonly used in analyzing pre-post trials. We can analyze the post-treatment score using one way analysis of variance model () [1, 2], analysis of covariance model adjusting for the baseline score (I) [2-7], and ANCOVA including a baseline score by treatment interaction (II) [3, 4, 8–10]. We can also analyze the change score using () [11]. Alternatively, we can model the baseline and post-treatment scores jointly using repeated measures models () and constrained repeated measures models () [10, 12–14]. Despite of the simplicity and wide application of randomized pre-post designs, which method is the best analytic approach has been a debated topic and many methodological studies have been performed to compare different statistical methods for past decades [1-13]. However, it is challenging for applied researchers to evaluate this vast, but often conflicting, amount of information in current literature and make an informed choice. In this study we aim to review IIIand from a practical standpoint, with the focus on delineating the differences and underlying connections between them. In section Methods, we first provide notations and assumptions for a typical pre-post design, define homogeneous and heterogeneous study populations, and discuss some common study endpoints and the associated metrics of treatment effects. We next analytically assess differences and connections between these competing models in the homogeneous and heterogeneous scenarios by first describing each model using the same set of population mean, variance, and covariance parameters. In section Results, we compare the relative efficiency of these competing methods theoretically using three simulated weight loss trial examples (homogeneous data, heterogeneous data with balanced design, heterogeneous data with unbalanced design). In the last two sections, we discuss the results and give recommendation on the best analytical approach in a randomized pre-post design.

Methods

A hypothetical weight loss trial and metrics of treatment effects

Notations

In a hypothetical two arm parallel weight loss trial comparing the effect of a new drug (“treatment”) and a placebo (“control”) in reducing participants’ body weights, we use Y to denote body weight of the i th subject (i = 1, 2, 3, …n) in the jth treatment arm (j = 0, 1) at the t th time (t = t0, t1 ). n0 and n1 are the number of subjects in the control and treatment arms. We denote the mean baseline weights for the treatment and control arms by and , respectively. Random allocation guarantees and we let denote the overall mean baseline weight. The mean weights of the treatment and control arms at time t1 are denoted by and , respectively (Fig. 1). We define homogeneous and heterogeneous study populations as follows: where and are the variances of the baseline and post-treatment weights, ρ is the correlation coefficient between the baseline and post-treatment weights. and where is the common variance of the baseline body weight in the control and treatment arms. Randomization guarantees that the variances of the baseline weights in both arms are equal to . and are the variances of the post-treatment weight in the control and treatment arms. ρ0 and ρ1 are the correlation coefficients between the baseline and post-treatment weights in the control and treatment arms, respectively. In practice, participants may respond to the treatment more differently so that variability of the post-treatment weight tends to be larger in the treatment arm than in the control arm and the correlation between pre- and post-treatment weights are usually stronger in the control arm than in the treatment arm. i.e., ρ0 > ρ1 and .

Fig. 1

Hypothetical two arm pre-post weight loss randomized trial

every participant has the same pattern of variance and covariance structure for their baseline and post-treatment weights, which is parameterized as below: Hypothetical two arm pre-post weight loss randomized trial variance and covariance structures of the baseline and post-treatment weights differ between the treatment and control arms. Formally, we have

Metrics of treatment effect

We discuss the following three metrics of treatment effect commonly reported in pre-post trials: The primary endpoint is the post-treatment weight measured at t1. The difference in the mean post-treatment weights of two arms is defined as a treatment effect, which is parameterized as follows: For example, if τ = − 10, we can interpret the results as “at the end of the trial, the mean weight was 10 pounds lower in the treatment group than in the control group.” The primary endpoint is the change score calculated by subtracting the baseline weight from the post-treatment weight. i.e., . The difference in the mean change scores of two arms is a treatment effect. Formally, we have: e.g. if , this difference is usually interpreted as “weight reductions were 10 pounds greater in the treatment group than in the control group”. Since randomization ensures , it follows directly . When we code “0” for t0 and “1” for t1, the mean change score for each arm can also be interpreted as the mean change rate per unit time for each arm, represented by slopes in Fig. 1. Thus, the difference in slopes, denoted by , is also equivalent to τ. As shown in previous section, and target τ, targets and targets However, we can compare these statistical methods targeting seemingly very different types of treatment effects in a meaningful way because of the equivalence between τ, and in randomized pre-post designs. where and are the mean percent changes of the treatment and control arms. Although the percent change is popular among clinical researchers, this metric has several drawbacks [1, 15, 16]: i) the percent change is a function of ratio . The distribution of the percent change is highly skewed. Analyzing it with normal-theory based statistical methods is not justified and non-parametric statistical methods are generally less powerful; ii) the percent change is not a symmetric measure. For example, the mean weight of adults over 20 in US is 197.8 pound for men and 170.5 pound for women. The mean difference is 27.3 pound between men and women. Men weight 16% (i.e.,100 × ((197.8–170.5)/170.5)) more than women, whereas women weight 13.8% (i.e., 100 × ((197.8–170.5)/197.8)) less than men. The differences could be different depending on which sex is used as devisor; iii) the percent change is not an additive measure. For example, if a participant’s weight increases by 10% in first 6 months and fall by 10% for the next 6 months, the 2 % changes do not cancel out. The participant’s weight at the end would be only 99% of the participant’s starting weight. The primary endpoint is the percent change from baseline weight, denoted by . The mean difference in the percent change between two arms is defined as a treatment effect and parameterized as follows:

Statistical models

In this section, we focus on six methods that estimate τ. We describe each statistical model using the same set of population mean, variance, and covariance parameters defined in section Methods for homogeneous and heterogeneous scenarios, separately. For each method, we present the closed-form expressions of the point estimator of treatment effect and its variance. It often goes unnoticed in practice that different statistical methods have different types of variances (i.e., conditional vs. unconditional variances) associated with their treatment effect estimators. For example, the OLS model-based variances for ANCOVA are conditional because OLS assumes the baseline weight is fixed. Generally speaking, the baseline weight is random because we rarely enroll participants into randomized trials based on predetermined values of the baseline weight. Thus, the unconditional variance and the corresponding unconditional inference is of greater interest because we want the findings derived from the current sample to be generalizable to the population of interest. We will discuss in details whether the OLS model-based conditional inference (i.e., test statistics and p-values from standard statistical softwares) for ANCOVA is still valid for unconditional hypothesis testing and the potential fixes that we can use to draw valid unconditional inference if the usual OLS model-based inference is biased.

When the study population is homogeneous

Method 1: We model the post-treatment body weight using the binary treatment indicator G (1 if in the treatment arm; 0 if in the control arm) as follows: where τ, and is independently and identically distributed (i.i.d) random error. represents the treatment effect. Model (1) is homoscedastic with a constant residual variance . We can fit an ordinary least squares (OLS) regression to estimate the coefficients and standard errors of model (1). The closed-form expressions of the OLS estimator and its “unconditional” variance, denoted by , are presented in Table 1. is estimated by the sample group mean difference in the post-treatment weight between two arms. is unbiased for τ. The OLS model-based variance of assuming known is: where . is estimated by where is the predicted value from model (1). We let denote the OLS model-based variance estimator with substituted for , which is output by standard statistical softwares (Table 1). Since , it follows that . It is well established that is unbiased for . Thus, is unbiased for . The usual OLS model-based inference (i.e., test statistics and the associated p-value) is valid for testing H : τ = 0 unconditionally.

Table 1

Estimators of treatment effect and variance estimators in a homogeneous study population

Model	Estimator of treatment effect (τ)	Type^a	True variance of treatment effect estimator	OLS model based variance estimator
ANOVA-Post	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\beta}}_{1, ols}^{(1)}={\overline{y}}_{.1{t}_1}-{\overline{y}}_{.0{t}_1} $$\end{document}β^1,ols1=y¯.1t1−y¯.0t1	U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(1)}\right)=\frac{\sigma_1^2}{n_0}+\frac{\sigma_1^2}{n_1} $$\end{document}varβ^1,ols1=σ12n0+σ12n1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(1)}\right)=\frac{{\hat{\sigma}}_1^2}{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({G}_{ij}-{G}_{..}\right)}^2} $$\end{document}var^olsβ^1,ols1=σ^12∑j=01∑i=1njGij−G..2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\sigma}}_1^2=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_1}-{\hat{y}}_{ij{t}_1}\right)}^2}{\left({n}_0+{n}_1-2\right)} $$\end{document}σ^12=∑j=01∑i=1njyijt1−y^ijt12n0+n1−2
ANCOVA-Post I	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\beta}}_{1, ols}^{(2)}=\left({\overline{y}}_{.1{t}_1}-{\overline{y}}_{.0{t}_1}\right)-{\hat{\beta}}_{2, ols}^{(2)}\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right) $$\end{document}β^1,ols2=y¯.1t1−y¯.0t1−β^2,ols2y¯.1t0−y¯.0t0	C	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\|{Y}_{ij{t}_0}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}+\frac{{\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}^2}{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_0}-{\overline{y}}_{.j{t}_0}\right)}^2}\right){\sigma}_{\epsilon^{(2)}}^2 $$\end{document}varβ^1,ols2Yijt0=1n0+1n1+y¯.1t0−y¯.0t02∑j=01∑i=1njyijt0−y¯.jt02σϵ22, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\sigma}_{\epsilon^{(2)}}^2=\left(1-{\rho}^2\right){\sigma}_1^2 $$\end{document}σϵ22=1−ρ2σ12	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\mathit{\operatorname{var}}}}_{ols}\Big({\hat{\beta}}_{1, ols}^{(2)}\left\|{Y}_{ij{t}_0}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}+\frac{{\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}^2}{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_0}-{\overline{y}}_{.j{t}_0}\right)}^2}\right){\hat{\sigma}}_{e_{ij}^{(2)}}^2 $$\end{document}var^ols(β^1,ols2Yijt0=1n0+1n1+y¯.1t0−y¯.0t02∑j=01∑i=1njyijt0−y¯.jt02σ^eij22, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\sigma}}_{e_{ij}^{(2)}}^2=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_1}-{\hat{y}}_{ij{t}_1}\right)}^2}{\left({n}_0+{n}_1-4\right)} $$\end{document}σ^eij22=∑j=01∑i=1njyijt1−y^ijt12n0+n1−4
ANCOVA-Post I		U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}\right)\left(1-{\rho}^2\right){\sigma}_1^2 $$\end{document}varβ^1,ols2=1n0+1n11−ρ2σ12
RM	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\gamma}}_{3,\kern0.5em gls}^{(3)}=\left({\overline{y}}_{.1{t}_1}-{\overline{y}}_{.1{t}_0}\right)-\left({\overline{y}}_{.0{t}_1}-{\overline{y}}_{.0{t}_0}\right) $$\end{document}γ^3,gls3=y¯.1t1−y¯.1t0−y¯.0t1−y¯.0t0	U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\gamma}}_{3,\kern0.5em gls}^{(3)}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}\right)\left({\sigma}_1^2+{\sigma}_0^2-2\rho {\sigma}_0{\sigma}_1\right) $$\end{document}varγ^3,gls3=1n0+1n1σ12+σ02−2ρσ0σ1
cRM	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\gamma}}_{3,\kern0.5em gls}^{(4)}=\left({\overline{y}}_{.1{t}_1}-{\overline{y}}_{.0{t}_1}\right)-\frac{\rho {\sigma}_0{\sigma}_1}{\sigma_0^2}\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right) $$\end{document}γ^3,gls4=y¯.1t1−y¯.0t1−ρσ0σ1σ02y¯.1t0−y¯.0t0	U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\gamma}}_{3, gls}^{(4)}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}\right)\left(1-{\rho}^2\right){\sigma}_1^2 $$\end{document}varγ^3,gls4=1n0+1n11−ρ2σ12
ANOVA-Change	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\beta}}_{1, ols}^{(5)}=\left({\overline{y}}_{.1{t}_1}-{\overline{y}}_{.1{t}_0}\right)-\left({\overline{y}}_{.0{t}_1}-{\overline{y}}_{.0{t}_0}\right) $$\end{document}β^1,ols5=y¯.1t1−y¯.1t0−y¯.0t1−y¯.0t0	U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(5)}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}\right)\left({\sigma}_1^2+{\sigma}_0^2-2\rho {\sigma}_0{\sigma}_1\right) $$\end{document}varβ^1,ols5=1n0+1n1σ12+σ02−2ρσ0σ1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(5)}\right)=\frac{{\hat{\sigma}}_{\epsilon^{(5)}}^2}{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({G}_{ij}-{G}_{..}\right)}^2}, $$\end{document}var^olsβ^1,ols5=σ^ϵ52∑j=01∑i=1njGij−G..2, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\sigma}}_{\epsilon^{(5)}}^2=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({\Delta }_{ij}-{\hat{\Delta }}_{ij}^{(5)}\right)}^2}{\left({n}_0+{n}_1-2\right)} $$\end{document}σ^ϵ52=∑j=01∑i=1nj∆ij−∆^ij52n0+n1−2

aU- unconditional variance; C- conditional variance

Estimators of treatment effect and variance estimators in a homogeneous study population , , aU- unconditional variance; C- conditional variance Method 2:I: We model the post-treatment weight using the binary treatment indicator G and the baseline weight . , where , = = , and is i.i.d random error. measures the treatment effect τ and represents the slope of the pre-post association between and . Model (2) has a common residual variance and implicitly assumes that two arms share the common baseline mean . The coefficients and standard errors of model (2) are also estimated using an OLS regression. The OLS estimator is derived as the sample mean difference in the post-treatment weight adjusting for the sample mean difference in the baseline weight between two arms. The group mean difference in the baseline weight can be seen as chance imbalance in a randomized trial. is unbiased for τ both conditional on and unconditionally. The formulas of and its “unconditional” variance are listed in Table 1. However, OLS assumes that the baseline weight is fixed. OLS targets the conditional variance of , denoted by , instead of . The formula of with a known common residual variance is presented in Table 1. Since is generally unknown, it is estimated by the following sample residual variance: , where , the predicted value from model (2). We let denote the OLS model-based variance estimator with substituted for . Note that is reported by standard statistical softwares (e.g. “proc reg” in SAS). Its formula is presented in Table 1. Since we want to generalize our conclusions to a general population and can take different values from those collected in the current sample, we may wonder whether significance tests based on the model-based conditional variance assuming is fixed (e.g., ) is comparable to unconditional inference (e.g., ), in which is treated as random variable, for testing H : τ = 0. To establish this equivalence, we need to show: i) is unbiased for ; ii) is unbiased for . The first part is well established in a homoscedastic linear model. The second part holds because we can show that =E() using the law of total variance formula and the fact that is unbiased for τ. That is, the unconditional variance of is the average of its conditional variance over the distribution of the baseline weight. Therefore, the usual model-based standard errors and associated p-values are valid for unconditional inference [3, 5, 17]. Method 3:: models the baseline and post-treatment weights (, ) jointly using the binary treatment indicator G, the binary time factor T, the time by treatment interaction G × T as follows: When t0 = 0 and t1 = 1, , and . represents the mean baseline weight of the control arm, represents the difference in the mean baseline weights of the treatment and control arms, represents the mean change from baseline in the control arm, and is generally interpreted as the difference in the mean change from baseline in a unit time interval between the treatment and control arms (“difference in difference”), also known as the difference in slopes. We have from random allocation and it follows that and Thus, testing is equivalent to testing H : τ = 0. The generalized least squares (GLS) model with correlated outcomes is routinely used to estimate the coefficients and standard errors of model (3). The GLS estimator of the treatment effect and its variance ) given known variance and covariance parameters are presented in Table 1. is estimated by the sample mean difference in body weight change between two arms and is unbiased for τ in a large sample. The variance and covariance parameters are generally unknown and need to be estimated using the restricted maximum likelihood (REML). The conventional maximal likelihood estimation (MLE) should be avoided. The REML variance estimator ) is derived by plugging the REML estimators of the variance and covariance parameters (i.e., ) into the formula of .We use Kenward and Roger method [18](“ddfm = kenwardroger” in SAS proc. mixed procedure) to adjust for the potential finite sample bias in ) because of its failure to incorporate variabilities of the REML estimators of the variance and covariance parameters. This adjustment involves inflating the variance and covariance matrix and computing an adjusted approximation degrees of freedom. Method 4:: By specifying in the model, model (3) assumes the mean baseline weight is different between two arms. Liang and Zeger [8] proposed the following model by fixing to force the treatment and control arms to have the same intercept. Intuitively, is more efficient than because estimates one less parameter. Formally, we model the baseline and post-treatment weights (, ) jointly using the binary factor T, a time by treatment interaction G × T in the following model: where , and . Interpretations of , , and are the same as their counterparts in . The formulas of the GLS point estimator and its variance are listed in Table 1. is unbiased for τ asymptotically. The empirical or the model-based variance estimate for is derived using REML in the same way as a regular model. Method 5:: We model change score using the binary treatment indicator G as follows: where , , and is i.i.d random error. measures the mean difference score in the control arm. measures the treatment effect . Since due to randomization at baseline, is reduced to τ. The closed-form expressions of and are listed in Table 1. is derived as the sample mean difference in the change score between two arms (“difference in difference”) and is unbiased for τ. The OLS model-based variance of assuming known is where . is estimated by where is the fitted value from model (5). We let denote the OLS model-based variance estimator with substituted for Table 1, which is reported by standard statistical softwares. Since , it follows that . It is well established that is unbiased for , and thus for . The usual OLS model-based inference is valid for unconditional hypothesis testing.

When the study population is heterogeneous

Method 6:II: Different variance and covariance structures in the treatment and control arms suggest a baseline measurement by treatment interaction term in ANCOVA [2, 3, 9, 10]. To estimate τ using an interaction model, we first compute the mean centered baseline weight by subtracting the overall mean baseline weight from individual baseline weights. i.e., . We then model the post-treatment body weight using the binary treatment indicator G, the mean centered baseline weight , and the baseline weight by treatment interaction as follows: , where , , and . and are i.i.d random errors in the control and treatment arms. measures the treatment effect. is the regression slope of the baseline body weight in the control arm. measures the difference in the regression slopes of the baseline weight between the treatment and control arms. Model (6) is heteroscedastic because the error terms in the treatment and control arms have different residual variances. As presented in Table 2, the OLS estimator is the adjusted mean difference in the post-treatment body weights controlling for a weighted mean difference of the baseline body weights between two arms with unequal weighting coefficients for treatment and control arms (i.e., for the treatment group, and for the control group). is unbiased for τ. The conditional variance of , denoted by , incorporates two different residual variances and (Table 2). Standard statistical softwares such as SAS do not output because OLS incorrectly assumes a common residual variance , which is the following weighted average of and :

Table 2

Estimators of treatment effect and variance estimators in a heterogeneous study population

Model	Estimator of treatment effect (τ)	Type	True variance of treatment effect estimator	Variance estimator from OLS model
ANCOVA-Post II	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\beta}}_{1, ols}^{(6)}=\left({\overline{y}}_{.1{t}_1}-\left({\hat{\beta}}_{2, ols}^{(6)}+{\hat{\beta}}_{3, ols}^{(6)}\right){\overline{\overset{\sim }{y}}}_{.1{t}_0}\right)-\left({\overline{y}}_{.0{t}_0}-{\hat{\beta}}_{2, ols}^{(6)}{\overline{\overset{\sim }{y}}}_{.0{t}_0}\right) $$\end{document}β^1,ols6=y¯.1t1−β^2,ols6+β^3,ols6y~¯.1t0−y¯.0t0−β^2,ols6y~¯.0t0	C	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}\|{\overset{\sim }{Y}}_{ij{t}_0}\right)=\left(\frac{1}{n_0}+\frac{{\overline{\overset{\sim }{y}}}_{.o{t}_0}^2}{\sum_{i=1}^{n_0}{\left(\tilde{y}_{{i}0{t}_0}-{\overline{\overset{\sim }{y}}}_{.0{t}_0}\right)}^2}\right)\ {\sigma}_{\epsilon_0^{(6)}}^2+\left(\frac{1}{n_1}+\frac{{\overline{\overset{\sim }{y}}}_{.1{t}_0}^2}{\sum_{i=1}^{n_0}{\left(\tilde{y}_{{i}1{t}_0}-{\overline{\overset{\sim }{y}}}_{.1{t}_0}\right)}^2}\right)\ {\sigma}_{\epsilon_1^{(6)}}^2 $$\end{document}varβ^1,ols6Y~ijt0=1n0+y~¯.ot02∑i=1n0y~i0t0−y~¯.0t02σϵ062+1n1+y~¯.1t02∑i=1n0y~i1t0−y~¯.1t02σϵ162 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \kern0.5em {\sigma}_{\epsilon_0^{(6)}}^2=\left(1-{\rho}_0^2\right){\sigma}_{01}^2 $$\end{document}σϵ062=1−ρ02σ012, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \kern0.5em {\sigma}_{\epsilon_1^{(6)}}^2=\left(1-{\rho}_1^2\right){\sigma}_{11}^2 $$\end{document}σϵ162=1−ρ12σ112	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}\|{\overset{\sim }{Y}}_{ij{t}_0}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}+\frac{{\overline{\overset{\sim }{y}}}_{.o{t}_0}^2}{\sum_{i=1}^{n_0}{\left(\tilde{y}_{{i}0{t}_0}-{\overline{\overset{\sim }{y}}}_{.0{t}_0}\right)}^2}+\frac{{\overline{\overset{\sim }{y}}}_{.1{t}_0}^2}{\sum_{i=1}^{n_0}{\left(\tilde{y}_{{i}1{t}_0}-{\overline{\overset{\sim }{y}}}_{.1{t}_0}\right)}^2}\right){\hat{\sigma}}_{\epsilon^{(6)}}^2 $$\end{document}var^olsβ^1,ols6Y~ijt0=1n0+1n1+y~¯.ot02∑i=1n0y~i0t0−y~¯.0t02+y~¯.1t02∑i=1n0y~i1t0−y~¯.1t02σ^ϵ62 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\sigma}}_{\epsilon^{(6)}}^2=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_1}-{\hat{y}}_{ij{t}_1}\right)}^2}{\left({n}_0+{n}_1-5\right)} $$\end{document}σ^ϵ62=∑j=01∑i=1njyijt1−y^ijt12n0+n1−5
		U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}\right)=\frac{1}{n_0}\left(1-{\rho}_0^2\right){\sigma}_{01}^2+\frac{1}{n_1}\left(1-{\rho}_1^2\right){\sigma}_{11}^2+{\left({\rho}_1\frac{\sigma_{11}}{\sigma_0}-{\rho}_0\frac{\sigma_{01}}{\sigma_0}\right)}^2\frac{\sigma_0^2}{n_0+{n}_1} $$\end{document}varβ^1,ols6=1n01−ρ02σ012+1n11−ρ12σ112+ρ1σ11σ0−ρ0σ01σ02σ02n0+n1
ANCOVA-Post I	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\beta}}_{1, ols}^{(7)}=\left({\overline{y}}_{.1{t}_1}-{\overline{y}}_{.0{t}_1}\right)-{\hat{\beta}}_{2, ols}^{(7)}\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right) $$\end{document}β^1,ols7=y¯.1t1−y¯.0t1−β^2,ols7y¯.1t0−y¯.0t0	C	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}\|{Y}_{ij{t}_0}\right)=\left(\frac{1}{n_0}+\frac{\sum \limits_{i=1}^{n_0}{\left({y}_{i1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}^2\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_0}-{\overline{y}}_{.j{t}_0}\right)}^2}\right)\kern0.5em {\sigma}_{\epsilon_0^{(7)}}^2+\left(\frac{1}{n_1}+\frac{\sum \limits_{i=1}^{n_1}{\left({y}_{i1{t}_0}-{\overline{y}}_{.1{t}_0}\right)}^2\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}{\sum \limits_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{i1{t}_0}-{\overline{\overset{\sim }{y}}}_{.1{t}_0}\right)}^2}\right)\kern0.5em {\sigma}_{\epsilon_1^{(7)}}^2 $$\end{document}varβ^1,ols7Yijt0=1n0+∑i=1n0yi1t0−y¯.0t02y¯.1t0−y¯.0t0∑j=01∑i=1njyijt0−y¯.jt02σϵ072+1n1+∑i=1n1yi1t0−y¯.1t02y¯.1t0−y¯.0t0∑j=01∑i=1njyi1t0−y~¯.1t02σϵ172 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \kern0.5em {\sigma}_{\epsilon_0^{(7)}}^2=\left(1-{\rho}_0^2\right){\sigma}_{01}^2 $$\end{document}σϵ072=1−ρ02σ012, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \kern0.5em {\sigma}_{\epsilon_1^{(7)}}^2=\left(1-{\rho}_1^2\right){\sigma}_{11}^2 $$\end{document}σϵ172=1−ρ12σ112	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}\|{Y}_{ij{t}_0}\right)=\left(\frac{1}{n_0}+\frac{1}{n_1}+\frac{\sum \limits_{i=1}^{n_0}{\left({y}_{i1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}^2\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_0}-{\overline{y}}_{.j{t}_0}\right)}^2}+\frac{\sum \limits_{i=1}^{n_1}{\left({y}_{i1{t}_0}-{\overline{y}}_{.1{t}_0}\right)}^2\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{.0{t}_0}\right)}{\sum \limits_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{i1{t}_0}-{\overline{\overset{\sim }{y}}}_{.1{t}_0}\right)}^2}\right){\hat{\sigma}}_{\epsilon^{(7)}}^2 $$\end{document}var^olsβ^1,ols7Yijt0=1n0+1n1+∑i=1n0yi1t0−y¯.0t02y¯.1t0−y¯.0t0∑j=01∑i=1njyijt0−y¯.jt02+∑i=1n1yi1t0−y¯.1t02y¯.1t0−y¯.0t0∑j=01∑i=1njyi1t0−y~¯.1t02σ^ϵ72 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\sigma}}_{\epsilon^{(7)}}^2=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{\left({y}_{ij{t}_1}-{\hat{y}}_{ij{t}_1}\right)}^2}{\left({n}_0+{n}_1-4\right)} $$\end{document}σ^ϵ72=∑j=01∑i=1njyijt1−y^ijt12n0+n1−4
		U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}\right)=\frac{1}{n_0}\left[\left(1-{\rho}_0^2\right){\sigma}_{01}^2+{\left(\left({\rho}_1\frac{\sigma_{11}}{\sigma_0}-{\rho}_0\frac{\sigma_{01}}{\sigma_0}\right){p}_1\right)}^2{\sigma}_0^2+\frac{1}{n_1}\right[\left(1-{\rho}_1^2\right)\ {\sigma}_{11}^2+{\left(\left({\rho}_1\frac{\sigma_{11}}{\sigma_0}-{\rho}_0\frac{\sigma_{01}}{\sigma_0}\right){p}_0\right)}^2{\sigma}_0^2 $$\end{document}varβ^1,ols7=1n0[1−ρ02σ012+ρ1σ11σ0−ρ0σ01σ0p12σ02+1n1[1−ρ12σ112+ρ1σ11σ0−ρ0σ01σ0p02σ02]
cRM	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\hat{\gamma}}_{3,\kern0.5em gls}^{(4)}=\left({\overline{y}}_{.1{t}_1}-{\overline{y}}_{.0{t}_1}\right)-\Big(\frac{\rho_0{\sigma}_0{\sigma}_{01}}{\sigma_0^2}\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{..{t}_0}\right)-\frac{\rho_1{\sigma}_0{\sigma}_{11}}{\sigma_0^2}\left({\overline{y}}_{.1{t}_0}-{\overline{y}}_{..{t}_0}\right) $$\end{document}γ^3,gls4=y¯.1t1−y¯.0t1−(ρ0σ0σ01σ02y¯.1t0−y¯..t0−ρ1σ0σ11σ02y¯.1t0−y¯..t0)	U	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathit{\operatorname{var}}\left({\hat{\gamma}}_{3,\kern0.5em gls}^{(4)}\right)=\frac{1}{n_0}\left[\left(1-{\rho}_0^2\right){\sigma}_{01}^2+{\left(\left({\rho}_1\frac{\sigma_{11}}{\sigma_0}-{\rho}_0\frac{\sigma_{01}}{\sigma_0}\right){p}_1\right)}^2{\sigma}_0^2+\frac{1}{n_1}\right[\left(1-{\rho}_1^2\right)\ {\sigma}_{11}^2+{\left(\left({\rho}_1\frac{\sigma_{11}}{\sigma_0}-{\rho}_0\frac{\sigma_{01}}{\sigma_0}\right){p}_0\right)}^2{\sigma}_0^2 $$\end{document}varγ^3,gls4=1n0[1−ρ02σ012+ρ1σ11σ0−ρ0σ01σ0p12σ02+1n1[1−ρ12σ112+ρ1σ11σ0−ρ0σ01σ0p02σ02]

Estimators of treatment effect and variance estimators in a heterogeneous study population , , We let denote the OLS model-based conditional variance of incorporating (Table 2). Since is generally unknown, is estimated by where is the predicted value of . We let denote the OLS model-based variance estimator of with substituted for . and known constant (Table 2). is reported by standard statistical softwares (e.g., “proc reg” in SAS). To assess the validity of the model-based standard errors and p-values from a regular II model for unconditional inference, we need to examine: i) whether is unbiased for ; ii) whether is unbiased for . First, is unbiased for However, the unbiasedness of as an estimator of depends on the relationship between and . Asymptotically, we have It can be shown in a balanced design (n0 = n1), Thus, is nearly unbiased for When the design is unbalanced (n0 ≠ n1), Hence, is biased for Due to heteroscedasticity, over-estimates if the group with a larger residual variance has larger sample size and the group with a smaller residual variance has smaller sample size, and otherwise may underestimate [3, 4]. Second, the common mean baseline weight is generally unknown. We need to estimate in using the overall sample mean but ANCOVA treats as fixed and fails to capture this additional variability in the conditional variances. As shown below, it turns out that underestimates by a factor of [3]: Thus, the OLS model-based conditional inference is biased for unconditional hypothesis testing because of heteroscedasticity and neglecting of sampling variability in . To fix these two problems, we can use the following adjusted heteroscedasticity-consistent (HC) variance estimator to replace for valid unconditional inference: where is a HC variance estimator for [19] and can be output from standard softwares. HC variance estimators are consistent (i.e., unbiased in large sample). Among all available HC variance estimators, HC2 was shown to have the best performance in finite samples [3, 4] (e.g. “HCCMETHOD = 2” in proc. reg or “EMPIRICAL” in proc. mixed, SAS). is the OLS estimator of , and is the overall sample variance of the baseline body weight. It follows directly that is asymptotically unbiased for and we can construct a valid test for testing H : τ = 0 unconditionally. Method 7I: We model the post-treatment weight using the binary treatment G and the baseline weight : , where and . and are random errors in the control and treatment arms. Since and have different variances in general, model (7) is heteroscedastic and the severity of heteroscedasticity is determined by the correlation coefficient, the variances of the post-treatment weights in two arms, and whether the design is balanced. As shown in Table 2, the OLS estimator is an adjusted mean difference in the post-treatment weights controlling for a weighted mean difference of the baseline weights between two arms with equal weighting coefficient for the treatment and control arms (i.e., for both arms). is unbiased for τ. The true conditional variance incorporates two different residual variances. Similar to II, the OLS model-based inference for I also mistakenly assumes a constant residual variance , which is a weighted average of and , as follows: Since is unknown, it is estimated by where is the predicted value of from model (7). The closed form expressions of the OLS model-based conditional variance incorporating and the OLS model-based variance estimator with substituted for are given in Table 2. Recall that standard statistical softwares report . To show the model-based standard errors and p-values are valid for unconditional inference, we need to examine: i) whether is unbiased for ; ii) whether is unbiased for . First, is unbiased for but the unbiasedness of as an estimator of depends on the relationship between and . Asymptotically, we have When sample sizes are equal between two arms, we have Thus, is nearly unbiased for in a balanced design [3]. When sample sizes are not equal between two arms, it follows directly that is biased for due to heteroscedasticity. may over-estimate when the group with a larger residual variance has larger sample size and the group with a smaller residual variance has smaller sample size, and otherwise may underestimate [3, 4] . I is robust against heteroscedasticity in a balanced design, but not in an unbalanced design. Second, different from II, is unbiased for because . Thus, the model-based standard errors and p-values are valid for unconditional inference in a balanced design but are biased in an unbalanced design only due to heteroscedasticity. This bias can be easily corrected by replacing with an HC variance estimator [4, 19] and corrected I will provide valid unconditional inference. : We model the baseline and post-treatment weights () jointly using the binary time point T, time by treatment interaction G × T: where , and . Noting that subjects in the treatment and control arms have different variance-covariance structures for the association between the pre- and post-treatment weights, we fit a heterogeneous variance GLS model with group specific variance-covariance structure (“repeated/group=” in SAS proc. mixed procedure specifies distinct variance-covariance structure for each treatment arm). The formulas of and are listed in Table 2. The GLS estimator is asymptotically unbiased for. REML is used to derive the empirical or model-based variance estimator).

Results

All treatment effect estimators, except the ANOVA estimator, are expressed as the mean difference in post-treatment measurements adjusting for the chance imbalance in baseline measurement between two arms in certain ways. Nonetheless, all estimators are unbiased for τ. To compare these competing methods, we evaluate the efficiency of point estimators of treatment effect by comparing their “unconditional” variances. Since the hypothesis testing of no treatment effect is based on dividing the point estimator by its standard error (i.e., variance divided by sample size) and rejecting the null hypothesis when this ratio exceeds a given threshold, the method that produces unbiased point estimate with the smallest unconditional variance is preferred because standard error in the dominator of statistical test determines the statistical power.

When study population is homogeneous

I is a more efficient alternative to because (Table 1). This advantage of ANCOVA over ANOVA can also be observed from the fact that the residual error variance of I is less than the residual error variance of (i.e.,). When the correlation coefficient ρ becomes larger, the I estimator has smaller variance. Since and are highly correlated in general, the inclusion of in I explains away some variability in and thus reduces the residual variance and yields a more efficient estimator of treatment effect than . and have exactly same point estimators of τ and thus have the same variances (Table 1). To compare or with , we can derive the difference between the unconditional variances of their treatment effect estimators as follows: When , ∆1 > 0 and outperforms and because the estimator has smaller variance. When , ∆1 < 0 and underperforms the other two methods. It can be shown that the difference between the unconditional variances of the I or estimators and those of the or estimators are always nonnegative: Thus, or is less efficient than either I or because their estimators have larger variances. Intuitively I or assumes that mean baseline weights in two arms are equal in a randomized study but or assumes that there is a baseline difference and needs to estimate an extra parameter. As shown in Table 1, the I and estimators of τ are equivalent because = . However, I plugs in the OLS estimators, whereas plugs in the REML estimators of the variance and covariance parameters. The numerical difference between and becomes negligible as sample size increases. Because of this equivalence between and , and are equal [3]. As discussed previously, I is a conditional model assuming fixed baseline covariates. Even though the model-based variance estimates are conditional, they are unbiased for the unconditional variance and thus the usual model-based conditional inference is still valid for unconditional hypothesis testing. I performs comparably to [3, 17].

When study population is heterogeneous

A heterogeneous study population justifies the inclusion of a treatment by baseline weight interaction term. Thus, II is the correctly specified model, whereas I is a mis-specified model. In this case, the “conditional” treatment effect is not constant across different values of baseline weight. The “marginal” treatment effect τ is simply the average of the conditional treatment effect over the distribution of the baseline weight and measures an overall treatment effect. As shown previously, both ANCOVA models can be used to estimate τ even though I is mis-specified. Then, what is the advantage of using a more complex interaction model over a main effect model? It turns out the II estimator is more efficient than the I estimator because [5]. Only in a balanced design and the two ANCOVA models perform comparably. Note that the OLS model-based variance estimates for I and II are both biased for the corresponding unconditional variances, but the HC-variance estimators provide simple fixes. The II and estimators of τ are equivalent because and (Table 2). Two methods only differ in the way two estimators are estimated. II plugs in the OLS estimators and , whereas plugs in the REML estimators of the variance and covariance parameters. The numerical difference between the II and estimators becomes smaller as sample size increases. As discussed previously, standard statistical softwares such as SAS does not output unconditional variance for II directly but the usual OLS model-based standard errors and p-values are biased for unconditional inference in heterogeneous scenario. The adjusted HC-variance estimator fixes this bias. Corrected II provides valid unconditional inference and performs comparably to . Another alternative approach to estimate variances of the I and II estimators is to use bootstrap method [20].

Data example

No human data was used in this study. Instead we simulated three weight loss trial data sets based on a published study for three scenarios: homogeneous data, heterogeneous data with balanced and unbalanced designs as follows [21]: The baseline weights for the control and treatment arms were generated from normal distribution with mean 88 kg and standard deviation 14 kg. Weights at 6 month after treatment for the control arm have mean 86 kg and standard deviation 15 kg. This gives a ~ 2.3% change from baseline. The mean and standard deviation of body weight at the sixth month in the treatment arm are 83 kg and 15 kg, respectively; This corresponds to a 5.7% change from baseline. In the homogeneous data, the correlation coefficient between the pre- and post-treatment weights is 0.9. One hundred eighty subjects were assigned to the treatment and control arms equally. In the heterogeneous data, the correlation coefficient between the pre- and post-treatment weights in the control arm is 0.9 and 0.7 in the treatment arm. Sample sizes are (n0 = 90, n1 = 90) for the balanced design and (n0 = 60, n1 = 120) for the unbalanced design. We analyzed the data examples using the methods outlined in section Methods. The statistical results were reported in Table 3 (SAS programs are provided in the Additional file 1).

Table 3

Statistical analysis of the three simulated data examples

Scenario	Method	Estimate	Standard error	p-value
Homogeneous	ANOVA	−3.089	2.106	0.144
	ANCOVAI	−2.422	0.955	0.0121
	ANOVA-Change	−2.354	0.971	0.0163
	RM	−2.354	0.971	0.0163
	cRM	−2.434	0.944	0.0108
Heterogeneous (n₀ = 90, n₁ = 90)	ANCOVAI	−3.203	1.403^a	0.0235
			1.397^b	0.0231
			1.400^d	n/a
	ANCOVAII	−3.165	1.333^a	0.0187
			1.402^c	0.0252
			1.397^d	n/a
	cRM	−3.203	1.405	0.0241
Heterogeneous (n₀ = 60, n₁ = 120)	ANCOVAI	−3.416	1.415^a	0.0167
			1.279^b	0.0083
			1.281^d	n/a
	ANCOVAII	−3.399	1.376^a	0.0145
			1.258^c	0.0076
			1.260^d	n/a
	cRM	−3.396	1.262	0.0078

aOLS regression model-based standard error

bHC standard error for ANCOVA I (main effect) model

cModified HC standard error for ANCOVA II (interaction) model

dBootstrapping standard error (n = 5000)

Statistical analysis of the three simulated data examples Heterogeneous (n0 = 90, n1 = 90) Heterogeneous (n0 = 60, n1 = 120) aOLS regression model-based standard error bHC standard error for ANCOVA I (main effect) model cModified HC standard error for ANCOVA II (interaction) model dBootstrapping standard error (n = 5000) In the first data example, produced the largest standard error and the largest p-value. and both outperformed with much smaller standard errors and p-values. I and outperformed and with smaller standard errors and p-values. Although I and are equivalent when sample size is large, there are still minor numerical differences between the two in finite sample. For the second data example with a balanced design, Fig. 2a shows that there is a strong baseline weight by treatment interaction. Both I and II have heteroscedastic errors by treatment arm (Fig. 2b and c). As shown in Table 2, the OLS model-based standard error of I is very similar to its HC and bootstrap standard errors. Thus, heteroscedasticity does not bias the model-based standard error of I. Although II is robust against heteroscedasticity in the balanced design, the OLS model-based standard error of II (s.e = 1.333) is still not correct because OLS fails to consider the variability of estimating the overall mean baseline weight. The adjusted HC standard error for II is 1.402, which is closer to the model-based and HC standard errors of I. The bootstrapping standard errors for I and II are close to their HC or adjusted HC standard errors, which suggests the HC and adjusted HC variances perform well in estimating the unconditional variances. The estimate and its standard error are close to those from I and II.

Fig. 2

Diagnosis plots of ANCOVA main and interaction models in heterogeneous scenario. a Scatter plot of baseline and follow-up weights in balanced design. Black and red solid dots are data points in the treatment and control arms. Black and red solid lines are the regression slopes of baseline weight against follow-up weight in the treatment and control arms. b Boxplot of residuals from the treatment and control arms from I model in balanced design; c Boxplot of residuals from the treatment and control arms from II model in balanced design; d Scatter plot of baseline and follow-up weights in unbalanced design. Black and red solid dots are data points in the treatment and control arms. Black and red solid lines are the regression slopes of baseline weight against follow-up weight in the treatment and control arms. e Boxplot of residuals from the treatment and control arms from I model in an unbalanced design; f Boxplot of residuals from the treatment and control arms from II model in an unbalanced design For the third example with an unbalanced design, Fig. 2d also reveals a baseline weight by treatment interaction. Both ANCOVA models have heteroscedastic errors by treatment arm (Fig. 2e and f). The model-based standard errors of I and II are not valid. The model-based standard errors were larger than the HC standard errors and thus overestimated the true conditional variances. Compared with I, II has a smaller HC standard error (also smaller p-value) and thus is slightly more efficient. The adjusted HC standard error for II is very close to the model-based standard error for . The bootstrapping standard errors for I and II are very close to their HC or adjusted HC standard errors.

Discussion

In this study we compare the efficiency of six unbiased methods analyzing pre-post designs. We found ANCOVA and are the equally most efficient methods compared with other alternatives in homogeneous and heterogeneous scenarios. In this study, we focus on the scenario in which randomization is properly performed and these competing methods all target the same causal quantity. In the scenarios where the treatment is not properly randomized or not randomized at all (e.g., in an observational study), the baseline score will not be balanced by design. In this case these competing methods may target different causal quantities. Debate over using change-score analysis (or RM) verse ANCOVA in the non-randomized setting, generally known as the lord’s paradox, is a well-known example [22, 23]. The majority of previous studies has only examined homogeneous study population. In this setting, is one of the least efficient approaches for analyzing pre-post designs because it does not utilize any baseline information. and incorporate the baseline score as part of outcome, whereas I includes the baseline score as a covariate. I outperforms and because I utilizes the assumption that the baseline scores are balanced between two arms in a randomized study. Thus, change score is a less efficient way to utilize the baseline score than including the baseline score as a covariate. Since we seldom can control the values of the baseline score in randomized trials, the OLS assumption that the baseline score is fixed casts doubt on the validity of ANCOVA for hypothesis testing [6, 12]. Crager proved I is valid for unconditional inference in homogeneous scenario [6]. This conclusion can be simply attributed to that the conditional variance of the I estimator is an unbiased estimate for its unconditional variance [3]. A few studies investigated further a heterogeneous scenario [3, 4, 10, 12, 24]. Although the heterogeneity justifies the inclusion of the baseline measurement by treatment interaction term, I and II are both unbiased. Yang and Tsiatis showed that II has a smaller unconditional variance estimator than that of I unless in a balanced design [9]. However, the OLS model-based variances of the I and II estimators, reported by standard statistical softwares, are conditional variances, not unconditional variances. The OLS model-based standard errors and associated p-values for II are generally questionable for unconditional inference, and the model-based inference for I is biased only when the design is unbalanced [3, 4, 10, 24]. With the corrected HC variance estimators, both models provide valid unconditional inference. Choosing between I and II then becomes an evaluation of a trade-off between simplicity and some gains in efficiency. In homogenous setting, was suggested as a superior choice to I because the unconditional variance of the estimator is smaller than the conditional variance of the I estimator [25]. Kenward et al. pointed out that such direct comparison between the conditional and unconditional variances is not meaningful. Since both estimators are equivalent, it can be shown that coupled with REML and Kenward-roger adjustment performs almost identically to I in finite samples [17]. In heterogeneous scenario, is comparable to II [3]. In presence of missing data, applied researchers often prefer over ANCOVA because it can utilize all observed data but ANCOVA uses only complete cases. However, imputation methods which utilize the strong pre-post correlation, such as weighting and regression imputation, can improve the statistical power for ANCOVA without biasing estimates, making it comparable to [17]. Furthermore, ANCOVA has several advantages over : first, outcome should only be the variable that can be influenced by treatment. Baseline measurement is certainly not an outcome by this definition. It is conceptually more appropriate to include the baseline score as covariate, not model it as outcome [5]; Second, it is very convenient to include other baseline variables in a regression model for more efficient estimates of treatment effect. Third, it is easy to adjust for other patterns of heteroscedastic errors in an OLS regression. For example, we may expect larger variability in the post-treatment weights associated with larger baseline weights. cannot handle this more complex type of heteroscedasticity easily. HC-variance estimators for ANCOVA are simple fixes and readily implemented in statistical softwares.

Conclusion

Comparing with other alternative methods, ANCOVA is a simple and the most efficient approach analyzing a pre-post randomized design. When there exists a baseline score by treatment interaction, we need to assess the heteroscedasticity of ANCOVA particularly when the design is not balanced. The HC-variances should be used for valid inference when heteroscedasticity is present. Adding an interaction term in ANCOVA can gain some efficiency but not including this term does not bias results. Additional file 1.

17 in total

1. Herbal ephedra/caffeine for weight loss: a 6-month randomized safety and efficacy trial.

Authors: C N Boozer; P A Daly; P Homel; J L Solomon; D Blanchard; J A Nasser; R Strauss; T Meredith
Journal: Int J Obes Relat Metab Disord Date: 2002-05

2. Pretest-posttest designs and measurement of change.

Authors: Dimiter M Dimitrov; Phillip D Rumrill
Journal: Work Date: 2003

3. Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design.

Authors: L Frison; S J Pocock
Journal: Stat Med Date: 1992-09-30 Impact factor: 2.373

4. Change from baseline and analysis of covariance revisited.

Authors: Stephen Senn
Journal: Stat Med Date: 2006-12-30 Impact factor: 2.373

5. Randomized clinical trials with a pre- and a post-treatment measurement: repeated measures versus ANCOVA models.

Authors: Bjorn Winkens; Gerard J P van Breukelen; Hubert J A Schouten; Martijn P F Berger
Journal: Contemp Clin Trials Date: 2007-04-21 Impact factor: 2.226

6. Analyzing pre-post designs using the analysis of covariance models with and without the interaction term in a heterogeneous study population.

Authors: Fei Wan
Journal: Stat Methods Med Res Date: 2019-02-13 Impact factor: 3.021

7. Analyzing pre-post randomized studies with one post-randomization score using repeated measures and ANCOVA models.

Authors: Fei Wan
Journal: Stat Methods Med Res Date: 2018-08-07 Impact factor: 3.021

8. Various varying variances: The challenge of nuisance parameters to the practising biostatistician.

Authors: Stephen Senn
Journal: Stat Methods Med Res Date: 2014-02-02 Impact factor: 3.021

9. Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study.

Authors: Bolaji E Egbewale; Martyn Lewis; Julius Sim
Journal: BMC Med Res Methodol Date: 2014-04-09 Impact factor: 4.615

10. Methods for Analysis of Pre-Post Data in Clinical Research: A Comparison of Five Common Methods.

Authors: Nathaniel S O'Connell; Lin Dai; Yunyun Jiang; Jaime L Speiser; Ralph Ward; Wei Wei; Rachel Carroll; Mulugeta Gebregziabher
Journal: J Biom Biostat Date: 2017-02-24

2 in total

1. The Effect of Exposure to "Exemption" Video Advertisements for Functional Foods: A Randomized Control Study in Japan.

Authors: Reina Iye; Tsuyoshi Okuhara; Hiroko Okada; Eiko Goto; Emi Furukawa; Takahiro Kiuchi
Journal: Healthcare (Basel) Date: 2022-02-11

2. Effectiveness of a hybrid digital substance abuse prevention approach combining e-Learning and in-person class sessions.

Authors: Kenneth W Griffin; Christopher Williams; Caroline M Botvin; Sandra Sousa; Gilbert J Botvin
Journal: Front Digit Health Date: 2022-08-03

2 in total