Literature DB >> 11459516

The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study.

Abstract

BACKGROUND: Many randomized trials involve measuring a continuous outcome - such as pain, body weight or blood pressure - at baseline and after treatment. In this paper, I compare four possibilities for how such trials can be analyzed: post-treatment; change between baseline and post-treatment; percentage change between baseline and post-treatment and analysis of covariance (ANCOVA) with baseline score as a covariate. The statistical power of each method was determined for a hypothetical randomized trial under a range of correlations between baseline and post-treatment scores.
RESULTS: ANCOVA has the highest statistical power. Change from baseline has acceptable power when correlation between baseline and post-treatment scores is high;when correlation is low, analyzing only post-treatment scores has reasonable power. Percentage change from baseline has the lowest statistical power and was highly sensitive to changes in variance. Theoretical considerations suggest that percentage change from baseline will also fail to protect from bias in the case of baseline imbalance and will lead to an excess of trials with non-normally distributed outcome data.
CONCLUSIONS: Percentage change from baseline should not be used in statistical analysis. Trialists wishing to report this statistic should use another method, such as ANCOVA, and convert the results to a percentage change by using mean baseline scores.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2001 PMID： 11459516 PMCID： PMC34605 DOI： 10.1186/1471-2288-1-6

Source DB: PubMed Journal: BMC Med Res Methodol ISSN： 1471-2288 Impact factor: 4.615

Background

Many randomized trials involve measuring a continuous outcome at baseline and after treatment. Typical examples include trials of pravastatin for hypercholesterolemia [1], exercise and diet for obesity in osteoarthritis patients [2] and acupuncture for pain in athletes with shoulder injuries [3]. In each trial, the outcome measure used to determine the effectiveness of treatment - cholesterol, body weight or shoulder pain - was measured both before treatment had started and after it was complete. In the case of a single post treatment outcome assessment, there are four possibilities for how such data can be entered into the statistical analysis of such trials. One can use the baseline score solely to ensure baseline comparability and enter only the post-treatment score into analysis (I will describe this method as "POST"). Alternatively, one can analyze the change from baseline, either by looking at absolute differences ("CHANGE") or a percentage change from baseline ("FRACTION"). The most sophisticated method is to construct a regression model which adjusts the post-treatment score by the baseline score ("ANCOVA"). Figure 1 describes each of these methods in mathematical terms. Figure 2 gives examples of the results of each method described in ordinary language.

Figure 1

Mathematical description of the four methods

Figure 2

Examples of the results of a trial analyzed by each method in ordinary language terms

Mathematical description of the four methods Examples of the results of a trial analyzed by each method in ordinary language terms Some trials assess outcome several times after treatment, a design known as "repeated measures." Each of the four methods described above can be used to analyze such trials by using a summary statistic such as a mean or an "area-under-curve" [4]. There are several more complex methods of analyzing such data including repeated measures analysis of variance and generalized linear estimation [5]. These methods are of particular value when the post-treatment scores have a predictable course over time (e.g. quality of life in late stage cancer patients) or when it is important to assess interactions between treatment and time (e.g. long-term symptomatic medication). This paper will concentrate on the simpler case where time in not an important independent variable. The choice of which method to use can be determined by analysis of the statistical properties of each. An important criteria for a good statistical method is that it should reduce the rate of false negatives (β). The β of a statistical test is usually expressed in terms of statistical power (1-β). Power is normally fixed, typically at 0.8 or 0.9, and the required amount of data (e.g. number of evaluable patients) is calculated. A method that requires relatively fewer data to provide a certain level of statistical power is described as efficient. The characteristics of the four methods - POST, CHANGE, FRACTION and ANCOVA - have been studied by statisticians for some time [6, 7, 8]. In this paper, I aim to provide statistical data that can guide clinical research yet is readily comprehensible by non-statisticians. Accordingly, I will compare the methods using a hypothetical trial and express results in terms of statistical power.

Methods

All calculations and simulations were conducted using the statistical software Stata 6.0 (Stata Corp., College Station, Texas). I created a hypothetical pain trial with patients divided evenly between a treatment and a control group. The pain score for any individual patient was sampled from a normal distribution. The mean score at baseline was 50 mm on a visual analog scale of pain (VAS); after treatment, mean pain was expected to be 50 mm in controls and 45 mm in treated patients. The standard deviation of all scores was 10. The text of the simulation is given in the appendix (appendix.doc). I calculated the statistical power of the different methods of analysis for this trial given a sample size of 100 patients. As power varies according to the correlation between baseline and follow-up scores, a range of different possible correlations were used. The power for POST, CHANGE and ANCOVA were calculated using the "sampsi" function of Stata. This derives power analytically using formula developed by Frison and Pocock [6]. The power for FRACTION was calculated by the simulation described above. The simulation was first validated against Stata's results for POST and CHANGE at a correlation of 0.5. It was then conducted using 1000 repetitions calculating ttests for FRACTION at a range of correlations between 0.2 and 0.8. The number of results in which p was less than 5% was calculated.

Results and Discussion

The true positive rates of the four statistical methods given different correlations are given in table 1. These data are equivalent to statistical power, or 1-β. As has been previously reported [6], ANCOVA has the highest statistical power. CHANGE has acceptable power when correlation between baseline and post-treatment scores are high;when correlations are low, POST has reasonable power. FRACTION has poor statistical efficiency at all correlations.

Table 1

Statistical power of each method of analysis

Correlation	ρ = 0.2	ρ = 0.35	ρ = 0.5	ρ = 0.65	ρ = 0.8
POST	70.5%	70.5%	70.5%	70.5%	70.5%
FRACTION	45.1%	56.4%	67.0%	82.7%	97.1%
CHANGE	50.7%	59.2%	70.5%	84.8%	97.7%
ANCOVA	72.3%	76.1%	82.3%	90.8%	98.6%

Statistical power of each method of analysis Moreover, the power of FRACTION is sensitive to changes in the characteristics of the baseline distribution. If the range of baseline values is large, the variance of FRACTION increases disproportionately and power falls. Simulations were repeated with the standard deviations and difference between groups doubled. There was no difference in the power of POST, CHANGE or ANCOVA. The power of FRACTION fell dramatically: at correlations of 0.2, 0.35, 0.5, 0.65 and 0.8 respectively, power was 18%, 24%, 33%, 45% and 63%. It is arguable that the method of simulation is biased against FRACTION because the treatment effect is additive, that is, the simulation models an absolute 5 mm difference between groups. In theory, the difference between FRACTION and CHANGE should decrease if the treatment effect is proportional. The simulation was therefore repeated with the treatment group experiencing an average 10% decrease from baseline. Correlation between baseline and follow-up scores was varied randomly between 0.2 and 0.8. The p values from a ttest of FRACTION and CHANGE were directly compared over 1000 simulations: p values were lower for CHANGE approximately 65% of the time. Theoretical considerations suggestion two further disadvantages to FRACTION. First, because it incorporates both baseline and post-treatment scores, it would appear to control for any chance baseline imbalance between groups. However, this is not the case because of regression to the mean: FRACTION will create a bias towards the group with poorer baseline scores (the same is true for CHANGE; POST causes bias in the opposite direction). Second, because it is calculated using a ratio, it may cause outcome data to be non-normally distributed. In a bivariate normal distribution (such as a baseline and post-treatment score) any statistic using either variable alone or combining both by addition or subtraction will be normally distributed. There is no analytic reason why a statistic created by multiplying or dividing one variable by the other should necessarily have a normal distribution.

Conclusion

Reporting a percentage change from baseline gives the results of a randomized trial in clinically relevant terms immediately accessible to patients and clinicians alike. This is presumably why researchers investigating issues such as the effects of medication on hot flashes [9], or of different chemotherapy regimes on quality of life [10], report this statistic. However, percentage change from baseline is statistically inefficient. Perhaps counterintuitively, it does not correct for imbalance between groups at baseline. It may also create a non-normally distributed statistic from normally distributed data. Percentage change from baseline should therefore not be used in statistical analysis. Trialists wishing to report percentage change should first use another method, preferably ANCOVA, to test significance and calculate confidence intervals. They should then convert results to percentage change by using mean baseline and post-treatment scores. For an example of this approach, see Crouse et al. [11]. The findings presented here reconfirm previously reported data suggesting that ANCOVA is the method of choice for analyzing the results of trials with baseline and post treatment measurement. In cases where ANCOVA cannot be used, such as with small samples or where the assumptions underlying ANCOVA modeling do not hold, CHANGE or POST are acceptable alternatives, especially baseline variables are comparable between groups (perhaps ensured by stratification) and if correlation between baseline and post-treatment scores are either high (for CHANGE) or low (for POST). The use of FRACTION should be avoided.

Competing interests

Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this paper? No Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this paper? No Do you have any other financial competing interests? No Are there any non-financial competing interests you would like to declare in relation to this paper? No

Pre-publication history

The pre-publication history for this paper can be accessed here:

Text of simulation

Text of simulation Click here

9 in total

1. Analysing repeated measurements data: a practical comparison of methods.

Authors: R Z Omar; E M Wright; R M Turner; S G Thompson
Journal: Stat Med Date: 1999-07-15 Impact factor: 2.373

2. Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design.

Authors: L Frison; S J Pocock
Journal: Stat Med Date: 1992-09-30 Impact factor: 2.373

3. Analysis of serial measurements in medical research.

Authors: J N Matthews; D G Altman; M J Campbell; P Royston
Journal: BMJ Date: 1990-01-27

4. Randomised clinical trial comparing the effects of acupuncture and a newly designed placebo needle in rotator cuff tendinitis.

Authors: J Kleinhenz; K Streitberger; J Windeler; A Güssbacher; G Mavridis; E Martin
Journal: Pain Date: 1999-11 Impact factor: 6.961

5. Exercise and weight loss in obese older adults with knee osteoarthritis: a preliminary study.

Authors: S P Messier; R F Loeser; M N Mitchell; G Valle; T P Morgan; W J Rejeski; W H Ettinger
Journal: J Am Geriatr Soc Date: 2000-09 Impact factor: 5.562

6. Effect of the HMG-CoA reductase inhibitors on blood pressure in patients with essential hypertension and primary hypercholesterolemia.

Authors: N Glorioso; C Troffa; F Filigheddu; F Dettori; A Soro; P P Parpaglia; S Collatina; M Pahor
Journal: Hypertension Date: 1999-12 Impact factor: 10.190

7. A randomized trial comparing the effect of casein with that of soy protein containing varying amounts of isoflavones on plasma concentrations of lipids and lipoproteins.

Authors: J R Crouse; T Morgan; J G Terry; J Ellis; M Vitolins; G L Burke
Journal: Arch Intern Med Date: 1999-09-27

8. Megestrol acetate for the prevention of hot flashes.

Authors: C L Loprinzi; J C Michalak; S K Quella; J R O'Fallon; A K Hatfield; R A Nelimark; A M Dose; T Fischer; C Johnson; N E Klatt
Journal: N Engl J Med Date: 1994-08-11 Impact factor: 91.245

9. Gemcitabine plus best supportive care (BSC) vs BSC in inoperable non-small cell lung cancer--a randomized trial with quality of life as the primary outcome. UK NSCLC Gemcitabine Group. Non-Small Cell Lung Cancer.

Authors: H Anderson; P Hopwood; R J Stephens; N Thatcher; B Cottier; M Nicholson; R Milroy; T S Maughan; S J Falk; M G Bond; P A Burt; C K Connolly; M B McIllmurray; J Carmichael
Journal: Br J Cancer Date: 2000-08 Impact factor: 7.640

9 in total

133 in total

Review 1. Statistics notes: Analysing controlled trials with baseline and follow up measurements.

Authors: A J Vickers; D G Altman
Journal: BMJ Date: 2001-11-10

2. Genetic dissection of dietary restriction in mice supports the metabolic efficiency model of life extension.

Authors: Brad A Rikke; Chen-Yu Liao; Matthew B McQueen; James F Nelson; Thomas E Johnson
Journal: Exp Gerontol Date: 2010-05-07 Impact factor: 4.032

3. Baseline resting heart rate variability predicts post-traumatic stress disorder treatment outcomes in adults with co-occurring substance use disorders and post-traumatic stress.

Authors: Heather E Soder; Margaret C Wardle; Joy M Schmitz; Scott D Lane; Charles Green; Anka A Vujanovic
Journal: Psychophysiology Date: 2019-04-10 Impact factor: 4.016

4. Cost effectiveness of treatment for alcohol problems: findings of the randomised UK alcohol treatment trial (UKATT).

Authors:
Journal: BMJ Date: 2005-09-10

5. Effect of static stretching of quadriceps and hamstring muscles on knee joint position sense.

Authors: R Larsen; H Lund; R Christensen; H Røgind; B Danneskiold-Samsøe; H Bliddal
Journal: Br J Sports Med Date: 2005-01 Impact factor: 13.800

6. Effectiveness of educational interventions in improving detection and management of dementia in primary care: cluster randomised controlled study.

Authors: Murna Downs; Stephen Turner; Michelle Bryans; Jane Wilcock; John Keady; Enid Levin; Ronan O'Carroll; Kate Howie; Steve Iliffe
Journal: BMJ Date: 2006-03-25

7. Change in Skeletal Muscle Following Resection of Stage I-III Colorectal Cancer is Predictive of Poor Survival: A Cohort Study.

Authors: Jessica J Hopkins; Rebecca Reif; David Bigam; Vickie E Baracos; Dean T Eurich; Michael M Sawyer
Journal: World J Surg Date: 2019-10 Impact factor: 3.352

8. Weight change and changes in the metabolic syndrome as the French population moves towards overweight: the D.E.S.I.R. cohort.

Authors: T A Hillier; A Fagot-Campagna; E Eschwège; S Vol; M Cailleau; B Balkau
Journal: Int J Epidemiol Date: 2005-12-22 Impact factor: 7.196

9. Evaluation of vaccine-induced antibody responses: impact of new technologies.

Authors: Daniel J Zaccaro; Diane K Wagener; Carol C Whisnant; Herman F Staats
Journal: Vaccine Date: 2013-04-11 Impact factor: 3.641

10. Ventricular enlargement as a possible measure of Alzheimer's disease progression validated using the Alzheimer's disease neuroimaging initiative database.

Authors: Sean M Nestor; Raul Rupsingh; Michael Borrie; Matthew Smith; Vittorio Accomazzi; Jennie L Wells; Jennifer Fogarty; Robert Bartha
Journal: Brain Date: 2008-07-11 Impact factor: 13.501