Literature DB >> 31325240

An appraisal of the SD_IR as an estimate of true individual differences in training responsiveness in parallel-arm exercise randomized controlled trials.

Jacob T Bonafiglia¹, Andrea M Brennan¹, Robert Ross¹, Brendon J Gurd¹.

Abstract

Calculating the standard deviation of individual responses (SDIR ) is recommended for estimating the magnitude of individual differences in training responsiveness in parallel-arm exercise randomized controlled trials (RCTs). The purpose of this review article is to discuss potential limitations of parallel-arm exercise RCTs that may confound/complicate the interpretation of the SDIR . To provide context for this discussion, we define the sources of variation that contribute to variability in the observed responses to exercise training and review the assumptions that underlie the interpretation of SDIR as a reflection of true individual differences in training responsiveness. This review also contains two novel analyses: (1) we demonstrate differences in variability in changes in diet and physical activity habits across an intervention period in both exercise and control groups, and (2) we examined participant dropout data from six RCTs and found that significantly (P < 0.001) more participants in control groups (12.8%) dropped out due to dissatisfaction with group assignment compared to exercise groups (3.4%). These novel analyses raise the possibility that the magnitude of within-subject variability may not be equal between exercise and control groups. Overall, this review highlights that potential limitations of parallel-arm exercise RCTs can violate the underlying assumptions of the SDIR and suggests that these limitations should be considered when interpreting the SDIR as an estimate of true individual differences in training responsiveness.

Entities: Chemical Disease Gene Mutation Species

Keywords: Individual responses; SDIR; exercise training; individual variability

Mesh：

Year: 2019 PMID： 31325240 PMCID： PMC6642277 DOI： 10.14814/phy2.14163

Source DB: PubMed Journal: Physiol Rep ISSN： 2051-817X

Introduction

In 1999 Bouchard et al. (1999) published results from the HERITAGE Family Study demonstrating a wide range of peak oxygen consumption (VO2peak) responses across individuals completing an identical exercise training program. Subsequently, a substantial body of literature has emerged reporting variability in the observed pre–post training changes in VO2peak (Hautala et al., 2006; Vollaard et al., 2009; Sisson et al., 2009; Astorino and Schubert, 2014; Wolpern et al., 2015; Ross et al., 2015; Raleigh et al., 2016; Gurd et al., 2016; Bonafiglia et al., 2016; Montero and Lundby, 2017), peak work rate (Vollaard et al., 2009; Montero and Lundby, 2017), lactate threshold (Gurd et al., 2016; Bonafiglia et al., 2016), and other physiologically meaningful central (MacPherson et al., 2011; Astorino et al., 2016; Raleigh et al., 2018) and peripheral (Vollaard et al., 2009; McPhee et al., 2011; Edgett et al., 2016; Bonafiglia et al., 2017; deLannoy et al., 2017; Raleigh et al., 2018) adaptations. Importantly, although the existence of variability in the observed response to training cannot be questioned (illustrated in Figure 1), it remains unclear whether this variability can be attributed to an effect of exercise per se.

Figure 1

‘Classic’ illustration of variability in the observed responses to exercise training. Individual bars represent observed changes in cardiorespiratory fitness (CRF) for individual participants from a previously published randomized controlled trial (Ross et al., 2015). Observed responses to 24 weeks of a no‐exercise control period (A) or exercise training (B). The exercise training prescription was walking/jogging five times per week at an intensity of 50% baseline cardiorespiratory fitness until 180 (females) or 300 (males) kilocalories were expended. In the last several years, biostatisticians in the field of exercise science have raised concerns regarding the experimental and statistical rigor required to appropriately analyze individual response heterogeneity (Atkinson and Batterham, 2015; Hopkins, 2015; Hecksteden et al., 2015; Ross et al., 2019; Atkinson et al., 2019). Specifically, although many reports have assumed that the variability in observed responses reflects true individual differences in training responsiveness (Hautala et al., 2006; Vollaard et al., 2009; Sisson et al., 2009; Astorino and Schubert, 2014; Wolpern et al., 2015; Ross et al., 2015; Raleigh et al., 2016; Gurd et al., 2016; Bonafiglia et al., 2016; Montero and Lundby, 2017), recent reviews have highlighted the importance of considering multiple sources of variation that can contribute to the observed variability in training responses and have questioned whether the existence of individual variability attributable to exercise has been convincingly demonstrated (Atkinson and Batterham, 2015; Hopkins, 2015; Hecksteden et al., 2015; Williamson et al., 2017; Hopkins, 2018; Hecksteden et al., 2018; Ross et al., 2019; Atkinson et al., 2019). In parallel‐arm exercise randomized controlled trials (RCTs), the standard deviation of individual responses (SDIR), the amount by which the true effect of the treatment differs between individuals (Hopkins, 2015) (described in detail below), has been forwarded as an appropriate and robust statistical means of quantifying the magnitude of individual differences in training responsiveness (Atkinson and Batterham, 2015). Importantly, there are potential limitations associated with parallel‐arm exercise RCTs that merit consideration when interpreting the SDIR. However, despite several exercise training studies utilizing the SDIR (Stock et al., 2016; Williamson et al., 2017; Phillips et al., 2017; Williamson et al., 2018; McLaren et al., 2018; Hammond et al., 2019; Walsh et al., 2019), the potential impact of these limitations have yet to be discussed in detail in the individual response literature. Thus, the purpose of the current review is to discuss the potential limitations in parallel‐arm exercise RCTs that may limit confidence when interpreting the SDIR. It is important to note that this review does not find fault in the mathematical logic underlying the SDIR. Further, we agree with previous reports (Atkinson and Batterham, 2015; Williamson et al., 2017; Atkinson et al., 2019) that calculating the SDIR is the only approach for determining whether interindividual variability can be attributed to an effect of exercise per se in parallel‐arm exercise RCTs. In this review, we highlight potential external and inherent limitations that may affect the data obtained from parallel‐arm exercise RCTs and consequently limit confident interpretation of the SDIR as an estimate of true individual differences in training responsiveness. Given the recent focus on the application of personalized exercise‐based medicine (Buford et al., 2013; Ross et al., 2019), this review aims to better inform researchers in exercise science about the logic underlying the SDIR and the potential pitfalls associated with parallel‐arm exercise RCTs that may confound its use as an estimate of variability in training responsiveness attributable to exercise.

Sources of Variation Impacting an Individual’s Observed Response to Training

In this section, we discuss the different sources of variability that influence an individual’s observed value at a single time point (2.1) and observed pre–post change following an intervention (2.2). The terminology used in this section is a synthesis of terms derived from a series of previously published papers (Hopkins, 2000; Hopkins, 2000; Senn, 2001; Hopkins, 2004; Senn et al., 2010; Scharhag‐Rosenberger et al., 2012; Bouchard et al., 2012; Astorino and Schubert, 2014; Leifer et al., 2014; Bentley et al., 2014; Atkinson and Batterham, 2015; Hopkins, 2015; Hecksteden et al., 2015; Arnold et al., 2015; Ross et al., 2015; Raleigh et al., 2016; Gurd et al., 2016; Bonafiglia et al., 2016; Astorino et al., 2016; Senn, 2016; Montero and Lundby, 2017; deLannoy et al., 2017; Williamson et al., 2017; Cadore et al., 2017; Clarke et al., 2017; Alvarez et al., 2017; Williamson et al., 2018; Swinton et al., 2018; Hecksteden et al., 2018). We attempt to use the most common term(s) for each source of variability and provide a list of relevant terms with definitions and alternative names in Table 1.

Table 1

Synthesis of terms used in this paper and in the individual response literature.

Term used this paper	Defined on page	Articles using this term	Alternative names used in other articles
Observed value	2	(Hopkins, 2000; Leifer et al., 2014; Swinton et al., 2018)
True value	2	(Hopkins, 2000; Leifer et al., 2014; Atkinson and Batterham, 2015; Hecksteden et al., 2015; Swinton et al., 2018)
Typical error	2	(Hopkins, 2000; Hopkins, 2000; Hopkins, 2004; Bentley et al., 2014; Arnold et al., 2015; Raleigh et al., 2016; Gurd et al., 2016; Bonafiglia et al., 2016; Montero and Lundby, 2017; Williamson et al., 2017; Cadore et al., 2017; Alvarez et al., 2017; Alvarez et al., 2017; Swinton et al., 2018; Hecksteden et al., 2018)	Random error/noise (Hopkins, 2000; Hecksteden et al., 2015; Hecksteden et al., 2018)
			Technical error (Bouchard et al., 2012; Ross et al., 2015; deLannoy et al., 2017; Clarke et al., 2017)
			Coefficient of variation (TE expressed as a percentage of the mean; (Astorino and Schubert, 2014; Astorino et al., 2016; Hecksteden et al., 2018; Hopkins, 2000; Scharhag‐Rosenberger et al., 2012))
Technical error	2	(Hopkins, 2000; Atkinson and Batterham, 2015)	Measurement error (Hecksteden et al., 2015)
Technical error	2	(Hopkins, 2000; Atkinson and Batterham, 2015)	Instrumentation error (Swinton et al., 2018)
Random day‐to‐day variability	2	Williamson et al. (2017)	Biological error/variability (Hopkins, 2000; Williamson et al., 2017; Swinton et al., 2018)
Observed change/response	3	(Hopkins, 2000; Leifer et al., 2014)
True change/response	3	(Leifer et al., 2014; Atkinson and Batterham, 2015)
Within‐subject variability (ΔWS)	3	(Hecksteden et al., 2015; Hecksteden et al., 2018)	Biological variability (Swinton et al., 2018)
			Within‐patient error (Senn, 2001; Senn, 2016)
			Random within‐subjects variability (Atkinson and Batterham, 2015; Williamson et al., 2017; Williamson et al., 2018)
Standard deviation of individual response (SD_IR; VΔTRUE)	4/5	(Atkinson and Batterham, 2015; Hopkins, 2015; Williamson et al., 2017; Williamson et al., 2018; Hecksteden et al., 2018)	Subject‐by‐training interaction (Atkinson and Batterham, 2015; Hecksteden et al., 2015; Williamson et al., 2017)
			Patient‐by‐treatment interaction (Senn, 2001; Senn et al., 2010; Senn, 2016)
			Individual responses; Individual trainability; Individual talent; Training responsiveness (Hecksteden et al., 2015)
			True individual differences (Hopkins, 2000; Atkinson and Batterham, 2015)
Variability in observed responses (SD_EX; SD_CON)	5/6	(Leifer et al., 2014; Hopkins, 2015; Hecksteden et al., 2015; Hecksteden et al., 2018)	Standard deviation in changes in interventions or controls (Atkinson and Batterham, 2015; Williamson et al., 2017; Williamson et al., 2018)
Variability in observed responses (SD_EX; SD_CON)	5/6		Gross response variability (Hecksteden et al., 2015; Hecksteden et al., 2018)
Minimum clinically important difference (MCID)	7	(Atkinson and Batterham, 2015; Williamson et al., 2017; Williamson et al., 2018)	Smallest worthwhile difference/change (Hopkins, 2004; Hecksteden et al., 2015; Swinton et al., 2018; Hecksteden et al., 2018)

Synthesis of terms used in this paper and in the individual response literature.

Typical error of measurement

Whenever a measurement is obtained, the observed value that results is influenced by both the individual’s true value and random measurement error. Random measurement error, or the typical error of measurement (TE), results from a combination of the technical error introduced by equipment and/or experimenter reliability and the random day‐to‐day variability in biological factors capable of altering the measured variable. Biological factors contributing to random day‐to‐day variability include factors that can affect an individual’s mental and/or physical state at the time of testing (e.g. behavioural and environmental factors including circadian rhythm, sleep patterns, diet, exercise, etc.; (Hopkins, 2000; Mann et al., 2014; Hecksteden et al., 2015; Ross et al., 2019; Swinton et al., 2018)). The equation below demonstrates that an individual’s observed value is comprised of both their true value (TRUE) and the TE (Leifer et al., 2014): Importantly, although both technical error and day‐to‐day biological variability will introduce “noise” into any measurement, this noise is expected to randomly affect the observed value. In other words, the noise introduced by TE will, over the course of repeated measurements, result in observed values that are normally distributed around an individual’s true value (Figure 2). Thus, taking the mean of several measurements at a single time point (e.g. before or after training) will increase the accuracy of the estimate of an individual’s true value (Hopkins, 2004; Hecksteden et al., 2015).

Figure 2

Illustration of the random nature of typical error (TE) in the observed values of repeated measures distributed around the true value (vertical dashed line).

Illustration of the random nature of typical error (TE) in the observed values of repeated measures distributed around the true value (vertical dashed line). Within the context of a training intervention, an individual’s observed change incorporates both their true change (ΔT) from baseline (PRE) to end of training (POST) and the TE associated with both PRE and POST observed values (ΔTE): It is important to emphasize that TE (both technical error and day‐to‐day biological variability) would be expected to introduce random noise into both PRE and POST measurements. Thus, while this random noise likely exerts minimal influence on the ability to detect group differences across a training intervention, it can influence an individual’s observed change following training (Hecksteden et al., 2015).

Within‐subject variability

Biological variability also has the potential to influence an individual’s true change following an exercise training intervention. Chronic changes in behavioral and/or environmental factors external to the prescribed exercise (e.g. changes in long‐term activity patterns or diet quality/quantity; reviewed by (Mann et al., 2014; Solomon, 2018)) can impact an observed change by augmenting or impairing an individual’s true response to an intervention (Senn, 2001; Hecksteden et al., 2015). Because variability in an individual’s mental/physical state could alter their true response to the same exercise intervention administered on different occasions, this source of variability is termed “within‐subject variability” (Table 1; Senn, 2001; Hecksteden et al., 2015). The existence of within‐subject variability requires that ΔT (from equation 2) be further delineated into true changes attributable to exercise (ΔTRUE) and true changes not‐attributable to exercise (i.e., changes attributable to within‐subject variability; ΔWS): Unlike TE, which is expected to have a random effect on observed changes (Figure 2) and remain constant regardless of the duration of an intervention, the impact of ΔWS on an individual’s observed change is expected to increase with longer interventions due to the potential for longer/more substantial behavioral/environmental changes. The “noise” introduced by the typical error of measurement (TE) is expected to randomly affect observed values (Figure 2). In addition to TE in both PRE‐ and POST‐intervention measurements, changes in behavioural and/or environmental factors also affect an individual’s observed change to an intervention (termed within‐subject variability). Although the influence of TE on an individual’s observed change remains constant regardless of the length of the intervention, the influence of within‐subject variability is expected to increase with longer intervention durations.

Attempting to Isolate Individual Differences in Training Response: The SDIR

Although a repeated cross‐over exercise/control study can theoretically partition the multiple sources of variation that contribute to an individual’s observed change following training (Senn et al., 2010; Hecksteden et al., 2015), this experimental design is costly and time‐consuming. In contrast, estimating the standard deviation of individual responses (SDIR) in a parallel‐arm exercise RCT (i.e., one or more experimental arms and one control arm) has been championed as a more feasible approach to isolate the amount by which ΔTRUE differs between individuals (Atkinson and Batterham, 2015; Hopkins, 2015; Atkinson et al., 2019). In this section, we explore how differences in the standard deviations of change scores between the experimental and control arms of a parallel‐arm RCT are used to calculate the SDIR. We also highlight the assumptions that permit the SDIR to be interpreted as an estimate of true individual differences in training responsiveness.

Sources of between‐subject response variability within the exercise arm of an RCT

From this point forward, we will focus on the factors contributing to the variability in observed responses between individuals (i.e., interindividual variability/between‐subject variability in observed responses; Table 1). Within the exercise arm of a parallel‐arm RCT, the variability in observed responses can be quantified by calculating the standard deviation of the individual change scores (the standard deviation of observed responses to exercise; SDEX). Although the variability in the factors contributing to SDEX cannot be isolated for a single arm exercise intervention (Hecksteden et al., 2015), we can theoretically capture these factors using the following equation: where VΔTRUE is the between‐subject variability in the true changes attributable to exercise (i.e., the magnitude of true individual differences in training responsiveness), VΔWSEX is the variability in the within‐subject variability within the exercise arm (i.e. the between‐subject variability in true changes not attributable to exercise), and VΔTEEX is the variability in the TE at PRE and POST within the exercise arm. As with the impact of ΔWS on an individual’s observed response (discussed in “Sources of Variation Impacting an Individual’s Observed Response to Training” section), VΔWSEX reflects variability in changes in behavioral/environmental factors external to the prescribed exercise that can either augment or impair individuals’ true responses (Senn, 2001; Hecksteden et al., 2015). Figure 3 presents variability in changes in behavioral factors in an EX group from a large RCT (Ross et al., 2013; Ross et al., 2015), which potentially demonstrates the existence of VΔWSEX and raises the possibility that variability in these behavioral factors contributed to the SDEX presented in Figure 1. Importantly, the component of variability within SDEX attributed to VΔWSEX and VΔTEEX is purported to occur randomly (Atkinson and Batterham, 2015; Williamson et al., 2017; Williamson et al., 2018). This purported random nature of VΔWSEX has led it to be called “random within‐subjects variability” (Atkinson and Batterham, 2015; Williamson et al., 2017; Williamson et al., 2018). Similar to the effects of ΔTE and ΔWS, the effect of VΔTEEX on SDEX should remain constant regardless of the duration of intervention period while the impact of VΔWSEX on SDEX would be expected to increase with increasing intervention duration.

Figure 3

Histograms depicting variability in changes in behavioral factors that are known to influence overall health and fitness following the completion of 24 weeks of exercise training (EX) or a control period (CON). All data were collected from a previously published randomized controlled trial (Ross et al., 2015). Variability in changes in Canadian Healthy Eating Index Scores (A), sedentary time (B), energy intake (C), and total physical activity (D). The EX and CON groups presented in this figure are the same groups presented in Figure 1. See Ross et al. (2013) for more information regarding the measurement of these behavioral outcomes. SDCON and SDEX values represent the variability in observed responses to CON and EX, respectively. SDIR values were calculated using equation 8. Negative SDIR values reflect situations where SDCON exceeded SDEX, and SDIR was therefore calculated by switching SDCON and SDEX in equation 8. As recommended by Hopkins (Hopkins, 2015), effect sizes of SDIR values (ESIR) were calculated by dividing SDIR values by baseline SD (see Hopkins (2015) for effect size category cut‐points). As previously recommended (Hopkins et al., 2009; Swinton et al., 2018; Hecksteden et al., 2018), minimum meaningful change (MMC) thresholds were determined by multiplying baseline SD by 0.2. The arrows indicate the mean observed response for each behavioral variable. Because SDEX results from multiple sources of variability, inferences about the existence or magnitude of VΔTRUE cannot be made without quantifying the contributions of VΔWSEX and VΔTEEX. As discussed in the next subsection, a control group is needed to estimate the contribution of VΔWS and VΔTE on the variability in observed responses (Atkinson and Batterham, 2015). Thus, attempts to attribute variability in the observed responses to VΔTRUE in single‐arm exercise trials (i.e. lacking a control group) have been justifiably criticized (Atkinson and Batterham, 2015; Williamson et al., 2017).

Response variability within the control arm of an RCT and calculating SDIR

The fundamental assumption inherent to parallel‐arm exercise RCTs is that participants in the treatment and control (CON) groups differ only by the treatment they receive (i.e. standardized exercise training vs. usual care, respectively; (Hopkins, 2018)). Accordingly, it is assumed that the difference between SDEX (see equation 4 above) and the standard deviation of the observed responses to CON (SDCON) is the absence of VΔTRUE. Thus, the variability in the observed responses to CON (SDCON) can be captured with the following equation:where VΔWSCON and VΔTECON are the variability attributable to random within‐subject variability and TE, respectively. Similar to EX, there appears to be variability in changes in behavioral factors in CON (select behavioral factors from a large RCT (Ross et al., 2013; Ross et al., 2015) are presented in Figure 3) and this variability may contribute to SDCON (Figure 1A). If the only difference between EX and CON within a parallel‐arm RCT is the presence (or absence) of exercise, and we assume that variability in within‐subject variability and TE are equal between groups (i.e. VΔWSEX = VΔWSCON and VΔTEEX = VΔTECON), subtracting the variability of observed responses to CON (SDCON) from the variability in observed responses to EX (SDEX) should provide us with an estimate of VΔTRUE as follows: wherein VΔWSEX = VΔWSCON and VΔTEEX = VΔTECON; thus, (VΔWSEX ± VΔTEEX) and (VΔWSCON ± VΔTECON) cancel each other out resulting in the following (simplified) equation: The simplification of equation (6) to equation (7) and the underlying logic detailed above provide the foundation for the utility of the SDIR in parallel‐arm exercise RCTs. Specifically, the difference in variability between EX and CON reflects the variability that is attributable to true individual differences in training responsiveness (VΔTRUE). It is important to reiterate that interpreting the SDIR as an estimate of VΔTRUE is based on the assumption that VΔWS and VΔTE are equal between EX and CONs. Accordingly, if there is the potential that this assumption is violated, then caution should be applied when interpreting the SDIR. SDIR is calculated using the following equation (Atkinson and Batterham, 2015; Hopkins, 2015; Williamson et al., 2017): Once the SDIR is calculated, confidence intervals and standardized effect sizes can be generated (Hopkins, 2015; Hopkins, 2018) and the magnitude of the SDIR can be interpreted relative to a minimal clinically important difference (MCID) (Atkinson and Batterham, 2015) or a smallest worthwhile change (SWC; typically 0.2 x baseline standard deviation) (Hopkins et al., 2009). Based on the assumption that typical error (VΔTE) and within‐subject variability (VΔWS) do not differ between exercise and control arms in an RCT, the SDIR theoretically represents the magnitude of individual differences in training responsiveness (VΔTRUE) (equations (6), (7), (8)). If the assumptions of the SDIR are violated, then caution is warranted when interpreting the SDIR.

The Impact of Limitations in Parallel‐Arm Exercise RCT on the Interpretation of the SDIR

In “Response variability within the control arm of an RCT and calculating SDIR” section, we discussed that interpreting the SDIR as an estimate of VΔTRUE requires that VΔWS and VΔTE are the same between EX and CON groups (i.e., VΔWSEX = VΔWSCON and VΔTEEX = VΔTECON). In this section, we highlight examples that violate this assumption. Specifically, we highlight external (“External limitations that may affect the interpretation of the SDIR” and “The potential influence of adherence and compliance to the prescribed exercise” sections) and inherent (“Inherent limitations that may affect the interpretation of the SDIR” section) limitations in the design of parallel‐arm exercise RCTs and suggest that these limitations limit confidence when interpreting the SDIR as an estimate of VΔTRUE.

External limitations that may affect the interpretation of the SDIR

As stated in “Attempting to Isolate Individual Differences in Training Response: The SDIR” section, failure to consider SDCON is a major limitation that prevents inference about the existence and/or magnitude of VΔTRUE ( Williamson et al., 2017). Although this section focuses on other external limitations that can occur in RCTs, the issues associated with not considering SDCON are briefly reiterated in the discussion (“Discussion” section) and have been discussed in previous articles (Atkinson and Batterham, 2015; Williamson et al., 2017; Ross et al., 2019; Atkinson et al., 2019). Even when SDCON is considered, there are external limitations in study design that can occur in parallel‐arm exercise RCTs that may violate the assumption that VΔWS and VΔTE are equal between EX and CON. It is important to acknowledge that these limitations represent deviations from standard guidelines for designing an RCT (Moher et al., 2010). For instance, using different equipment and/or experimenters to measure outcomes in EX vs. CON groups (Phillips et al., 2017) risks introducing differences in VΔTE between EX and CON groups. Additionally, study designs that allow for potential between‐group differences in behavioral/environmental factors (e.g., using different durations to separate baseline and follow up measures between EX and CON; collecting EX and CON at different sites (Phillips et al., 2017); etc.) risks introducing differences in VΔWS between groups. Non‐optimal RCT designs introduce the possibility that VΔTEEX ≠ VΔTECON and/or VΔWSEX ≠ VΔWSCON and therefore limit the utility of the SDIR to accurately estimate VΔTRUE (Atkinson et al., 2019).

The potential influence of adherence and compliance to the prescribed exercise

It is important to note that differences in training adherence (attending the prescribed number of training sessions) and compliance (completing the exercise sessions as prescribed; i.e. achieving the prescribed exercise intensity and/or duration) may also influence the variability in observed responses to exercise training (SDEX). This variability would not be attributable to either VΔTRUE or VΔWSEX, but would represent an additional source of variance in the observed response to an exercise intervention. We have modified equation 4 to include variability in adherence/compliance to exercise training (VΔAD): Importantly, variability in participant adherence/compliance to exercise training (VΔAD) further complicates the assumption that EX and CON only differ by VΔTRUE. Specifically, subtracting SDCON from SDEX would not isolate (VΔTRUE) but instead would result in the following (modified based on equation 7; see above): The added complexity associated with VΔAD requires that trialists implement a standardized approach that considers participant adherence/compliance prior to calculating the SDIR (e.g., only include data from participants that completed> 90% of supervised training sessions). We refer the reader to published articles that have discussed strategies to account for differences in participant adherence and compliance (Smart et al., 2015; Hecksteden et al., 2018).

Inherent limitations that may affect the interpretation of the SDIR

The impact of the external limitations discussed in “External limitations that may affect the interpretation of the SDIR” and “The potential influence of adherence and compliance to the prescribed exercise” sections can be eliminated, or at least reduced, by performing rigorously designed RCTs. However, even in rigorously controlled exercise RCTs, there may be inherent limitations that threaten the assumption that VΔTE and VΔWS are random, and thus are equal between EX and CON. Unlike drug trials that administer placebo to the CON group, participants cannot be blinded to their assigned group in exercise RCTs (Smart et al., 2015; Hecksteden et al., 2018). Non‐blinded group assignment risks introducing performance/participant preference bias (Halpern, 2003; Higgins et al., 2011); a type of bias that causes participants to alter their behavior during the course of an intervention based on the knowledge of, and potential preference toward/against, their assigned group (Halpern, 2003). Thus, it is possible that performance/preference bias results in differences in variability in behavioral changes between EX and CON (Figure 3), which violates the assumption that VΔWS is equal between groups. We have performed two novel analyses in an attempt to determine whether performance/participant preference bias exists in exercise RCTs. First, we synthesized dropout information from several large parallel‐arm exercise RCTs (Table 2). Interestingly, we found that despite similar dropout rates (P = 0.9), significantly more (P < 0.001) CON participants (12.8% of total sample) dropped out due to dissatisfaction with their group assignment than EX participants (3.4% of total sample; see Table 2). This finding is consistent with the assertion that participants prefer to be assigned to EX over CON (Sluijs et al., 2006; Hertogh et al., 2010; Hecksteden et al., 2018) and raises the possibility that exercise RCTs inherently introduce performance/preference bias that may contribute to differences in VΔWS between groups.

Table 2

	Exercise Group	Control Group
Total Number of Participants	966	288
Reasons for Dropout
Dissatisfaction with groupa	33 (3.4%)	37 (12.8%)
No contact	8 (0.8%)	3 (1.0%)
Time commitment	50 (5.2%)	6 (2.1%)
Other	121 (12.5%)	18 (6.3%)
Total Number of Dropouts	212 (21.9%)	64 (22.2%)

We performed 2x2 chi‐squared analyses on the proportion of dropouts (dropouts vs. completers) and the number of participants who dropped out due to dissatisfaction (dropouts due dissatisfaction vs. dropouts not due to dissatisfaction) between EX and CON. References for the six randomized controlled trials: (Ross et al., 2000; Ross et al., 2004; Slentz et al., 2004; Church et al., 2007; Davidson et al., 2009; Ross et al., 2015, b). Percentages are relative to total number of participants within each group.

Significant difference (P < 0.001) between groups.

This table only includes dropout data from exercise and control groups. Groups that followed dietary interventions without a prescribed exercise intervention were excluded from this analysis.

Reasons for dropouts pooled across six large parallel‐arm exercise randomized controlled trials conducted in middle‐aged, overweight/obese adults free of cardiovascular disease and type 2 diabetesb. We performed 2x2 chi‐squared analyses on the proportion of dropouts (dropouts vs. completers) and the number of participants who dropped out due to dissatisfaction (dropouts due dissatisfaction vs. dropouts not due to dissatisfaction) between EX and CON. References for the six randomized controlled trials: (Ross et al., 2000; Ross et al., 2004; Slentz et al., 2004; Church et al., 2007; Davidson et al., 2009; Ross et al., 2015, b). Percentages are relative to total number of participants within each group. Significant difference (P < 0.001) between groups. This table only includes dropout data from exercise and control groups. Groups that followed dietary interventions without a prescribed exercise intervention were excluded from this analysis. Next, in an attempt to test the assumption that VΔWS is equal between EX and CON, and to try to understand the impact of non‐blinding/preference bias in exercise RCTs, we compared the variability in changes in select behavioral factors (parameters of physical activity and diet) from a large exercise RCT (Ross et al., 2013; Ross et al., 2015). Interestingly, we found that the variability in these factors differed between EX and CON groups with moderate–large SDIR effect sizes (Figure 3). Although this analysis is preliminary, it highlights the potential impact of non‐blinding on behavioral factors believed to contribute to VΔWS. Collectively, these analyses highlight the potential impact of non‐blinded group assignment in parallel‐arm exercise RCTs on data quality. Specifically, we believe these results suggest that inherent pitfalls associated with exercise RCTs violate the assumption that VΔWSEX = VΔWSCON. In an attempt to improve the robustness of the SDIR in parallel‐arm exercise RCTs, trialists can use statistical approaches (e.g. outlier removal) to identify participants that may have deviated from the prescribed behaviors. However, it may prove difficult, if not impossible, to measure and account for all sources of VΔWS when attempting to calculate and interpret the SDIR. Parallel‐arm exercise RCTs containing external limitations may deliberately introduce between‐group differences in VΔTE and/or VΔWS, thus violating the assumptions that allow the SDIR to estimate VΔTRUE. Beyond avoidable external limitations, inherent limitations in parallel‐arm exercise RCTs (e.g. inability to blind participants) also risk violating the assumption that VΔWS is equal between EX and CON.

Discussion

In the previous section, we discussed that limitations of parallel‐arm exercise RCTs may invalidate the assumption that VΔTE and VΔWS are equal between EX and CON due to: (1) non‐optimal RCT designs (“External limitations that may affect the interpretation of the SDIR” section), (2) variability in participant adherence/compliance to exercise training (“The potential influence of adherence and compliance to the prescribed exercise” section), and (3) inherent limitations (e.g. inability to blind participants to group assignment; “Inherent limitations that may affect the interpretation of the SDIR” section). Taken together, the previous section suggests that caution is warranted when interpreting the SDIR as an estimate of VΔTRUE in parallel‐arm exercise RCTs. It is important to note that the above‐mentioned limitations are specific to parallel‐arm exercise RCTs. RCTs that are devoid of these limitations (e.g., drug trials where participants can be blinded) may not violate the assumption that VΔTE and VΔWS are equal between EX and CON. Additionally, although acute exercise studies involve non‐blinded participants, these studies are relatively short (e.g. measurements collected at baseline and three hours–postacute exercise (Egan and Zierath, 2013; Perry and Hawley, 2017)) and may not provide enough time for behavioral–environmental differences (i.e., factors contributing to VΔWS) to emerge between EX and CON. To our knowledge, only one acute exercise study has utilized the SDIR (Bonafiglia et al., 2019), highlighting acute exercise as a feasible model for exploring the existence and magnitude of VΔTRUE. Subsequent to establishing the existence of VΔTRUE, researchers can explore potential mechanisms that contribute to interindividual differences in training responsiveness (see “conceptual framework” in (Atkinson and Batterham, 2015)). It is also important to reiterate that the majority of previous reports examining individual responses to exercise training have not included a CON group (Hautala et al., 2006; Vollaard et al., 2009; Astorino and Schubert, 2014; Wolpern et al., 2015; Raleigh et al., 2016; Gurd et al., 2016; Bonafiglia et al., 2016; Astorino et al., 2016; Montero and Lundby, 2017) or analyzed SDCON (Sisson et al., 2009; Ross et al., 2015). In the absence of SDCON, it is impossible to partition the contributions of VΔTRUE and VΔTE/VΔWS as the counterfactual (i.e., an estimate of what would have happened had a participant in EX been allocated to CON) remains unknown (Williamson et al., 2017). Although we suggest that caution is warranted when interpreting the SDIR, failing to consider SDCON represents a larger and more problematic issue in the individual response literature.

Conclusion and Future Directions

The SDIR statistic estimates whether variability in the observed responses to exercise training can be attributed to an effect of VΔTRUE per se (Atkinson and Batterham, 2015). However, external limitations and non‐blinded group assignment may confound the robustness of the SDIR. Therefore, we suggest that future studies consider the potential limitations in parallel‐arm exercise RCTs when interpreting the SDIR as an estimate of VΔTRUE. While the SDIR statistic is relevant to parallel‐arm exercise RCTs, there are other statistical approaches that are useful for clinical/applied settings. Specifically, there are several approaches for estimating whether an individual has benefited from an exercise intervention (Hopkins, 2000; Swinton et al., 2018; Hecksteden et al., 2018; Ross et al., 2019; Bonafiglia et al., 2019). Although these approaches are not able to determine why an individual has/has not benefited following an intervention, they provide information that can be used to guide individualized exercise prescription decision‐making (Bonafiglia et al., 2018). Therefore, although the SDIR is the only statistic able to assess the existence/magnitude of VΔTRUE in parallel‐arm exercise RCTs (Atkinson et al., 2019), different statistical approaches (Hopkins, 2000; Swinton et al., 2018; Hecksteden et al., 2018; Ross et al., 2019; Bonafiglia et al., 2019) can be used in future studies that wish to investigate the application of personalized exercise‐based medicine.

Conflict of Interest

The authors have declared that no conflicts of interests exist.

Data Availability Statement

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, upon request.

68 in total

1. Differences in adaptations to 1 year of aerobic endurance training: individual patterns of nonresponse.

Authors: F Scharhag-Rosenberger; S Walitzek; W Kindermann; T Meyer
Journal: Scand J Med Sci Sports Date: 2010-06-18 Impact factor: 4.221

Review 2. Incidence of nonresponse and individual patterns of response following sprint interval training.

Authors: Brendon J Gurd; Matthew D Giles; Jacob T Bonafiglia; James P Raleigh; John C Boyd; Jasmin K Ma; Jason G E Zelt; Trisha D Scribbans
Journal: Appl Physiol Nutr Metab Date: 2015-11-03 Impact factor: 2.665

Review 3. Issues in the determination of 'responders' and 'non-responders' in physiological research.

Authors: Greg Atkinson; Philip Williamson; Alan M Batterham
Journal: Exp Physiol Date: 2019-06-09 Impact factor: 2.969

4. Refuting the myth of non-response to exercise training: 'non-responders' do respond to higher dose of training.

Authors: David Montero; Carsten Lundby
Journal: J Physiol Date: 2017-05-14 Impact factor: 5.182

5. High responders and low responders: factors associated with individual variation in response to standardized training.

Authors: Theresa N Mann; Robert P Lamberts; Michael I Lambert
Journal: Sports Med Date: 2014-08 Impact factor: 11.136

6. Effects of different doses of physical activity on cardiorespiratory fitness among sedentary, overweight or obese postmenopausal women with elevated blood pressure: a randomized controlled trial.

Authors: Timothy S Church; Conrad P Earnest; James S Skinner; Steven N Blair
Journal: JAMA Date: 2007-05-16 Impact factor: 56.272

7. Inter-Individual Responses of Maximal Oxygen Uptake to Exercise Training: A Critical Review.

Authors: Philip J Williamson; Greg Atkinson; Alan M Batterham
Journal: Sports Med Date: 2017-08 Impact factor: 11.136

Review 8. Sources of Inter-individual Variability in the Therapeutic Response of Blood Glucose Control to Exercise in Type 2 Diabetes: Going Beyond Exercise Dose.

Authors: Thomas P J Solomon
Journal: Front Physiol Date: 2018-07-13 Impact factor: 4.566

9. Precision exercise medicine: understanding exercise response variability.

Authors: Robert Ross; Bret H Goodpaster; Lauren G Koch; Mark A Sarzynski; Wendy M Kohrt; Neil M Johannsen; James S Skinner; Alex Castro; Brian A Irving; Robert C Noland; Lauren M Sparks; Guillaume Spielmann; Andrew G Day; Werner Pitsch; William G Hopkins; Claude Bouchard
Journal: Br J Sports Med Date: 2019-03-12 Impact factor: 13.800

10. Prevalence of Non-responders for Glucose Control Markers after 10 Weeks of High-Intensity Interval Training in Adult Women with Higher and Lower Insulin Resistance.

Authors: Cristian Álvarez; Rodrigo Ramírez-Campillo; Robinson Ramírez-Vélez; Mikel Izquierdo
Journal: Front Physiol Date: 2017-07-06 Impact factor: 4.566

6 in total

1. Hyperglycaemia is associated with impaired muscle signalling and aerobic adaptation to exercise.

Authors: Tara L MacDonald; Pattarawan Pattamaprapanont; Prerana Pathak; Natalie Fernandez; Ellen C Freitas; Samar Hafida; Joanna Mitri; Steven L Britton; Lauren G Koch; Sarah J Lessard
Journal: Nat Metab Date: 2020-07-20

Review 2. An appraisal of the SD_IR as an estimate of true individual differences in training responsiveness in parallel-arm exercise randomized controlled trials.

Authors: Jacob T Bonafiglia; Andrea M Brennan; Robert Ross; Brendon J Gurd
Journal: Physiol Rep Date: 2019-07

3. Dose-Response Matters! - A Perspective on the Exercise Prescription in Exercise-Cognition Research.

Authors: Fabian Herold; Patrick Müller; Thomas Gronwald; Notger G Müller
Journal: Front Psychol Date: 2019-11-01

4. Can non-responders be 'rescued' by increasing exercise intensity? A quasi-experimental trial of individual responses among humans living with pre-diabetes or type 2 diabetes mellitus in Canada.

Authors: Travis J Hrubeniuk; Danielle R Bouchard; Brendon J Gurd; Martin Sénéchal
Journal: BMJ Open Date: 2021-04-05 Impact factor: 2.692

5. Interindividual variability in response to protein and fish oil supplementation in older adults: a randomized controlled trial.

Authors: Caoileann H Murphy; Claire Connolly; Ellen M Flanagan; Kathleen A J Mitchelson; Elena de Marco Castro; Brendan Egan; Lorraine Brennan; Helen M Roche
Journal: J Cachexia Sarcopenia Muscle Date: 2022-02-21 Impact factor: 12.910

6. Estimating heterogeneity of physical function treatment response to caloric restriction among older adults with obesity.

Authors: Daniel P Beavers; Katherine L Hsieh; Dalane W Kitzman; Stephen B Kritchevsky; Stephen P Messier; Rebecca H Neiberg; Barbara J Nicklas; W Jack Rejeski; Kristen M Beavers
Journal: PLoS One Date: 2022-05-05 Impact factor: 3.240