K F Arnold1,2, Gth Ellison1,2, S C Gadd2, J Textor3, Pwg Tennant1,4, A Heppenstall1,5, M S Gilthorpe1,2. 1. 1 Leeds Institute for Data Analytics, University of Leeds, Leeds, UK. 2. 2 School of Medicine, University of Leeds, Leeds, UK. 3. 3 Tumor Immunology Lab, Radboud University Medical Centre, Nijmegen, The Netherlands. 4. 4 School of Healthcare, University of Leeds, Leeds, UK. 5. 5 School of Geography, University of Leeds, Leeds, UK.
Abstract
'Unexplained residuals' models have been used within lifecourse epidemiology to model an exposure measured longitudinally at several time points in relation to a distal outcome. It has been claimed that these models have several advantages, including: the ability to estimate multiple total causal effects in a single model, and additional insight into the effect on the outcome of greater-than-expected increases in the exposure compared to traditional regression methods. We evaluate these properties and prove mathematically how adjustment for confounding variables must be made within this modelling framework. Importantly, we explicitly place unexplained residual models in a causal framework using directed acyclic graphs. This allows for theoretical justification of appropriate confounder adjustment and provides a framework for extending our results to more complex scenarios than those examined in this paper. We also discuss several interpretational issues relating to unexplained residual models within a causal framework. We argue that unexplained residual models offer no additional insights compared to traditional regression methods, and, in fact, are more challenging to implement; moreover, they artificially reduce estimated standard errors. Consequently, we conclude that unexplained residual models, if used, must be implemented with great care.
'Unexplained residuals' models have been used within lifecourse epidemiology to model an exposure measured longitudinally at several time points in relation to a distal outcome. It has been claimed that these models have several advantages, including: the ability to estimate multiple total causal effects in a single model, and additional insight into the effect on the outcome of greater-than-expected increases in the exposure compared to traditional regression methods. We evaluate these properties and prove mathematically how adjustment for confounding variables must be made within this modelling framework. Importantly, we explicitly place unexplained residual models in a causal framework using directed acyclic graphs. This allows for theoretical justification of appropriate confounder adjustment and provides a framework for extending our results to more complex scenarios than those examined in this paper. We also discuss several interpretational issues relating to unexplained residual models within a causal framework. We argue that unexplained residual models offer no additional insights compared to traditional regression methods, and, in fact, are more challenging to implement; moreover, they artificially reduce estimated standard errors. Consequently, we conclude that unexplained residual models, if used, must be implemented with great care.
Within the field of lifecourse epidemiology, there is substantial interest in
modelling the relationship between an exposure x measured
longitudinally at several time points (i.e. ) and a subsequent outcome y measured once later
in life (hereafter referred to as a distal outcome); such a relationship can be
helpfully summarised in Figure
1(a) in the form of a directed acyclic graph (DAG).[1] DAGs are pictorial representations of hypothesised causal relationships
between variables in which: variables (nodes) are connected via unidirectional
arrows (directed edges), which represent direct causal relationships; and no
directed loops (i.e. circular paths) between variables are permitted. Nodes may be
either: endogenous, having at least one causally preceding variable represented in
the graph; or exogenous, having none.[2] All unexplained causes of the endogenous nodes in Figure
1(a) are represented by the variables , respectively. While there are many useful applications for DAGs
in epidemiologic research, perhaps the most beneficial is their ability to identify
suitable sets of covariates for removing bias due to confounding between an exposure
and outcome,[3,4] which occurs
whenever both variables share one or more common causes. For this reason, DAGs are
increasingly being used in epidemiology, as they provide a framework for estimating
the total causal effect of an exposure on an outcome.[4]
Figure 1.
(a) Nonparametric causal diagram (DAG) representing the hypothesised
data-generating process for k longitudinal measurements
of exposure x (i.e.
x,x ,…, x)
and one distal outcome y . The terms
e,…,e
and e represent all unexplained causes of
x,…,x
and y , respectively, and are included to explicitly
reflect uncertainty in all endogenous nodes (whether modelled or not).
(b) Path diagrams depicting the k standard regression
models that would be constructed to estimate the total causal effect of
each of
x1,x2,…,x
on y (i.e. equation (5)). For each model,
only the final coefficient may be interpreted as a total causal effect;
all other coefficients are greyed to illustrate that no such
interpretation should be made for them. (c) Path diagrams depicting the
UR model, consisting of k − 1 preparation regressions
(i.e. equation (6)) and a final
composite regression model (i.e. equation (7), with
i = k ).
(a) Nonparametric causal diagram (DAG) representing the hypothesised
data-generating process for k longitudinal measurements
of exposure x (i.e.
x,x ,…, x)
and one distal outcome y . The terms
e,…,e
and e represent all unexplained causes of
x,…,x
and y , respectively, and are included to explicitly
reflect uncertainty in all endogenous nodes (whether modelled or not).
(b) Path diagrams depicting the k standard regression
models that would be constructed to estimate the total causal effect of
each of
x1,x2,…,x
on y (i.e. equation (5)). For each model,
only the final coefficient may be interpreted as a total causal effect;
all other coefficients are greyed to illustrate that no such
interpretation should be made for them. (c) Path diagrams depicting the
UR model, consisting of k − 1 preparation regressions
(i.e. equation (6)) and a final
composite regression model (i.e. equation (7), with
i = k ).Using a causal framework to (correctly) model the scenario in Figure 1(a) may also have additional utility
in identifying and quantifying important periods of change in the exposure that are
causally related to the outcome. However, one challenge to such applications is that
successive measurements of an exposure over time may be highly correlated with one
another and therefore likely to suffer collinearity when analysed in relation to a
distal outcome. Consequently, there has been extensive debate regarding the best way
to model these types of longitudinally measured variables; a recent review[5] of analytical and modelling techniques has identified a range of different
approaches, including z-score plots, regression with change scores, multilevel and
latent growth curve models, and growth mixture models. Nonetheless, one of the most
straightforward methods in use is a series of standard multivariable regression
models.
1.1 Standard regression method
When using this approach, each longitudinal measurement of the exposure variable
is treated as a separate entity that is subject to confounding by all previous
measurements of that variable – the total number of models needed therefore
being equal to the total number of time points at which the exposure has been
measured.As an example, the simplest scenario would involve just two measurements of the
exposure x (i.e. x1 and
x2, measured at time points 1 and 2,
respectively), and a distal outcome, y, where all variables are
continuous in nature. Here, two standard regression models (denoted
, for ) would need to be constructed to estimate the total causal
effect of each distinct measurement of x on y,
i.e.Importantly, to estimate the total causal effect of
x1 on y in equation (1),
adjustment for x2 is inappropriate, as it lies on
the causal path between x1 and y
(i.e. x2 is a mediator); in fact, adjustment for
x2 might invoke bias in the causal
interpretation due to a phenomenon known as the ‘reversal paradox’.[5-7] In contrast, to estimate the
total causal effect of x2 on y in
equation (2), adjustment for x1
is appropriate, since it confounds the desired relationship
(i.e. x1 causally precedes both
x2 and y, potentially creating
a spurious relationship between them). For this reason, in either model, it is
only possible to interpret the coefficient of the last/most
recent measurement of x (the exposure) as a total causal effect,[1] which encompasses all direct and indirect causal pathways between an
exposure and outcome. No such interpretation is possible (nor should it be
attempted) for the coefficient of the earlier measurement of x
in equation (2), as it operates purely as a confounder.
1.2 Unexplained residuals method
To circumvent the need for multiple models, Keijzer-Veen[8] has suggested an alternative approach that would combine the information
contained within each of the two separate models (equations (1) and
(2)) into a single composite regression model using ‘unexplained
residuals’. As originally proposed,[9] such a model allows the researcher to quantify the total effects of both
the initial measurement of x (i.e. and subsequent change in x on the outcome
y. The proposed approach contains two steps but is
straightforward in principle.First, the most recent measurement of x (i.e.
x2) is regressed on the earlier measurement of
x (i.e. x1):This produces a measure of each observation’s ‘expected’ value of
x2 as predicted by its value of
x1. The difference between the expected value of
x2 (i.e. ) and the observed value of x2
amounts to the residual term . Put another way, represents the part of x2
‘unexplained’ by x1.Second, y is regressed on both the initial exposure
x1 and subsequent residual termAccording to Keijzer-Veen et al.,[9] the key advantages of conducting regression using the composite
‘unexplained residuals’ (UR) model (4) are that:The UR model produces the same estimated outcome values as the
standard regression model in equation (2) (i.e.
);The estimated total effect sizes (coefficient values) produced by
individual standard regression models (equations (1) and (2)) are equal to
those estimated within the UR model (i.e. and ); thus, multiple coefficients in a single model
may be interpreted;The UR model provides additional insight (via the coefficient
in equation (4)) into the effect
of x increasing more than expected
upon y; andThe initial exposure x1 and subsequent
residual term are mathematically independent (i.e.
orthogonal).Succinctly, the two models and are algebraically equivalent, but makes interpretation of the separate influence of the initial
measurement of the exposure x (i.e.
x1) and subsequent changes in
x more straightforward than do (multiple) standard
regression models and .Within the epidemiological literature, UR models have been used under a number of
different names. In addition to ‘regression with unexplained residuals’ (as
first proposed by Keijzer-Veen et al.[9-11]), other studies have
referred to: ‘unexplained residual regression’[12]; ‘method of unexplained residuals’[13]; ‘conditional linear regression’[12]; ‘conditional (regression) models’[5,14]; ‘regression with
conditional growth measures’[14]; ‘conditional growth models’[15-18]; ‘conditional weight models’[19]; and ‘conditional (regression) analysis’.[20-24] The terms ‘conditional
growth’ and ‘conditional size’ – and additional variations thereof – are also
commonly used to refer to the difference between observed and expected size
measurements.[5,15,18,25-39] To avoid further
confusion, the residual term representing the difference between the observed
and expected values of an exposure produced in the manner proposed by
Keijzer-Veen et al. (as in equation (3)) will be henceforth referred
to as the ‘unexplained residuals (UR) term’, and the models
themselves (as in equation (4)) will be referred to as
‘unexplained residuals (UR) models’.Despite the numerous names given to these models, the process remains essentially
the same as that first proposed. Indeed, several authors have extended the
original model to examine scenarios involving several measurements of an
exposure x (i.e. ); UR models in these extended applications thus include
several UR terms.[5,12,13,16-41] In general, each UR term
is derived from the regression of each measured value
x on all previous measurements
, for ,[12,16,18-22,24,25,27,29,31-34,36,39,40] though some researchers
have deviated from this procedure[13,26,35,37,41]; the outcome
y is then regressed on x1 and
all subsequent UR terms .Many researchers have further extended the original UR models by adjusting for
additional confounding variables (i.e. over and above the confounding of prior
measurements of the exposure), though there is, as yet, little consensus as to
whether or how such adjustments should be performed. For example, Horta et al.[16] made no adjustments for potential confounders when deriving their UR
terms, but did make adjustments within their composite UR model. In contrast,
Gandhi et al.[18] adjusted for just one potential confounder (gender) when creating their
UR terms, but also made further adjustments to the composite UR model (for
gender and other variables). Adair et al.[25] created their UR terms using site- and sex-stratified linear regressions
that were also adjusted for age, and made further adjustments for age, sex, and
study site in their subsequent composite UR models. Indeed, there are many other
examples of different approaches to confounder adjustment, but none of these
have been adequately and explicitly justified by the researchers concerned, even
though it appears that they did so in order to make causal inferences.
2 Research aims
The potential impact of using alternative approaches to adjust for confounding when
constructing and using UR terms has yet to be fully evaluated. Indeed, Keijzer-Veen et al.[9] did not address confounding variables in their original paper, and there has
been little to no discussion or analysis of this issue by subsequent authors using
this approach. It therefore remains unclear whether UR models that include
confounders offer the same purported benefits as those lacking (or ignoring)
confounders, and there is no clear indication of how potential confounders should be
treated by analyses using these models. This is an issue of particular relevance to
researchers seeking to infer causality from individual coefficient estimates, since
inappropriate adjustment for covariates (which includes both the failure to adjust
for genuine confounders and the adjustment for mediators mistaken for confounders)
can lead to biased causal inferences. For this reason, UR models are likely to have
limited practical utility unless they are able to accommodate confounding variables
appropriately. The fact that UR models have not been developed or analysed within a
causal framework also creates uncertainty about their utility for making causal
inferences.Therefore, the aims of the present study were to: (1) confirm that the approach
proposed by Keijzer-Veen et al. may be extended to a scenario involving
k longitudinal measurements of an exposure x
in the absence of any additional confounding; (2) determine whether it is possible
(and if so, how might it be possible) to adjust for additional confounders within
the UR modelling framework; (3) evaluate the benefits of UR models claimed by
Keijzer-Veen et al.; and (4) offer recommendations for future use of UR models The
present study examines two very different types of potential confounders:
time-invariant (which require/provide measurements taken at a single time point and
remain constant across the lifecourse, e.g. sex); and time-varying (for which
measurements are collected at multiple time points across the lifecourse – usually
concurrent to measurements of the exposure – because the value of the variable may
change, e.g. socioeconomic position).These aims are summarised in the DAGs presented in Figures 1(a), 2(a), and 3(a), which depict three general scenarios
drawn from lifecourse epidemiology, each of which will be examined in the analyses
that follow. Each DAG relates k longitudinally measured exposure
variables (i.e. x measured at time points ) to a distal outcome y (measured at some point
either concurrent to or following k) under three very different
circumstances: (1a) in the absence of any additional confounders; (2a) in the
presence of an additional time-invariant confounder m; and (3a) in
the presence of an additional time-varying confounder . All DAGs are drawn forwardly saturated (i.e. where each node may
causally affect all future nodes), and all unexplained causes of endogenous nodes
are represented by the variable e and depicted as independent (i.e.
we assume no unobserved confounding). The explicit inclusion of these three DAGs in
Figures 1(a), 2(a), and 3(a) is intended not only to visually
illustrate each of the scenarios that will be examined, but also, importantly, to
situate the analyses that follow within a causal framework.
Figure 2.
(a) Nonparametric causal diagram (DAG) representing the hypothesised
data-generating process for k longitudinal measurements
of exposure x (i.e.
x1,x2,…,x),
one distal outcome y , and one time-invariant
confounder m . The terms
e,
e1,…,e
and e represent all unexplained causes of
m,
x1,…,x,
and y, respectively, and are included to explicitly
reflect uncertainty in all endogenous nodes (whether modelled or
not).(b) Path diagrams depicting the k standard
regression models that would be constructed to estimate the total causal
effect of each of
x1,x2,…,x
on y (i.e. equation (9)). For each model,
only the final coefficient may be interpreted as a total causal effect;
all other coefficients are greyed to illustrate that no such
interpretation should be made for them. (c) Path diagrams depicting the
UR model, consisting of k − 1 preparation regressions
(i.e. equation (10)) and a final
composite regression model (i.e. equation (11), with
i = k ).
Figure 3.
(a) Nonparametric causal diagram (DAG) representing the hypothesised
data-generating process for k longitudinal measurements
of exposure x (i.e.
x1,x2,…,x ), one distal outcome y, and
k longitudinal measurements of one time-varying
confounder
m1,m2,…,m . The terms
e, …, e,
e,…,e
and e represent all unexplained causes of
m,…, m,
x
,…, x, and y,
respectively, and are included to explicitly reflect uncertainty in all
endogenous nodes (whether modelled or not). (b) Path diagrams depicting
the k standard regression models that would be
constructed to estimate the total causal effect of each of
x, x ,…, x
on y (i.e. equation (12)). For each model,
only the final coefficient may be interpreted as a total causal effect;
all other coefficients are greyed to illustrate that no such
interpretation should be made for them. (c) Path diagrams depicting the
UR model, consisting of 2(k − 1) preparation
regressions (i.e. equations (13) and (14)) and a final composite regression model (i.e. equation
(15), with i = k ).
(a) Nonparametric causal diagram (DAG) representing the hypothesised
data-generating process for k longitudinal measurements
of exposure x (i.e.
x1,x2,…,x),
one distal outcome y , and one time-invariant
confounder m . The terms
e,
e1,…,e
and e represent all unexplained causes of
m,
x1,…,x,
and y, respectively, and are included to explicitly
reflect uncertainty in all endogenous nodes (whether modelled or
not).(b) Path diagrams depicting the k standard
regression models that would be constructed to estimate the total causal
effect of each of
x1,x2,…,x
on y (i.e. equation (9)). For each model,
only the final coefficient may be interpreted as a total causal effect;
all other coefficients are greyed to illustrate that no such
interpretation should be made for them. (c) Path diagrams depicting the
UR model, consisting of k − 1 preparation regressions
(i.e. equation (10)) and a final
composite regression model (i.e. equation (11), with
i = k ).(a) Nonparametric causal diagram (DAG) representing the hypothesised
data-generating process for k longitudinal measurements
of exposure x (i.e.
x1,x2,…,x ), one distal outcome y, and
k longitudinal measurements of one time-varying
confounder
m1,m2,…,m . The terms
e, …, e,
e,…,e
and e represent all unexplained causes of
m,…, m,
x
,…, x, and y,
respectively, and are included to explicitly reflect uncertainty in all
endogenous nodes (whether modelled or not). (b) Path diagrams depicting
the k standard regression models that would be
constructed to estimate the total causal effect of each of
x, x ,…, x
on y (i.e. equation (12)). For each model,
only the final coefficient may be interpreted as a total causal effect;
all other coefficients are greyed to illustrate that no such
interpretation should be made for them. (c) Path diagrams depicting the
UR model, consisting of 2(k − 1) preparation
regressions (i.e. equations (13) and (14)) and a final composite regression model (i.e. equation
(15), with i = k ).Sections 3 through 9, which follow, provide: the three key properties of UR models
that will be evaluated for the scenarios in Figures 1(a), 2(a), and 3(a) (§3); DAG-based and mathematical
examinations of the UR models for the scenarios given in Figure 1(a) (§4), 2(a) (§5), and 3(a) (§6); a discussion of several
interpretational issues that arise for UR models when placed within a causal
framework, including an evaluation of the claim that UR models provide greater
insight than standard regression methods (§7); an argument outlining how UR models
produce artificially reduced standard errors (SEs) and how this might be corrected
(§8); and recommendations for future use and interpretation of UR models,
particularly as these relate to the inclusion of confounders (§9).
3 Key properties of UR models
In the following sections, we evaluate the mathematical properties of the original UR
models after extending them to include k measurements of a
continuous exposure x: in the absence of any additional confounding
(§4); in the presence of a single additional time-invariant confounder
m (§5); and in the presence of a single additional time-varying
confounder with sequential values (§6). These properties are:Property (i): The outcome values predicted by the final
standard regression model (for the final measurement of the exposure
variable, x) are equal to those predicted
by the composite UR model.Property (ii): The estimated coefficient for
x1 in the initial standard regression
model (for the first measurement of the exposure variable,
x1) is equal to the estimated
coefficient for x1 in the composite UR
model.Property (iii): The estimated coefficient for each
x in its individual standard
regression model (i.e. for designated exposure
x) is equal to the estimated
coefficient for the corresponding UR term in the composite UR model.From a causal inference perspective, only Properties (ii) and (iii) are meaningful,
since the focus is on individual coefficient estimates as opposed to predicted
outcomes. Nevertheless, we evaluate all three properties in Sections 4 through 6,
and leave discussion of interpretational issues until later in the paper (§8).
4 UR models: No confounders (Figure 1(a))
Before considering any additional confounding variables, we first consider the
straightforward scenario depicted in Figure 1(a). We provide: definitions of the
standard regression models, UR terms, and UR models (§4.1); an analysis of UR models
within a causal framework (§4.2); and arguments for why Properties (i)–(iii) are
upheld (§4.3).
4.1 Definitions
We define the ordinary least-squares (OLS) regression model for estimating the total causal effect of each measurement of
the exposure variable x (for ) on y as:A visual depiction of equation (5) is given in Figure 1(b). Because the
relationship between each x and y
is confounded by all previous measurements of x (i.e.
), these covariates must be adjusted for. However, as discussed
in Section 1, only the coefficient of the last/most recent measurement of
x (i.e. ) may be interpreted as a total causal effect.To create UR terms according to the process established by Keijzer-Veen et al.,[9] each measurement of the exposure x is
regressed on all previous measurements ofx (for ):The UR term thus represents the difference between the actual value of
x and the value of
x as predicted by all previous measurements of
x.Lastly, we define the UR model (for ), which represents the outcome y as function
of the initial value of the exposure x1 and
subsequent ‘unexplained’ increasesThe composite UR model thus represents the outcome y as function of
the initial value of the exposure x1 and
all subsequent ‘unexplained’ increases . The UR modelling process is summarised in Figure 1(c), depicting
regressions of x on
(equation (6)) and one composite UR
regression model (equation (7), with ).
4.2 A causal framework
Within the causal framework provided by Figure 1(a), the unique properties of UR
models can be visualised. If we were naively to model simultaneously, only the coefficient of the final measurement
x could be interpreted as a total causal
effect on y; the coefficients of would represent only the direct effects of each measurement on
y, because all future measurements would fully mediate the
respective relationship and all backdoor paths[1] would be blocked by preceding measurements. However, by modelling
(as in a UR model), we encounter no mediation problems due to
the fact that, by construction, the UR terms remain wholly independent of the
other terms in the model. In fact, by placing the UR model in a causal
framework, we are able to see that the UR terms are essentially instrumental variables (IVs)[42] for , respectively, which have been produced by the modelling
process (Note: The process has similarities with the two-stage least squares
regression method,[43] a form of instrumental variable analysis commonly encountered in
economics research).All techniques based on linear regression, including UR models, assume that the
causal relationships between variables are linear functions. If that is the
case, we may parameterise a DAG (as in Figure 1(a)) by assigning a single
coefficient to every arrow and assuming all variables to have a variance of one.
The method of path coefficients[44] then allows us to determine the ‘true’ total causal effects in the data
generating process. Take x2 as an example, where
. The total effect of x2 on
y encompasses the direct effect from and all indirect effects (of which there is only one in this
scenario): . We introduce the notation to represent the coefficient of the arrow . Table
1 gives the total effects of x2 on
y and of on y, with both total effects decomposed into
their respective direct and indirect effects. From Table 1, we see that the total effect
of x2 on y is equal to the total
effect of on y; this is because there are no direct
paths between and y, and all indirect paths pass through
x2 (with being equal to one, as in Figure 1(c)).
Table 1.
Total effect of x2 on y
estimated by a standard regression model compared to total effect of
on y estimated by an equivalent
UR model (Figure
1(a), with ).
Exposure
Path
Effect size
Total effect
x2
Direct:
x2→y
pyx2
pyx2+px3x2·pyx3
Indirect:
x2→x3→y
px3x2·pyx3
ex2
Direct:
n/a
pyx2+px3x2·pyx3
Indirect:
ex2→x2→y
px2ex2·pyx2
ex2→x2→x3→y
px2ex2·px3x2·pyx3
Total effect of x2 on y
estimated by a standard regression model compared to total effect of
on y estimated by an equivalent
UR model (Figure
1(a), with ).
4.3 Covariate orthogonality and Properties (i)–(iii)
In addition to the graph-based approach in the preceding section, we are able to
prove mathematically that Properties (i)–(iii) are upheld for the scenario given
in Figure 1(a). In
summary, these properties are:Property (i):Property (ii):Property (iii):Equations (5) to (7) are summarised in Table 2; the standard regression models
(for ) and composite UR model (in which the UR terms have been produced via the regression
of each measurement of x on all previous measurements, as in
equation (5)) contained therein are guaranteed to satisfy Properties
(i)–(iii). These properties of UR models rely crucially on all UR terms
being orthogonal to all other covariates in the composite UR
model .
Table 2.
For the scenario depicted in Figure 1(a), the standard
regression model necessary for estimating the total causal effect
of each exposure x on
y, and the corresponding UR model
, for .
Standard regression model y^S(i)
UR model y^UR(i)
i=1:
α^0(1)+α^x1(1)x1
λ^0(1)+λ^x1(1)x1
i=2:
α^0(2)+α^x1(2)x1+α^x2(2)x2
λ^0(2)+λ^x1(2)x1+λ^ex2(2)ex2
⋮
⋮
⋮
i=k:
α^0(k)+α^x1(k)x1+α^x2(k)x2+…+α^xk(k)xk
λ^0(k)+λ^x1(k)x1+λ^ex2(k)ex2+…+λ^exk(k)exk
For the scenario depicted in Figure 1(a), the standard
regression model necessary for estimating the total causal effect
of each exposure x on
y, and the corresponding UR model
, for .We illustrate this property, and explain how it is exploited to ensure Properties
(i)–(iii) are upheld. Formal proofs are provided in online supplementary
Appendix 1.In Table 2, note that
each regression model (for both the standard and UR methods) contains one more
covariate than the model preceding it. In the column of standard regression
models, each row contains an additional x term; in
the column of UR models, each row contains an additional term.Typically, the inclusion of an additional covariate in a regression model changes
the coefficient(s) estimated for other covariates because their covariance would
be nonzero. For example, the addition of x2 in
will undoubtedly change the estimated coefficient for
x1 in compared to , because x1 and
x2 are two measurements of the same variable and
thus will have a nonzero covariance (i.e. correlation ≠ 0). This nonzero
covariance is what is exploited by adjustment for confounders – if two
covariates did not covary, then adjustment would not be necessary in the first
place.However, a UR model upholds Properties (ii) and (iii) specifically because its
covariates do not covary. The addition of in does not change the estimated coefficient for
x1 in compared to because x1 and are orthogonal (i.e. correlation = 0). This orthogonality is
ensured as an artefact of OLS regression; because represents the residual term from the regression of
x2 on x1 by
definition (equation (6)), it is guaranteed to be
orthogonal to x1.In fact, it can easily be shown that all UR terms are orthogonal to one another by construction. For any UR term
, it holds that is orthogonal to . Because preceding UR terms are themselves linear combinations of (equation (6)), it follows that
is orthogonal to , for . Using this information, we can easily conclude that the
addition of subsequent UR terms in the set of UR models in Table 2 will leave the
coefficients of all other covariates unchanged. Thus, it only remains to be
shown that the estimated coefficients for x1 and the
UR terms are themselves equivalent to the coefficients for
as estimated in their individual standard regression models,
respectively.
Property (i):
First, it must be noted that each UR model is nothing more than a
reparameterisation of the corresponding standard regression model (i.e.
for each row in Table 2). Each standard regression
model represents y as a function of
. In contrast, each UR model represents y as a function of
. However, is itself a function of (equation (5)), and thus it follows
that the UR model itself is also a function of . Because and are both functions of the same covariates, it follows that
, thereby satisfying Property (i).
Property (ii):
It is trivially true that the coefficients estimated for
x1 in the first standard regression model
and corresponding UR model will be equal (i.e. ) because the models are themselves equivalent. All
subsequent UR terms are orthogonal to x1 and to
one another; therefore, it follows that the estimated coefficient of
x1 will be equivalent for all UR models in
Table 1
(i.e. ). This ensures that the coefficient of
x1 in (which represents the total effect of
x1 on y) will be unchanged
in the composite UR model (i.e. ).
Property (iii):
Lastly, we can show that the coefficient for (i.e. ) in a UR model is equal to the estimated total effect of
x (i.e. ) in the corresponding standard regression model. To this
end, we consider the following standard regression and corresponding UR
models, respectively:We may set these two equations equal to one another (due to Property (i)),
substitute the expansions for (equation (5)) into the UR model and
rearrange, thereby producing:From equation (8) above, it becomes clear
that the coefficients for x in and in are equal (i.e. ). Again, we invoke the property of orthogonality to
conclude that the estimated coefficient for will be equivalent for all UR models in Table 2 (i.e.
). This ensures that the coefficient of in (which represents the total effect of
x on y) will be
unchanged in the composite UR model (i.e. ).
5 UR models: Time-invariant confounder (Figure 2(a))
We next consider the scenario in Figure 2(a), in which a time-invariant covariate m
confounds the relationship between and y. This section is structured similarly to
the preceding one. We provide: definitions of the standard regression models, UR
terms, and UR models, all adjusted for the confounder m based upon
the DAG in Figure 2(a)
(§5.1); an analysis of UR models within a causal framework (§5.2); arguments for why
Properties (i)–(iii) are upheld when the defined adjustments for m
have been made (§5.3); and a discussion regarding the implications of insufficient
adjustment for m (§5.4).
5.1 Definitions (with correct adjustment for )
Using the DAG in Figure
2(a) as guidance, we extend the original definitions of the standard
regression models, UR terms, and UR models (equations (5) to
(7), respectively) to properly account for the confounding effect of
m, a time-invariant covariate.We define the OLS regression model for estimating the total causal effect of each measurement of
the exposure variable x (for ) on y as:Because the relationship between each x and
y is confounded by all previous measurements of
x (i.e. ) and m, these covariates must be adjusted for
to obtain an inferentially unbiased estimate of the total causal effect of each
measurement of the exposure. As previously, only the coefficient of the
last/most recent measurement of x (i.e. ) may be interpreted as a total causal effect.We further extend the process of Keijzer-Veen et al.[9] to create UR terms for this scenario. As is evident, the relationship
between each measurement of the exposure variable x
and all previous measurements is confounded by m (for ); thus, adjustment for m is necessary:Therefore, the UR term represents the difference between the actual value of
x and the value of
x as predicted by all previous measurements
, adjusted for the confounding effect of
m.Finally, we define the UR model (for ); this model must be also be adjusted for m,
since m confounds the relationship between
x1 andThe composite UR model thus represents the outcome y as function of
the initial value of the exposure x1, all subsequent
‘unexplained’ increases , and the time-invariant confounder m.As in the preceding section, visual depictions of the previous equations are
provided, with Figure
2(b) corresponding to equation (8) and Figure 2(c) corresponding to equation
(8) and equation (9) (with ).
5.2 A causal framework
We may easily extend the reasoning from the previous scenario (§4.2) to explain
why the UR model (equation (11)) satisfies Properties
(i)–(iii) before resorting to mathematics, by considering the diagram in Figure 2(a) as a path
diagram. A regression model containing all of (as in equation (9)) would only allow for the
interpretation of the coefficient of x as a total
causal effect on y; the coefficients of would represent only the direct effects of each measurement on
y, because all future measurements would mediate the
respective relationship and all backdoor paths would be blocked by preceding
measurements (including m). Within the UR model, the
independence of all UR terms ensures no mediating paths are blocked, and the only backdoor
path between x1 and y is blocked by
m.
5.3 Covariate orthogonality and Properties (i)–(iii)
In addition to the graph-based approach in the preceding section (§5.2), we are
able to illustrate mathematically that adjustment for m both
when generating each UR term (equation (10)) and in the composite UR
model (Eq.11) will result in Properties (i)–(iii) being satisfied. Note that
the scenario depicted in Figure
2(a) is nearly indistinguishable, both visually and mathematically,
from the scenario in Figure
1(a). The confounder m (which affects
y and all measurements of x) could be
reimagined as variable x0; viewed in this way, the
need for its adjustment becomes clear and the proofs from the previous section
apply with only minor notational adjustments. Even though a distinction must be
drawn between exposure variables and confounding variables within a causal
framework, OLS regression treats both equivalently (i.e. as ‘independent
variables’). Therefore, we give a brief outline only of how the adjustments
deemed necessary by the causal diagram in Figure 2(a) will result in Properties
(i)–(iii) being upheld and attach the formal mathematical proofs in online
supplementary Appendix 2.Equations (9) to (11), which are summarised in
Table 3, are
guaranteed satisfy Properties (i)–(iii). As in the previous scenario (§4.3),
each regression model (for both the standard and UR methods) in Table 3 contains one
more covariate than the model preceding it – an additional
x term in the column of standard regression
models, and an additional term in the column of UR models. Proofs for the previous
scenario relied on the property of each UR term being orthogonal to all
preceding terms in the model. Adjustment for m when generating
each UR term (equation (10)) guarantees that this
property will be upheld, because it ensures that is orthogonal to m in addition to
; this cannot be guaranteed without explicit adjustment for
m. Furthermore, adjustment for m in each
UR model in Table 3
ensures that for each row in Table 3.
Table 3.
For the scenario depicted in Figure 2(a), the standard
regression model necessary for estimating the total causal effect
of each exposure x on
y, and the corresponding UR model
, for .
For the scenario depicted in Figure 2(a), the standard
regression model necessary for estimating the total causal effect
of each exposure x on
y, and the corresponding UR model
, for .
5.4 Incorrect adjustment for
We have used the causal diagram in Figure 2(a) to argue for the necessity of
adjusting for a time-invariant confounder m during both stages
of the UR modelling process, and have demonstrated how such adjustments will
produce a composite UR model that satisfies Properties (i)–(iii), as
Keijzer-Veen et al. intended. We now consider the implications of insufficient
adjustment.Without adjustment for m when generating each UR term
, the coefficients of (i.e. , for and ) and the UR term will absorb the effect of the omitted
variable m on x, thereby biasing
the total effect of estimated within the UR model (so-called ‘omitted variable
bias’). Further, it is evident that m confounds the
relationship between x1 and y, so
that failure to adjust for m in the composite UR model will
produce different predicted outcome values and bias the estimated coefficient of
x1.
6 UR models: Time-varying confounder (Figure 3(a))
Finally, we consider the scenario in Figure 3(a), in which a time-varying covariate confounds the relationship between and y.In this section, we again provide: definitions of the standard regression models, UR
terms, and UR models, all adjusted for the confounder based upon the DAG in Figure 3(a) (§6.1); an analysis of UR models
within a causal framework (§6.2); arguments for why Properties (i)–(iii) are upheld
when the defined adjustments for have been made (§6.3); and a discussion regarding the implications
of insufficient adjustment for (§6.4).
6.1 Definitions (with correct adjustment for )
Using the DAG in Figure
3(a), we extend the original definitions of the standard regression
models, UR terms, and UR models (equations (5) to (7), respectively) to
properly account for the confounding effect of , a time-varying covariate.We define the OLS regression model for estimating the total causal effect of each measurement of
the exposure variable x (for ) on y as:The relationship between each x and
y is not only confounded by all previous values of the
exposure but also by the current measurement and all previous
measurements of the confounder . Therefore, adjustment for is necessary to obtain an inferentially unbiased estimate of
the total causal effect of each measurement of the exposure. We reiterate that
only the coefficient of the last/most recent measurement of x
(i.e. ) may be interpreted as a total causal effect.Extending the process of Keijzer-Veen et al.[9] to create UR terms for each measurement of the exposure
x in this scenario necessitates adjustment
for the current measurement and all previous measurements of the confounder
(for ), since these variables confound the relationship between each
measurement of the exposure variable x and all
previous measurements , i.e.:In this way, represents the difference between the observed value of
x and the value of
x as predicted by all previous measurements
, adjusted for the confounding effects of
.As we have demonstrated previously (§4.3, §5.3), UR models rely upon the
orthogonality of terms in the composite UR model. This necessitates the creation
of UR terms for each measurement of the time-varying confounding variable
m (for ) in a similar manner to that of the UR terms (equation (13)). Each is derived from the OLS regression of
m on all previous values of the confounder
, as well as all previous values of the exposure
which confound this relationship:Thus, has a similar interpretation to the original UR term
, in that it represents the part of
m unexplained by all previous values
, adjusted for the confounding effects of
.Lastly, we define the UR model (for ) as a function of the initial value of the confounder
m1 and its subsequent ‘unexplained’ increases
, and the initial value of the exposure
x1 and its subsequent ‘unexplained’ increasesAs previously, visual depictions of these equations are provided. Figure 3(b) corresponds to
the standard regression models given by equation (12); Figure 3(c) corresponds to the
regressions of x on all preceding
measurements of x and m (equation (13)),
the regressions of m on all preceding
measurements of x and m (equation (14)),
and one composite UR regression model (equation (15), with ).
6.2 A causal framework
The similarities amongst the three causal scenarios depicted in Figures 1(a), 2(a), and 3(a) are evident, and shed
light on how the reasoning from the previous scenarios (§4.2 and §5.2) can be
extended to demonstrate why the UR model in equation (15)
satisfies Properties (i)–(iii). In a regression model containing all of
(as in equation (12), with ), only the coefficient of x could
be interpreted as a total causal effect on y; the coefficients
of may only be interpreted as the direct effects of each
measurement of the exposure on y, because all future
measurements of both x and m would fully
mediate the respective relationship and all preceding measurements of
x and m would block all backdoor paths.
Within the UR model, however, the independence of all UR terms for both the
exposure (i.e. ) and confounder (i.e. ) ensures no mediating paths are blocked, and the only backdoor
path between x1 and y is blocked by
m1.
6.3 Covariate orthogonality and Properties (i)–(iii)
In addition to the graph-based approach in the preceding section (§6.2), we can
illustrate mathematically that the standard regression models (equation (12)), UR terms for measurements
of the exposure (equation (13)) and confounder (equation
(14)), and composite UR model (equation (15), with ) satisfy Properties (i)–(iii). Although seemingly more
complex, the scenario depicted in Figure 3(a) also has very little to
distinguish it from the scenarios in Figures 1(a) and 2(a). The confounder
m1, being the only exogenous node on the graph,
could be imagined as variable x0, with all nodes
subsequent to x1 having an associated UR term.
Viewed as such, the necessity of adjusting for m1
and creating UR terms for both the exposure and the time-varying confounder
becomes apparent, as the causal diagram in Figure 3(a) is equivalent to that of
Figure 2(a) with
minor notational adjustments. Therefore, we provide only a brief outline of how
the adjustments deemed necessary by the causal diagrams in Figure 3(a) will result in Properties
(i)–(iii) being upheld; formal mathematical proofs are provided in online
supplementary Appendix 3.Equations (12) to (15) are summarised in Table 4 and are
guaranteed to satisfy Properties (i)–(iii). In contrast to previous scenarios
(§4.3 and §5.3), each regression model (for both the standard and UR models)
contains two more covariates than the model preceding it. In
the column of standard regression models, each row contains an additional
x and m term;
in the column of UR models, each row contains an additional and term. Thus, for Properties (i)–(iii) to be upheld in in each
UR model , these two additional terms must be orthogonal to one another
and to all preceding terms.
Table 4.
For the scenario depicted in Figure 3(a), the standard
regression model necessary for estimating the total causal effect
of each exposure x on
y, and the corresponding UR model
, for .
For the scenario depicted in Figure 3(a), the standard
regression model necessary for estimating the total causal effect
of each exposure x on
y, and the corresponding UR model
, for .Proving this is relatively straightforward. For any UR term for the confounder, it holds that is orthogonal to by construction (equation (14)). Because preceding UR
terms (equation (13)) and (equation (14)) may be expressed as linear
combinations of , it follows that is orthogonal to . Furthermore, for any UR term for the exposure, it holds that is orthogonal to by construction (equation (13)). Because preceding UR
terms (equation (13)) and (equation (14)) may be expressed as linear
combinations of , it follows that is orthogonal to . Thus, we are able to conclude that and are orthogonal to one another and to all preceding terms in
for any UR model ; adjustment for all causally preceding measurements of both
m and x when generating UR terms for both
the confounder and the exposure ensures this orthogonality.
6.4 Incorrect adjustment for
The DAG in Figure 3(a)
demonstrates the necessity of adjusting for a time-varying confounder
in the manner described in Section 6.1, and we have
demonstrated how such adjustments will produce a composite UR model that
satisfies Properties (i)–(iii). The implications of incorrect adjustment for a
time-varying confounder in a UR model are similar to those of incorrect adjustment for
a time-invariant confounder m, which were previously outlined
in Section 5.4. Without adjustment for any of when constructing each UR term for the exposure
, the coefficients of (i.e. , for and ) and the UR term will absorb the effect of each omitted
variable on x; this will result in the coefficient
estimated for each in the composite UR model being unequal to the total effect of
x in its corresponding standard regression
model.The requirement of orthogonal covariates within the composite UR model also sheds
light on the necessity for generating UR terms for measurements of a time-varying confounder, if present. We
might easily imagine a scenario in which we considered only the original
covariates in the UR model. In such a scenario, the terms would remain
correlated with each other and with x1; therefore,
the inclusion of subsequent m terms in the UR model would
necessarily change the coefficient estimates for x1
and all other covariates.
7 UR model interpretation
Having demonstrated that confounder adjustment within UR models is possible, we
consider the claim[9] that UR models offer additional insight via the coefficients for each UR term
(e.g. in equation (7), for ) into the effect of x increasing
more than expected upon y.Consider again the simple example with two longitudinal measurements of a continuous
exposure x (i.e. x1 and
x2), outcome y, and no additional
confounders (i.e. Figure
1(a), with ); the standard regression model (with
x2 as the specified exposure variable) and
‘equivalent’ UR model are given below, respectively:It has been shown (§4.3) that and are equal, yet is interpreted as the total effect of a one-unit increase in
x2 on y, whereas is (supposedly) interpreted as the total effect of a one-unit
higher than expected increase in x2
on y. If these two variables truly are distinct, their regression
coefficients should likewise be distinct. This issue has also been addressed by Tu
and Gilthorpe,[11] who have argued that the two coefficients are equivalent because adjustment
for x1 in amounts to testing the relation between y and the
part of x2
unexplained by
x1 (i.e. the unexplained residual). In fact, the two
coefficients are equal simply because they mean the same thing. The UR model does
not, therefore, offer any additional insight into the effect of higher than expected
change in x on the outcome.[15]We also raise a more philosophical point, which speaks to the need for any model to
reflect accurately the underlying data-generation process of a given scenario. As an
artefact of OLS regression, the UR terms will always be mathematically independent
of the value of the initial measurement of the exposure and all subsequent
measurements. This is unlikely to be an accurate representation of real-world
exposure variables. Many of these, such as body size, exhibit a consistent,
cumulative presence that is only manifest at the discrete time points at which it is
measured; these measurements are thus distinct only as a result of the
discretisation of time within the measurement processes adopted. Moreover, in
auxological studies, the phenomenon of so-called compensatory (or ‘catch up’) growth
has been well documented, with accelerated growth being observed in individuals who
begin with a low value of some measure, e.g. birthweight.[45,46] Therefore, however convenient
and mathematically sound it may be to model data in a way that implies complete
statistical independence amongst an exposure variable’s initial value and its
subsequent measurements, this assumption is likely to be implausible and unrealistic
for most biological and social variables of interest to epidemiologists. This is a
weakness shared by all conditional approaches (of which UR models are one), which
has led several authors[47] to recommend that the results be considered alongside those produced by other
methods, rather than in isolation.
8 Standard error reduction
Finally, we address an important consequence of the use of UR models; namely, that
they underestimate the standard errors (SEs) of estimated coefficients, thereby
resulting in artificial precision of estimated effect sizes. Although focus on
statistical significance by way of p-values
and confidence intervals is not in and of itself justifiable within a causal
framework (as focus is effect size and likely functional
significance, e.g. the absolute risk posed or the potential for substantive
intervention), we consider it an important issue to address as a matter of clarity
for researchers seeking to use UR models.To demonstrate, we have simulated 1000 non-overlapping random samples of 1000
observations from a multivariate normal distribution based upon the DAG in Figure 1(a) with
, using the ‘dagitty’ package (v. 0.2–2)[4,48] in R (v. 3.3.2).[49] Each sample was used to create: (1) the two standard regression models
necessary for estimating the total causal effect of each of on y (equation (5)); (2) the UR term , derived by regressing x2 on
x1 (equation (6)); and (3) the composite UR model
in which y is regressed on x1 and
(equation (7)). For each standard regression
model (for ), the reported SE of the regression coefficient for exposure
x is stored. For each composite UR model
, the SE of the regression coefficient for each of is stored in two forms: (1) as reported in the UR model summary
output; and (2) as estimated by bootstrapping 1000 samples and calculating the
standard deviation of the distribution of estimated coefficients. Additional details
relating to this simulation – including parameters and code – are located in online
supplementary Appendix 4. (Note: The specific correlation structure and parameter
values used to simulate the data are unimportant for the purposes of this
demonstration).By definition, the SE of an estimated regression coefficient is a point estimate of
the standard deviation of an (infinitely) large sampling distribution of estimated
regression coefficients. We have shown that standard regression and UR models elicit
identical point estimates of the total causal effects of each measure of the
longitudinal exposure (§4); from this, it follows that the associated SEs should
themselves be equal.Violin plots of the SEs estimated for each coefficient representing a total causal
effect across the 1000 simulations are displayed in Figure 4 for each method considered. As is
evident, the reported SEs within the UR models are reduced in
comparison to those within the first standard regression models (for designated
exposure x1) and equal to those within the final
standard regression models (for designated exposure x2).
This demonstrates an apparent paradox: the coefficient values are equivalent, yet
the associated SEs are unequal.
Figure 4.
Violin plots comparing the standard errors associated with equivalent
coefficients estimated in standard regression vs. UR models, for data
simulated based upon the scenario depicted in Figure 1(a) (with
k = 2). Horizontal bars within each distribution
represent the mean ± 1 standard deviation.
Violin plots comparing the standard errors associated with equivalent
coefficients estimated in standard regression vs. UR models, for data
simulated based upon the scenario depicted in Figure 1(a) (with
k = 2). Horizontal bars within each distribution
represent the mean ± 1 standard deviation.We argue that the apparent reduction in SEs achieved by using UR models is purely
artefactual and arises from the explicit conditioning on future measurements of
x within a UR model. In the standard regression analysis, the
only information within the data that is used to inform SE estimation lies in the
past (i.e. past measures of the exposure plus any confounders). In contrast, the UR
modelling process generates (orthogonal) residuals for the entire exposure period
and combines these into a single model, thereby using information within the data
that is from both the past and the future. If we possessed data
pertaining to any true independent causes of future measurements of the exposure,
such a method would indeed be valid; however, the UR terms are simply estimated
using prior measurements of the exposure. Moreover, due to the fact that they are
estimates, the UR terms themselves contain additional variation that is not
accommodated by traditional regression methods which assume covariates are measured
without error. Consequently, the SEs of estimated causal effect derived from UR
models are artefactually reduced and should not be inferred as robust. Indeed, when
the SEs within the UR models are estimated via bootstrapping, they are similar to
those within the standard regression models.Comparing the two plots in Figure
4 offers clarity to this argument: (a) displays
differing distributions of the reported SEs for the coefficient
estimates of x1 (where conditioning on the future
information given by x2 reduces the standard error in
the UR model); whereas (b) displays the same distribution of the
reported SEs for the coefficient estimates of x2 and
(where the standard regression model correctly exploits all prior
information given by x1, as does the UR model). Although
the magnitude of bias in estimated SEs is small in this simulated example, it will
always be present due to the way in which UR models are constructed. Quantifying the
magnitude of this bias is not trivial and is beyond the scope of the present
study.
9 Conclusion
The mathematical appraisal of UR models that we have undertaken confirms that the
method proposed by Keijzer-Veen et al.[9] is capable of accommodating more than two longitudinal measurements of an
exposure variable and demonstrates how adjustment for confounding variables should
be made in this framework to uphold the property that the coefficients for the terms
estimated within a UR model are equal to the total effects for
estimated by their respective standard regression models. This
result will only be guaranteed to hold when adjustment for all
confounding variables has been made at both stages in the UR modelling process (i.e.
when generating UR terms for subsequent measurements of the exposure and in the
composite UR model). From a statistical perspective, adjustment for all preceding
variables (including confounders) ensures orthogonality amongst the covariates in a
composite UR model. Therefore, when the potential confounder is time-varying, it is
also necessary to generate UR terms for subsequent measurements of the confounder
itself and include these in the final composite models used.As our proofs only consider one confounding variable, the causal framework provided
by DAGs should aid future researchers who wish to extend robustly UR models to
situations involving multiple, possibly causally linked, time-invariant and
time-varying confounders. Such a DAG will be useful in identifying confounders and
establishing the temporal ordering of variables, thereby ensuring that all preceding
variables are adjusted for when generating the necessary UR terms.Although UR models can accommodate multiple measurements of an exposure variable in
addition to confounding variables, we have concerns about their practical
implementation. Although only one UR model need ultimately be presented, the
necessity of generating orthogonal covariates for that UR model requires that many
models be created; this has the potential to be quite substantial when multiple
confounders are considered. For an exposure x measured at
k points in time, the standard regression approach necessitates
k separate models for estimating the total causal effect of
each measurement on the outcome regardless of the number of
confounders. In the case of one time-invariant confounder (§5),
k models are also created ( models to generate all UR terms and 1 composite UR model); for a
time-varying confounder (§6), models are created (i.e. models to generate all UR terms and 1 composite UR model). The
total number of models created by the UR process will always be either equal to or
greater than the total number of models created by the standard regression process.
If such a process offered real gains in insight into the scenario under
consideration, it may indeed be worth it; however, UR models offer no additional
insight compared to standard regression methods. Moreover, the inclusion of multiple
covariates that are explicitly conditional on one another within the same model also
results in artificially reduced standard error estimates, the extent of which has
yet to be fully evaluated; the issue can be avoided by bootstrapping, but such a
solution may be computationally intensive and require more programming skills than
those necessary for implementing the built-in regression functionalities in
statistical software packages. Previous research that has utilised UR models without
undertaking sufficient adjustment for confounders and correcting SEs via
bootstrapping should not be considered robust.We therefore have strong reservations about the use and implementation of UR models
within lifecourse epidemiology, and suggest that researchers considering using them
should instead rely on standard regression methods, which produce the same results
but are much less likely to be mis-specified and misleading. However, for
researchers wishing to use these models, the hypothesised DAG or causal diagram
should be presented so that any readers and/or reviewers can confirm that sufficient
adjustment for confounders has been undertaken; moreover, SEs should be estimated
via bootstrapping and not simply reported as in the model output, as these have the
potential to be misleading. We support the recommendation of previous authors[47] that additional analytical approaches should be considered alongside
conditional approaches (e.g. UR models) in order to achieve robust causal
conclusions. For example, multilevel, latent growth curve, and growth mixture models
may be used to estimate the effects of growth across the lifecourse on a distal
outcome, and are more flexible than standard regression methods.[5] Moreover, the three G-methods[50,51] are explicitly grounded in a
causal framework and allow for the simultaneous consideration of multiple
measurements of a longitudinally measured exposure, as well as time-varying
confounding; these methods provide exciting avenues of research for lifecourse
epidemiologists.Click here for additional data file.Supplemental material, Appendix for Adjustment for time-invariant and
time-varying confounders in ‘unexplained residuals’ models for longitudinal data
within a causal framework and associated challenges in Statistical Methods in
Medical Research
Authors: Sara Sammallahti; Riikka Pyhälä; Marius Lahti; Jari Lahti; Anu-Katriina Pesonen; Kati Heinonen; Petteri Hovi; Johan G Eriksson; Sonja Strang-Karlsson; Sture Andersson; Anna-Liisa Järvenpää; Eero Kajantie; Katri Räikkönen Journal: J Pediatr Date: 2014-09-26 Impact factor: 4.406
Authors: Juliana Kagura; Linda S Adair; Richard J Munthali; John M Pettifor; Shane A Norris Journal: Hypertension Date: 2016-09-26 Impact factor: 10.190
Authors: Andrew K Wills; Bjørn Heine Strand; Kari Glavin; Richard J Silverwood; Ragnhild Hovengen Journal: BMC Med Res Methodol Date: 2016-04-08 Impact factor: 4.615
Authors: Bianca L De Stavola; Dorothea Nitsch; Isabel dos Santos Silva; Valerie McCormack; Rebecca Hardy; Vera Mann; Tim J Cole; Susan Morton; David A Leon Journal: Am J Epidemiol Date: 2005-11-23 Impact factor: 4.897
Authors: Ana M B Menezes; Pedro C Hallal; Samuel C Dumith; Alicia M Matijasevich; Cora L P Araújo; John Yudkin; Clive Osmond; Fernando C Barros; Cesar G Victora Journal: J Epidemiol Community Health Date: 2011-02-15 Impact factor: 3.710
Authors: Belavendra Antonisamy; Senthil K Vasan; Finney S Geethanjali; Mahasampath Gowri; Y S Hepsy; Joseph Richard; P Raghupathy; Fredrik Karpe; Clive Osmond; Caroline H D Fall Journal: J Pediatr Date: 2016-11-04 Impact factor: 4.406
Authors: Sue Hudson; Kirsti Vik Hjerkind; Sarah Vinnicombe; Steve Allen; Cassia Trewin; Giske Ursin; Isabel Dos-Santos-Silva; Bianca L De Stavola Journal: Breast Cancer Res Date: 2018-12-29 Impact factor: 6.466