Literature DB >> 31800576

Analysing trajectories of a longitudinal exposure: A causal perspective on common methods in lifecourse research.

Sarah C Gadd^1,2, Peter W G Tennant^1,3,4, Alison J Heppenstall^1,2,4, Jan R Boehnke⁵, Mark S Gilthorpe^1,3,4.

Abstract

Longitudinal data is commonly analysed to inform prevention policies for diseases that may develop throughout life. Commonly methods interpret the longitudinal data as a series of discrete measurements or as continuous patterns. Some of the latter methods condition on the outcome, aiming to capture 'average' patterns within outcome groups, while others capture individual-level pattern features before relating these to the outcome. Conditioning on the outcome may prevent meaningful interpretation. Repeated measurements of a longitudinal exposure (weight) and later outcome (glycated haemoglobin levels) were simulated to match three scenarios: one with no causal relationship between growth rate and glycated haemoglobin; two with a positive causal effect of growth rate on glycated haemoglobin. Two methods that condition on the outcome and one that did not were applied to the data in 1000 simulations. The interpretation of the two-step method matched the simulation in all causal scenarios, but that of the methods conditioning on the outcome did not. Methods that condition on the outcome do not accurately represent a causal relationship between a longitudinal pattern and outcome. Researchers considering longitudinal data should carefully determine if they wish to analyse longitudinal data as a series of discrete time points or by extracting pattern features.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Glycated Hemoglobin A

Year: 2019 PMID： 31800576 PMCID： PMC6892534 DOI： 10.1371/journal.pone.0225217

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Lifecourse data comprise longitudinal data (repeated measurements) that span some or all of life. Analyses of lifecourse data are popular for informing preventative policies to improve population health and wellbeing [1]. For example, temporal patterns of growth (recorded in repeated measures of weight) throughout childhood might be related to risk of type-2 diabetes by age 40 years to target preventative measures at those with certain ‘high risk patterns’. To do this effectively, results from analyses must truly reflect a relationship between patterns of growth and diabetes. This may not be the case for some commonly used lifecourse methods. Repeated measurements of a longitudinal exposure, such as weight throughout infancy, are usually correlated with each other, a phenomenon known as autocorrelation [2]. Therefore, they do not satisfy the requirement for independence of observations needed for many common statistical analyses [3]. Methods for capturing and analysing longitudinal exposures typically aim to describe how different patterns of the exposure (e.g. rate of adolescent weight growth) relate to the outcome. Alternatively, some methods aim to identify specific times or ‘critical periods' during which the (causal) effect of the exposure is especially strong, or estimate the cumulative effect over multiple exposure times [4, 5]. Generalised methods (g-methods), such as marginal structural models, and methods that explicitly examine “lifecourse hypotheses” offer the most obvious solution to achieving these objectives, given their theoretical foundation within an explicit causal framework [4, 5]. G-methods are however very rarely utilised in applied research, perhaps due to perceived complexity [6]. Simpler and more common methods are less likely to incorporate causal thinking; focussing instead on estimating non-causal associations that are consequently less useful for informing policies and interventions [7, 8]. With the aid of simulations, this paper explains how lack of causal thinking in analyses of longitudinal exposures in relation with later-life outcomes can lead to interpretational biases. Methods that can lead to these biases are compared to an alternative approach that avoids them. This alternative method, however, is not suitable for all situations; other methods, such as g-methods, would be necessary in the presence of time-varying confounding, which is not examined in this paper.

Methods

Data were simulated to represent the illustrative example of weight measured yearly from birth until age 2 years (the exposure) and diabetes diagnosed at age 40 years (the outcome) from percentage glycated haemoglobin (HbA1c) [9]. This is analogous to routinely-collected health data or data from birth cohorts. Three illustrative scenarios with different causal structures were simulated matching the directed acyclic graphs in Fig 1. Each arrow specifies a direct causal relationship between variables. The absence of an arrow means there is no direct causal relationship, but there may still be a correlation. In Scenario A, birthweight causes HbA1c and there is no causal effect of growth rate on HbA1c. In Scenario B weight1 directly causes HbA1c and growth rate indirectly causes HbA1c through weight1. In Scenario C weight2 directly causes HbA1c and growth rate indirectly causes HbA1c through weight2. For ease of illustration, confounding (by e.g. genetics or in utero nutrition) was represented by a single unmeasured common cause of birthweight and growth (U).

Fig 1

Directed acyclic graph showing the structure of causal relationships between variables in simulated scenarios A, B and C. Growth represents the growth rate of an individual and is not simulated or measured in the scenario. U is an unknown and unmeasured variable. The age in years at which known variables are measured is shown in subscript. Arrows show the direction of causal relationships and numbers attached to these arrows show the correlations induced by them. 1000 datasets comprising 1000 observations were simulated using R 3.4.3; exceeding the number required to achieve >99% accuracy for the parameters of interest [10]. Simulation code is available in the S1 Appendix. Each directed acyclic graph was converted into a covariance matrix of the weight and HbA1c variables using the parameters in Table 1 and standardised path coefficients in Fig 1. Data were simulated with multivariate normal distributions.

Table 1

Parameters of latent variables and error terms used to simulate data in section 3.

Variable	Weight₀ (kg)	Weight₁ (kg)	Weight₂ (kg)	HbA1c (%)
Mean	4	8	12	5.8
Standard deviation	2	2	2	1

The path diagram used to generate observed variables from these is shown in Fig 1.

The path diagram used to generate observed variables from these is shown in Fig 1. Linear growth was simulated for ease of interpretation. HbA1c was dichotomised into a binary variable at the National Institute for Health and Care Excellence threshold for diagnosing type-2 diabetes (HbA1c > 6.5%) [11]. The mean and standard deviation (SD) of simulated weight values, along with the correlation of each weight measure with HbA1c, were averaged across all simulations with 2.5th and 97.5th centiles depicting empirical 95% confidence intervals (CIs). Data were analysed using three methods: Z-score plots, multilevel models (outcome as covariate), and multilevel models (two-step). Z-score plots are a simple, graphical approach that aims to identify exposure patterns that lead to an outcome [9, 12]. Weight at each age was standardised into z-scores using the time-specific sample mean and SD [11]. The mean z-scores in those who did and did not develop diabetes were then plotted at each age and connected. Z-score plots are often viewed and interpreted as ‘patterns’ of weight that ‘lead to’ the outcome [13]. The plots presented show mean values from all simulations, with 2.5th and 97.5th centiles depicting empirical 95% CIs. The multilevel model (outcome as covariate) analysis involved fitting multi-level models of weight over time, with covariates for age, diabetes status, and an age-diabetes interaction term, defined by the following equations (where i indexes observations and j individuals): Intercept and age coefficients were free to vary randomly between individuals. Age was centred at one year [14, 15]. A first-order autocorrelated error structure was specified to account for the effect of each weight measure on the subsequent. Multilevel models like this are typically interpreted from the coefficient of the interaction term; for example, a positive interaction between diabetes and time would be interpreted as meaning that an increased growth rate leads to diabetes. Coefficients for these interaction terms were recorded over the 1000 simulations to obtain a median and empirical 95% CIs. The multilevel model (two-step) approach involved fitting two models, defined by the following equations (where i indexes observations and j individuals): The first (Eq 2.1) was a multilevel model of weight by age, with a first order autocorrelated error structure, to estimate growth as depicted in each directed acyclic graph in Fig 1. The intercept and age coefficients were permitted to vary randomly across individuals and age was centred. The individual-level age coefficients were recorded, representing individuals’ growth rates. In the second step (Eq 2.2), a logistic regression model was fitted with diabetes as the outcome, the age coefficient (growth rate) as the exposure, and a birthweight covariate to condition for its confounding influence [16]. The exponentiated model coefficients represent the change in odds of developing diabetes for each increase of 0.1kg/year (selected due to the small growth rate). Coefficients greater than one suggest that higher growth rates lead to diabetes. Coefficient point estimates for the growth rate exposure were recorded to obtain a median and empirical 95% CI from the 2.5th and 97.5th centiles over the 1000 simulations. All multilevel models were fitted using R package ‘nlme’ [17]. Any errors from the multilevel model (outcome as covariate) and multilevel model (two-step), such as failure to converge, were recorded, and the estimates from these datasets were discarded.

Results

One dataset for each of scenarios A and B, and 20 datasets in scenario C were discarded due to models failing to converge. The mean and SD of weight at each time, averaged across all remaining simulations for each scenario are shown in Tables 2, 3 and 4, along with mean correlations of each weight measure with HbA1c. In scenario A, there was a large positive correlation at birth, decreasing to a small positive correlation at age 1, and a small negative correlation at age 2. In scenario B, there was a near-zero correlation at birth, increasing to a large positive correlation at age 1, and decreasing to a small positive correlation at age 2. In scenario C, there is a small negative correlation at birth, increasing to a small positive correlation at age 1, and a large positive correlation at age 2.

Table 2

Summary of simulated variables in scenario A.

	Weight₀ (kg)		Weight₁ (kg)		Weight₂ (kg)		HbA1c₄₀ (%)
	Mean	95%CI	Mean	95%CI	Mean	95%CI	Mean	95%CI
Mean	4.003	3.879, 4.129	8.003	7.889, 8.133	11.997	11.863, 12.117	5.800	5.737, 5.862
SD	2.000	1.910, 2.093	1.998	1.910, 2.090	2.000	1.911, 2.092	0.999	0.955, 1.043
Correlation with HbA1c	0.699	0.664, 0.729	0.029	-0.033, 0.091	-0.105	-0.167, -0.044	1

95%CI represents 95% empirical confidence intervals.

Table 3

Summary of simulated variables in scenario B.

	Weight₀ (kg)		Weight₁ (kg)		Weight₂ (kg)		HbA1c₄₀ (%)
	Mean	95%CI	Mean	95%CI	Mean	95%CI	Mean	95%CI
Mean	4.001	3.870, 4.124	7.999	7.876, 8.127	11.997	11.885, 12.118	5.801	5.741, 5.859
SD	2.000	1.912, 2.088	2.000	1.917, 2.084	1.999	1.915, 2.09	1.000	0.956, 1.044
Correlation with HbA1c	0.027	-0.034, 0.088	0.699	0.666, 0.731	0.229	0.169, 0.283	1

95%CI represents 95% empirical confidence intervals.

Table 4

Summary of simulated variables in scenario C.

	Weight₀ (kg)		Weight₁ (kg)		Weight₂ (kg)		HbA1c₄₀ (%)
	Mean	95%CI	Mean	95%CI	Mean	95%CI	Mean	95%CI
Mean	3.997	3.873, 4.113	8.003	7.882, 8.122	12.001	11.875, 12.122	5.801	5.738, 5.861
SD	2.001	1.919, 2.087	2.000	1.911, 2.087	2.003	1.915, 2.101	1.001	0.959, 1.047
Correlation with HbA1c	-0.106	-0.166, -0.043	0.229	0.167, 0.288	0.700	0.666, 0.731	1

95%CI represents 95% empirical confidence intervals.

95%CI represents 95% empirical confidence intervals. 95%CI represents 95% empirical confidence intervals. 95%CI represents 95% empirical confidence intervals. The z-score plots for each scenario are shown in Fig 2. In scenario A, the diabetic group has a much higher weight at birth and the points on the graph are far apart, and far from the overall mean (zero). The points converge over time until they meet, cross and begin to diverge between age 1 and 2 years. By age 2, the diabetic group have a lower mean weight z-score than the non-diabetic group. In scenario B, the points are close to the overall mean at birth, diverging substantially at age 1, before converging back towards the mean at age 2; the diabetic group always has a higher mean weight z-score than the non-diabetic group. In scenario C, the diabetic group starts with a slightly lower birthweight than the non-diabetic group, but the z-score increases over time, while the non-diabetic group decreases, leading to a large difference at age 2.

Fig 2

Z-score plots of weight from birth to age 2 years for scenarios A, B and C. Dotted lines show the group diagnosed with diabetes at age 40 and dashed those without a diagnosis. Error bars show empirical 95% confidence intervals. Results from the multilevel models (outcome as covariate) are in Table 5 and Fig 3, which show the model-fitted regression lines and true mean weight values for the diabetic and non-diabetic groups at each time point. The model values do not always fit well with the mean values (see especially scenario B in Fig 3B) because the models were constrained to linearity (because growth was simulated to be linear for simplicity), but the mean values in each outcome group change nonlinearly. In all scenarios, the coefficient of age is positive, confirming that weight increases from birth to age 2. For scenario A, the negative age-diabetes interaction term and shallower slope of increasing weight in the diabetes group (Fig 3A) suggests that those who developed diabetes grew slightly slower than those who did not develop diabetes. In scenario B, the small positive age-diabetes interaction term and slightly steeper slope (Fig 3B) suggests that those who developed diabetes grew slightly faster than those who did not develop diabetes. In scenario C, the large positive diabetes-age interaction term and steeper slope (Fig 3C) suggests that those who developed diabetes grew substantially faster than those who did not develop diabetes.

Table 5

Average parameter estimates from multilevel models of weight (outcome as covariate).

Parameter	Scenario A		Scenario B		Scenario C
	Mean	95% CI	Mean	95% CI	Mean	95% CI
Diabetes	0.729	0.566, 0.900	1.070	0.899, 1.25	0.939	0.769, 1.111
Age	4.327	4.223, 4.430	3.914	3.807, 4.021	3.670	3.563, 3.767
Diabetes*Age	-1.372	-1.572, -1.166	0.340	0.133, 0.573	1.375	1.166, 1.573
Intercept	7.823	7.737, 7.911	7.740	7.655, 7.821	7.770	7.690, 7.862
Intercept variance	0.481	0.348, 0.601	0.503	0.127, 0.816	0.099	0.000, 0.255
Age Variance	0.680	0.547, 0.799	0.897	0.714, 1.08	0.567	0.000, 0.711
Residual Variance	1.770	1.710, 1.836	1.727	1.54, 1.849	1.840	1.778, 1.908
Constant-Age Covariance	0.973	0.894, 0.988	0.268	0.02, 0.845	-0.729	-0.957, 0.579
Autocorrelation parameter	0.120	0.069, 0.169	0.042	-0.118, 0.134	0.160	0.119, 0.196

Fig 3

Fitted weight values from multilevel models (outcome as covariate) and average mean weight values for scenarios A, B and C. Dotted lines (fitted values) and circular points (average mean weight values) represent fitted values for the group with a diabetes diagnosis at age 40. Dashed lines (fitted values) and triangular points (average mean weight values) represent those without a diagnosis. The grey ribbon represents an empirical 95% confidence band around the fitted values. Results from the multilevel models (two-step) are shown in Table 6. In scenario A, the odds ratio for growth rate was 1.000 (95% empirical CI: 0.943, 1.057), suggesting that the odds of diabetes were unaffected by growth rate. In scenario B, the odds ratio was 1.194 (95% empirical CI: 1.122, 1.316), suggesting that the odds of diabetes increased modestly with increasing growth rate. In scenario C, the odds ratio was 1.679 (95% empirical CI: 1.477, 2.191), suggesting the odds of diabetes increased substantially with increasing growth rate.

Table 6

Average parameter estimates from the logistic regression model of diabetes status on weight growth rate.

Parameter	Scenario A		Scenario B		Scenario C
	Mean	95% CI	Mean	95% CI	Mean	95% CI
Growth rate	1.000	0.943, 1.057	1.194	1.122, 1.316	1.679	1.477, 2.191
Weight₀	2.000	2.060, 2.745	1.000	1.149, 1.429	2.000	1.339, 2.265
Constant	6.030x10^-03	3.654x10^-04, 9.185x10^-02	9.309x10^-05	1.659x10^-06, 1.291x10^-03	2.221x10^-11	2.552x10^-16, 7.591x10^-09

Growth rate was estimated using a multilevel model of weight over age (agnostic to the outcome, diabetes status)

Discussion

In Scenario A, we simulated no causal effect of growth rate on the risk of developing diabetes; Birthweight causes HbA1c, but any pattern of growth thereafter is irrelevant. Neither the z-score plot nor the multilevel model (with the outcome as a covariate) reflect this and would be erroneously 'interpreted' as showing that slower growth leads to diabetes. Conversely, the coefficient from the two-step multilevel model correctly implies no effect of growth rate on diabetes risk. In scenario B, we simulated that weight1 caused diabetes, which could be interpreted as growth causing HbA1c through weight1. The z-score plot however suggests that faster growth up to age 1 and slower growth thereafter leads to diabetes. This does not reflect the causal relationship simulated, where higher growth only increased the risk of diabetes by increasing weight at age 1. Both the multilevel model (with the outcome as a covariate) and the two-step multilevel model reflect this more closely, suggesting that higher growth rates caused diabetes. In scenario C, we simulated that weight2 caused diabetes, which could again be interpreted as growth causing HbA1c through weight2 (and indirectly through weight1). Here, the results from all three methods correctly suggest that higher growth rate cause diabetes. The z-score plots (and common interpretation thereof) only reflected the simulated truth in one of the three scenarios, revealing this is not a reliable approach for examining the causal effect of a longitudinal exposure on a distal outcome. This is because average weight z-scores at each time point are explicitly calculated and presented within groups of the outcome. By inappropriately conditioning on the outcome in an attempt to examine ‘average patterns’ of weight associated with diabetes, the method actually examines cross-sectional associations between weight and the outcome at each time point. This problem remains even if only one group (e.g. those with diabetes) is considered. The value of each mean z-score (e.g. weight) has no obvious causal meaning; instead, it reflects the size of the cross-sectional correlation between the exposure and the outcome at each time point. Because the standardisation process fixes the scale, time points with the strongest cross-sectional correlations will always appear most different, and those with the weakest correlations will always appear most similar. For example, in Scenario A, there was a strong positive correlation between birthweight and diabetes due to the causal effect of birthweight, and weaker correlations at ages 1 and 2 years as the contribution of birthweight to weight decreased. This is reflected by the z-score plot in Fig 2; the mean weight z-scores values are farthest apart at birth and coverage over time. In scenarios B and C, the strongest correlations are at ages 1 and 2 years respectively; the corresponding z-scores plots are likewise farthest apart at these time points. The absolute value of each time point z-score should not therefore be joined or compared to the z-score values at other time points because the ‘patterns’ that appear have no causal meaning and do not represent individual growth trajectories. Inappropriate conditioning on the outcome also affects multilevel models where the outcome is included as a model covariate. The consequences are not identical to the z-score plot because the scale has not been fixed by standardisation and the correlation pattern is assumed to follow a specific parametric shape. In our example, the linearity constraint introduced misfit between the modelled regression lines and the mean weight values (Fig 3). In Scenario B this meant the model failed to highlight that the largest cross-sectional correlation between weight and diabetes occurred at age 1 year, and this explains the difference in interpretation with the corresponding z-score plot. In scenarios A and C, models were similar enough to the average weight values to provide similar interpretations. Had we simulated nonlinear growth, however, the linearity constraint would likely have introduced further differences in interpretation compared with the z-score plot. The multilevel model (two-step) approach is more robust than the other approaches because it does not involve conditioning on the outcome. Instead, exposure patterns are modelled and only in the second step are these related to the outcome. This approach genuinely treats the exposure as a longitudinal variable and should therefore be strongly favoured over approaches that condition on the outcome whenever there is an interest in the causal interpretation of a longitudinal exposure pattern. This method is not, however, without limitations. First, because the second step of the multilevel model (two step) approach treats the unobserved growth rate estimates as fully observed, it underestimates the standard errors (and confidence intervals), even when attempts are made to address this [18]. Alternative latent variable methods, like latent growth curve models, growth mixture models, and autoregressive latent trajectory models, which retain the latent, or unobserved, nature of the pattern features avoid this problem. Second, two-step multilevel models and their constructed latent variable alternatives can still present some interpretational challenges from a causal inference perspective. By summarising the effect of multiple measurements that span a period into one or more average feature(s), such as growth rate, the causal contributions of each individual measurement occasion is lost, as too are any corresponding 'critical' period effects [19]. This places such methods in contrast to G-methods, where the focus is explicitly on estimating the causal effect of the exposure as measured at each time point. Whether the underlying feature (e.g. growth) or the individual measure (e.g. weight at age 1) are the 'true' cause cannot be distinguished statistically because they are simply different ways of describing the same information. In scenarios B and C, for instance, both 'growth rate' and the individual measures of weight at ages 1 and 2 years respectively appear to cause diabetes. Changing either would therefore change the risk of diabetes, and neither can be described as more or less responsible. The choice of whether to analyse a longitudinal exposure as a series of discrete measures or a summary feature (e.g. growth rate) may therefore be down to philosophical and/or contextual preferences regarding the question(s) posed. That said, since many pattern features like 'growth rate' span several measurement intervals, they are susceptible to time-varying confounding by any variables that are simultaneously caused by earlier measures while causing later measures, i.e. so-called intermediate confounders. In such situations, there may be no alternative to g-methods, which are currently unique for their compatibility with intermediate confounding [6]. It is important to note that, for illustrative purposes, this paper presents a simplified scenario in which there are no competing events or loss to follow up, both of which would be present in reality. Any differential loss to follow up or occurrence of competing events would (further) bias the results from all three methods examined in this paper [20].

Recommendations

Methods that condition on the outcome are not appropriate for examining the causal relationship between patterns of a longitudinal exposure and a later outcome, as they only describe the cross-sectional correlations at each time point. The apparent ‘patterns’ that are observed have no causal interpretation and should not be interpreted as individual exposure trajectories that cause the outcome. Alternative analytical strategies should seek to describe features of the exposure agnostic to the outcome, whether explicitly in two separate models or implicitly using latent variable methods. Researchers should however carefully consider whether pattern features or discrete measures are more appropriate, useful and/or interpretable ways to capture a specific 'exposure' in a specific context. If interested in the effect of exposures at specific 'critical' points in time then alternative methods are recommended [4, 5]. If a pattern feature is truly of interest, researchers should think very carefully about which pattern feature(s) are of interest before analysis. In the absence of a single, distinct and clearly identifiable causal feature it is tempting to consider summarising the ‘average’ of exposure ‘trajectories’ for individuals with different outcomes by conditioning on the outcome, but this risks highly misleading results. A longitudinal exposure—or pattern thereof—that spans a long period may be conflated with intermediate confounding and thus fail to describe the true causal process of interest. Features that occur at specific time periods that have a tangible real-world meaning may be best suited to the methods recommended, such as two-step multilevel models.

Conclusion

This paper explains how longitudinal data analyses that inappropriately condition on the outcome may lead to biased inferences about how exposure patterns affect later outcomes. Methods such as z-score plots and multilevel models with the outcome as a covariate do not create causally meaningful exposure 'patterns' and, as our simulations show, can be highly misleading. In lifecourse research, or whenever interested in the causal relationship between a longitudinal exposure and later outcome, we recommend avoiding methods that inappropriately condition on the outcome in favour of methods that capture patterns a priori, although the potential influence of intermediate confounding should be carefully considered.

Code used to simulate and analyse data.

(DOCX) Click here for additional data file. 16 Oct 2019 PONE-D-19-21843 Analysing trajectories of a longitudinal exposure: a causal perspective on common methods in lifecourse research PLOS ONE Dear Ms Gadd, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. We would appreciate receiving your revised manuscript by Nov 30 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Prof Stephen Z Levine, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. In your Methods section, it is reported that "Simulation code is available as supplementary material"; please correct this statement, as we note that the code is not provided as a supplementary file, but is instead deposited in GitHub. Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This is an interesting paper, addressing a very important methodological issue in lifecourse epidemiology. I have some suggestions to improve the accessibility of this manuscript: 1. To make the comparison across different models/methodologies more explicit, I think it would be useful to spell out the model specifications in equations. Although the directed acyclic diagrams are useful for visualizing the causal relations between variables, equations help readers with the understanding of the statistical models used to estimate the causal relations. 2. Since DAGs are used to show the causal structure in the simulated data, I think it would be equally useful to use DAGs to show the implied causal structures within the statistical models (e.g. multilevel models etc). SO readers could compare the DAGs to apprehend why some models are more "correct" than others in different scenarios. 3. Although the causal relations between the variables in the simulated data were shown in DAGs, I feel it would useful to provide the R codes in the appendix; so readers could examine the details of the simulations. Reviewer #2: This is a clear, nicely written paper on an important topic. My comments are below: Major comment: 1. The methods that condition on the outcome didn't perform as badly as I expected. My understanding is that this is because of the restriction to settings with no time-varying confounder-treatment feedback as stated in the discussion. I think it would be helpful to introduce this restriction / assumption earlier in the paper -- perhaps in the introduction -- so that the results don't get over-generalized. Minor comment: 1. The simulated data creates a scenario in which no one is lost to follow-up or experiences a competing event. While this is a reasonable simplification to demonstrate the problems with the outcome-control methods, some readers may miss that there are further potential dangers associated with LTFU or competing events. A short review of these issues in the discussion would make the paper a bit more complete. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 28 Oct 2019 Reviewer #1 Questions 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes 3. Have the authors made all data underlying the findings in their manuscript fully available? Reviewer #1: No Author response: The data underlying findings in this manuscript was simulated using code made available on GitHub which we appreciate was not correctly indicated in the manuscript. Following on from this, we decided to include code as an appendix as GitHub hosting may not be permanent. Author change: Changed “as supplementary material.” to “in the S1 appendix.” in Methods [p5]; Added “Supporting Information Captions¶ S1 Appendix: Code used to simulate and analyse data.” at end of manuscript [p19]. Supplementary material has been submitted in the file “S1_Appendix.docx”. 4. Is the manuscript presented in an intelligible fashion and written in standard English? Reviewer #1: Yes Reviewer #1 Comments to the Author This is an interesting paper, addressing a very important methodological issue in lifecourse epidemiology. I have some suggestions to improve the accessibility of this manuscript: 1. To make the comparison across different models/methodologies more explicit, I think it would be useful to spell out the model specifications in equations. Although the directed acyclic diagrams are useful for visualizing the causal relations between variables, equations help readers with the understanding of the statistical models used to estimate the causal relations. Author response: Thank you for this suggestion, we agree that this will be a useful addition to the paper to clarify the models used. We note that no equations have been provided for the Z-score plots as these do not use models. Author change: Equation 1 added following the changed sentence: “The multilevel model (outcome as covariate) analysis involved fitting multi-level models of weight over time, with covariates for age, diabetes status, and an age-diabetes interaction term, defined by the following equations (where i indexes observations and j individuals):“ in Methods [p6]; Equation 2 added following the altered sentence: “The multilevel model (two-step) approach involved fitting two models, defined by the following equations (where i indexes observations and j individuals):” in Methods [p6-7]. References to Equation 2.1 and 2.2 added in Methods [p7]: “The first (Equation 2.1) was a multilevel…” and “In the second step (Equation 2.2), a logistic…” 2. Since DAGs are used to show the causal structure in the simulated data, I think it would be equally useful to use DAGs to show the implied causal structures within the statistical models (e.g. multilevel models etc). SO readers could compare the DAGs to apprehend why some models are more "correct" than others in different scenarios. Author response: Thanks for this useful suggestion. While we agree that visual representations of models are often useful for explaining them, in this case, we have chosen not to include DAGs to represent the implied causal structures of the models for the following reasons: 1. A single model specification could correctly estimate a causal effect for a multitude of different DAGs. That is, a DAG (given a chosen exposure and outcome) implies one (or several) appropriate models, but a given model may also be appropriate for many DAGs. For example, the four DAGs shown in the Response to Reviewers document all imply the model D~A to find the total causal effect of A on D, but the model implies any of these DAGs (and many more with these four variables). We thought to include many implied DAGs for each model could potentially lead to greater confusion than clarity and would certainly warrant a more detailed explanation that might only detract from the main message of the paper. 2. Some features of the models are not commonly understood as a feature of DAGs. In particular, this relates to the multilevel model with an outcome as covariate which contains an interaction term, indicating “effect modification” between the time and diabetes variables. While effect modification in DAGs has been discussed (https://www.ncbi.nlm.nih.gov/pubmed/17700242) it is not commonly implemented and so would not be a familiar component of a DAG to most readers. In the presence of the newly included model equations to clarify their specification (as per comment 1) we felt that the inclusion of implied DAGs may create more confusion than the clarity it would seek to bring. 3. Although the causal relations between the variables in the simulated data were shown in DAGs, I feel it would useful to provide the R codes in the appendix; so readers could examine the details of the simulations. Author response: Thank you for highlighting this issue – we have made appropriate changes to include the code as supplementary material as discussed in our response to Reviewer #1 Questions #3. Reviewer #2 Questions 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #2: Yes 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes 3. Have the authors made all data underlying the findings in their manuscript fully available? Reviewer #2: Yes 4. Is the manuscript presented in an intelligible fashion and written in standard English? Reviewer #2: Yes Reviewer #2 Comments to the Author This is a clear, nicely written paper on an important topic. My comments are below: Major comment: 1. The methods that condition on the outcome didn't perform as badly as I expected. My understanding is that this is because of the restriction to settings with no time-varying confounder-treatment feedback as stated in the discussion. I think it would be helpful to introduce this restriction / assumption earlier in the paper -- perhaps in the introduction -- so that the results don't get over-generalized. Author response: Thank you for this useful comment, we agree that it is important to introduce this earlier in the paper for the reasons stated and we have made changes to do so. Author changes: In the Introduction section [p4] the sentence “Approaches are suggested to avoid these biases” has been changed to “Such methods are compared to an alternative approach that avoids these biases. This method, however, is not suitable for all situations; other methods, such as g-methods, would be necessary in the presence of time-varying confounding, which is not examined in this paper.” Minor comment: 1. The simulated data creates a scenario in which no one is lost to follow-up or experiences a competing event. While this is a reasonable simplification to demonstrate the problems with the outcome-control methods, some readers may miss that there are further potential dangers associated with LTFU or competing events. A short review of these issues in the discussion would make the paper a bit more complete. Author response: This is another important restriction that should be highlighted and we are grateful for pointing it out. Author changes: We have added the following paragraph to the discussion section [p15]: “It is important to note that, for illustrative purposes, this paper presents a simplified scenario in which there are no competing events or loss to follow up, both of which would be present in reality. Any differential loss to follow up or occurrence of competing events would (further) bias the results from all three methods examined in this paper [20].” Reference number 20 has been added to the reference list [p19]. Submitted filename: Response to reviewer comments.docx Click here for additional data file. 31 Oct 2019 Analysing trajectories of a longitudinal exposure: a causal perspective on common methods in lifecourse research PONE-D-19-21843R1 Dear Dr. Gadd, We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements. Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication. Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. With kind regards, Stephen Z Levine, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 7 Nov 2019 PONE-D-19-21843R1 Analysing trajectories of a longitudinal exposure: a causal perspective on common methods in lifecourse research Dear Dr. Gadd: I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. For any other questions or concerns, please email plosone@plos.org. Thank you for submitting your work to PLOS ONE. With kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Stephen Z Levine Academic Editor PLOS ONE

14 in total

1. A life course approach to chronic disease epidemiology: conceptual models, empirical challenges and interdisciplinary perspectives.

Authors: Yoav Ben-Shlomo; Diana Kuh
Journal: Int J Epidemiol Date: 2002-04 Impact factor: 7.196

2. A critical evaluation of statistical approaches to examining the role of growth trajectories in the developmental origins of health and disease.

Authors: Yu-Kang Tu; Kate Tilling; Jonathan A C Sterne; Mark S Gilthorpe
Journal: Int J Epidemiol Date: 2013-09-14 Impact factor: 7.196

3. Public health policy, evidence, and causation: lessons from the studies on obesity.

Authors: Federica Russo
Journal: Med Health Care Philos Date: 2012-05

4. An introduction to g methods.

Authors: Ashley I Naimi; Stephen R Cole; Edward H Kennedy
Journal: Int J Epidemiol Date: 2017-04-01 Impact factor: 7.196

5. Selection Bias Due to Loss to Follow Up in Cohort Studies.

Authors: Chanelle J Howe; Stephen R Cole; Bryan Lau; Sonia Napravnik; Joseph J Eron
Journal: Epidemiology Date: 2016-01 Impact factor: 4.822

6. Trajectories of growth among children who have coronary events as adults.

Authors: David J P Barker; Clive Osmond; Tom J Forsén; Eero Kajantie; Johan G Eriksson
Journal: N Engl J Med Date: 2005-10-27 Impact factor: 91.245

Review 7. Causal inference in public health.

Authors: Thomas A Glass; Steven N Goodman; Miguel A Hernán; Jonathan M Samet
Journal: Annu Rev Public Health Date: 2013-01-07 Impact factor: 21.981

8. The design of simulation studies in medical statistics.

Authors: Andrea Burton; Douglas G Altman; Patrick Royston; Roger L Holder
Journal: Stat Med Date: 2006-12-30 Impact factor: 2.373

9. Model Selection of the Effect of Binary Exposures over the Life Course.

Authors: Andrew D A C Smith; Jon Heron; Gita Mishra; Mark S Gilthorpe; Yoav Ben-Shlomo; Kate Tilling
Journal: Epidemiology Date: 2015-09 Impact factor: 4.822

10. Joint modelling compared with two stage methods for analysing longitudinal data and prospective outcomes: A simulation study of childhood growth and BP.

Authors: A Sayers; J Heron; Adac Smith; C Macdonald-Wallis; M S Gilthorpe; F Steele; K Tilling
Journal: Stat Methods Med Res Date: 2016-07-11 Impact factor: 3.021

3 in total

1. Latent class regression improves the predictive acuity and clinical utility of survival prognostication amongst chronic heart failure patients.

Authors: John L Mbotwa; Marc de Kamps; Paul D Baxter; George T H Ellison; Mark S Gilthorpe
Journal: PLoS One Date: 2021-05-07 Impact factor: 3.240

2. Trajectory of body mass index and height changes from childhood to adolescence: a nationwide birth cohort in Japan.

Authors: Naomi Matsumoto; Toshihide Kubo; Kazue Nakamura; Toshiharu Mitsuhashi; Akihito Takeuchi; Hirokazu Tsukahara; Takashi Yorifuji
Journal: Sci Rep Date: 2021-11-26 Impact factor: 4.379

3. Analyses of 'change scores' do not estimate causal effects in observational data.

Authors: Peter W G Tennant; Kellyn F Arnold; George T H Ellison; Mark S Gilthorpe
Journal: Int J Epidemiol Date: 2022-10-13 Impact factor: 9.685

3 in total