| Literature DB >> 34328638 |
Martijn van de Pol1, Lyanne Brouwer2,3.
Abstract
Biological processes exhibit complex temporal dependencies due to the sequential nature of allocation decisions in organisms' life cycles, feedback loops and two-way causality. Consequently, longitudinal data often contain cross-lags: the predictor variable depends on the response variable of the previous time step. Although statisticians have warned that regression models that ignore such covariate endogeneity in time series are likely to be inappropriate, this has received relatively little attention in biology. Furthermore, the resulting degree of estimation bias remains largely unexplored. We use a graphical model and numerical simulations to understand why and how regression models that ignore cross-lags can be biased, and how this bias depends on the length and number of time series. Ecological and evolutionary examples are provided to illustrate that cross-lags may be more common than is typically appreciated and that they occur in functionally different ways. We show that routinely used regression models that ignore cross-lags are asymptotically unbiased. However, this offers little relief, as for most realistically feasible lengths of time-series conventional methods are biased. Furthermore, collecting time series on multiple subjects-such as populations, groups or individuals-does not help to overcome this bias when the analysis focusses on within-subject patterns (often the pattern of interest). Simulations, a literature search and a real-world empirical example together suggest that approaches that ignore cross-lags are likely biased in the direction opposite to the sign of the cross-lag (e.g. towards detecting density dependence of vital rates and against detecting life-history trade-offs and benefits of group living). Next, we show that multivariate (e.g. structural equation) models can dynamically account for cross-lags, and simultaneously address additional bias induced by measurement error, but only if the analysis considers multiple time series. We provide guidance on how to identify a cross-lag and subsequently specify it in a multivariate model, which can be far from trivial. Our tutorials with data and R code of the worked examples provide step-by-step instructions on how to perform such analyses. Our study offers insights into situations in which cross-lags can bias analysis of ecological and evolutionary time series and suggests that adopting dynamical models can be important, as this directly affects our understanding of population regulation, the evolution of life histories and cooperation, and possibly many other topics. Determining how strong estimation bias due to ignoring covariate endogeneity has been in the ecological literature requires further study, also because it may interact with other sources of bias.Entities:
Keywords: zzm321990Malurus eleganszzm321990; covariate endogeneity; density dependence; group living; measurement error; structural equation model; time-series length; trade-off
Mesh:
Year: 2021 PMID: 34328638 PMCID: PMC9290935 DOI: 10.1111/1365-2656.13572
Source DB: PubMed Journal: J Anim Ecol ISSN: 0021-8790 Impact factor: 5.606
FIGURE 1Graphical model illustrating that time‐series data on X and Y are likely to be correlated when a cross‐lag occurs, even if X does not affect Y. (a) A negative cross‐lag between X and Y −1 means that if by chance Y is higher (lower) than average, then in the next time step both X +1 and Y +1 are predominantly expected to be lower (higher), respectively, due to the cross‐lag and regression to the mean. Consequently, a negative cross‐lag causes a positive correlation between and , which means that datapoints of (X, Y) are likely to move along the direction of the grey ellipse over time, also causing a directional pattern and correlation between X and Y. Green arrows originating from black points (X, Y) depict the most likely temporal trajectory to the observation in the next time step (, ), but other less‐likely trajectories are possible, as indicated by thinner blue arrows for the top right datapoint. For comparison, (b) shows an example without cross‐lag (X and Y being uncorrelated random variables) in which we get regression to the mean for both X and Y, and we see no directionality (directions of green arrows are diverse, meaning that and are uncorrelated). Comparing an example X,Y‐trajectory over (c) 5 and (d) 10 time steps illustrates how the directional orientation gradually disappears in longer time series, as chance effects cause the variation in X to increase over time, which dilutes the directionality caused by the cross‐lag (shallower ellipse in (d) than in (c)). (e) Cross‐sectional patterns (grey ellipse) of multiple time series have little directional orientation, despite each within‐subject pattern (red, purple and yellow ellipses) being directional. (f) However, the cross‐sectional pattern of a heterogeneous population (grey ellipse) often depends on the covariance among‐subjects rather than within‐subject patterns. See main text for additional explanation
FIGURE 3The sensitivity of estimation bias in parameter of interest b to (a) the strength of the cross‐lag, (b) the amount of among‐subject covariance and (c) the strength of the contemporaneous effect size b, when using static and dynamic regression models (see legend) for all three example scenarios (row panels i–iii). The vertical dotted lines show the default value of the parameters used in other simulations and figures. Note that values of parameters b and d are shown in standardized units obtained by multiplying them with varY/varX and varX/varY, respectively (except in panels a‐i & a‐ii, as the absolute value of d can be interpreted as the probability of natal philopatry/emigration). For panels (a) and (b), we calculated the relative bias, while for panel (c) we present the absolute bias (as relative bias does not exist for b = 0). All results are the mean across estimates on 1,000 simulated datasets of multiple (100) subjects and 10 time steps, see Figure D in Supplementary Material D for single time‐series datasets
FIGURE 2The (bias in) estimates of parameter of interest b (contemporaneous effect of X on Y) as a function of time‐series length determined by (a, b) static and (c, d) dynamical regression models applied to simulated cross‐lagged data with varying number of subjects (see legend). Panel rows reflect situations of (i) density dependence of vital rates, (ii) benefits of group living and (iii) trade‐offs. For each situation, we considered analyses of single time series (e.g. density dependence in a single population) as well as analyses of multiple time series (10, 100 or 1,000 subjects) in the presence of among‐subject covariance (see Boxes 1 and 2). Note that the x‐axes are logarithmic and that the left and right y‐axes show, respectively, the relative bias and absolute value of the estimate of b for each panel