Literature DB >> 31560683

Modeling reciprocal effects in medical research: Critical discussion on the current practices and potential alternative models.

Satoshi Usami1, Naoya Todo2, Kou Murayama3,4.   

Abstract

Longitudinal designs provide a strong inferential basis for uncovering reciprocal effects or causality between variables. For this analytic purpose, a cross-lagged panel model (CLPM) has been widely used in medical research, but the use of the CLPM has recently been criticized in methodological literature because parameter estimates in the CLPM conflate between-person and within-person processes. The aim of this study is to present some alternative models of the CLPM that can be used to examine reciprocal effects, and to illustrate potential consequences of ignoring the issue. A literature search, case studies, and simulation studies are used for this purpose. We examined more than 300 medical papers published since 2009 that applied cross-lagged longitudinal models, finding that in all studies only a single model (typically the CLPM) was performed and potential alternative models were not considered to test reciprocal effects. In 49% of the studies, only two time points were used, which makes it impossible to test alternative models. Case studies and simulation studies showed that the CLPM and alternative models often produce different (or even inconsistent) parameter estimates for reciprocal effects, suggesting that research that relies only on the CLPM may draw erroneous conclusions about the presence, predominance, and sign of reciprocal effects. Simulation studies also showed that alternative models are sometimes susceptible to improper solutions, even when reseachers do not misspecify the model.

Entities:  

Mesh:

Year:  2019        PMID: 31560683      PMCID: PMC6764673          DOI: 10.1371/journal.pone.0209133

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Collecting longitudinal data has become widely popular in medical research and other disciplines due to its statistical advantages over cross-sectional data. One of the biggest advantages of using a longitudinal design is that it can provide richer information for statistical inference aimed at uncovering reciprocal effects or causality between variables to answer questions such as how change (or growth, development) in one variable affects that of the other. More than 30 years ago, Nesselroade and Baltes [1] reviewed the benefits and drawbacks of using longitudinal data in psychology, noting that revealing causes (determinants) of intra-individual change is one of the major strengths of longitudinal data. Likewise, in the econometrics literature, Hsiao [2] argued that panel (i.e., longitudinal) data is effective for inferring dynamic relations between variables. One of the most common methods for addressing reciprocal effects in medical research is use of a cross-lagged panel model (CLPM; Duncan [3]; also known as a dynamic panel model, autoregressive cross-lagged model, cross-lagged path model, or cross-lagged regression model), especially after the CLPM was integrated into the framework of structural equation modeling (e.g., Finkel [4], Marsh and Yeung [5]). In these models, reciprocal effects are examined by testing the cross-lagged relations, which are the effect of variable X on variable Y after controlling for the previous effects of X. The CLPM is a simple and powerful model to test reciprocal effects, and thus it has been widely used. However, the application of the CLPM has also recently been criticized. Notably, Hamaker, Kuiper, and Grasman [6] criticized the use of the CLPM because the cross-lagged estimates in the CLPM conflate between-person and within-person processes, and so the results do not represent the actual within-person relations over time. Between-person relations are the covariation of two variables in terms of individual differences (e.g., individuals with higher X tend to have higher Y relative to individuals with lower X), whereas within-person relation are the covariation within one person of two variables across time points or situations. Obviously, these two types of relations are conceptually different. As such, the fact that estimates from traditional CLPM conflate between-person and within-person relations means that the cross-lagged estimates from the CLPM are conceptually difficult to interpret. Indeed, the importance of disaggregation to examine within-person processes has been widely acknowledged in the methodological literature (Curran & Bauer [7]; Hamaker [8]; Hoffman & Stawski [9]). Relying on the CLPM may draw erroneous conclusions regarding the presence, predominance, and sign of reciprocal effects as well as about causality. Therefore, the CLPM can be a possible option when within-person variances are negligible, or when researchers are not interested in uncovering within-person relations because predicting/forecasting outcomes is the main analytic purpose. To address this inherent problem with the CLPM, Hamaker et al [6] proposed a random-intercepts CLPM (RI-CLPM) as a possible analytic option. As discussed later, in the RI-CLPM, individual differences are effectively controlled by the inclusion of a latent variable that represents a time-invariant (but person-variant) trait-like factor; this allows testing the reciprocal effects within individuals. If this model is extended to include measurement errors, the model is equivalent to a so-called (bivariate) stable trait autoregressive trait and state (STARTS) model (Kenny & Zautra [10, 11]). Usami, Murayama, and Hamaker [12] discussed the mathematical and conceptual relations between various cross-lagged models, including these models. These recent studies are insightful and informative, providing applied medical researchers a basis for thinking about how to test within-person reciprocal effects by longitudinal data. However, the arguments are limited mostly to mathematical and conceptual relations. As a result, we still know little about whether, when, and how the choice of different cross-lagged longitudinal models has substantive consequences for parameter estimates of (within-person) reciprocal effects in practice, leading researchers to draw different conclusions from the same data in medical sciences. The aim of the current manuscript is to show the importance of considering these alternative models and the potential problems in current practices to infer reciprocal effects. This is approached through a literature search, case studies, and statistical simulations. In the literature search, we first investigate the current common practice of longitudinal research in the medical literature, showing that medical researchers rely heavily and almost exclusively on the traditional CLPM, and do not consider potential alternative models. Such reliance on the traditional CLPM makes it difficult to infer within-person reciprocal effects. Then, with case studies and statistical simulations, we illustrate the potential danger of this common practice (i.e., applying only the CLPM), showing it can result in mistaken conclusions about reciprocal effects. In the end, we also provide some practical guidelines, hoping to help applied medical researchers who work on longitudinal data in the future.

Cross-lagged longitudinal models

In this paper, we focus on three cross-lagged longitudinal models: the (traditional) CLPM, the RI-CLPM, and the STARTS model. Below, following Usami et al, [12] we describe these models by emphasizing the commonalities and differences among these cross-lagged models. Throughout the paper, we assume that researchers are interested in the reciprocal effect between two variables X and Y, although it is easy to expand the models in a way that include more than two variables (e.g., when examining mediating effects of variables is a main focus of the research).

CLPM

Let x and y be the measurements at time point t (1 … t … T) for individual i (1 … i … N). In the CLPM, x and y are first modeled as Here μ and μ are the temporal group means at time point t; and are temporal deviation terms from the temporal group means for individual i. With these equations, the trajectories of the temporal group mean are implicitly removed from the raw data. By definition, the deviations have a mean of zero. Then, x and y for t ≥ 2 are modeled as where β and β are autoregressive parameters and γ and γ are cross-lagged regression parameters at time point t. For these parameters, time-invariance can also be assumed (by using β and β, and γ and γ) if the cross-lagged relations are assumed to be stable over time. Note that with t = 1, the initial observations x and y are modeled as exogenous variables (i.e., their variances and covariance are assumed). From the view of Granger causality (Granger [13]), estimates of cross-lagged regression parameters (the longitudinal relation between Y and X after controlling for the baseline X) are key for inferring reciprocal effects between the variables. The residuals d and d are usually assumed to be normally distributed and correlated: Here, and are time-variant residual variances and ω is a time-variant residual covariance. As with previous parameters, time-invariant residual variances and covariances can also be assumed (by using , , and ω). A path diagram of the CLPM is provided in Fig 1a.
Fig 1

Path diagrams of cross-lagged models.

(a) the CLPM. (b) the RI-CLPM. (c) the STARTS model.

Path diagrams of cross-lagged models.

(a) the CLPM. (b) the RI-CLPM. (c) the STARTS model. Residuals and error covariances and variance and covariances between trait factors are all omitted for clarity of presentation. Variances and covariance of latent true scores at t = 1 (i.e., exogeneous variables) are also omitted for the same purpose. Means of trait factors are set to zero in the RI-CLPM and the STARTS model. In all three cross-lagged models, means are modeled through temporal group means (μ and μ).

RI-CLPM

In the RI-CLPM (Hamaker et al [6]), x and y are modeled as Again, μ and μ are the temporal group means. Notably, the model also includes I and I, which are the defining characteristic of the RI-CLPM. These are (time-invariant) trait factors that represent individual’s trait-like deviations from temporal group means. Trait factors I and I have means of 0 and variance–covariance matrix V. By accounting for trait factor scores, for each individual, and represent temporal deviations from the means of that individual because they are subtracted from the expected scores of individual i (i.e., μ + I and μ + I). Accordingly, in the RI-CLPM, the time series and can be considered as within-person fluctuation. Due to this statistical property in temporal deviations, at t = 1 the initial deviation terms ( and ) are assumed to be uncorrelated with the trait factors. Using these within-person deviation terms, in the RI-CLPM the cross-lagged relations are modeled as in the Eq 2 for t ≥ 2. A path diagram of the RI-CLPM is provided in Fig 1b. Because the RI-CLPM accounts for trait factors and then separates stable between-person differences (i.e., trait factors) from within-person fluctuations over time, cross-lagged relations in the RI-CLPM can be considered as the one pertaining to a process that takes place at the within-person level. Therefore, in the RI-CLPM, γ and γ can be interpreted as the quantity that express the extent to which the two variables influence each other within individuals. Because longitudinal data typically include both quantitative information of within-person changes and its individual differences, the CLPM, which does not account for trait factors (i.e., individual differences), fails to disaggregate these two components. As such, the CLPM provides inaccurate estimates for within-person reciprocal effects. Note that when substituting the cross-lagged relations of Eq 2 into Eq 4, the trait factors, which are separated from independent variables ( and ), can obviously be interpreted as random intercepts in the model. The model is named after this statistical fact. Obviously, the CLPM is a special case of the RI-CLPM, found by letting I = 0 and I = 0 (i.e., trait factors variances are zero). The RI-CLPM requires two or more variables to have been measured at three or more time points, while the CLPM requires only two time points.

STARTS model

By extending the RI-CLPM to include measurement error, we obtain the STARTS model (Kenny & Zautra [10, 11]). In the (bivariate) STARTS model, y and x are decomposed into latent true scores f and f and measurement errors ϵ and ϵ. That is, These measurement errors are usually assumed to be normally distributed and possibly correlated, that is, Here, and are measurement error variances, and ψ is an error covariance. If needed, time-invariant measurement error (co)variances can be assumed. As in the RI-CLPM, f and f are modeled as Here, and are the terms expressing temporal deviation from the expected scores of individual i, with accounting for measurement error. Substituting the Eq 7 into the Eq 5 provides the specification of the STARTS model: As in Eq 2, temporal deviation terms are modeled as A path diagram of the STARTS model is provided in Fig 1c. Although measurement errors are not assumed in the CLPM or RI-CLPM, the STARTS model and the RI-CLPM share a common critical feature—the inclusion of trait factors. As such, like the RI-CLPM, cross-lagged parameters (γ and γ) in the STARTS model reflect within-person reciprocal effects. The STARTS model requires two or more variables to have been measured at four or more time points. This means that we can compare RI-CLPM and the STARTS to determine which of these models fits better to the data so long as more than three waves are available. When observations may be influenced by measurement errors occurring for procedural reasons, accounting for measurement errors is desirable. However, the specification of measurement error when there is only one indicator variable (such as in the STARTS model) sometimes involves costs in terms of parameter estimation. Indeed, research has reported that the STARTS model often encounters estimation problems such as improper solutions and non-convergence. Conceptually, one primary reason is the fact that unlike trait factor variances (v2) and residual variances (), the contribution from measurement error variances () is temporal: in the model-implied variance-covariance matrix, appears at time point t only. Because of this, unstable estimates of some parameters (particularly autoregressive parameters) caused by some aspects of the research design (e.g., small sample size) can easily inflate the variances of the deviation terms (), increasing the risk of obtaining negative estimates of . Therefore, previous studies have also proposed models that incorporate multiple indicators (rather than a single indicator) to represent latent variables (see Cole et al [14]; Luhmann, Schimmack, & Eid [15]). In addition, research has also suggested the utility of a Bayesian approach to avoid unstable parameter estimation (Lüdtke, Robitzsch, & Wagner [16]).

Review of the literature

Method

To investigate recent trends in the use of cross-lagged longitudinal models in medical research, we conducted a literature search through the UTokyo REsource Explorer (TREE; http://tokyo.summon.serialssolutions.com/) web search engine in June of 2017. TREE aggregates information from many major databases (e.g., Web of Science, PubMed, PsycINFO, Engineering Village, ERIC, JSTOR) and electronic journals under contract with The University of Tokyo. TREE summarizes this collection of information in a single search window, allowing us to perform more comprehensive and efficient literature search than by using the individual databases separately. We first used the English keywords “cross lagged model” and “cross lagged relation”, searching English papers published since 2009 in medical journals. In addition, we limited our search to only peer-reviewed papers. Therefore, news items, book reviews, and doctoral dissertations were not considered. We found 323 medical papers by this method. Of these, we excluded 53 papers that did not apply any cross-lagged longitudinal models to actual data, leaving us with 270 papers. Most of the excluded papers were review papers, statistical simulations, or methodological and statistical discussion. Table 1 lists the papers retained for the investigation (authors, publication year, journal, the number of time points). Full references are available in Table A in S1 File.
Table 1

The list of 271 papers that applied cross-lagged models.

IDAuthorsYearJournalthe number of time point (T)
1Adachi & Willoughby2016Child development4
2Andrade2014Journal of Adolescence2
3Arnett et al.2012Journal of Abnormal Child Psychology4
4Arnett et al.2016Journal of Child Psychology and Psychiatry10
5Ayalon et al.2016Psychology and Aging3
6Baams et al.2015Archives of Sexual Behavior3
7Baesemer et al.2016Journal of Abnormal Child Psychology8
8Banerjee et al.2011Child Development3
9Baydar & Akcinar2018Journal of Abnormal Child Psychology5
10Beaujean et al.2013Social Psychiatry and Psychiatric Epidemiology2
11Bekkhus et al.2011Journal of Abnormal Child Psychology4
12Bennett et al.2015Journal of Child Psychology and Psychiatry3
13Bentley et al.2013Quality of Life Research4
14Best et al.2015Journal of the American Geriatrics Society2
15Birkeland et al.2016International Archives of Occupational and Environmental Health2
16Bohlmann et al.2015Child Development3
17Bolhuis et al.2014Psychological Medicine2
18Bolhuis et al.2017Journal of the American Academy of Child and Adolescent Psychiatry2
19Bondü et al.2016Journal of Adolescence2
20Bonvanie et al.2016Pain2
21Bourque et al.2016Journal of the American Academy of Child & Adolescent Psychiatry4
22Boyes et al.2014Journal of Abnormal Child Psychology2
23Boylan et al.2010Journal of the American Academy of Child & Adolescent Psychiatry3
24Breeman et al.2015Journal of Abnormal Child Psychology3
25Brière et al.2014Comprehensive Psychiatry3
26Brinke et al.2017Journal of Abnormal Child Psychology4
27Brown et al.2011Journal of Aging and Health2
28Burns et al.2016Annals of Behavioral Medicine5
29Calvete et al_12015Journal of Child and Family Studies2
30Calvete et al_22015Journal of Adolescence3
31Chang & Shaw2016Child Psychiatry and Human Development2
32Chen et al.2012Journal of Child Psychology and Psychiatry4
33Chen et al.2015PLoS ONE2
34Cheng et al.2016Child: Care, health and development2
35Chi et al.2014AIDS and Behavior3
36Choi et al.2012Tobacco Control10
37Christensen & Knardahl2012Pain2
38Conway et al.2017Child Psychiatry and Human Development2
39Cooley et al.2018Journal of Abnormal Child Psychology3
40Cowlishaw et al.2013Aging & Society2
41Crocetti et al.2016PLoS ONE6
42Crocetti et al.2017Child Development5
43Crosnoe et al.2012Journal of Health and Social Behavior2
44Dakanalis et al.2015European Child & Adolescent Psychiatry2
45Dakanalis et al.2016Journal of Clinical Psychology2
46Daniel et al.2014Journal of Adolescence3
47Daniel et al.2018Child Development3
48Danzo et al.2017Journal of Adolescence4
49Das & Sawin2016Archives of Sexual Behavior2
50De Laet et al.2014Child Development3
51de Leeuw et al.2011Pediatrics3
52de Wilde et al.2016Journal of Abnormal Child Psychology3
53Dempsy et al.2016Journal of Clinical Psychology in Medical Settings2
54Deschenes et al.2016Journal of Diabetes4
55Diamantopoulou et al.2011Europian Child & Adolscent Psychiatry5
56Ding et al_12014BMC Neuroscience2
57Ding et al_22014Behavioral and Brain Functions2
58Doane et al.2016Journal of Religion and Health3
59Eggers et al.2017AIDS and Behavior2
60Fabbri et al.2015Journals of Gerontology: Biological Sciences3
61Faller et al.2017Psycho-oncology2
62Fanti & Munoz Centifanti2014Child Psychiatry & Human Development2
63Fátima et al.2014Journal of Abnormal Child Psychology4
64Feldt et al.2016Scandinavian Journal of Work, Environment & Health5
65Fielder et al.2014Journal of Sex Research4
66Fletcher & Johnson2016Journal of Child and Family Studies2
67Flouri et al.2015Child: Care, health and development3
68Flouri et al.2016Journal of Abnormal Child Psychology3
69Flournoy et al.2016Child Development2
70Foti et al.2010American Journal of Psychiatry5
71Freedman et al.2015European Journal of Psychotraumatology2
72French et al.2014Child Development3
73Frijins et al.2010Journal of Adolescence4
74Fuller-Tyszkiewicz et al.2015Journal of Adolescence34
75Garbarski2014Journal of Health and Social Behavior9
76Garon-Carrier et al.2016Child Development3
77Gershoff et al.2012Child Development2
78Giard et al.2016Journal of Abnormal Child Psychology2
79Girard et al.2017European Child and Adolescent Psychiatry4
80Girard et al.2014PLoS ONE5
81Good et al.2017Psychiatry Research2
82Goodman et al.2014Infant Mental Health Journal3
83Greven et al.2012Journal of Child Psychology and Psychiatry2
84Greven et al.2011Journal of Abnormal Child Psychology2
85Gudmundsson et al.2015Acta Psychiatrica Scandinavica3
86Gutenbrunner et al.2018Journal of Abnormal Child Psychology3
87Hale et al.2011Journal of Child Psychology and Psychiatry3
88Hale III et al.2016European child and adolescent psychiatry6
89Hall et al.2015PLoS ONE3
90Hallett et al.2010American Journal of Psychiatry2
91Hamama-Raz et al.2015European Journal of Cancer Care2
92Hannigan et al.2017Journal of Child Psychology and Psychiatry2
93Hanson et al.2016PLoS ONE4
94Hanson et al.2017Scandinavian Journal of Work, Environment & Health4
95Harlaar et al.2011Child Development2
96Harris et al.2015Child Development5
97Harvey et al.2016Journal of Abnormal Psychology4
98Henchoz et al.2014Quality of Life Research2
99Hiemstra et al.2013PLoS ONE5
100Hietanen et al.2016Ageing & Society4
101Hill et al.2013Journal of Adolescence2
102Hinnant et al.2013Child Development3
103Hipwell et al.2011Journal of Child Psychology and Psychiatry9
104Holmes et al.2016Journal of Abnormal Child Psychology4
105Hopwood et al.2010Psychological Medicine6
106Houkes et al.2011BMC Public Health3
107Howarth et al.2016Child development4
108Huizink et al.2014Journal of Psychosomatic Obstetrics & Gynecology3
109Husby & Wichstrom2017Journal of Abnormal Child Psychology4
110Huyghebaert et al.2016International Journal of Stress Management2
111Ibrahim et al.2009Social Science & Medicine3
112In-Albon et al.2017Child Psychiatry and Human Development3
113Jackson & Cunningham2017Preventive Medicine5
114Jäggi et al.2016Journal of Adolescence4
115Jansen et al.2013Pediatrics4
116Kashdan et al.2014Archives of Sexual Behavior21
117Keijsers et al.2012Child Development3
118Keles et al.2017Journal of Abnormal Child Psychology3
119Kilian et al.2012Social Psychiatry and Psychiatric Epidemiology3
120Kim et al.2018Child Development3
121Kimonis et al.2015Journal of Abnormal Child Psychology3
122Kiviruusu et al.2016PLoS ONE4
123Klass et al.2017Psychological Medicine8
124Klimstra et al.2014Social Psychiatry and Psychiatric Epidemiology4
125Kochel et al.2012Child Development3
126Koen et al.2012Journal of Adolescent Health2
127Koleck et al.2017Quality of Life Research2
128Konttinen et al.2014International Journal of Obesity3
129Kuijpers et al.2015Journal of Child and Family Studies2
130Kuja-Halkola et al.2015Journal of Child Psychology and Psychiatry4
131Labhart et al.2017Behavioral Medicine2
132Lange et al.2017Child & Adolescent Mental Health5
133Lanz & Tagliabue2014Journal of Adolescence2
134Lavigne et al.2015Journal of Abnormal Child Psychology3
135Leadbeater & Jacqueline2015Journal of Abnormal Child Psychology7
136Leadbeater et al.2009Child Development4
137Lewis et al.2014European Child & Adolescent Psychiatry2
138Li & Zhang2015Social Science & Medicine3
139Liat et al.2009Child Development2
140Lifshitz-Vahav et al.2017Aging & Mental Health2
141Lindwall et al.2011Health Psychology2
142Liu et al.2016Journal of Health and Social Behavior2
143Loukas2009Journal of Abnormal Child Psychology2
144Lowe et al.2014Journal of Abnormal Psychology3
145Lucy et al.2013Journal of Adolescence2
146Luengo Kanacri et al.2017Child Development2
147Luo et al.2012Social Science & Medicine3
148Luyckx et al.2012Journal of Adolescent Health2
149Luyckx et al.2010Diabetes Care4
150Magee et al.2014Acta Pædiatrica3
151Mannering et al.2011Child Development2
152Marschall-Lévesque et al.2017Journal of Adolescent Health3
153Marshall et al.2014Child Development2
154Marsiglio et al.2014Journal of Child & Adolescent Trauma2
155Martinent & Nicolas2017International Journal of Stress Management2
156Martz et al.2016JAMA Psychiatry3
157Masquillier et al.2015AIDS and Behavior2
158Mauno et al.2011International Archives of Occupational and Environmental Health3
159McAdams et al.2014Journal of Adolescence3
160McAdams et al.2015Psychological Medicine3
161Meier et al.2015Family Practice2
162Micalizzi et al.2016Journal of Abnormal Child Psychology2
163Miller et al_12017Journal of Abnormal Child Psychology3
164Miller et al_22017Psychoneuroendocrinology3
165Mitchison et al.2015PLoS ONE5
166Moberg et al.2011Behavior Genetics2
167Mrug et al.2009Journal of Abnormal Child Psychology2
168Muratori et al.2016Comprehensive Psychiatry3
169Murphy et al.2017Journal of Clinical Psychology3
170Mustillo et al.2012Journal of Health and Social Behavior9
171Natsukai et al.2013Child Development2
172Neece et al.2012American Journal on Intellectual and Developmental Disabilities7
173Negriff et al.2015Journal of Adolescent Health3
174Newland et al.2015Journal of Child and Family Studies3
175Nielsen et al.2017International Archives of Occupational and Environmental Health2
176Nishiguchi et al.2016Psychiatry Research2
177Occhipinti et al.2015PLoS ONE6
178Olesen et al.2013BMC Psychiatry9
179Paek et al,2016Annals of Behavioral Medicine3
180Palosaari et al.2013Journal of Abnormal Psychology3
181Palosaari et al.2016Journal of Abnormal Child Psychology3
182Pastorelli et al.2016Journal of Child Psychology and Psychiatry2
183Patalay et al.2015Journal of Child Psychology and Psychiatry3
184Pearl et al.2014Journal of Child and Family Studies4
185Peter et al.2016Social Science & Medicine2
186Pettersson et al.2011BMC Public Health2
187Peyre et al.2016BMC Psychiatry2
188Pickard et al.2017Journal of the American Academy of Child and Adolescent Psychiatry3
189Poirier et al.2016European Child & Adolescent Psychiatry5
190Pössel & Black2014Journal of Clinical Psychology3
191Preckel et al.2013Journal of Adolescence3
192Priest et al.2017BMC Psychiatry2
193Rappe2009Journal of Abnormal Child Psychology2
194Rawal et al.2014Journal of Child Psychology and Psychiatry2
195Rhodes et al.2015Annals of Behavioral Medicine3
196Ribeiro et al.2011BMC Pediatrics2
197Richardson et al.2011Social Psychiatry and Psychiatric Epidemiology2
198Richie et al.2015Child Development5
199Richter et al.2015International Archives of Occupational and Environmental Health2
200Rivas-Drake et al.2017Child Development3
201Rommel et al.2015PLoS ONE3
202Ruttle et al.2015Psychoneuroendocrinology3
203Salihovic et al.2012Journal of Abnormal Child Psychology4
204Savage et al.2015Journal of the American Academy of Child & Adolescent Psychiatry4
205Senste et al.2017Journal of Abnormal Child Psychology3
206Seymour et al.2014Journal of Abnormal Child Psychology3
207Shaffer et al.2013Journal of Abnormal Child Psychology6
208Shields & Beaver2011Journal of Adolescent Health2
209Shimazu et al.2009Social Science & Medicine3
210Skalická et al.2015Child Development2
211Solberg et al.2016Psychological Medicine3
212Song et al.2012Helthcare Informativs Research3
213Spanos et al.2010Journal of Abnormal Psychology3
214Spilt et al.2014Child Development4
215Stavrakakis et al.2012Journal of Adolescent Health3
216Stinglhamber et al.2015PLoS ONE2
217Stratton et al.2014The Journal of Pain3
218Sturaro et al.2011Child Development4
219Sutin & Zonderman2012Psychological Medicine2
220Szabo et al.2014Journal of Crohn’s and Colitis4
221Tabri et al.2015Psychological Medicine104
222Tang et al.2009Ageing International2
223Taylor et al.2013Psychological Medicine2
224Taylor et al.2014Journal of Autism and Developmental Disorders2
225Telley et al.2015Journal of Health and Social Behavior3
226Teppers et al.2014Journal of Adolescence2
227Tiet et al.2010Journal of Child and Family Studies2
228Tiggelman et al.2015Quality of Life Research3
229Timmermans et al.2010Psychological Medicine7
230Ting-Lan & Bellmore2012Journal of Abnormal Child Psychology3
231Trucco et al.2014Journal of Child Psychology and Psychiatry2
232Tsai et al.2017Journal of Abnormal Child Psychology2
233Tseng et al.2015Journal of Abnormal Child Psychology3
234Tucker et al_12013Journal of Adolescent Health2
235Tucker et al_22013Journal of Adolescent Health4
236Usami et al.2015Multivariate Behavioral Research6
237Van Dorn et al.2017Psychological Medicine11
238van Dulmen et al.2012Journal of Adolescent Health3
239van Zalk & Tillfors2017Child and Adolescent Psychiatry and Mental Health3
240Vanhalst et al.2013Journal of Abnormal Child Psychology5
241Vaz et al.2014PLoS ONE2
242Vella et al.2017Medicine and Science in Sports and Exercise2
243Vitezova et al.2015Maturitas2
244von Salisch et al.2017Journal of Abnormal Child Psychology2
245von Stumm & Deary2013Psychology and Aging2
246Voss et al.2016European Journal of Ageing2
247Waller et al.2015Journal of Abnormal Child Psychology2
248Wang & Fredricks2014Child Development3
249Wang & Kenny_12014Journal of Abnormal Child Psychology3
250Wang & Kenny_22014Child Development2
251Wang et al.2012Child: Care, health and development2
252Webb et al.2016Journal of Adolescent Health5
253Weinstein et al.2017PeerJ2
254Welp et al.2016Critical Care3
255Whelan et al.2015Journal of Child Psychology and Psychiatry2
256Wichstrøm et al.2016Journal of Adolescence4
257Wickrama et al.2010Journal of Aging and Health3
258Williams et al.2011Child Development7
259Wolf et al.2016Psychological Medicine2
260Wolff2011Dyslexia3
261Wols et al.2015Journal of Adolescence2
262Wood et al.2012Child Development7
263Wouters et al.2016AIDS and Behavior2
264Yan & Dix2014Journal of Child Psychology and Psychiatry4
265Yu et al.2015Social Science & Medicine9
266Zahl et al.2017Pediatrics3
267Zavos et al.2012Behavior Genetics2
268Zhou et al.2014PLoS ONE3
269Zhou et al.2015Psychiatry Research3
270Zhu et al.2017Journal of Abnormal Child Psychology3
271van den Eijnden et al.2010Journal of Abnormal Child Psychology2

Result

Among 270 papers, 106 (= 39%) papers collected longitudinal data at two time points, 89 (= 33%) papers collected data with three waves, 36 (= 13%) with four waves, 16 (= 6%) papers with five waves, and 24 (= 9%) at more than five time points. The proportion for two time points (= 39%) is close to the one reported by Hamaker et al [6] (= 45%) in the field of psychology. With regard to the statistical analysis they performed, 257 papers (= 95%) used the CLPM to analyze longitudinal data, and one paper used a model similar to the RI-CLPM (see Telley et al, 2015 in Table 1; this model does not assume autoregressive parameters). Other papers applied different models, such as an autoregressive latent trajectory model (Poirier et al, 2016), a latent change score model (LCS; Baydar and Akcinar, 2018; Natsukai et al, 2013; Occhipinti et al, 2015; Usami et al, 2015), a model similar to the latent curve model with structured residuals (Baams et al, 2015; Mustillo et al, 2012; Williams et al, 2011), or a fixed-effects regression model (Baesemer et al, 2016; a model similar to the LCS). For the mathematical and conceptual relations between these models, see Usami et al. [12] Five papers used a multilevel-model framework (Arnett et al, 2016; Cooley et al, 2018; Daniel et al, 2018; Fuller-Tyszkiewicz et al, 2015; Kashdan et al, 2014) to account for individual differences in parameters of the cross-lagged model (see General discussion on this point). Note that no research applied the STARTS model, and few studies compared analysis results from different cross-lagged models (one exception is a methodological paper of Usami et al, 2015, which compared analysis results from the LCS model and the CLPM). These results indicate the heavy reliance on the traditional CLPM in the literature. It is also important to note that alternative cross-lagged longitudinal models (e.g., the RI-CLPM and the STARTS model) require at least three time points (with a stability assumption; the STARTS model requires at least four time points with an instability assumption) to fit the model (for the ALT model, we need four time points with a stability assumption). Unfortunately, almost 40% of the papers collected data with only two time points, indicating many applied medical research implicitly precludes the option of using these alternative models.

Case studies

To compare analysis results based on different cross-lagged longitudinal models, we focused on the 165 papers that collected longitudinal data with more than two time points. Among these papers, we randomly selected 50 papers and using the contact information provided in each of the paper we contacted the corresponding authors of the papers via email to request they share the dataset to help our research. In this contact, we emphasized that (1) our primary research purpose is simply to compare analysis results from different cross-lagged models, not to criticize their findings, (2) we would not provide any estimation results from the original paper or relevant information in the datasets to prevent identification of the source of the paper, (3) we would not share the dataset with any other researchers, and that (4) we did not need information about variables that are not relevant to cross-lagged analysis (e.g., personal information of participants). To increase response rates from authors, we contacted the authors after one month if we had not received a reply from the first contact. As a result, we received a total of 21 responses from the authors (response rate: 42%), and among them, five authors (from five different papers) granted us access to their datasets. We were unable to obtain permissions from the authors of the other 16 papers, mainly because sharing with us might have violated the data sharing policy of their sources. To summarize the procedure for case studies as well as literature review so far, a flow diagram is provided in Fig 2. Among the five datasets, two datasets were publicly available online without special permission from the authors, two datasets were provided directly by the authors, and one dataset was provided after a review of the data use agreement that we submitted. Note that one of the datasets provides us with the access only to the sample means and sample (co)variances information (rather than the raw data), which allowed us to estimate the parameters but not to fully account for missing data.
Fig 2

Flow diagram for literature review and case studies.

Among five datasets, two datasets have three time points and the others have more than three time points (mean of the number of time points is 6.0). The average sample size of these datasets is large (= 2, 741). In this paper, we do not give the exact number of participants and time points for each study to prevent the identification of the studies. While all five studies applied CLPM, some of them specified the model in slightly different ways. Specifically, two studies assumed second-order autoregressive and cross-lagged parameters as well as first-order parameters. Another study assumed a mediator between two variables. In addition, one study assumed time-invariant parameters (i.e., stability), while the other four studies did not. To ensure the comparability of the results between datasets, in the current analysis, we assume time-invariant parameters for autoregressive and cross-lagged coefficients (β and γ) and residual and error (co)variances (ω2 and ψ2). In addition, neither second-order parameters nor external variables (e.g., mediators) were included in any of the analyses. This setup also means that the results reported in the current paper are all different from those reported in the original papers. Note that one study collected multi-group data and applied the CLPM using multi-group analysis. For this dataset, we assumed group-invariant parameters for autoregressive and cross-lagged coefficients as well as residual and error (co)variances (i.e., measurement invariance between groups) while setting no constraints on the difference of temporal means between groups. All analyses were conducted using Mplus version 7.4 (Muthen & Muthen [17]). However, we found improper solutions (i.e., negative variance for trait factor or singular Hessian matrix was produced) and non-convergence in four of the five datasets when using maximum likelihood (ML) estimation to fit the RI-CLPM or the STARTS model. One potential reason is that large auto-regressive parameters might have adversely affected the risk of obtaining negative estimates of trait factor variances (as well as other variances) of these models. In such cases, we instead used Bayes estimation, based on a Markov chain Monte Carlo method under the assumption of non-informative priors. With Bayes estimation, we obtained parameter estimates successfully without any convergence problems. For more detailed discussion about ML and Bayes estimation in terms of estimation problems in applying the STARTS model, see Lüdtke, Robitzsch, & Wagner [16]. Table 2 provides (unstandardized) autoregressive/cross-lagged parameter estimates and standard errors for the CLPM, the RI-CLPM, and the STARTS model. Except for the cross-lagged parameter estimates in Research 2, all autoregressive/cross-lagged parameter estimates with the CLPM were statistically significant with two-sided α = .05. This can be partly attributed to the large sample sizes in these datasets, which increased the statistical power.
Table 2

Autoregressive/Cross-lagged parameter estimates and standard errors from the RI-CLPM or the STARTS model to those from the CLPM.

Research 1N>1000T = 3maximum likelihood (ML)Research 2N<1000T = 3BayesResearch 3N>1000T>3BayesResearch 4N>1000T>3BayesResearch 5N>1000T>3Bayes
CLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTS
Est.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SEEst.SE
βx1.050.010.740.04--0.670.050.320.09--0.700.010.400.021.690.040.660.010.200.010.890.030.600.010.240.010.760.02
γx0.380.130.530.22---0.680.48-0.900.86--0.030.010.020.01-0.080.02-0.060.000.000.01-0.010.020.020.000.010.000.010.00
βy0.460.020.140.03--0.520.030.480.05--0.550.010.290.020.860.030.660.010.150.010.910.030.990.000.930.011.020.00
γy0.010.000.010.00--0.000.000.000.01--0.180.020.110.030.300.05-0.220.01-0.110.02-0.100.040.080.010.080.010.090.02
MLAIC25774.0425706.86*
BIC25892.6525847.04*
RMSEA0.0910.079*
CFI0.9530.969*
SRMR0.0940.092*
BayesDIC17439.7417434.83*82280.8481476.4780266.74*71558.9970109.62*70160.40109428.88107221.91106354.77*
BIC17498.64*17508.5682422.0481632.9080681.61*71679.2870249.66*70323.92109635.95107448.26106602.48*

CLPM…cross-lagged panel model; RI-CLPM…random intercepts CLPM; STARTS…stable trait autoregressive trait and state; AIC…Akaike Information Criterion; BIC…Bayesian information criterion; RMSEA…root mean square error of approximation; CFI…comparative fit index; SRMR…standardized root mean square residual; DIC…deviance information criterion

Bold value indicates that associated parameters are statistically significant (p<.05); “-” indicates that estimates can not be obtained due to the limited number of time point in the dataset

* indicates that associated model was preferred based on the corresponding model fit index

CLPM…cross-lagged panel model; RI-CLPM…random intercepts CLPM; STARTS…stable trait autoregressive trait and state; AIC…Akaike Information Criterion; BIC…Bayesian information criterion; RMSEA…root mean square error of approximation; CFI…comparative fit index; SRMR…standardized root mean square residual; DIC…deviance information criterion Bold value indicates that associated parameters are statistically significant (p<.05); “-” indicates that estimates can not be obtained due to the limited number of time point in the dataset * indicates that associated model was preferred based on the corresponding model fit index Although the RI-CLPM and the STARTS model also showed significant estimates in most cases, is not statistically significant in Research 4, while it is significant with the CLPM. Another different result is that the sign of in the STARTS model was different from that with the CLPM in Research 3. We also found notable differences in the magnitudes of parameter estimates among cross-lagged models. The RI-CLPM provided smaller autoregressive parameter estimates () than the CLPM did (approximately 0.49 times the size), while the STARTS model provided larger estimates on average (approximately 1.45 times the size). The relation between parameter estimates from different cross-lagged longitudinal models must depend in complicated ways on the magnitude of the parameter values and on research design factors (e.g., N and T), and we need to be careful when generalizing the findings. But, one potential explanation for the increased autoregressive parameters in the STARTS model is the dissociation of measurement errors in the model because the autoregressive parameters are the major source of correlations (i.e., the variance–covariance matrix) between time points. For the RI-CLPM, in contrast, the decreased autoregressive parameter estimates may be a consequence of trait factors, which would explain a large portion of the correlations between time points. The differences in estimates of autoregressive parameters between the RI-CLPM and the STARTS model also lead to differences between their cross-lagged parameter estimates and those found by the CLPM. In this case study, the RI-CLPM and the STARTS model showed smaller cross-lagged estimates (in absolute value, 0.66 and 0.62 times the size, respectively) from those with the CLPM. Although we need to be careful about the generalizability of findings, it is well-known that the magnitude of within-cluster (in this case, within-person) relations (i.e., cross-lagged parameters in the RI-CLPM and the STARTS model) is smaller than those of between-cluster (in this case, between-person) relations, when the between-cluster difference is larger than the within-cluster difference. The decreased cross-lagged effects could be explained by this so-called ecological fallacy (Robinson [18]). With regard to standard errors, interestingly, the standard errors of in the RI-CLPM and the STARTS model are, on average, 1.6 and 2.7 times, respectively, the size of those with the CLPM. These results indicate that the inclusion of parameters that are specific to these models (i.e., trait factor (co)variances in the RI-CLPM and those and error (co)variances in the STARTS model) leads to an increase in standard errors. In combination with the observed upward or downward changes in autoregressive and cross-lagged parameter estimates, these results indicate that the RI-CLPM and the STARTS model will produce substantially different results on statistical tests than the CLPM will. It is also important to note that, among the five datasets, the CLPM was chosen as the best model in terms of model fit only once, when the Bayesian Information Criterion was used in Research 2. This result indicates that many previous studies that applied only the CLPM may have drawn erroneous conclusions about the magnitude and presence of reciprocal effects. The results described here indicate the importance of comparing alternative models when testing for reciprocal effects, and the potential (in most cases, unintended) consequences of not considering multiple models. However, one might be concerned about the generalizability of the results due to the small number of studies (i.e., five) presented here. Another important issue is the improper solutions observed in two of the five datasets when applying the STARTS model. To address these issues more extensively, we conducted two statistical simulation studies, one focusing on the frequency of improper solutions and the other focusing on parameter estimates. Although the previous case studies indicated that these models could produce largely different parameter estimates, to the best of our knowledge, no previous research has performed statistical simulation that directly compared the parameter estimates (and associated standard errors) produced by different cross-lagged longitudinal models we discussed here (i.e., the CLPM, the RI-CLPM, and the STARTS model). In addition, although some past studies have examined the frequency of improper solutions, focusing especially on the STARTS model (e.g., Cole et al [14]; Lüdtke et al [16]), no studies have systematically investigated the differences of longitudinal models used and examined the potential impact of model misspecification. Our statistical simulation also aims to extend the previous studies by addressing these points.

Simulation study

Frequency of improper solutions

To systematically investigate the rate of improper solutions under various conditions, we performed Monte Carlo simulations, where both data generation model and analysis models were selected from the three models we have discussed, resulting in 9 (= 3 × 3) combinations of data generation and analysis models. This way, we can examine the potential influence of model misspecification (as well as the correct model specification) on improper solutions. For simplicity of the simulations, the stability of parameters was assumed. For data generation, we systematically changed the number of total participants (N = 200, 600, 1, 000), the number of time points (T = 4, 6, 8), and the size of autoregressive parameters (β = β = β = 0.5, 0.7, 0.9). In this simulation, cross-lagged parameters γ were all fixed to 0.2. For the STARTS model, measurement error variances were set to (). For the other models, ψ2 is always set to zero. Variances of the temporal deviation terms at the first time point ( and ), which are equivalent to those of observations in case of the CLPM, were fixed to 1 − ψ2. The size of β reflects the determination coefficients in cross-lagged regressions. For models with trait factors (i.e., the RI-CLPM and the STARTS model), we posited normal distribution for the trait factors and their variances were set to the half size of those of temporal deviation terms at the first time point (i.e., to and ). Without loss of generality, the temporal group means were set to μ = μ = t − 1 for each time point. Correlation of the trait factors was set to 0.2. Correlation of temporal deviation terms at the first time point was set to 0.2, and in the STARTS model (time-invariant) correlations between measurement errors were set to 0.2. Finally, residual variances were fixed to , and correlation of residuals between variables was fixed to 0.2 for each time point. We generated simulated data (200 trials for each combination) by crossing these factors, resulting in 81 (= 3(N) × 3(T) × 3(β) × 3(ψ2)) combinations of factors for each pair of data generation model and data analysis model. Each simulated dataset was analyzed by the three types of analysis models, and we counted the number of improper solutions, which was defined as (1) out-of-range parameter estimates (e.g., negative variances parameters) or (2) a singular approximate Hessian matrix after termination of iteration. The whole simulation procedure, including data generation and analysis, was conducted in R (R Core Team [19]) using the lavaan (Rosseel [20]) package with the ML estimation method. Simulation code is available in S1 File. Table 3 presents the marginal proportions (i.e. proportion after aggregating across all the other factors) of improper solutions observed with each data analysis model under each level of the factors we manipulated. We mainly inspected marginal proportions in order to have an overall grasp of the factors that relate to the frequency of improper solutions. When the CLPM is used for analysis, it did not show improper solutions under any conditions. When CLPM is used for data generation, Table 3 shows that RI-CLPM and the STARTS model showed very large proportions of improper solutions (in the range of 40%-100%). Notably, in cases of the STARTS model, which posited measurement error (co)variances and residuals, 90% of the results exhibited improper solutions.
Table 3

Marginal proportions of improper solutions observed at each data analysis model under each level of the factors.

Data generation model
CLPMRI-CLPMSTARTS
Analysis modelCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTSCLPMSTARTSCLPM
β = 0.5.00.66.95.00.01.98.00.00.71
β = 0.7.00.59.99.00.081.00.00.00.89
β = 0.9.00.671.00.00.051.00.00.20.94
ψ2 = 0.2.00.65.98.00.05.98.00.00.84
ψ2 = 0.5.00.64.99.00.041.00.00.00.82
ψ2 = 0.8.00.63.98.00.051.00.00.20.88
N = 200.00.65.98.00.111.00.00.07.90
N = 400.00.64.98.00.051.00.00.07.87
N = 800.00.63.98.00.02.99.00.07.83
N = 1600.00.64.98.00.01.99.00.07.78
T = 4.00.64.98.00.10.98.00.01.75
T = 6.00.63.98.00.031.00.00.09.81
T = 8.00.64.99.00.011.00.00.11.98
Interestingly, the manipulated factors, such as the number of total participants (N) and number of time points (T) did not influence the results much. These results indicate that the impact of model misspecification dominates the risk of improper solutions, with the factors being manipulated playing a much smaller role. The same pattern was observed with different data generation models. Model misspecification was the biggest cause of improper solutions, and the STARTS model especially produced a higher number of improper solutions. One particularly important observation is that improper solutions were still observed in the STARTS model even when the model was correctly specified. Indeed, the proportion of improper solutions was unacceptably high, at more than 70%. Note that, even compared with previous investigations (Cole et al [14]; Lüdtke et al [16]), our simulations showed larger number of improper solutions. This might be attributed to differences in the stability of measurements between the current simulations and the simulations in the previous studies. Instead of controlling the residual variances, the variances of all variables were set to 1 in the simulations of both Cole et al [14] and Lüdtke et al, [16], while we did not do this in the current investigation. In most of the current simulation conditions the variances of variables are implicitly assumed to increase over time, as is often the case with longitudinal data of developmental changes and growths. Standard latent growth model (LGM) also implicitly has that assumption (e.g., in linear LGM, variance of true score increases over time). Thus, the relative impacts of trait factor variances, (time-invariant) measurement error variances, and residual variances on observations become smaller at later time points, increasing the risk of out-of-range estimates in these variance estimates. Another important difference is that such previous investigations have considered univariate (rather than bivariate) version of the STARTS model. The bivariate version of the STARTS model, which we simulated in the current study, might have a bigger risk of improper solutions caused by a singular Hessian matrix. For correctly specified models, the RI-CLPM showed smaller proportions of improper solutions than the STARTS model, especially when sample size and the number of time points were larger. However, the proportion of improper solutions was still not negligible (at 10–15%). Therefore, although the RI-CLPM and the STARTS model can be considered as alternatives to the CLPM when investigating within-person reciprocal effects, these models might be susceptible to improper solutions, especially in the presence of model misspecification.

Statistical properties of estimates

To investigate the statistical properties of cross-lagged parameter estimates in each cross-lagged longitudinal model, we performed another Monte Carlo simulation. As in the previous simulation, the data generation model and analysis model were selected from the three types of models. For data generation, we systematically changed the number of total participants (N = 200, 600, 1, 000), the number of time points (T = 4, 6, 8), and the size of autoregressive parameters (β = β = β = 0.5, 0.7) and cross-lagged parameters (γ = γ = γ = 0.0, 0.1, 0.2). Other parameters were the same as in the previous simulation. We generated simulated data (100 trials for each combination) by crossing these factors, resulting in 162 (= 3(N) × 3(T) × 2(β) × 3(γ) × 3(ψ2)) combinations of factors for each pair of data generation model and data analysis model. Each simulated dataset was analyzed by the three types of analysis models. In this simulation, when improper solutions (e.g., out-of-range parameter estimates or a singular approximate Hessian matrix) were observed, the results were discarded and the simulations were repeated until the total number of successful trials was 100 for each condition. The whole simulation procedure, including data generation and analysis, was conducted in R (R Core Team [19]) using the lavaan (Rosseel [20]) package with the ML estimation method. Simulation code is available in S1 File. From the results of the previous simulation, we expected a large proportion of improper solutions when applying the RI-CLPM and the STARTS model (especially when the analysis model was misspecified), which would indicate that the parameter estimates in these models might be substantially biased by discarding results with improper solutions. Therefore, we limited our attention here mainly to the differences in the standard errors of the cross-lagged parameters estimates between models. Standard errors might be less influenced by the occurrence of improper solutions, given that improper solutions are mainly caused by the magnitude of point estimates (e.g., out-of-range parameter estimates or a singular approximate Hessian matrix) rather than the magnitudes of associated standard errors Comparing the magnitudes of standard errors among models is useful because this might suggest the reason why inconsistent results are obtained among models in testing statistical significance of cross-lagged parameters, as we will discuss later. Table 4 presents the marginal means (i.e. means aggregated across all of the other factors) of estimated standard errors for different data generation models and analysis models.
Table 4

Marginal means of standard errors estimated at each model.

Data generation model
CLPMRI-CLPMSTARTS
Analysis modelCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTS
γ = 00.020.030.080.010.030.120.020.030.41
γ = 0.10.020.030.090.010.030.130.020.030.40
γ = 0.20.020.030.100.010.030.150.020.030.42
β = 0.50.020.030.060.010.030.090.020.030.32
β = 0.70.020.030.120.010.030.180.020.030.50
ψ2 = 0.20.010.030.060.010.030.080.020.030.12
ψ2 = 0.40.020.030.080.010.030.130.020.030.30
ψ2 = 0.60.020.030.130.010.030.180.020.030.82
N = 2000.020.040.140.020.040.200.030.040.67
N = 6000.010.020.070.010.020.110.020.020.35
N = 10000.010.020.060.010.020.090.010.020.22
T = 40.020.040.110.020.040.160.020.040.53
T = 60.020.020.080.010.020.120.020.030.39
T = 80.010.020.080.010.020.120.020.020.32
From Table 4, as we have observed from the five case studies, standard errors in the RI-CLPM and the STARTS model tend to be larger than those in the CLPM in most cases. Specifically, the standard errors were 1.3–2.6 times the size of the CLPM in the RI-CLPM and 3.3–38.7 times the size in the STARTS model. The standard errors decrease as T and N increase in cross-lagged models. In addition, β and ψ2, which relate to the (relative) magnitudes of measurement error variances, explain the magnitudes of the estimated standard errors in the STARTS model. Although these estimated standard errors in the RI-CLPM and the STARTS model might be somewhat biased by discarding the results with improper solutions, Table 4 provides an important suggestion for practice: when the true model is either the RI-CLPM or the STARTS model, standard errors with the CLPM tend to be smaller than those with other models, indicating that (incorrectly) applying the CLPM without comparing alternative models runs a great risk of committing a type-1 error when statistically testing for reciprocal effects. Tables 5, 6 and 7 shows the marginal means of the proportions of models reaching consistent/inconsistent conclusions about the statistical significance of cross-lagged parameters for different data generation models and analysis models.
Table 5

Marginal means of proportions that models suggest consistent/inconsistent conclusions about reciprocal relations (CLPM vs RI-CLPM).

Data generation model
CLPMRI-CLPMSTARTS
Analysis modelboth non-sigSTARTS onlyRI-CLPM onlyboth sigboth non-sigSTARTS onlyRI-CLPM onlyboth sigboth non-sigSTARTS onlyRI-CLPM onlyboth sig
γ = 00.940.010.040.010.920.010.070.000.920.030.040.01
γ = 0.10.010.000.460.530.070.020.380.530.310.020.510.16
γ = 0.20.000.000.200.800.000.000.180.820.060.000.530.40
β = 0.50.320.000.150.520.340.020.120.520.470.020.270.23
β = 0.70.310.000.310.370.320.000.300.380.380.010.460.15
ψ2 = 0.20.310.000.250.430.320.020.210.450.380.020.330.27
ψ2 = 0.40.320.000.250.440.330.010.210.450.430.010.380.18
ψ2 = 0.60.320.000.210.470.330.010.200.450.480.020.380.12
N = 2000.320.000.360.320.380.020.290.320.560.020.320.10
N = 6000.320.000.190.490.310.010.200.490.390.010.390.21
N = 10000.310.000.150.530.310.000.140.550.340.010.380.27
T = 40.320.000.440.240.360.010.400.240.480.020.450.05
T = 60.310.000.180.500.320.010.160.510.430.010.380.18
T = 80.310.000.080.600.310.010.070.600.370.020.260.35

“both non-sig” indicates that both models showed non-significant estimates for cross-lagged relations.

“RI-CLPM only” indicates that only the RI-CLPM showed significant estimates for cross-lagged relations.

“CLPM only” indicates that only the CLPM showed significant estimates for cross-lagged relations.

“both sig” indicates that both models showed significant estimates for cross-lagged relations.

Table 6

Marginal means of proportions that models suggest consistent/inconsistent conclusions about reciprocal relations (CLPM vs STARTS).

Data generation model
CLPMRI-CLPMSTARTS
Analysis modelboth non-sigSTARTS onlyRI-CLPM onlyboth sigboth non-sigSTARTS onlyRI-CLPM onlyboth sigboth non-sigSTARTS onlyRI-CLPM onlyboth sig
γ = 00.820.130.040.010.860.070.070.010.940.010.050.00
γ = 0.10.010.000.770.220.080.010.810.090.320.000.670.01
γ = 0.20.000.000.790.210.000.000.910.090.060.000.930.01
β = 0.50.300.020.440.240.320.030.530.120.490.000.490.01
β = 0.70.250.070.620.060.310.020.660.010.390.000.610.00
ψ2 = 0.20.230.080.420.260.280.060.520.140.390.010.590.02
ψ2 = 0.40.280.030.550.130.320.020.620.040.440.000.560.00
ψ2 = 0.60.310.010.620.060.340.000.640.020.500.000.500.00
N = 2000.290.030.570.110.370.030.550.050.580.000.420.00
N = 6000.270.050.520.160.290.030.620.070.400.000.590.01
N = 10000.260.060.510.170.280.030.620.080.350.000.640.01
T = 40.320.000.650.020.360.000.630.010.500.000.500.00
T = 60.260.060.500.180.300.040.580.080.440.000.550.00
T = 80.240.080.440.240.280.040.580.100.380.000.600.01

“both non-sig” indicates that both models showed non-significant estimates for cross-lagged relations.

“STARTS only” indicates that only the STARTS showed significant estimates for cross-lagged relations.

“CLPM only” indicates that only the CLPM showed significant estimates for cross-lagged relations.

“both sig” indicates that both models showed significant estimates for cross-lagged relations.

Table 7

Marginal means of proportions that models suggest consistent/inconsistent conclusions about reciprocal relations (RI-CLPM vs STARTS).

Data generation model
CLPMRI-CLPMSTARTS
Analysis modelbothSTARTS onlyRI-CLPM onlyboth sigbothSTARTS onlyRI-CLPM onlyboth sigbothSTARTS onlyRI-CLPM onlyboth sig
γ = 00.840.140.010.010.910.080.010.000.960.010.040.00
γ = 0.10.440.030.340.190.440.010.450.100.820.000.170.01
γ = 0.20.190.000.590.210.180.000.730.090.590.000.400.01
β = 0.50.440.030.300.230.420.030.430.120.740.000.240.01
β = 0.70.540.080.330.040.600.020.370.010.840.000.160.00
ψ2 = 0.20.450.110.200.240.470.060.330.140.700.010.270.02
ψ2 = 0.40.520.040.310.120.530.020.410.040.800.000.200.00
ψ2 = 0.60.510.020.430.050.530.010.450.010.860.000.140.00
N = 2000.630.050.230.090.640.030.290.050.880.000.120.00
N = 6000.450.060.340.150.480.030.430.070.780.000.210.01
N = 10000.400.060.370.160.420.030.480.070.710.000.280.01
T = 40.750.000.220.020.750.000.230.010.940.000.060.00
T = 60.430.070.340.160.440.040.440.080.810.000.190.00
T = 80.300.100.380.220.340.050.520.100.620.010.360.01

“both non-sig” indicates that both models showed non-significant estimates for cross-lagged relations.

“STARTS only” indicates that only the STARTS showed significant estimates for cross-lagged relations.

“RI-CLPM only” indicates that only the RI-CLPM showed significant estimates for cross-lagged relations.

“both sig” indicates that both models showed significant estimates for cross-lagged relations.

“both non-sig” indicates that both models showed non-significant estimates for cross-lagged relations. “RI-CLPM only” indicates that only the RI-CLPM showed significant estimates for cross-lagged relations. “CLPM only” indicates that only the CLPM showed significant estimates for cross-lagged relations. “both sig” indicates that both models showed significant estimates for cross-lagged relations. “both non-sig” indicates that both models showed non-significant estimates for cross-lagged relations. “STARTS only” indicates that only the STARTS showed significant estimates for cross-lagged relations. “CLPM only” indicates that only the CLPM showed significant estimates for cross-lagged relations. “both sig” indicates that both models showed significant estimates for cross-lagged relations. “both non-sig” indicates that both models showed non-significant estimates for cross-lagged relations. “STARTS only” indicates that only the STARTS showed significant estimates for cross-lagged relations. “RI-CLPM only” indicates that only the RI-CLPM showed significant estimates for cross-lagged relations. “both sig” indicates that both models showed significant estimates for cross-lagged relations. From these tables, it is obvious that different models tend to show inconsistent results (in terms of statistical significance) for cross-lagged parameters when γ ≠ 0. Notably, when they show different results, in most cases only the simpler model (the CLPM being compared with the RI-CLPM and the STARTS model; the RI-CLPM being compared with the the STARTS model) showed a significant result. Note that the influences of T and N vary depending on the data generation models and analysis models, and when γ = 0 models tend to converge to agreement more frequently. Note, however, that our simulations used relatively small values for trait factor variances and measurement error variances, and as in the previous simulation, this may have contributed to the results that simpler models were favored. For example, a larger size of β indicates relative smaller impact of trait factor variances over time, which might have made parameter estimates somewhat unstable. Note that when the true model is either the RI-CLPM or the STARTS model, (mostly, negative) biased point estimates were observed even when models were correctly specified. Marginal means of (standardized) point estimates and corresponding biases for different data generation models and analysis models are provided in Table B in S1 File. Although we have to take care about possible biased results here as a consequence of discarding the results when improper solutions were produced, in applying the RI-CLPM and the STARTS model, this simulation clearly demonstrates that statistical tests of cross-lagged effects can often show substantially inconsistent results, regardless of the number of participants or time points, especially when cross-lagged relations are actually present. One primary source of this should be the inflated standard errors of cross-lagged parameter estimates, as observed earlier. Tables 8 and 9 shows the marginal means of the proportions of models preferred by information criteria (Akaike Information Criterion: AIC, and Bayesian Information Criterion: BIC) under different data generation models and analysis models.
Table 8

Marginal means of the proportions of models preferred by Akaike Information Criterion.

Data generation model
CLPMRI-CLPMSTARTS
Analysis modelCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTS
γ = 00.910.080.000.460.530.000.000.980.02
γ = 0.10.930.070.000.420.580.000.000.980.02
γ = 0.20.950.040.000.410.590.000.000.990.01
β = 0.50.920.070.010.090.910.000.000.980.02
β = 0.70.940.060.000.780.220.000.000.990.01
ψ2 = 0.20.930.060.010.350.650.000.000.990.00
ψ2 = 0.40.930.060.000.440.560.000.000.990.01
ψ2 = 0.60.920.080.000.500.500.000.000.970.03
N = 2000.940.060.000.460.540.000.000.980.01
N = 6000.930.070.000.420.580.000.000.980.02
N = 10000.930.070.000.410.590.000.000.980.02
T = 41.000.000.000.550.450.000.001.000.00
T = 60.940.060.000.400.590.000.000.990.01
T = 80.860.140.010.340.660.000.000.970.03
Table 9

Marginal means of the proportions of models preferred by Bayesian Information Criterion.

Data generation model
CLPMRI-CLPMSTARTS
Analysis modelCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTSCLPMRI-CLPMSTARTS
γ = 00.930.060.000.530.470.000.010.980.01
γ = 0.10.950.050.000.490.510.000.000.980.01
γ = 0.20.960.040.000.460.540.000.000.990.01
β = 0.50.940.050.000.150.850.000.000.980.02
β = 0.70.950.050.000.830.170.000.000.990.01
ψ2 = 0.20.950.050.000.410.590.000.000.990.00
ψ2 = 0.40.950.050.000.500.500.000.000.990.01
ψ2 = 0.60.940.060.000.570.430.000.010.960.03
N = 2000.960.040.000.570.430.000.010.980.01
N = 6000.940.060.000.470.530.000.000.980.01
N = 10000.940.060.000.440.560.000.000.980.01
T = 41.000.000.000.660.340.000.010.990.00
T = 60.960.040.000.450.550.000.000.990.01
T = 80.880.110.000.370.630.000.000.970.03
With both AIC and BIC, when the true model was the STARTS model, the RI-CLPM was preferred in most of the cases. When the true model was the RI-CLPM, the CLPM was often preferred. It should be noted, however, again that there may be a bias in the results as we discarded the results with improper solutions and our simulations used relatively small values for trait factor variances and measurement error variances.

General discussion

In this manuscript, we discussed the importance of considering alternative models such as the RI-CLPM and the STARTS model to infer reciprocal effects, and presented potential problems of applying commonly-used CLPM (specifically, the conflation of between-person and within-person effects). Through a literature search, case studies, and statistical simulations, we showed the current predominance of the CLPM for testing cross-lagged effects in the medical literature and demonstrated the risk of drawing inconsistent conclusions depending on the model tested. In addition, we showed the potential risk of improper solutions when applying alternative models (the STARTS model, in particular) with the ML method, especially when the model is misspecified. One important observation was that many researchers implicitly precluded the option of using RI-CLPM or the STARTS model by collecting data from only two time points. Given the substantially different results obtained from different models, we recommend that applied researchers collect longitudinal data at more than two time points, even if the time lag between occasions is set to be optimal to effectively capture the theoretical process (see Dormann & Griffin [21] on this point). If we were to assume the instability of parameters across time points, more than three time points are required to compare model fits between RI-CLPM and the STARTS model. If collecting data from a larger number of time points, then performing model selection based on model fit indices is an important step in minimizing the risk of drawing erroneous conclusions about reciprocal effects. Parameter estimation may be a serious obstacle, though, especially when applying the STARTS model. Although improving research design (e.g., by choosing an appropriate sample size) is important, choosing a different estimation strategy, such as Bayesian estimation (Lüdtke, Robitzsch, & Wagner [16]), and choosing a better specified analysis model via model selection seems to be more useful. Future research should more intensively investigate the utility of Bayesian estimation in applying various cross-lagged models. One potential limitation of the alternative models is the large number of improper solutions observed in our study. Although we acknowledge that large number of improper solutions might be caused by the specific true parameter values used in our simulations, our results indicate that, when researcher encounters improper solutions in applying the RI-CLPM and the STARTS model, this might suggest the possibility of model misspecification. This is especially the case in applying the RI-CLPM, because in this model, a dominant factor that caused improper solutions was model misspecification (Table 3). However, we also observed that alternative models produce improper solutions even when researchers do not misspecify the true model. Future studies should examine how this is caused and effective ways to address the problem. Some limitations should be noted. First, the RI-CLPM and the STARTS model assume that autoregressive and cross-lagged parameters are fixed across participants, but we could incorporate random slopes for these effects. This would allow investigating the possible individual differences in within-person reciprocal effects. Such a model can be easily implemented with a multilevel modeling framework (e.g., Bringmann et al [22]; Schuurman, Ferrer, de Boer-Sonnenschein, & Hamaker [23]; they both used the framework of multilevel vector autoregression model). We suspect that such new models may be more susceptible to improper solutions given the increased number of parameters and complicated covariance structure. Future investigations should provide clearer insights into how researchers can choose the appropriate analysis model in practice. A second point relates to the extension of the current discussion to other statistical models. For example, medical researchers are often interested in testing mediation effects to understand the mechanism by which one variable influences another (e.g., Richiardi, Bellocco, & Zugna [24]; Ten Have & Joffe [25]; VanderWeele [26]), and they are often assessed in a longitudinal design (e.g., Huang & Yuan [27]; Preacher [28]). The issue of the current paper applies especially to longitudinal mediation models that include cross-lagged relations (e.g., a dynamic autoregressive mediation model; Maxwell, Cole & Mitchell [29]). If researchers fail to account for stable individual differences, then the estimated mediation effects conflate between-person and within-person processes. The current discussion is useful for considering possible alternatives when evaluating longitudinal mediation effects, and investigating the statistical properties of estimates and the frequency of estimation problems should be intriguing topics for future research. Finally, although the current study focused only on the medical literature, future study should examine common practices for testing reciprocal effects in other fields. This would give us more empirical insights into the similarities and differences in these cross-lagged models. (PDF) Click here for additional data file.
  15 in total

Review 1.  A review of causal estimation of effects in mediation analyses.

Authors:  Thomas R Ten Have; Marshall M Joffe
Journal:  Stat Methods Med Res       Date:  2010-12-16       Impact factor: 3.021

2.  Optimal time lags in panel studies.

Authors:  Christian Dormann; Mark A Griffin
Journal:  Psychol Methods       Date:  2015-08-31

Review 3.  Mediation Analysis: A Practitioner's Guide.

Authors:  Tyler J VanderWeele
Journal:  Annu Rev Public Health       Date:  2015-11-30       Impact factor: 21.981

Review 4.  Mediation analysis in epidemiology: methods, interpretation and bias.

Authors:  Lorenzo Richiardi; Rino Bellocco; Daniela Zugna
Journal:  Int J Epidemiol       Date:  2013-09-09       Impact factor: 7.196

5.  A critique of the cross-lagged panel model.

Authors:  Ellen L Hamaker; Rebecca M Kuiper; Raoul P P P Grasman
Journal:  Psychol Methods       Date:  2015-03

Review 6.  Advances in mediation analysis: a survey and synthesis of new developments.

Authors:  Kristopher J Preacher
Journal:  Annu Rev Psychol       Date:  2014-08-21       Impact factor: 24.137

7.  How to compare cross-lagged associations in a multilevel autoregressive model.

Authors:  Noémi K Schuurman; Emilio Ferrer; Mieke de Boer-Sonnenschein; Ellen L Hamaker
Journal:  Psychol Methods       Date:  2016-01-25

8.  Bayesian dynamic mediation analysis.

Authors:  Jing Huang; Ying Yuan
Journal:  Psychol Methods       Date:  2016-04-28

9.  More stable estimation of the STARTS model: A Bayesian approach using Markov chain Monte Carlo techniques.

Authors:  Oliver Lüdtke; Alexander Robitzsch; Jenny Wagner
Journal:  Psychol Methods       Date:  2017-11-27

10.  The trait-state-error model for multiwave data.

Authors:  D A Kenny; A Zautra
Journal:  J Consult Clin Psychol       Date:  1995-02
View more
  7 in total

1.  Within-Person Variability Score-Based Causal Inference: A Two-Step Estimation for Joint Effects of Time-Varying Treatments.

Authors:  Satoshi Usami
Journal:  Psychometrika       Date:  2022-08-18       Impact factor: 2.290

2.  The role of perceived threats on mental health, social, and neurocognitive youth outcomes: A multicontextual, person-centered approach.

Authors:  May I Conley; Jasmine Hernandez; Joeann M Salvati; Dylan G Gee; Arielle Baskin-Sommers
Journal:  Dev Psychopathol       Date:  2022-03-02

3.  Examining Reciprocal Effects of Cigarette Smoking, Food Insecurity, and Psychological Distress in the U.S.

Authors:  Jin E Kim-Mozeleski; Krishna C Poudel; Janice Y Tsoh
Journal:  J Psychoactive Drugs       Date:  2020-11-03

4.  "I Feel You!": The Role of Empathic Competences in Reducing Ethnic Prejudice Among Adolescents.

Authors:  Beatrice Bobba; Elisabetta Crocetti
Journal:  J Youth Adolesc       Date:  2022-07-01

5.  Reciprocal relations between dimensions of Oppositional defiant problems and callous-unemotional traits.

Authors:  Lourdes Ezpeleta; Eva Penelo; J Blas Navarro; Núria de la Osa; Esther Trepat; Lars Wichstrøm
Journal:  Res Child Adolesc Psychopathol       Date:  2022-03-15

6.  Testing prospective effects in longitudinal research: Comparing seven competing cross-lagged models.

Authors:  Ulrich Orth; D Angus Clark; M Brent Donnellan; Richard W Robins
Journal:  J Pers Soc Psychol       Date:  2020-07-30

7.  Association Between Sense of Coherence and Health Outcomes at 10 and 20 Years Follow-Up: A Population-Based Longitudinal Study in Germany.

Authors:  Anna Dziuba; Janina Krell-Roesch; Steffen C E Schmidt; Klaus Bös; Alexander Woll
Journal:  Front Public Health       Date:  2021-12-10
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.