Cross-Lagged Panel Model in Medical Research: A Cautionary Note

Longitudinal designs provide a strong inferential basis for uncovering reciprocal effects or causality between variables. For this analytic purpose, a cross-lagged panel model (CLPM) has been widely used in medical research, but the use of the CLPM has recently been criticized in methodological literature because parameter estimates in the CLPM conflate between-person and within-person processes. The aim of this study is to present some alternative models of the CLPM that can be used to examine reciprocal effects, and to illustrate potential consequences of ignoring the issue. A literature search, case studies, and simulation studies are used for this. We examined more than 300 medical papers published since 2009 that applied cross-lagged longitudinal models, finding that in all studies only a single model (typically, the CLPM) was performed and potential alternative models were not considered to test reciprocal effects. In 49% of the studies, only two time points were used, which makes it impossible to test such alternative models. Case studies and simulation studies showed that the CLPM often has worse model fit and markedly different estimates of cross-lagged parameters than alternative models, suggesting that research that relies on the CLPM only may draw erroneous conclusions regarding the presence, predominance, and sign of reciprocal effects as well as about causality.

the CLPM may draw erroneous conclusions regarding the presence, predominance, and sign of reciprocal effects as well as about causality.
To address this inherent problem with the CLPM, Hamaker et al 6 proposed a random-intercepts CLPM (RI-CLPM) as a possible analytic option. As discussed later, in the RI-CLPM, individual differences are effectively controlled by the inclusion of a latent variable that represents a time-invariant (but person-variant) trait-like factor; this allows testing the reciprocal effects within individuals. If this model is extended to include measurement errors, the model is equivalent to a so-called (bivariate) stable trait autoregressive trait and state (STARTS) model (Kenny & Zautra 10,11 ). Usami, Murayama, and Hamaker 12 discussed the mathematical and conceptual relations between various cross-lagged models, including these models.
These recent studies are insightful and informative, providing applied medical researchers a basis for thinking about how to test reciprocal relations by longitudinal data.
However, the arguments are limited mostly to mathematical and conceptual relations. As a result, we still know little about whether, when, and how the choice of different cross-lagged longitudinal models has substantive consequences for parameter estimates of reciprocal effects in practice, leading researchers to draw different conclusions from the same data in medical sciences. The aim of the current manuscript is to show the practical implications and importance of considering these alternative models when investigating reciprocal effects. This is approached through a literature search, case studies, and In the end, we also provide some practical guidelines, hoping to help applied medical researchers who work on longitudinal data in the future.

Cross-Lagged Longitudinal Models
In this paper, we focus on three cross-lagged longitudinal models: the (traditional) CLPM, the RI-CLPM, and the STARTS model. Below, following Usami et al, 12 we describe these models by emphasizing the commonalities and differences among these cross-lagged models. Throughout the paper, we assume that researchers are interested in the reciprocal relation between two variables X and Y , although it is easy to expand the models in a way that include more than two variables (e.g., when examining mediating effects of variables is a main focus of the research).
Here µ xt and µ yt are the temporal group means at time point t; x * it and y * it are temporal deviation terms from the temporal group means for individual i. With these equations, the trajectories of the temporal group mean are implicitly removed from the raw data. By Model in Medical Research 6 definition, the deviations have a mean of zero. Then, x it and y it for t ≥ 2 are modeled as

Cross-Lagged Panel
where β xt and β yt are autoregressive parameters and γ xt and γ yt are cross-lagged regression parameters at time point t. For these parameters, time-invariance can also be assumed (by using β x and β y , and γ x and γ y ) if the cross-lagged relationships are assumed to be stable over time. Note that with t = 1, the initial observations x i1 and y i1 are modeled as exogenous variables.
From the view of Granger causality (Granger 13 ), estimates of cross-lagged regression parameters (the longitudinal relationship between Y t−1 and X t after controlling for the baseline X t−1 ) are key for inferring reciprocal relations between the variables. The residuals d xit and d yit are usually assumed to be normally distributed and correlated:  Here, ω 2 xt and ω 2 yt are time-variant residual variances and ω xyt is a time-variant residual covariance. As with previous parameters, time-invariant residual variances and covariances can also be assumed (by using ω 2 x , ω 2 y , and ω xy ). A path diagram of the CLPM is provided in Figure 1a.

RI-CLPM
In the RI-CLPM (Hamaker et al 6 ), x it and y it are modeled as Again, µ xt and µ yt are the temporal group means. Critically, the model also includes I xi and I yi , which are the defining characteristic of the RI-CLPM. These are (time-invariant)  Figure 1b.
Because the RI-CLPM accounts for trait factors and then separates stable between-person differences (i.e., trait factors) from within-person fluctuations over time, cross-lagged relations in the RI-CLPM can be considered as the one pertaining to a process that takes place at the within-person level. Therefore, in the RI-CLPM, γ x and γ y can be interpreted as the quantity that express the extent to which the two variables influence each other within individuals. Because longitudinal data typically include both quantitative information of within-person changes and its individual differences, the CLPM, which does not account for trait factors (i.e., individual differences), fails to disaggregate these two components. As such, the CLPM provides inaccurate estimates for within-person reciprocal effects.
Note that if substituting the cross-lagged relations of Equation 2 into 4, the trait factors, which are separated from independent variables (x * i(t−1) and y * i(t−1) ), can obviously be interpreted as random intercepts in the model. The model is named after this statistical fact. Obviously, the CLPM is a special case of the RI-CLPM, found by letting I xi = 0 and I yi = 0. The RI-CLPM requires two or more variables to have been measured at three or more time points, while the CLPM requires only two time points.

STARTS model
By extending the RI-CLPM to include measurement error, we obtain the STARTS model (Kenny & Zautra 10,11 ). In the (bivariate) STARTS model, y it and x it are decomposed into latent true scores f xit and f yit and measurement errors ϵ xit and ϵ yit .
That is, These measurement errors are usually assumed to be normally distributed and possibly correlated, that is,  Here, ψ 2 xt and ψ 2 yt are measurement error variances, and ψ xyt is an error covariance. If needed, time-invariant measurement error (co)variances can be assumed. As in the RI-CLPM, f xit and f yit are modeled as Here, f * xit and f * yit are the terms expressing temporal deviation from the expected scores of individual i, with accounting for measurement error.
Substituting the equation 7 into the equation 5 provides the specification of the STARTS model: As in Eq. 2, temporal deviation terms are modeled as Cross-Lagged Panel Model in Medical Research 9 A path diagram of the STARTS model is provided in Figure 1c. Obviously, in the STARTS model, cross-lagged relations are posited between latent true scores, rather than between observed scores, distinguishing it from the RI-CLPM and the CLPM. However, the STARTS model and the RI-CLPM share a common critical feature-the inclusion of trait factors. As such, like the RI-CLPM, cross-lagged parameters (γ xt and γ yt ) in the STARTS model reflect within-person reciprocal effects. The STARTS model requires two or more variables to have been measured at four or more time points. This means that we can compare RI-CLPM and the STARTS to determine which of these models fits better to the data so long as more than three waves are available.
When observations may be influenced by measurement errors occurring for procedural reasons, accounting for measurement errors is desirable. However, the specification of measurement error when there is only one indicator variable (such as in the STARTS model) sometimes involves costs in terms of parameter estimation. Indeed, research has reported that the STARTS model often encounters estimation problems such as improper solutions and non-convergence. Conceptually, one primary reason is the fact that unlike trait factor variances (v 2 ) and residual variances (ω 2 t ), the contribution from measurement error variances (ψ 2 t ) is temporal: in the model-implied variance-covariance matrix, ψ 2 t appears at time point t only. Because of this, unstable estimates of some parameters (particularly autoregressive parameters) caused by some aspects of the research design (e.g., small sample size) can easily inflate the variances of the deviation Therefore, news items, book reviews, and doctoral dissertations were not considered.
We found 324 medical papers by this method. Of these, we excluded 53 papers that did not apply any cross-lagged longitudinal models to actual data, leaving us with 271 papers. Most of the excluded papers were review papers, statistical simulations, or methodological and statistical discussion. See Table 1 for the complete list of retained papers. These results indicate the heavy reliance on the traditional CLPM in the literature.

Result
It is also important to note that alternative cross-lagged longitudinal models (e.g., the RI-CLPM and the STARTS model) require at least three time points (with a stability assumption; the STARTS model requires at least four time points with an instability assumption) to fit the model (for the ALT model, we need four time points with a stability assumption). The fact that about 40% of the papers collected data at only two time points suggests that almost half of applied medical research implicitly precludes the option of using these alternative models.

Cross-Lagged Panel Model in Medical Research 12
Case studies

Method
To compare analysis results based on different cross-lagged longitudinal models, we focused on the 165 papers that collected longitudinal data with more than two time points. Among them, we randomly selected 50 papers and using the contact information provided in each of the paper we contacted the corresponding authors of the papers via email to request they share the dataset to help our research. In this contact, we emphasized that (1) our primary research purpose is simply to compare analysis results from different cross-lagged models, not to criticize their findings, (2) we would not provide any estimation results from the original paper or relevant information in the datasets to prevent identification of the source of the paper, (3) we would not share the dataset with any other researchers, and that (4) we did not need information about variables that are not relevant to cross-lagged analysis (e.g., personal information of participants).
To increase response rates from authors, we contacted the authors after one month if we had not received a reply from the first contact. As a result, we received a total of 21 responses from the authors (response rate: 42%), and among them, five authors (from five different papers) granted us access to their datasets. We were unable to obtain permissions from the authors of the other 16 papers, mainly because sharing with us might have violated the data sharing policy of their sources. Among the five datasets, two datasets were publicly available online without special permission from the authors, two datasets were provided directly by the authors, and one dataset was provided after a review of the data use agreement that we submitted. Note that one of the datasets provides us with the access only to the sample means and sample (co)variances information (rather than the raw data), which allowed us to estimate the parameters but not to fully account for missing data.

Cross-Lagged Panel Model in Medical Research 13
Among five datasets, two datasets have three time points and the others have more than three time points (M time−points = 6.0). The average sample size of these datasets is large (N = 2, 741). In this paper, we do not give the exact number of participants and time points for each study to prevent the identification of the studies. While all five studies applied CLPM, some of them specified the model in slightly different ways.
Specifically, two studies assumed second-order autoregressive and cross-lagged parameters as well as first-order parameters. Another study assumed a mediator between two variables. In addition, one study assumed time-invariant parameters (i.e., stability), while the other four studies did not.
To ensure the comparability of the results between datasets, in the current analysis, we assume time-invariant parameters for autoregressive and cross-lagged coefficients (β and γ) and residual and error (co)variances (ω 2 and ψ 2 ). In addition, neither second-order parameters nor external variables (e.g., mediators) were included in any of the analyses.
This setup also means that the results reported in the current paper are all different from those reported in the original papers. Note that one study collected multi-group data and applied the CLPM using multi-group analysis. For this dataset, we assumed group-invariant parameters for autoregressive and cross-lagged coefficients as well as residual and error (co)variances (i.e., measurement invariance between groups) while setting no constraints on the difference of temporal means between groups. We also found notable differences in the magnitudes of parameter estimates among cross-lagged models. The RI-CLPM provided smaller autoregressive parameter estimates

Cross-Lagged Panel Model in Medical Research 15
The differences in estimates of autoregressive parameters between the RI-CLPM and the STARTS model also lead to differences between their cross-lagged parameter estimates and those found by the CLPM. In this case study, the RI-CLPM and the STARTS model showed smaller cross-lagged estimates (in absolute value, 0.66 and 0.62 times the size, respectively) from those with the CLPM. Although we need to be careful about the generalizability of findings, it is well-known that the magnitude of within-cluster (in this case, within-person) relations (i.e., cross-lagged parameters in the RI-CLPM and the STARTS model) is smaller than those of between-cluster (in this case, between-person) relations, when the between-cluster difference is larger than the within-cluster difference. The decreased cross-lagged effects could be explained by this so-called ecological fallacy (Robinson 18 ).
With regard to standard errors, interestingly, the standard errors ofγ in the RI-CLPM and the STARTS model are, on average, 1.6 and 2.7 times, respectively, the size of those with the CLPM. These results indicate that the inclusion of parameters that are specific to these models (i.e., trait factor (co)variances in the RI-CLPM and those and error (co)variances in the STARTS model) leads to an increase in standard errors. In combination with the observed upward or downward changes in autoregressive and cross-lagged parameter estimates, these results indicate that the RI-CLPM and the STARTS model will produce substantially different results on statistical tests than the CLPM will.
It is also important to note that, among the five datasets, the CLPM was chosen as

General Discussion
In this manuscript, we discussed limitations of the commonly-used CLPM