| Literature DB >> 33550541 |
M G E Verdam1,2, F J Oort3, M A G Sprangers4.
Abstract
BACKGROUND: Patient-reported outcomes (PROs) are of increasing importance for health-care evaluations. However, the interpretation of change in PROs may be obfuscated due to changes in the meaning of the self-evaluation, i.e., response shift. Structural equation modeling (SEM) is the most widely used statistical approach for the investigation of response shift. Yet, non-technical descriptions of SEM for response shift investigation are lacking. Moreover, application of SEM is not straightforward and requires sequential decision-making practices that have not received much attention in the literature. AIMS: To stimulate appropriate applications and interpretations of SEM for the investigation of response shift, the current paper aims to (1) provide an accessible description of the SEM operationalizations of change that are relevant for response shift investigation; (2) discuss practical considerations in applying SEM; and (3) provide guidelines and recommendations for researchers who want to use SEM for the investigation and interpretation of change and response shift in PROs.Entities:
Keywords: Change; Health-related quality of life (HRQL); Patient-reported outcomes (PROs); Response shift; Structural equation modeling (SEM)
Mesh:
Year: 2021 PMID: 33550541 PMCID: PMC8068637 DOI: 10.1007/s11136-020-02742-9
Source DB: PubMed Journal: Qual Life Res ISSN: 0962-9343 Impact factor: 4.147
Fig. 1A SEM model for physical (P), mental (M), and social (S) health. The squares at the bottom represent nine observed indicators, where p1 to p3 refer to the three measures of physical health (i.e., ‘nausea,’ ‘pain,’ and ‘fatigue’), m1 to m3 refer to the three measures of mental health (i.e., ‘anxiety,’ ‘sadness,’ and ‘happiness’) and s1 to s3 refer to the three measures of social health (i.e., ‘family relations,’ ‘friendships,’ and ‘work relations’). The solid single-headed arrows at the bottom of the squares represent the residual factors of each indicator variable. The circles at the top represent the underlying latent variables that measure everything that the indicators that load on that factor have in common [i.e., a physical (P), mental (M), and social (S) domain of HRQL]. Each arrow from a latent variable to an observed indicator represents a factor loading. The solid double-headed arrows between the latent variables represent common factor covariances
Fig. 2A longitudinal SEM model for the investigation of change and response shift in physical (P), mental (M), and social (S) health. This is the longitudinal SEM model of the same HRQL measurement as depicted in Fig. 1. The squares at the bottom represent the observed indicators, measuring physical (p1 to p3), mental (m1 to m3), and social (s1 to s3) aspects of health (see Fig. 1) at two occasions (T1 and T2). The solid single-headed arrows at the bottom of the squares represent the residual factors of each indicator variable. The dotted double-headed arrows represent the longitudinal relations between the residual factors, where only the residual factors of the same indicator are allowed to correlate. The circles at the top represent the underlying latent variables that measure everything that the indicators that load on that factor have in common [i.e., a physical (P), mental (M), and social (S) domains of HRQL, both at T1 and T2]. Each arrow from a latent variable to an observed indicator represents a factor loading. The solid double-headed arrows between the latent variables represent common factor covariances. The dotted double-headed arrow represents the (nine) longitudinal correlations between the common factors
Illustration of response shift detection using the four-step structural equation modeling (SEM) procedure
| The SEM approach | General example | Illustrative example |
|---|---|---|
| Step 1: establishing a measurement model | ||
| The factor model is used to model the relationships between the observed variables and one or more underlying latent variables, where the underlying latent variables represent everything that the observed measures have in common (e.g., perceived health or HRQL). With longitudinal assessment, e.g., baseline (T1) and follow-up (T2), the same factor model can be applied at each measurement occasion to arrive at a longitudinal measurement model | Suppose we have patients’ scores on nine observed indicators that measure physical, mental, and social aspects of health. We use a three-factor model to represent the measurement structure of the data (see Fig. |
| Step 2: overall test of response shifta | ||
| To test for the presence of response shift, all model parameters associated with response shift (i.e., factor loadings and intercepts) are restricted to be equal across occasions, e.g., across baseline and follow-up. When these restrictions are not tenable, we continue to investigate which variable is affected by which type of response shift in Step 3 | In the illustrative example from Fig. |
| Step 3: response shift detectiona | ||
| Pattern | Suppose that the common factor loading of the observed indicator ‘work relations’ on physical health is significantly different from zero at follow-up. This indicates that the scores on the subscale ‘work relations’ become (at least partly) indicative of physical health. It may be, for example, that ‘work relations’ is interpreted as related to physical functioning at follow-up (but not at baseline) | |
| | Suppose the common factor loading of the observed indicator ‘anxiety’ is larger at follow-up as compared to baseline. This indicates that patients’ scores on ‘anxiety’ become more important to the measure of mental health. It may be, for example that anxiety due to uncertainty about the effectiveness of treatment and/or the course of the disease play an increasingly important role for patients’ mental health | |
| | Suppose the intercept value of the observed indicator ‘nausea’ is lower at follow-up as compared to baseline. This indicates that the same physical health leads to lower scores on nausea at follow-up as compared to baseline. It may be, for example, that patients get used to the experience of nausea, and therefore rate the same health-experience lower at follow-up as compared to baseline | |
| Step 4: true change assessment | ||
| |
Σ is a symmetric matrix that contains the variances and covariances of the observed variables (X); Λ is a matrix of common factor loadings, where ΛT1 and ΛT2 contain the factor loadings of baseline (T1) and follow-up (T2), respectively; Φ is a symmetric matrix of common factor (co)variances, where ΦT1 contains the common factor (co)variances at baseline, ΦT2 contains the common factor (co)variances at follow-up, ΦT1,T2 contains the common factor covariances across occasion, and ΦT1,T2 = ΦT2,T1; Θ is a matrix of residual (co)variances, where ΘT1 is a diagonal matrix with residual variances at baseline, ΘT2 is a diagonal matrix with residual variances at follow-up, ΘT1,T2 is a diagonal matrix with residual covariances across occasion, and ΘT1,T2 = T2,T1; μ is a vector that contains the means of the observed variables; τ is a vector of intercept values, where τT1 and τT2 contain the intercepts at baseline and follow-up, respectively; κ is a vector of common factor means, where κT1 and κT2 contain the means of the common factors at baseline and follow-up, respectively. In Step 1, the following restrictions apply for identification purposes: diag(ΦT1) = I, diag(ΦT2) = I, κT1 = 0, and κT2 = 0. In Step 2, identification restrictions diag(ΦT1) = I, and κT1 = 0 are sufficient, so that diag(ΦT2) and κT2 are free to be estimated. aAlthough non-invariance of residual variances can be considered as a type of non-uniform recalibration (see [10]), the detection of non-uniform recalibration is not important for the investigation of mean change in the common factors and therefore not considered here
Overview of practical considerations in application of the SEM approach for the detection of response shift
| Decisions to be made | Recommendations | |
|---|---|---|
| Know your measures | ||
| Step 1: establishing a measurement model | • Choose one of two procedures (i.e., model all measurement occasions simultaneously or separately) to arrive at a longitudinal measurement model • Modify the measurement model when model fit is not adequate, in order to obtain a well-fitting model • Decide which and how many modifications are necessary to obtain a substantively meaningful measurement model | • Specify the measurement model based on the structure of the questionnaire, previous research, and/or theory • In case of unclear or unknown structure, use exploratory analyses and substantive considerations to arrive at an interpretable and well-fitting measurement model • Combine substantive and statistical criteria to guide (re)specification to arrive at the most parsimonious, most reasonable, and best-fitting measurement model |
| Identification of possible response shift | ||
Step 2: overall test of response shift Step 3: detection of response shift | • Choose statistical criteria to guide response shift detection • Choose between competing response shifts • Decide when to stop searching for response shift | • Use the overall test of response shift to protect against false detection (i.e., type I error) • When possible, use an iterative procedure (where all possible response shifts are considered one at a time) to identify specific response shift effects • Alternatively, use statistical indices such as modification indices, expected parameter change, inspection of residuals, and/or Wald tests to guide response shift detection • Evaluate each possible response shift statistically (i.e., difference in model fit) and substantively (i.e., interpretation) in order to identify all meaningful effects • For more robust stopping criteria, use overall model-fit evaluation and evaluation of difference in model fit of the measurement model • Use different sequential decision-making practices in order to find confidence in robustness of results |
| Interpretation of response shift and ‘true’ change | ||
Step 3: detection of response shift Step 4: assessment of ‘true’ change | • Can the detected violations of invariance of model parameters be interpreted as response shifts? • Is there ‘true’ change? • What is the impact of response shift on the assessment of change? | • Detected effects can be substantively interpreted as response shift using substantive knowledge of the patient group, treatment, or disease trajectory • Compare common factor means across occasions to assess ‘true’ change in the target construct • To understand both ‘true’ change and response shift, consider possible other explanations for the detected effects and include—when available—possible explanatory variables (e.g., coping, adaptation, social comparison) • To understand the impact of response shift on change assessment, evaluate (1) the impact of response shift on observed change in the indicator variable using the decomposition of change [ |
An overview of SEM-based model-fit indices for the evaluation of overall goodness of model fit and differences in model fit
| Description and interpretation | Advantages | Disadvantages |
|---|---|---|
| Fit statistic | ||
| Chi-square test | ||
| The chi-square value can be used to test the null hypothesis of ‘exact’ fit (i.e., that the specified model holds exactly in the population), where a significant chi-square value indicates a significant deviation between the model and data (and thus that the model does not hold exactly in the population) | • Has a clear interpretation • Provides a convenient decision rule | • Highly dependent on sample size, i.e., with increasing sample size and equal degrees of freedom the chi-square value increases • Tends to favor highly parameterized (i.e., complex) models |
| Root-mean-square error of approximation (RMSEA) | ||
| The RMSEA [ | • Confidence intervals are available • Can be used to provide a ‘test of close fit’ • Takes into account model complexity • Less sensitive to sample size | • When N or df is relatively small, the index is expected to be uninformative [ |
| Comparative fit index (CFI) | ||
| The CFI [ | • Relatively unaffected by sample size • Takes into account model complexity | • Does not provide a test of model fit • Sensitive to the size of the observed covariances |
| Expected cross-validation Index (ECVI) | ||
| The ECVI [ | • Confidence intervals are available • Takes into account model complexity | • Does not provide a test of model fit • Cannot be used to evaluate model fit of a single model |
| The chi-square difference test | ||
| The chi-square values of two nested models (i.e., where the second model can be derived from the first model by imposing restrictions on model parameters) can be compared to test the difference in ‘exact’ fit. A significant difference in chi-square values indicates that the less restricted model fits the observed data significantly better than the more restricted model, or in other words, the more restricted model leads to a significant deterioration in model fit | • Has a clear interpretation • Provides a convenient decision rule | • Highly dependent on sample size (see above) • Tends to favor highly parameterized (i.e., complex) models |
| ECVI difference test | ||
| The difference in ECVI values of two nested models may be used to test the equivalence in approximate model fit, where a value that is significantly larger than zero indicates that the more restricted model has significantly worse approximate fit. Similarly, the difference between two model’s AICs (or BICs) can be used | • Takes into account model complexity | • Stringent evaluation of the performance of the ECVI for the comparison of nested models is lacking |
| CFI difference | ||
| It has been proposed that the difference between CFI values can be used to evaluate the difference in model fit between two nested models [ | • Simple to apply | • Cannot be used to test whether the difference in model fit is significant |