Literature DB >> 25101022

A general non-linear multilevel structural equation mixture model.

Abstract

In the past 2 decades latent variable modeling has become a standard tool in the social sciences. In the same time period, traditional linear structural equation models have been extended to include non-linear interaction and quadratic effects (e.g., Klein and Moosbrugger, 2000), and multilevel modeling (Rabe-Hesketh et al., 2004). We present a general non-linear multilevel structural equation mixture model (GNM-SEMM) that combines recent semiparametric non-linear structural equation models (Kelava and Nagengast, 2012; Kelava et al., 2014) with multilevel structural equation mixture models (Muthén and Asparouhov, 2009) for clustered and non-normally distributed data. The proposed approach allows for semiparametric relationships at the within and at the between levels. We present examples from the educational science to illustrate different submodels from the general framework.

Entities: Chemical Disease Gene Species

Keywords: latent variables; mixture distribution; multilevel; non-linear; semiparametric; structural equation modeling

Year: 2014 PMID： 25101022 PMCID： PMC4102910 DOI： 10.3389/fpsyg.2014.00748

Source DB: PubMed Journal: Front Psychol ISSN： 1664-1078

In the past 2 decades latent variable modeling has become a standard tool in the social sciences. Linear structural equation models have been extended to include non-linear interaction and quadratic effects (for a review see Schumacker and Marcoulides, 1998; Algina and Moulder, 2001; Marsh et al., 2004, 2006), and for the capability to model multilevel data structures (e.g., Rabe-Hesketh et al., 2004; Muthén and Asparouhov, 2009). However, a systematic combination of both non-linear structural equation modeling and multilevel modeling has not been implemented in a more general framework. In this article, we present a GNM-SEMM that combines recent semiparametric non-linear structural equation models (Kelava and Nagengast, 2012; Kelava et al., 2014) with multilevel structural equation mixture models (Muthén and Asparouhov, 2009) for clustered and non-Gaussian data. The proposed framework is capable of modeling non-linear parametric and semiparametric relationships at the within and at the between levels, and it allows non-normally distributed data to be considered. We will provide an empirical example from educational sciences to illustrate the applicability of the proposed framework. We will begin by providing an overview of current approaches for estimating non-linear structural equation models and current frameworks for multilevel structural equation (mixture) models.

1. Non-linear structural equation models

Numerous parametric approaches for the estimation of non-linear effects have been developed (for a review, see Schumacker and Marcoulides, 1998; Algina and Moulder, 2001; Marsh et al., 2004, 2006), including product indicator approaches (e.g., Kenny and Judd, 1984; Bollen, 1995; Jaccard and Wan, 1995; Ping, 1995; Jöreskog and Yang, 1996; Algina and Moulder, 2001; Marsh et al., 2004, 2006; Little et al., 2006; Kelava and Brandt, 2009), distribution analytic approaches (Klein and Moosbrugger, 2000; Klein and Muthén, 2007), Bayesian approaches (e.g., Arminger and Muthén, 1998; Lee et al., 2007), and method of moments based approaches (Wall and Amemiya, 2003; Mooijaart and Bentler, 2010). Whereas most product indicator approaches have been ad-hoc methods for the specification of non-linear interaction effects and have thus suffered from requiring complicated measurement models, recent distribution analytic and Bayesian approaches have tried to overcome the need for non-linear measurement models. Method-of-moments-based approaches (Wall and Amemiya, 2003; Mooijaart and Bentler, 2010) and some indicator approaches (Bollen, 1995; Jöreskog and Yang, 1996) have been proposed as methods that do not rely as heavily on the normality assumption of the latent variables as other approaches (e.g., the distribution analytic approaches). The relaxation of distributional assumptions may lead to a reduction in the threat of biased estimates for non-linear effects in situations in which data are non-normally distributed, but for most of these approaches, relaxing these assumptions is associated with a low power for detecting the effects (Schermelleh-Engel et al., 1998; Brandt et al., 2014). A different approach for modeling non-linear relations between latent variables is the use of semiparametric structural equation mixture models (SEMM; Arminger and Stein, 1997; Jedidi et al., 1997a,b; Dolan and van der Maas, 1998; Arminger et al., 1999; Muthén, 2001; Bauer and Curran, 2004; Bauer, 2005; Pek et al., 2009, 2011). Finite mixtures of linear structural equation models are used to approximate the unknown functional form of the non-linear relationship of the latent variables. Furthermore, by assuming mixtures, the SEMM approach relaxes the assumption of normally distributed latent variables and disturbances necessary in conventional structural equation models. Therefore, the SEMM approach is a flexible tool for predicting latent dependent variables when data are not normal, and when obtaining a strict parametric representation of the functional relation does not have the highest priority (for a discussion see Bauer, 2005). However, one drawback is that the linearity assumption of latent relationships and the normality assumption of the latent variables are relaxed simultaneously. This drawback can be manifested in the problem that observed non-normality in the data cannot be attributed to either non-normality of the latent variables or non-linearity between the latent variables. A way to overcome this problem is the specification of non-linear structural equation mixture models (NSEMM; Kelava et al., 2014) that allow distributional and linearity assumptions to be relaxed separately for the latent variables and their relationships. Although, the use of mixtures for modeling non-linear latent variable relationships (e.g., Curran et al., 1996; Dolan and van der Maas, 1998; Bauer and Curran, 2004; Bauer, 2005) or the non-normality of latent variables in the context of non-linear structural equation models (Lubke and Muthén, 2005; Lee et al., 2008; Yang and Dunson, 2010; Kelava and Nagengast, 2012; Brandt et al., 2014; Kelava et al., 2014) have received increased attention in recent years, systematic evaluations have been rare. As an additional limitation, all approaches presented so far have been strictly limited to single-level models and have not accounted for nested data structures.

2. Multilevel structural equation modeling

Nested data structures have been addressed with multilevel models for relationships between manifest variables (for an introduction see Snijders and Bosker, 1999; Hox, 2010). In the past 2 decades, researchers have proposed frameworks that are capable of modeling nested data structures in latent variable models (e.g., Muthén, 1994; Rabe-Hesketh et al., 2004; Muthén and Asparouhov, 2009). For example, these frameworks have included models that account for random effects on the within-level, multilevel path analysis (Heck and Thomas, 2000), or multilevel confirmatory factor analysis (Muthén, 1994). Furthermore, mixtures of distributions have been applied in latent growth curve modeling (Muthén and Asparouhov, 2009). So far, very limited psychometric developments have been proposed in the context of non-linear multilevel structural equation models that incorporate latent interaction effects. Leite and Zuo (2011) presented a product-indicator-based approach that allows for a specification of latent interactions on the between-level (e.g., at the school level). Their approach was a first attempt to extend the product-indicator approach for non-linear interaction effects in latent multilevel models. Products of between-level indicators are used for the specification of a measurement model of the between-level latent product variable. Focusing more generally on within-person processes in psychology (Molenaar, 2004; Molenaar and Campbell, 2009), Nagengast et al. (2013) adapted the unconstrained product indicator approach to account for latent interactions on the within-level. In predicting homework motivation, they found support for the latent interaction between homework expectancy and homework value at the within-student level. Despite these first successful adaptations, several problems that are associated with single-level non-linear structural equation modeling remain unsolved. First, the hitherto applied constrained and unconstrained product-indicator approaches for multilevel models are vulnerable to violations of distributional assumptions (normal distributions are typically assumed; for a discussion see Kelava et al., 2011). The specification of constrained and unconstrained product-indicator approaches strongly depends on the distributions involved (Kelava and Brandt, 2009), and biased estimates of the parameters and standard errors can be expected when specification errors occur (Kelava et al., 2008) or distributional assumptions are not met (Kelava and Nagengast, 2012). Hence, product-indicator approaches that are extended for multilevel data structures are even more vulnerable because more distributional assumptions on different levels have to be met. Second, the proposed extensions of single-level non-linear structural equation models specify a parametric non-linearity (by involving products of latent variables). Recently, a strong emphasis has been placed on the relaxation of this simple functional relationship, including mixtures of latent variables that also allow for non-normally distributed variables (e.g., Bauer, 2005; Kelava et al., 2014). Therefore, on the one hand there is a need for an optional specification of a semiparametric relationship of the latent variables (at the within and between levels) to better approximate the non-linear reality. On the other hand, there is a need for an optional specification of mixtures that can account for non-normality or heterogeneity across subpopulations. Third, the application of single-level non-linear structural equation modeling in substantive research has suffered from too many approaches that use the same distributional assumptions (see paragraphs above) and too few simulation studies that offer clear recommendations for the application of specific approaches (for an overview, see Kelava et al., 2011). Approaches that agree with regard to distributional assumptions may lead to contradictory results; that is, some approaches might suggest significant non-linear effects, whereas others might not. Substantive researchers cannot solve this kind of problem by referring to empirical data. Further information that is based on simulation studies (for single-level non-linear models see e.g., Brandt et al., 2014) is needed here. In total, there is a need for a framework that incorporates several special cases of multilevel modeling and that offers general as well as specific solutions for both substantive and methodological research in non-linear latent variable modeling. From a substantive standpoint, non-linear hypotheses (e.g., interactions) can be examined in more detail. From a methodological standpoint, the framework will foster the comparison of different kinds of estimators (e.g., MCMC, ML, or moment methods) in the context of different distributions. As a result of these considerations, in the next section, we will present a general non-linear multilevel structural equation mixture modeling (GNM-SEMM)framework that allows for the separate relaxation of distributional and linearity assumptions of the latent variables and their relationships on different levels of a nested data structure. We will provide several theoretical and practical examples to illustrate what is possible within the framework. In general, within this framework, it is possible to derive specific submodels that include crucial parts of the model as well as a combination of several aspects that have not been combined before.

3. A general non-linear multilevel structural equation mixture model

In this section, we will present a GNM-SEMM framework that allows for semiparametric latent non-linear effects on the within and the between levels. The framework presented here is similar to the general multilevel mixture model and notation presented by Muthén and Asparouhov (2009). Whereas Muthén and Asparouhov's (2009) model focuses only on linear relationships, the GNM-SEMM framework accounts for non-linear semiparametric relationships of the manifest and latent variables involved. This allows for a more precise modeling of latent variable relationships at different data levels while relaxing the linearity assumptions of standard latent multilevel frameworks (e.g., Rabe-Hesketh et al., 2004).

3.1. Observed and mixture variables

3.1.1. Definition

Let y be the score of the j-th (j = 1, …, J) observed (indicator) variable for individual i (i = 1, …, N) in a cluster k (k = 1, …, K). Note that the individual index i is cluster-specific. Its range depends on the cluster size N (e.g., the number of pupils in a given school k is denoted as N). Let z be the score of the l-th (l = 1, …, L) observed (indicator) variable for cluster k. The observed scores y and z could be realizations of dichotomous, ordered categorical, continuous normally distributed, or count variables. Categorical (mixture) variables are used for the definition of mixtures on the individual (within) and cluster (between) levels. Let C be an within-level latent categorical variable for individual i in cluster k, which takes values 1, …, C*. Let D be a between-level latent categorical variable for cluster k, which takes values 1, …, D*. Note that the number of latent classes on the within-level may be different across the latent classes on the between-level. Analogous to Rabe-Hesketh et al. (2004), Muthén (1984), and Muthén and Asparouhov (2009), for observed dichotomous and ordered categorical variables, the underlying normally distributed latent variables y* and z* are defined such that for a set of threshold parameters τ and τ, and categories s and s′, respectively, the following equations hold for each subject i in cluster k: where the vertical bar ·|· indicates a “conditional on” statement, and ↔ indicates an equivalence. For continuous normally distributed variables, y* = y and z* = z are assumed, and for count variables, y* = log(λ) and z* = log(λ) hold, where λ and λ are the expectations of the Poisson distribution. Additional assumptions regarding the mean and covariance structure will be made in the following subsections, which will specify the measurement and structural models on the within and between levels.

3.1.2. Example

Suppose that pupils from several schools take part in a math test. For a given pupil i from school k the score on a sub-task j from the math test is given by y. In addition, for school k, there is a score z that indicates the school's social problems (e.g., the degree of bullying reported by the principal). In Figure 1, two latent categorical variables C and D on the within-level (Level 1) and the between-level (Level 2), respectively, are introduced. These variables may account for heterogeneity that occurs in the scores on both levels. On Level 1, heterogeneity in the distribution of the math test may occur due to additional private lessons in math that some pupils received. On Level 2, heterogeneity may occur in the distribution of the school's social problems, for example, due to the general (unobserved) socioeconomic status of the neighborhood where the school is located. Furthermore, school k might belong to an unobserved group of schools D = d that explicitly prepared for the math test. This may then influence the distribution of the math scores.

Figure 1

Observed variable scores .

Observed variable scores . Figure 1 shows a diagram with the observed and mixture variables. At this stage, there is no model that can explain the relationship between the scores y and z and no measurement model that can describe the realizations of the scores. The mixtures are indicated by C and D.

3.2. Level 1 – within level

3.2.1. Measurement model

. Let y* be the J-dimensional vector for individual i in cluster k that includes scores for all dependent observed within variables. The measurement model is defined by a mixture distribution model where ν1 is a J-dimensional vector of latent intercepts, Λ1 is a J × m( loading matrix. η1 = (η11, …, η1)′ is an m-dimensional vector of variables including all latent exogenous and endogenous variables. f1(·) is a smooth polynomial function mapping the m-dimensional variable vector η1 to an m(-dimensional vector f1(η1). f1(η1) could be a vector that includes product variables [e.g., (η11, η12, η11 η12)′ or (η11, (η11)2, η12, (η12)2)′] (e.g., Schumacker and Marcoulides, 1998; Kelava et al., 2011) or splines (Freund and Hoppe, 2007). K1 is a J × Q( matrix with regression coefficients. x1 is a Q-dimensional vector of all observed unexplained (within) covariates that may have an additional influence on the indicator variables y*. g1(·) is a smooth polynomial function mapping the Q-dimensional vector of covariates to a Q(-dimensional vector g1(x1), and ϵ1 is a J-dimensional vector of residual variables with a zero mean vector and covariance matrix Θ1. For observed categorical variables y, a normality assumption for ϵ1 is equivalent to a probit regression for y on η1 and x1. Alternatively, for dichotomous variables y, ϵ1 can have a logistic distribution, resulting in a logistic regression. For count variables y, the residual ϵ1 is assumed to be zero. For normally distributed continuous variables y, the residual variable ϵ1 is assumed to be normally distributed. . Suppose that in the above-mentioned math test example, data for two additional constructs (attitude toward reading and the teaching strategies experienced by the student) were collected with three items for each construct. The measurement model [cp. Equation (3)] is illustrated in Figure 2, and accordingly, it assumes two latent factors η11 (attitude toward reading) and η12 (experienced teaching strategies). For didactical purposes, all schools here belong to one class D = 1, so that the index d can be omitted, and there is no between-level model. Furthermore, heterogeneity is assumed on the within-level such that each pupil i belongs to an unobserved class (mixture) C. The example measurement model derived from the framework above is a confirmatory factor mixture model that is given by y| = ν1 + Λ1η1 + ϵ1. The heterogeneity, which is implied by the mixture c, can be accounted for differently by the (statistical) model depending on the hypothesized population model: First, a non-normal distribution of the latent variables can be modeled as a mixture distribution. For example, attitude toward reading might not be normally distributed. A mixture distribution of η11 (with varying expectations and covariance structure for each mixture component c) could represent the non-normality (see Kelava et al., 2014). Second, the measurement model might be completely different for each unobserved subgroup (with varying factor loadings etc.). For example, some pupils might have poor reading skills, and hence, do not understand the items well enough. As a consequence, factor loadings in this subgroup may be lower (or residual variances may be larger) compared with other subgroups. and such differences may lead in turn to an observed heterogeneity.

Figure 2

A measurement model for subject . The mixture distribution is symbolized by the frame with dashed lines. It was assumed that all subjects belonged to one latent class D = 1 on the between-level so that the index d could be omitted.

3.2.2. Structural model

The structural model for the latent variable vector η1 is given for each subject i in cluster k by where α is an m-dimensional vector of intercepts, B1 is an m × m( loading matrix. F1(·) is a smooth polynomial function mapping the m-dimensional vector of latent variables η1 to an m(-dimensional vector F1(η1). Γ1 is an m × Q( matrix with regression coefficients. G1(·) is a smooth polynomial function mapping the Q-dimensional vector of covariates x1 to a Q(-dimensional vector G1(x1). Note that for identification purposes, vector G1(x1) has to be completely different from vector g1(x1). ζ1 is an m-dimensional vector of residual variables with zero mean vector and covariance matrix Ψ1.

3.2.3. Mixture part

The model for the latent categorical variable C is a multinomial logit model where a1 and b1 are regression coefficients, and h1(·) is again a smooth (e.g., polynomial) function. . In the following illustrative example, the math skills of pupil i from school k (η13) are predicted by the attitude toward reading (η11) and by experienced teaching abilities (η12; see also the example above). All three constructs are modeled as latent variables, which are measured with three indicator variables each. In addition, we assume that math skills can be predicted by gender, which is introduced into the model as an observed covariate (x11). For simplicity, the model is restricted to the within-level. Furthermore, it is assumed that there is unobserved heterogeneity due to a latent class C. Membership in one of the latent classes is predicted by a second observed covariate x12 (e.g., additional private math lessons). In contrast to an ordinary linear approximation of the relationship between the latent variables, the unknown and potentially curvilinear relationship is approximated by a latent spline model. Figure 3 illustrates the proposed model; the semiparametric spline model is indicated by the snake-type arrow.

Figure 3

Structural model for subject . Note that this figure shows only a single-level model; the index d is therefore omitted.

3.3. Level 2 – between (cluster) level

The multilevel (between) part of the model is conceptualized as follows. Each of the intercepts (ν1, α, a1) and slopes or loading parameters (Λ1, K1, B1, Γ1, b1) in Equations (3), (4), and (5) can be either a fixed coefficient or a random effect that varies across the observed clusters k.

3.3.1. Structural model

Let η2 be the U-dimensional vector of all such random effect variables and any additional between-level latent exogenous variables that explain these random effects and vary across the clusters. Note that η2 is different from η1 which is the individual-level latent variable vector. For a given cluster k, the between-level structural model for η2 is defined as where μ is a U-dimensional vector of intercepts, and B2 is a U × U( loading matrix. F2(·) is a smooth polynomial function mapping the U-dimensional vector of variables η2 to a U(-dimensional vector F2(η2). Γ2 is a U × V( matrix with regression coefficients. x2 is a V-dimensional vector of all observed unexplained between-level covariates that may have an additional influence on the variables in vector η2. Note that x2 is different from x1. G2(·) is a smooth polynomial function mapping the V-dimensional vector of between-level covariates x2 to a V(-dimensional vector G2(x2). ζ2 is a U-dimensional vector of residual variables with a zero mean vector and covariance matrix Ψ2. μ, B2, and Γ2 are fixed parameters. . Suppose that the model in Figure 3 is extended to allow for multilevel effects on the between-level (Level 2). In Figure 4 depicts a latent random intercept model that implies a school-specific intercept (α3) for school k when the math skills (η13) of a given pupil i are examined. In order to approximate a potentially non-normal distribution of the school-specific intercepts or to reveal a certain heterogeneity in the latent intercepts (i.e., average math skills), a latent mixture model with the latent categorical variable D is applied. This mixture reflects Level-2 heterogeneity that may stem from (unobserved) sources, for example, certain school characteristics that influence the average math skills in school k.

Figure 4

Structural model for subject .

3.3.2. Measurement model

Let z* be the L-dimensional vector for cluster k that includes scores on all observed between-level variables that are indicators of the latent variables in vector η2. For a given cluster k, the measurement model is defined by where ν2 is an L-dimensional vector of intercepts, Λ2 is an L × U( loading matrix. f2(·) is a smooth polynomial function mapping the U-dimensional vector of variables η2 to a U(-dimensional vector f2(η2). K2 is an L × V( matrix with regression coefficients. x2 is the V-dimensional vector of all observed unexplained between-level covariates that may have an additional influence on the indicator variables z*. g2(·) is a smooth polynomial function mapping the V-dimensional vector of between-level covariates x2 to a V(-dimensional vector g2(x2). Note that for identification purposes g2(x2) has to be completely different from G2(x2). ϵ2 is a L-dimensional vector of residual (mixture) variables with a zero mean vector and covariance matrix Θ2. ν2, Λ2, and K2 are fixed parameters.

3.3.3. Mixture part

The model for the between-level categorical variable D is also a multinomial logit regression where a2 and b2 are regression coefficients, and h2(·) is again a smooth (e.g., polynomial) function. . In this last example (see Figure 5, the random intercept model in Figure 4 has been expanded by adding two latent Level-2 predictor variables (η21 and η22) that may influence the average math-skill level, for example, structural problems and social problems in school. Besides the linear effects of the latent predictors, there is an interaction effect that models the hypothesis that high scores on both between-level predictors may lead to a particularly low (or high) average math-skill level. A potential heterogeneity of the latent predictors (e.g., a non-normal distribution) is taken into account by introducing a latent categorical variable D. In addition, a manifest predictor variable x21, for example, school size or school type (private or public), is included in the model to predict the latent class probability of D as described more generally in Equation (8).

Figure 5

Structural model for subject . The distribution of the between-level's predictors is approximated by a mixture of normal distributions. The latent categorical variable D is predicted by a between-level covariate x21.

3.4. Summary

In the model described by Equations (3) to (8), the latent variables on Level 1 (η1, ϵ1, and ζ1) and on Level 2 (η2, ϵ2, and ζ2) are conceptualized as variables stemming from mixtures on level 1 and level 2, respectively. The possibility of specifying within- and between-level mixture components is a result of introducing the latent categorical variables C and D on the individual and cluster levels, respectively. On the within-level, unobserved latent classes may refer to different subpopulations (within each cluster), for example, pupils with different socioeconomic backgrounds in a given school. On the between-level, latent mixtures additionally allow for a specification of heterogeneity across/between observed clusters, for example, heterogeneity that is caused by certain characteristics of the schools. Furthermore, due to the conceptualization of mixture variables, a semiparametric modeling of non-normally distributed latent variables is available (e.g., Yang and Dunson, 2010; Kelava and Nagengast, 2012; Kelava et al., 2014), or a simple semiparametric formulation of the latent relationships (e.g., Bauer, 2005) is possible. Finally, the implementation of general polynomial functions F1(·), f1(·), G1(·), and g1(·) allows for a flexible inclusion of different parametric or semiparametric relationships (e.g., interaction effects or splines; Hastie et al., 2009), which extends the opportunities to model non-linear effects (e.g., Guo et al., 2012; Song et al., 2013).

4. Model identification

As in any other latent variable framework, within the GNM-SEMM framework, the user must ensure that the specified model is identified. In this section, we will summarize important aspects that need to be considered even though model identification is not straightforward (cf. San Martín et al., 2011; Song et al., 2013). For the identification of the proposed model, four separate aspects need to be taken into account. However, the actual identification of a specific model needs to be examined individually. First, within each mixture component standard assumptions for non-linear structural equation models need to be met. This mainly implies that restrictions be placed on manifest scaling variables or latent exogenous variables (e.g., a necessary condition for the identification is to set one factor loading for each latent predictor variable or the latent predictors' variance to one). In addition, either the latent intercepts of the indicator variables or the latent intercepts of the latent variables may be estimated in a model. Note that when latent exogenous variables (e.g., η11, η12) are identified, their latent product terms (e.g., η11 η12) do not need product indicators for identification (cf. Klein and Moosbrugger, 2000). Second, regarding the inclusion of polynomial functions for the observed covariates, it is necessary that the vectors g1(x1) and G1(x1) on Level 1 and, respectively, the vectors g2(x2) and G2(x2) on Level 2 are completely different from each other. For example, a model including g1(x1) = G1(x1) = (x11, x211)' would not be identified because x11 would be a predictor in the measurement and structural models [see Equations (3) and (4)]. In this case, two effects of x11 would be estimated simultaneously on the right side of one regression equation, which would not be identified. The same holds for the polynomial functions of the latent variables. Again, f1(η1) and F1(η1) on Level 1 as well as f2(η2) and F2(η2) on Level 2 have to be unequal [see Equations (7) and (6)]. Otherwise, perfect collinearity would be the result, meaning that the covariates and latent variables, respectively, would have the same influence on the measurement and the structural models. Their impacts would not be separable. Furthermore, polynomial (semiparametric) functions should not include constants. Otherwise, latent intercepts in the measurement and structural models would not be identified. Third, on the between (cluster) level the inclusion of latent exogenous variables, which explain the variability in the random coefficients, requires measurement models (see Figure 5). The exogenous latent variables at Level 2 need to be identified as well according to identification rules, which are the same as in single-level structural equation models. Fourth, additional assumptions concerning the latent classes of the mixture components are required. For the identification of the discrete latent variables, (a) the unconditional probabilities in Equations (5) and (8) need to sum up to one. and (b), the ambiguity of mixture components due to the so-called label switching problem makes it necessary to impose additional (artificial) constraints or relabeling strategies e.g., restrictions on the mean structure or ordinality of mixture proportions (see Equations 15–19; Redner and Walker, 1984; Stephens, 2000; Kelava and Nagengast, 2012). Note that the identification of separate parts of a model (e.g., the measurement model and the structural model) does not automatically imply that the whole model is identified. General necessary and sufficient conditions to guarantee the identifiability of a latent variable model are difficult to establish. Hence, we focus primarily on the necessary identification conditions in this article.

5. Model estimation

Generally speaking, latent variable modeling offers a large variety of methods for the estimation of specified models. The choice of the best estimation method strongly depends on the distributional assumptions of the observed and latent variables, the given sample size, the type of specified model, potential confounders, and many more aspects. Just to mention a few large classes, these methods comprise maximum likelihood estimators (e.g., Jöreskog, 1973; Rabe-Hesketh et al., 2005; Muthén and Asparouhov, 2009), least squares methods (e.g., Joreskog and Goldberger, 1972; Browne, 1974, 1984), and methods of moments (e.g., Bentler, 1983), among others. For example, when applying a maximum likelihood estimator, in the well-known EM algorithm (Dempster et al., 1977), which treats latent variables as missing data, the likelihood L of the observed indicator vector y is given as: where f1(·), ψ1(·), and ψ2(·) are probability density functions for the observed variables y, and the latent variables η1 and η2, respectively (cf. Muthén and Asparouhov, 2009). Because the likelihood function L of the observed indicator vector y is not given in closed form in general, numerical integration can be utilized in the evaluation of the likelihood using both adaptive and non-adaptive quadrature. As an alternative, the likelihood could be directly optimized by applying a quasi-Newton algorithm. Both approaches of estimating parameters are very complex due to the non-linearity (for a discussion of latent interaction effects, see Klein and Moosbrugger, 2000). However, in recent years, the Bayesian framework has become very popular in latent variable modeling (e.g., Lee et al., 2004; Lee, 2007; Lee et al., 2007; Song et al., 2009). The main reason is that it provides flexible options for specifying and estimating models. Bayesian estimation methods and algorithms (e.g., Markov Chain Monte Carlo: MCMC) can handle numerous complex parametric, semiparametric, and non-parametric relationships and distributions, for example, latent mixture distributions (e.g., Yang and Dunson, 2010; Kelava and Nagengast, 2012), non-linear models (e.g., Lee et al., 2007; Guo et al., 2012; Song et al., 2013), and multilevel structures (e.g., Fox and Glas, 2001; Song and Lee, 2004). Referring to the proposed GNM-SEMM framework with its semiparametric functional forms and its capability of considering non-normally distributed variables, a Bayesian approach seems to be a viable way to estimate models. In this sense, we will provide general descriptions of the specifications of the variables' distributions and the selection of prior distributions. Parameter vectors are defined as follows: For the Level-1 parameters, let θ = (ν′1, vec(Λ1)′, vec(K1)′, vec(Θ1)′)′ for the measurement model, where vec(·) vectorizes all elements of a given matrix. For the structural model, let θ = (α′, vec(B1)′, vec(Γ1)′, vec(Ψ1)′)′, and for the mixture model part let θ = (a1, b′1)′. Analogously, for the Level-2 parameters, let θ = (ν′2, vec(Λ2)′, vec(K2)′, vec(Θ2)′)′ for the measurement model. For the structural model, let θ = (μ′, vec(B2)′, vec(Γ2)′, vec(Ψ2)′)′, and for the mixture model part let θ = (a2, b′2)′. Finally, let θ, θ, θ, θ, θ, and θ be the vectors that include all parameters from the defined model parts across all latent classes c = 1, …, C*, d = 1, …, D*, and observed clusters k = 1, …, K.

5.1. Specification of the variables' distribution

5.1.1. Level 1

For the Bayesian analysis, the j = 1, …, J indicator variables on Level 1 are specified as a cluster-specific mixture distribution. The single mixture is given as where μ(θ, θ, x1) is the vector of conditional expectations of y*, which are specified in Equation (3) and depend on the parameter vectors θ and θ, and on the covariate vector x1. Θ−11 is the precision matrix of the multivariate normal distribution of the measurement error variables (i.e., the inverse of the covariance matrix). The model implies a specific mean vector and covariance matrix for subjects stemming from a certain latent class c on Level 1 that is clustered in a latent class d on Level 2, which in turn is given for an observed cluster k. Within each cluster k, y* is a mixture of D* components, which model heterogenity in the observed clusters. Further, within in each mixture component d, y* is a mixture of C* components, which induce heterogenity on the individual level (C* may change across different latent classes on Level 2). The latent variables η1 on Level 1 are specified as with the vector μη(θ, x1) of conditional expectations for η1 that depend on the parameter vector θ and covariate vector x1 as specified in Equation (4) as well as in the precision matrix Ψ−11.

5.1.2. Level 2

Analogous to the specification of the variables' distributions on Level 1, the indicator vector z* is modeled as with the vector μ(θ, θ, x2) of conditional expectations for z* as specified in Equation (7) and precision matrix Θ−12. The unconditional indicator vector z* is composed of D* mixture components. Finally, the distribution of the latent variable vector η2, is given as with the vector of conditional expectations μη(θ, x2) specified in Equation (6) and precision matrix Ψ−12.

5.2. Specification of prior distributions

For the prior specification, informative or non-informative priors can be selected (Gelman et al., 2004). This selection is primarily based on the availability of prior knowledge. Because the application of non-informative priors may lead to suboptimal solutions (e.g., Lee et al., 2007), it may be necessary to analyze parts of the model (e.g., a confirmatory factor analysis for the Level-2 predictors) to obtain information about the parameters. Here, a very general description of the proposed model is provided. For a detailed description of priors see Gelman et al. (2004). The class probabilities Pr(C = c|D = d, x1) and Pr(D = d|x2) depend on the multinomial logit models given in Equations (5) and (8) and thus depend on the parameters in θ and θ. For these parameters, uninformative priors are suggested unless information about heterogeneity is available (see also Kelava and Nagengast, 2012). For each precision matrix of the mixture distributions defined above, that is for Θ−11, Θ−12 for the indicator variables, and for Ψ−11, Ψ−12 for the latent variables, a multivariate normal distribution is assumed within each component. Conjugate priors are then given for c = 1, …, C*, d = 1, …, D* as The hyperparameters ρ and the (positive definite) matrices Θ, Θ, Ψ, and Ψ of the Wishart distribution include parameter information that may stem from previous studies or knowledge about the parameters. For example, Ψ2d includes information about the variances and covariances of the random coefficients, and about the latent endogenous and exogenous variables on Level 2. This information may refer to estimates of the (co)variances for the latent exogenous variables retrieved from a separately estimated confirmatory factor analysis. The conjugate priors can be modified, for example, if the residual covariance matrix Θ2 on Level 2 is assumed to be diagonal, then each diagonal element Θ2 (j = 1, …, J) can be assumed to be inverse Gamma distributed, that is (Θ2)−1 ~ Gamma(αΘ, βΘ) (with hyperparameters α, β) (Kelava and Nagengast, 2012). Further information about the selection of priors for count or ordinal data can be found in Song et al. (2013). For the other parameters in θ, θ, θ, and θ, normally distributed priors are used within each mixture component. Though, the definition of some priors needs to be formulated recursively (cf. Kelava and Nagengast, 2012). For example, let ν1 be the j-th element of the vector ν1 (which specifies the intercept of the j-th variable in y*|), and let Θ1 be the j-th diagonal element in the matrix Θ1. Then for the latent classes c = 1, d = 1, the conjugate (normal) prior for ν1 is specified as with hyperparameters H0 and ν1k11 that include information about ν1. For all other latent classes, that is c > 1 or d > 1, the following prior is selected: with If parameters are constrained to be the same across mixture components (e.g., ν1 = ν1 and Θ1 = Θ1), Equations (15) to (19) simplify to For the other parameter matrices, that is for Λ1, K1, α, B1, Γ1 and so forth on Level 1 and ν2, Λ2, K2, μ, B2, Γ2 and so forth on Level 2, a specification corresponding to the formulation above given is straightforward when the appropriate precision matrices are used. In order to avoid the label-switching problem in a mixture distribution, only one of the parameter matrices needs to be formulated recursively.

6. Empirical example

In this section, we will provide an extensive illustration of the GNM-SEMM with an example that is based on data from the Program for International Student Assessment 2009 (PISA; Organisation for Economic Co-Operation and Development, 2010), which is publicly available under http://pisa2009.acer.edu.au/downloads.php. The sample was a German subsample of N = 1, 474 pupils from 226 schools who took a math test. Additional covariate information were available on the individual level as well as on the school level. As before, we predicted pupil's math skills (Math) with their general attitude toward reading (Att) and the teaching strategies they experienced (Strat). We further expected that pupil's average math skills (latent intercept of Math) would vary systematically across schools, and that this variation could be (partly) accounted for by Level-2 predictors with measurement errors, here, structural problems in school (Prob) and the schools's social environment (Soc). We will report the results for a model that accounted for different aspects of the general model. The example is not exhaustive with regard to all potential parameters within the GNM-SEMM framework, but it provides an indication of the flexibility of the proposed framework in accommodating different aspects of the data: A spline model on Level 1 described a semiparametric flexible relationship between Att, Strat, and Math. A random intercept for Math was explained by the Level-2 predictors Prob and Soc, and the interaction effect between them. Furthermore, a mixture model accounted for the non-normality of the latent predictors on Level 2 (heterogeneity).

6.1. Model formulation

In the following, we will provide the specified measurement and structural equations for the model. For reasons of clarity, we restricted the subscripts (k, c or d) in the model formulation to those model parameters that actually depended on the latent classes or the Level-2 model. Figure 6 presents a diagram of the model and its parameters.

Figure 6

Structural models and measurement models on the within-level (Level 1) and between-level (Level 2). On Level 1, the math skill (Math) of a pupil i is predicted by his/her general attitude toward reading (Att) and his/her experienced teaching strategies (Strat). The snake-type arrows indicate a flexible spline approximation of the latent variable relationship. On Level 2, the average math skills of pupils (latent intercept α3) in school k are explained by a nonlinear interaction between structural problems in the school (Prob) and the school's social environment (Soc). The non-normality of the latent predictors is approximated by a mixture distribution.

6.1.1. Structural models

The Level-1 structural model [cf. Equation (4)] for the i-th pupil in school k was given by where F11 and F22 both defined a latent cubic spline model with two knots at ξ1 = 2, ξ2 = 3 that approximated the (curvilinear) relationships between the variables (e.g., Hastie et al., 2009): Only the latent intercept α3 was assumed to vary across schools. The Level-2 structural model [cf. Equation (6)] for school k was given by with η2 = (Prob, Soc, α3)′ and F2(η2) = (Prob, Soc, Prob · Soc)′. The product term Prob · Soc implemented the interaction effect of the structural problems in school and the social environment. Because the non-normal distributions of the latent predictors were approximated by a mixture distribution, their expectations μ1 and μ2 were assumed to vary across the unobserved mixtures (Kelava and Nagengast, 2012).

6.1.2. Measurement models

For each of the latent variables between nine and 13 items were available; they were aggregated to three indicator variables for each latent variable (item parcels) for didactic purposes. It was assumed that the indicator variables were continuously distributed, resulting in an identity link function in the measurement model (y* = y and z* = z, respectively). On Level 1, the measurement model for pupil i in the k-th school [cf. Equation (3)] was given by where f1(η1) = (Att, Strat, Math)′. On Level 2, the measurement model [cf. Equation (7)] was given by where f2(η2) = (Prob, Soc)′. The factor loading matrices Λ1 and Λ2 were formulated with a simple structure (i.e., each item loaded on only one latent variable). The residual variables ϵ1 and ϵ2 were assumed to be mutually uncorrelated and normally distributed with zero mean vectors and (diagonal) covariance matrices Θ1 and Θ2, respectively.

6.1.3. Parameter constraints and identification

Besides employing the standard identification constraints for structural equation models, we restricted the measurement model parameters and the structural model parameters to be the same across schools except for the latent intercept α3. Due to the invariance of the measurement models for the latent predictors on Levels 1 and 2, in Equations (24) and (25) the non-linear effects in the polynomial spline model and the interaction effect in Equations (22) and (23) were identified. For the mixture model, we fit two latent classes (D = 1, 2).

6.2. Model estimation

To keep this example as simple as possible, missing data were assumed to be missing at random, and this was accounted for directly in the analysis by applying the Gibbs sampler (Gelman et al., 2004). The analysis of the latent multilevel model was implemented by using the R-project software (R Core Team, 2013) and the OpenBugs package (Lunn et al., 2009). Syntax for the empirical example can be obtained upon request from the authors.

6.2.1. Starting values and prior selection

Starting values for the measurement model parameters were based on the prior analyses conducted in Mplus Muthén and Muthén (1998–2010) for separate parts of the model. Informative priors were then selected in accordance with recommendations by Gelman et al. (2004) as well as Kelava and Nagengast (2012).

6.2.2. Bayesian analysis

For the analysis, three chains with 100,000 iterations each were generated. The first 75,000 iterations (burn in) were then discarded. As proposed by Gelman (1996), convergence of the estimation procedure was achieved when all (EPSR Estimated Potential Scale Reduction; Gelman, 1996) values were below 1.2, which occurred after about 60,000 iterations (see the Supplementary Material, Figure S1). Trace plots also indicated good convergence (see the Supplementary Material, Figure S2). Means, standard errors, t-values, and percentiles of the posterior distributions of the parameter estimates based on the last 25,000 iterations are reported in the next subsection.

6.3. Results

We will summarize the main results in this subsection. Detailed results for the estimated multilevel model are presented in Table 1. In the measurement models, the factor loadings were all significant and positive, thus indicating that the latent constructs were measured reliably.

Table 1

Mean parameter estimates, standard errors, .

	θ^¯	SE	t-value	Percentiles
				2.5%	50.0%	97.5%
LEVEL-1 PARAMETERS
Intercepts
ν₁₂₁	−1.078	0.076	−14.190	−1.229	−1.076	−0.933
ν₁₃₁	−0.409	0.072	−5.709	−0.557	−0.408	−0.275
ν₁₅₂	0.411	0.118	3.484	0.173	0.414	0.633
ν₁₆₂	−0.419	0.175	−2.399	−0.769	−0.411	−0.089
ν₁₈₃	0.058	0.018	3.317	0.023	0.058	0.092
ν₁₉₃	0.340	0.016	21.069	0.308	0.340	0.372
Factor loadings
λ₁₂₁	1.141	0.026	43.986	1.091	1.141	1.192
λ₁₃₁	0.997	0.024	40.814	0.951	0.996	1.047
λ₁₅₂	0.687	0.043	15.864	0.605	0.686	0.774
λ₁₆₂	1.213	0.064	18.849	1.091	1.210	1.343
λ₁₈₃	0.754	0.030	25.279	0.696	0.754	0.814
λ₁₉₃	0.553	0.027	20.437	0.501	0.553	0.607
Path coefficients
β₁₁	0.005	0.161	0.031	−0.312	0.004	0.320
β₁₂	0.009	0.117	0.079	−0.221	0.009	0.237
β₁₃	−0.005	0.029	−0.174	−0.061	−0.005	0.052
β₁₄	0.046	0.068	0.678	−0.088	0.046	0.180
β₁₅	−0.164	0.104	−1.580	−0.367	−0.165	0.037
β₂₁	−0.070	0.192	−0.366	−0.453	−0.060	0.287
β₂₂	0.079	0.080	0.992	−0.073	0.077	0.239
β₂₃	−0.017	0.012	−1.449	−0.040	−0.017	0.006
β₂₄	0.007	0.016	0.443	−0.025	0.007	0.039
β₂₅	0.018	0.029	0.620	−0.040	0.018	0.073
Means/intercepts of latent variables
α₁	2.856	0.022	129.972	2.813	2.856	2.899
α₂	2.700	0.019	143.505	2.663	2.700	2.737
(Co)variances of latent variables
ψ₁₁₁	0.506	0.025	20.357	0.459	0.506	0.556
ψ₁₂₁	0.072	0.012	5.866	0.048	0.072	0.097
ψ₁₂₂	0.250	0.019	13.076	0.215	0.250	0.291
ψ₁₃₃	0.041	0.003	15.847	0.036	0.041	0.046
Residual variances
θ₁₁₁	0.147	0.009	16.124	0.129	0.147	0.165
θ₁₂₂	0.198	0.012	16.607	0.176	0.198	0.222
θ₁₃₃	0.212	0.011	19.288	0.191	0.212	0.234
θ₁₄₄	0.212	0.014	14.708	0.184	0.213	0.241
θ₁₅₅	0.323	0.014	22.999	0.297	0.323	0.352
θ₁₆₆	0.219	0.019	11.251	0.181	0.219	0.257
θ₁₇₇	0.066	0.003	19.760	0.059	0.066	0.072
θ₁₈₈	0.047	0.002	20.197	0.042	0.047	0.052
θ₁₉₉	0.049	0.002	23.364	0.045	0.049	0.053
LEVEL-2 PARAMETERS
Latent class probabilities
P (D = 1)	0.532	0.255	2.082	0.069	0.537	0.955
P (D = 2)	0.468	0.255	1.835	0.045	0.463	0.931
Intercepts
ν₂₂₁	0.759	0.415	1.829	−0.063	0.767	1.550
ν₂₃₁	0.603	0.277	2.180	0.048	0.603	1.143
ν₂₅₂	−0.024	0.184	−0.129	−0.391	−0.020	0.331
ν₂₆₂	0.279	0.179	1.556	−0.077	0.281	0.614
Factor loadings
λ₂₂₁	1.029	0.206	4.999	0.635	1.027	1.439
λ₂₃₁	0.700	0.137	5.113	0.434	0.699	0.970
λ₂₅₂	1.002	0.091	11.071	0.828	1.001	1.180
λ₂₆₂	0.794	0.088	8.992	0.629	0.793	0.969
Path coefficients
β₃	0.558	0.101	5.512	0.381	0.550	0.776
β₄	0.442	0.108	4.072	0.261	0.432	0.680
β₅	−0.289	0.053	−5.483	−0.405	−0.285	−0.199
Means/intercepts of latent variables
μ₁₁	1.921	0.099	19.408	1.685	1.935	2.076
μ₂₁	1.938	0.081	24.063	1.739	1.950	2.062
μ₁₂	2.107	0.123	17.088	1.931	2.084	2.424
μ₂₂	2.091	0.104	20.076	1.955	2.071	2.367
μ₃	−0.365	0.162	−2.248	−0.686	−0.361	−0.059
(Co)variances of latent variables
ψ₂₁₁	0.291	0.049	5.959	0.207	0.287	0.396
ψ₂₂₁	0.007	0.024	0.304	−0.038	0.007	0.055
ψ₂₂₂	0.239	0.030	7.986	0.186	0.237	0.305
ψ₂₃₃	0.051	0.005	11.169	0.042	0.050	0.060
Residual variances
θ₂₁₁	0.415	0.059	6.970	0.305	0.413	0.539
θ₂₂₂	0.723	0.098	7.412	0.543	0.718	0.927
θ₂₃₃	0.366	0.046	7.898	0.280	0.364	0.461
θ₂₄₄	0.183	0.022	8.204	0.143	0.182	0.230
θ₂₅₅	0.130	0.017	7.763	0.100	0.129	0.165
θ₂₆₆	0.176	0.020	8.940	0.141	0.175	0.217

Mean parameter estimates, standard errors, . The results for the semiparametric approximation of the true relationships between the Level-1 latent variables Att, Strat, and Math are illustrated in Figure 7. The relationship between Math and Att resembled a cubic relationship; the subjects' Math scores slowly increased with increasing Att scores, whereby a stronger increase was found for Att scores greater than 3 and a slight decrease for Att scores greater than 4. The relationship between Strat and Math seemed to be slightly quadratic with the highest Math scores for medium Strat scores.

Figure 7

Semiparametric Level-1 relationships between . The gray crosses indicate the predicted slope with a predicted school-specific random intercept; the black line indicates the predicted Math score for the mean random intercept. In order to test the hypotheses on the cubic relationship for Att and the quadratic relationship for Strat, we estimated a model that changed Equation (22) to β1F(Att) = β11Att + β12Att2 + β13Att3 and β2F12(Strat) = β21 Strat + β22Strat2. Results for the structural parameters on the within-level can be found in Table 2. The parametric cubic relationship for Att was not significant (13 = 0.003, p = 0.745 for the cubic effect and 11 = − 0.045, p = 0.723 for the linear effect). The attitude toward reading did not significantly predict the math ability. The parametric model for Strat indicated a significant negative quadratic relationship (22 = −0.034, p = 0.037). This indicated that pupils' math skills were highest for those subjects who rated the experienced teaching strategies as average.

Table 2

Mean parameter estimates, standard errors, .

	θ^¯	SE	t-value	Percentiles
				2.5%	50.0%	97.5%
PATH COEFFICIENTS
β₁₁	−0.045	0.139	−0.324	−0.307	−0.045	0.232
β₁₂	0.005	0.055	0.097	−0.105	0.006	0.112
β₁₃	0.003	0.007	0.354	−0.012	0.003	0.017
β₂₁	0.154	0.082	1.877	0.000	0.157	0.329
β₂₂	−0.034	0.016	−2.086	−0.067	−0.034	−0.003

Mean parameter estimates, standard errors, . On Level 2, the random intercept factor α3 had a significant negative intercept (3 = −0.365, p = 0.024) and an unexplained variance across schools of 233 = 0.051. The linear effects of the predictors were significant with 3 = 0.558 (p < 0.001) for school problems (Prob) and 4 = 0.442 (p < 0.001) for social problems (Soc). The interaction effect was significant and negative with 5 = −0.289 (p < 0.001). Figure 8 illustrates the complex non-linear association between Prob, Soc, and the random intercept α3. The expected math level of a school with an average score on school and social problems was about 0.5 (E[α3|Prob = , Soc = ] = 0.461); the expected math level was higher in schools for which one of the problems was above average and the other was below average; and the math level decreased rapidly when both problems increased together.

Figure 8

Between-level: Three-dimensional illustration of the relationship between school problems (Prob), social problems (Soc), and the random intercept α.

Between-level: Three-dimensional illustration of the relationship between school problems (Prob), social problems (Soc), and the random intercept α. Finally, the results of the mixture model for the Level-2 predictors are illustrated in Figure 9. As can be inferred from Figure 9, the distribution of the latent variables was slightly non-normal. In this empirical example, the means of the latent variables in the two classes were marginally different (with means of about 11 ≈ 21 ≈ 1.9 in Class 1 and 12 ≈ 22 ≈ 2.1 in Class 2). Additional analyses may reveal the necessity to increase or decrease the number of latent classes (e.g., using the DIC). Here, the DIC was 14,780 for a model including the mixtures and 14,770 for a model without the mixture distribution. This indicates that a mixture may not have been necessary in this case.

Figure 9

Predicted slightly non-normal densities of the Level-2 predictors Prob and Soc obtained from a two-class solution.

7. Discussion

In this article, we presented a generalized non-linear multilevel structural equation mixture model (GNM-SEMM) framework. A key characteristic its ability to specify non-linear functional relationships between outcome variables on one side and latent predictors or manifest covariates on the other side by using semiparametric regression functions (e.g., splines; Freund and Hoppe, 2007; Hastie et al., 2009). This feature is given for both levels, the within and between (cluster) levels of nested data structures. Given that (multilevel) latent variable modeling frameworks are typically linear (Bollen, 1989; van der Linden and Hambleton, 1997; Rabe-Hesketh et al., 2004; Muthén and Asparouhov, 2011), the relaxation of the linearity assumption is a step forward toward a more realistic approximation of a non-linear world. It thus extends the hitherto available multilevel modeling frameworks. A second key characteristic is the ability to specify latent mixture distributions on both levels. As in recent semiparametric latent variables approaches (e.g., Bauer and Curran, 2004; Bauer, 2005; Kelava et al., 2014), this allows for an approximation of non-normally distributed latent predictor variables for a thorough introduction with regard to manifest variables, see McLachlan and Peel (2000). Again, the relaxation of a typical assumption that can be found in most applications of latent variable modeling allows for a more precise modeling of relationships for heterogeneous populations or distributions. A third key characteristic of the proposed approach is that it is flexible enough to specify a large number of special cases. For example, it offers the ability to approximate a non-normal distribution using mixture modeling and provides an easy way to interpret the parametric functional form of the latent variable relationship. As another example, it is possible to specify a non-linear latent variable relationship in one subpopulation but not in the other. The same is true for different levels. If functional forms of the relationships are unknown, semiparametric approximations of these relationships are also possible using mixtures. Taken together, these properties are desirable. Nevertheless, the identification and estimation of the models is a general issue. Additional assumptions have to be introduced as was exemplified in the sections before (see Level-1 section on the measurement model). Fortunately, these assumptions are standard identification assumptions in latent mixture, latent (non)linear, and (semi)parametric modeling, but researchers should be careful when specifying models. For example, multiple intercepts in spline models might lead to identification issues. However, the wide range of specifiable models offers a variety of adaptable estimators that could be applied from a theoretical standpoint. Bayesian MCMC, Newton-type algorithms, and adapted EM-Algorithms are just a few examples. In this paper, we also used a substantive example from educational science. A complex model was applied to data from the large-scale PISA study (Organisation for Economic Co-Operation and Development, 2010) illustrating several conditions that may occur in empirical data. First, an a priori unknown curvilinear relationship between the latent variables was identified on Level 1 using a semiparametric latent spline model. Second, the proposed mixture part on Level 2 could be used to control for the potential non-normality of the latent Level-2 predictors. In this example, only a slight indication of non-normality was visible. The model may have also been extended to include a mixture model on Level 1. Third, on Level 2 a latent random intercept modeled a school-dependent math skill, which allowed us to account for the clustering of the data. The random intercept was predicted by a latent non-linear interaction model. The model may be extended further, for example, to test the linearity assumption on Level 2 of the relationship between the latent variables apart from the interaction effect. Other random effects could also be included. In any case, the specification of these effects should be theory-driven. Finally, we want to mention two important considerations. The proposed model should be viewed as a general framework that includes a variety of different possible models. A model that includes all aspects as presented in the model section would be highly parameterized and may overfit the data. In each empirical situation, we recommend that the actual applied model be restricted to a simpler model that allows for an adequate but parsimonious representation of the data. A decision concerning the necessity to include different parts of the model depends on the hypothesized model (e.g., random factor loadings in a confirmatory factor model or a latent spline to predict a latent slope in the structural model) and on model comparisons. In the Bayesian framework, Bayesian indices/information criteria for model selection (e.g., the deviance information criterion, DIC: Spiegelhalter et al., 2002; Celeux et al., 2006; or the Bayes factor, Bernardo and Smith, 1994) are the primary model fit indices, although they only allow only for a model comparison to be made, and they are not absolute fit indices. In general, for (both frequentist and Bayesian) non-linear models there are no absolute fit indices (Klein and Schermelleh-Engel, 2010). Hence, a top-down (or bottom-up) strategy using information criteria may be a viable way to improve the model (i.e., to restrict the model to its necessary parts). An illustration of such a strategy for multilevel models in general can be found, for example, in West et al. (2007). Furthermore, we did not show how to implement the presented framework with statistical software. In this article, a Bayesian estimator was applied and implemented in OpenBugs, thus allowing us to analyze a complete but specific semiparametric non-linear multilevel model. Future research should improve this implementation so that it will be feasibly available within standard statistical latent variable software (e.g., Mplus) that can be directly applied to different models by the substantive researcher.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

12 in total

1. A method of moments technique for fitting interaction effects in structural equation models.

Authors: Melanie M Wall; Yasuo Amemiya
Journal: Br J Math Stat Psychol Date: 2003-05 Impact factor: 3.380

2. Bayesian analysis of two-level nonlinear structural equation models with continuous and polytomous data.

Authors: Xin-Yuan Song; Sik-Yum Lee
Journal: Br J Math Stat Psychol Date: 2004-05 Impact factor: 3.380

3. Structural equation models of latent interactions: evaluation of alternative estimation strategies and indicator construction.

Authors: Herbert W Marsh; Zhonglin Wen; Kit-Tai Hau
Journal: Psychol Methods Date: 2004-09

4. The integration of continuous and discrete latent variable models: potential problems and promising opportunities.

Authors: Daniel J Bauer; Patrick J Curran
Journal: Psychol Methods Date: 2004-03

5. Investigating population heterogeneity with factor mixture models.

Authors: Gitta H Lubke; Bengt Muthén
Journal: Psychol Methods Date: 2005-03

6. Semiparametric Bayesian analysis of structural equation models with fixed covariates.

Authors: Sik-Yum Lee; Bin Lu; Xin-Yuan Song
Journal: Stat Med Date: 2008-06-15 Impact factor: 2.373

7. Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables.

Authors: Xin-Yuan Song; Ye-Mao Xia; Sik-Yum Lee
Journal: Stat Med Date: 2009-07-30 Impact factor: 2.373

2. Modeling Psychological Attributes in Psychology - An Epistemological Discussion: Network Analysis vs. Latent Variables.

Authors: Hervé Guyon; Bruno Falissard; Jean-Luc Kop
Journal: Front Psychol Date: 2017-05-18

3. The Standardization of Linear and Nonlinear Effects in Direct and Indirect Applications of Structural Equation Mixture Models for Normal and Nonnormal Data.

Authors: Holger Brandt; Nora Umbach; Augustin Kelava
Journal: Front Psychol Date: 2015-11-30

4. Academic Stress and Adolescents Mental Health: A Multilevel Structural Equation Modeling (MSEM) Study in Northwest of Iran.

Authors: Zahra Hosseinkhani; Hamid-Reza Hassanabadi; Mahboubeh Parsaeian; Mehrdad Karimi; Saharnaz Nedjat
Journal: J Res Health Sci Date: 2020-10-31

4 in total