Literature DB >> 28474768

A heteroscedastic generalized linear model with a non-normal speed factor for responses and response times.

Abstract

In generalized linear modelling of responses and response times, the observed response time variables are commonly transformed to make their distribution approximately normal. A normal distribution for the transformed response times is desirable as it justifies the linearity and homoscedasticity assumptions in the underlying linear model. Past research has, however, shown that the transformed response times are not always normal. Models have been developed to accommodate this violation. In the present study, we propose a modelling approach for responses and response times to test and model non-normality in the transformed response times. Most importantly, we distinguish between non-normality due to heteroscedastic residual variances, and non-normality due to a skewed speed factor. In a simulation study, we establish parameter recovery and the power to separate both effects. In addition, we apply the model to a real data set.

Entities: Chemical Disease Gene Species

Keywords: factor analysis; heteroscedasticity; item response theory; non-linearity; non-normality; response time modelling

Mesh：

Year: 2017 PMID： 28474768 PMCID： PMC5434939 DOI： 10.1111/bmsp.12087

Source DB: PubMed Journal: Br J Math Stat Psychol ISSN： 0007-1102 Impact factor: 3.380

Introduction

One of the dominant approaches to the analysis of responses and response times is the generalized linear modelling framework (GLM; Molenaar, Tuerlinckx, & Van der Maas, 2015a; Ranger, 2013; Ranger & Ortner, 2012a). In this framework, a measurement model is specified for the responses to operationalize a normally distributed latent ability factor. For the response times, a measurement model is specified to operationalize a normally distributed latent speed factor. Special cases within the GLM differ in the exact way in which both measurement models are connected. By specifying a linear relation between the ability factor and the response times, the hierarchical models by Van der Linden (2007, 2009) and Thissen (1983; see also Furneaux, 1961) arise. By using a non‐linear relation between the latent ability factor and the response times, the distance–difficulty models for personality data arise (Ferrando & Lorenzo‐Seva, 2007a,b). As response times are typically bounded by zero and skewed (Luce, 1986), researchers have transformed the raw response times to enable linear modelling of them. Such transformations mainly serve the purpose of linearizing the relation between the response times and the underlying normally distributed speed factor, and of making the assumption of homoscedasticity of the residual variance in the linear response time model more plausible. Transformations that have been considered include the reciprocal (e.g., Ferrando & Lorenzo‐Seva, 2007b), square root (e.g., Aberson, Shoemaker, & Tomolillo, 2004), and logarithm (e.g., Thissen, 1983; Van der Linden, 2007). Assuming that the transformed response times are indeed normal, these transformations imply respectively an exponential, chi‐square, and log‐normal distribution for the untransformed response times. The log transformation is arguably the most widely used. However, as discussed by Klein Entink, Linden, and Fox (2009) and Ranger and Kuhn (2012), the implied log‐normal distribution does not always provide a satisfactory approximation for the observed response time distributions. To this end, Klein Entink et al. proposed a more flexible Box–Cox normal model for the response times. In these models, an extra parameter is estimated which reflects the degree of non‐normality in the log‐response times. In addition, various semi‐parametric approaches have been proposed, including a proportional hazards model (Loeys, Legrand, Schettino, & Pourtois, 2014; Ranger & Ortner, 2012b, 2013; Wang, Fan, Chang, & Douglas, 2013), a linear transformation model (Wang, Chang, & Douglas, 2013), and models for categorized response times (Ranger & Kuhn, 2012, 2013). Here, a new approach is outlined to account for departures from normality in the transformed response times. This approach is derived by noting that, in the linear model for the transformed response times, normality of the transformed responses times, , implies a normal speed factor, τ, and homoscedastic and normal residuals, ε. That is, departures from normality in will be solely due to (1) non‐normality in τ, (2) non‐normality in ε and/or (3) heteroscedastic ε. The approach by Klein Entink et al. (2009) focuses on (2). In the present paper, we provide a test on normality of based on (1) and (3). That is, we explicitly separate between non‐normality due to a non‐normal latent speed factor and between non‐normality due to heteroscedastic residuals.

Non‐normal speed factor

A non‐normal speed factor may arise if there are qualitative between‐subject differences in response speed. For instance, Meyer (2010) showed that if some subjects resort to rapid guessing for all items in a test, the transformed response time distribution will have a longer lower tail which can be captured by a two‐class between‐subject mixture model. Although Meyer estimated the mixture for each item separately, a more parsimonious way would be to introduce the mixture into the speed factor as a rapid guessing subject will have an overall higher speed level on all items as compared to a non‐guessing subject. The existence of such a between‐subject mixture will thus show up as non‐normality in the speed factor in the present approach. Another example of a between‐subject mixture of response speed is given by Van der Maas and Jansen (2003) who showed that children on a balance scale task adopt different solution strategies which differ in their execution time. Again, such heterogeneity causes the distribution of the speed factor to depart from normality.

Heteroscedastic residuals

In this paper, we use the term ‘heteroscedasticity’ to refer to the observation that the residual variances in a linear model are not constant across the predictors. In linear regression models, heteroscedasticity is a well‐studied phenomenon. Methods have been proposed to test the equality of the residual covariance matrix across the levels of the predictor variables (e.g., Anderson, 2006), and methods have been studied to account for possible violations (e.g., Brunner, Dette, & Munk, 1997). In addition, various approaches exist to assess, test or model heteroscedasticity in linear regression models, including diagnostic graphical approaches (e.g., Stevens, 2009, p. 90), statistical tests (e.g., Jarque & Bera, 1980), corrections (e.g., Long & Ervin, 2000), and approaches to model heteroscedasticity explicitly (e.g., Harvey, 1976). Interestingly, although the GLM for responses and response times is in essence also a linear regression model (but with a single latent predictor variable), tests on heteroscedasticity have not yet received attention. In GLMs for responses and response times, heteroscedasticity can arise for various reasons, including item deadlines (censoring), the use of an inappropriate transformation function for the untransformed response times, and the existence of item‐specific mixtures. Examples of item‐specific mixtures include item pre‐knowledge (McLeod, Lewis, & Thissen, 2003), post‐error slowing (Rabbitt, 1979), and rapid guessing (Wang & Xu, 2015) on some of the items but not on all.

Motivation

In the present paper, we will present a unified model that enables the simultaneous estimation of heteroscedasticity in the residuals and non‐normality in the speed factor. Such an endeavour is valuable for a number of reasons: Explicit statistical test. The assumption of normality can be tested using marginal tests like the test of Mardia (1970) on multivariate normality and that of Shapiro and Wilks (1965) on univariate normality. Such tests are marginal tests as they do not impose a latent variable structure on the data. We will show in the present study that by using a latent variable modelling framework for the responses and response times, the power to detect an effect is greater than the multivariate test by Mardia (1970). An alternative approach to the marginal tests, used by for instance Van der Linden, Breithaupt, Chuah, and Zhang (2007) and Van der Linden, Scrams, and Schnipke (1999) employs quantile–quantile plots. As discussed by Klein Entink et al. (2009), although valuable, quantile–quantile plots may be hard to interpret and they do not penalize for model complexity. Therefore, an explicit statistical test may be desirable. Affected person parameter estimates. As we will illustrate in this paper, results from response and response time analyses may be affected by the presence of non‐normality in the transformed response time distribution. That is, we find that non‐normality affects the ability and speed estimates in the tails of the latent ability and speed factor distributions. Although such a discrepancy may not always arise, it is valuable to have a tool at our disposal that can be used to test the normality assumption and compare the results across the normal and non‐normal models. If results do not differ importantly, one can safely rely on the normal modelling results. However, if results differ (as is the case in our illustration below), one may better rely on the results from the best‐fitting model, as these are likely to capture the patterns in the data better. Aberrant responses. The latent ability and latent speed factor estimates play an important role in the detection of aberrant responses (Marianti, Fox, Avetisyan, Veldkamp, & Tijmstra, 2014; Van der Linden & Guo, 2008; Van der Linden & van Krimpen‐Stoop, 2003) and in the selection of items in computerized adaptive testing (Van der Linden et al., 1999). As discussed above, for the respondents in the tails of the ability and speed distribution, the speed and ability estimates may be affected by non‐normality in the transformed response times. Therefore, for these respondents, it is important that the most appropriate estimates are used to prevent invalid inferences about aberrances or to prevent a suboptimal item selection procedure in the computerized adaptive testing procedure. In the case of severe non‐normality in the transformed response time distribution, the ability and speed factor estimates from the non‐normal models might thus be preferred. Using an appropriate transformation. Another motivation for having a statistical test on normality available within the model for responses and transformed response times is that it is often unclear which transformation (e.g., logarithmic, reciprocal, or square root) to adopt for the response times. That is, it is unclear what transformation makes the linearity and homoscedasticity assumptions plausible. Using the present approach, it can be investigated which transformation works best in normalizing the transformed response times. If normality does not hold for any of the transformations considered, the resulting violation of homoscedasticity and linearity can be explicitly taken into account in the model. Increasing power. In the present paper, the reason to focus on both heteroscedasticity and non‐normality in the speed factor is mainly to increase the resolution to detect departures from normality. As we will show in this paper, both effects are statistically well separable, meaning that the effects of a non‐normal speed factor can hardly be captured by heteroscedastic residuals, and vice versa. Therefore, as both effects pick up distinct patterns in the data, taking both effects into account increases the power to detect non‐normality. Personality items. Although the main focus of this paper is on ability measurement, some of the above motivations also hold for applications to personality items. That is, faking on personality items (Holden & Kroner, 1992) may be detected using the same procedures for aberrant responses as discussed above. In addition, research into responses and response times in personality have focused on the distance–difficulty hypothesis (Ferrando & Lorenzo‐Seva, 2007a,b) or the inverted‐U response time hypothesis (Kuiper, 1981). This hypothesis states that for personality items, if the distance between the item difficulty parameters and the latent ability parameters increases, response times decrease. As the models used to test this effect include a quadratic (or approximately quadratic) effect of the latent ability factor on the transformed response times (see Molenaar, Tuerlinckx, & van der Maas, 2015b), the transformed response times will be non‐normally distributed if the distance–difficulty hypothesis holds. The models put forward in this paper may thus provide an omnibus test for the distance–difficulty hypothesis from personality research.

Outline

The outline of this paper is as follows. First, the GLM approach to responses and response time modelling is outlined. Next, we introduce heteroscedastic residuals and a non‐linear speed factor. In a simulation study we show that the model is viable in terms of parameter recovery and the power to separate between different sources of non‐normality in the transformed response times. We then apply the model to a real data set and illustrate that the latent ability and latent speed factor estimates may be different in the normal and non‐normal models for respondents in the tails of the distributions. We conclude with a general discussion.

The generalized linear model for responses and response times

Let X denote the response of subject p on item i with transformed response time . As discussed above in the GLM framework, a measurement model is specified for both variables. To connect the two models, Van der Linden (2007, 2009) and Glas and van der Linden (2010) used a common distribution for the item parameters and a common distribution for the person parameters in the two models. In addition, Molenaar et al. (2015b), Ranger and Ortner (2012a), Van der Linden and Guo (2008), Wang, Change, et al. (2013), and Wang, Fan et al. (2013) specified a common distribution for the person parameters only. Here we follow Molenaar et al. (2015a), Ranger (2013), Thissen (1983), Furneaux (1961), and Ferrando and Lorenzo‐Seva (2007a,b) and specify a cross‐loading of the response times on the latent ability factor. That is, by assuming independence of X and conditional on the random person parameters, the joint likelihood factors as where f(·) is the likelihood function according to the measurement model of the responses, g(·) is the likelihood function according to the measurement model of the transformed response times including the cross‐loading on θ, and φ(·) is a standard normal density function for the latent ability parameters, , and the latent speed parameters, , respectively. We elaborate on choices for f(·) and g(·) below. For f(·) in equation (1), the measurement model of the responses, researchers have used the Rasch model (Loeys et al., 2014), the two‐parameter model (Molenaar et al., 2015a,b; Ranger & Ortner, 2012a; Thissen, 1983), the graded response model (Ranger, 2013) and the linear factor model (Ferrando & Lorenzo‐Seva, 2007b). Here we use the two‐parameter model, but we note that present undertaking is equally amenable to other generalized linear models (e.g., a graded response models in the case of Likert scale personality item scores). We thus specify where is the discrimination parameter which is expected to be strictly positive in practice and is the difficulty parameter. For g(·) in equation (1), the measurement model of the response times including the cross‐loading on θ, researchers have mainly used a normal model. However, others have used a proportional hazards model (Loeys et al., 2014; Wang, Fan et al. 2013), a linear transformation model (Wang, Change, et al., 2013), or a categorical model for discretized time (Ranger & Kuhn, 2012, 2013). There are many possible parameterizations of the normal model. Here we focus on where is the time intensity parameter, and is the residual variance, that is, the variance of conditional on τ. In equation (3), λθ ∈ (−∞, ∞) denotes the cross‐loading of the response times on the ability factor θ to connect the measurement model of the responses to the measurement model of the response times (see Furneaux, 1961; Molenaar et al., 2015a; Ranger, 2013; Thissen, 1983). In addition, in equation (3), models the scale of τ. Note that the model in equation (3) is equivalent to the hierarchical model of Van der Linden (2007, 2009) with fixed item effect (see Molenaar et al., 2015a; Ranger, 2013). That is, the correlation between τ and θ in the hierarchical model, ρ, can be retrieved from the parameters in equation (3) as follows: In addition, the variance of τ in the hierarchical model, , can be retrieved from the parameters in equation (3) as follows: Note that the cross‐loading from the present parameterization, λθ, is an unscaled version of the correlation parameter, ρ, from the hierarchical model. As the correlation parameter does not depend on the item, λθ does not depend on the item either.

A heteroscedastic generalized linear model with a skewed speed factor

Here, we extend the traditional model in equations (1), (2), (3) by allowing non‐normality in the speed factor and by allowing the residual variances to be heteroscedastic. In factor analysis, methods to account or test for heteroscedasticity include the generalized least squares procedure of Meijer and Mooijaart (1996), the distribution‐free method of Lewin‐Koh and Amemiya (2003), and the two‐stage least squares procedure of Bollen (1996). Here, we adopt the approach of Hessen and Dolan (2009) which is developed within the GLM framework as adopted in the present paper. Specifically, Hessen and Dolan proposed to model heteroscedasticity within the linear one‐factor model by modelling the variance of the indicators conditional on the common factor, σ2|η, by an exponential function (for a similar approach within regression modelling, see Harvey, 1976). Specifically, where η is the common factor, is an intercept parameter and is a heteroscedasticity parameter which accounts for the dependency of the conditional variance on η. That is, for δ1 = 0, the residual variance is homoscedastic, for δ1 > 0, the residual variance is increasing for increasing levels of η, and for δ1 < 0, the residual variance is decreasing for increasing levels of η. To introduce heteroscedastic residuals into the traditional GLM given by equation (3), the residual variances are explicitly conditioned on τ and modelled as in equation (6), where η = −λττ. This results in Note that we explicitly focus on heteroscedasticity across τ and not across θ. Various approaches exist by which non‐normal latent variables can be modelled. For instance, the latent distribution can be approximated using a constrained mixture approach (Schmitt, Mehta, Aggen, Kubarych, & Neale, 2006; Vermunt, 2004; Vermunt & Hagenaars, 2004), Johnson curves (Van den Oord, 2005), and splines (Woods, 2007). In addition, researchers have used the log‐beta distribution (Andersen & Madsen, 1977). Here, we use the skew normal distribution (Azzalini, 1985, 1986). The density of the skew normal distribution is given by where is a location parameter, is a scale parameter, and is a skewness parameter. That is, for ζ = 0 the density above reduces to the density of a normal distribution, for ζ > 0 the density is right‐skewed, and for ζ < 0 the density is left‐skewed. Our choice of the skew normal distribution is driven by the following considerations: As the skew normal density includes the normal distribution as special case for ζ = 0, the skew normal density enables a straightforward likelihood ratio test on normality with only one degree of freedom. If we had adopted a two‐class mixture distribution (which is certainly feasible), we would need test with at least two degrees of freedom (the mixing proportion and the mean of the second mixture component), probably lowering power. In addition, a likelihood ratio test is not feasible as a mixture distribution would require boundary constraints to arrive at a normal distribution. As the statistical properties of the likelihood ratio are well known, we considered this test desirable. However, a mixture distribution has some advantages over the skew normal distribution. We return to this point in Section 6. The skew normal density is unbounded, which is appropriate for the unbounded transformed response times. The beta distribution and the log‐normal distribution, for instance, are bounded and thus less suitable for this endeavour. The skew normal density is well developed within the present generalized linear framework (see Azevedo, Bolfarine, & Andrade, 2011; Molenaar, Dolan, & de Boeck, 2012). A linear factor model with a skew normal distribution for the factor is shown to be equivalent to a one‐factor model with quadratic factor loadings (Smits, Timmerman, & Stegeman, 2015). As such non‐linear factor models are well developed (e.g., Klein & Moosbrugger, 2000; McDonald, 1962), we can draw from the modelling tools available for these models and use them in our modelling approach (e.g., parameter estimation, factor score estimation). In addition, due to this equivalence, our modelling approach can be implemented in existing software, hopefully increasing the practical utility of the approach. To introduce a skew normal distribution for τ, we use the result of Smits et al. (2015) mentioned above. Specifically, Smits et al. show that a one‐factor model with a skew normal factor distribution given by equation (8) is equivalent to a model with quadratic factor loadings up to the first three sample moments.1 Here, we use this equivalence to specify the skew normal speed factor as a quadratic speed factor. That is, in the model subject to heteroscedasticity in equation (7) we additionally introduce the effect of non‐normality in τ by adding the quadratic effect of τ, that is, where is a shape parameter that corresponds to ζ in the skew normal density in equation (8). See Smits et al. for the exact relation between ζ and λζ. Note that the equivalence between the quadratic model in equation (9) and a model with a skew normal τ specified through the density function of τ in equation (1) only holds for the first three sample moments. However, Smits et al. show by means of the L 1 norm of the difference in the model‐implied data distributions that for values of ζ of practical interest, the differences in the two implied distributions are only minor if all moments are considered (i.e., as is the case with full‐information procedures as adopted here in equation (1)).

Estimation of the parameters

Together with the two‐parameter model for X the marginal log‐likelihood of the matrix of item responses, X, and the matrix of transformed response times, T , given the model parameters, ω = [α1,…,α, β1,…, β, ν1,…, ν, δ01, …, δ0, δ11, …, δ1, λθ, λτ, λζ] is given by where n denotes the number of items, N denotes the number of subjects, f(·) is given by equation (2) and g(·) is given by equation (9). All free parameters in the likelihood function in equation (10) can be estimated simultaneously using marginal maximum likelihood (MML) estimation (Bock & Aitkin, 1981). To this end, the integrals in the likelihood function are approximated by Gauss–Hermite quadrature (GHQ). We implemented the model in Mplus (Muthén & Muthén, 2007) using 15 nodes for each of the θ and τ dimensions. As the syntax to fit the model can become very large for an increasing number of items, n, we implemented an R script that generates the Mplus syntax to fit the model for any choice for n and number of nodes. In addition, we wrote some documentation which gives a guide to how to read the Mplus output. See http://www.dylanmolenaar.nl for the R script and the documentation.

Evaluating model fit

As for δ1 = 0 the residuals are homoscedastic and for λζ = 0 the speed factor is normal, inferences about the sources of non‐normality in the transformed response times can be based on models that do and do not include these parameters. Note that as both constraints do not involve a boundary constraint, a likelihood ratio test or fit indices like the Akaike information criterion (AIC; Akaike, 1974), Bayesian Information Criterion (BIC; Schwarz, 1978), and the saBIC (sample size adjusted BIC; Sclove, 1987) are suitable for model comparisons with and without heteroscedasticity and/or a skewed latent speed factor. However, the standard errors of these parameters might be biased as, within MML estimation, these are based on the assumption of an asymptotic normal parameter distribution which is likely to be violated for the parameter distributions of δ1 and λζ.

Simulation study

Design

To investigate parameter recovery, Type I error, and the power to distinguish between the different sources of non‐normality in the transformed response times, a simulation study was conducted. In this study we focus on four models: M1: het‐GLM‐skew. A heteroscedastic generalized linear model for responses and response times with a skew normal speed factor where λζ is a free parameter and δ1 are free parameters for all i. M2: het‐GLM. A heteroscedastic generalized linear model for responses and response times with a normal speed factor where λζ = 0 and δ1 are free parameters for all i. M3: GLM‐skew. A homoscedastic generalized linear model for responses and response times with a skew normal speed factor where λζ is a free parameter and δ1 = 0 for all i. M4: GLM. A homoscedastic generalized linear model for responses and response times with a normal speed factor where λζ = 0 and δ1 = 0 for all i (i.e., the traditional model given by equations (1), (2), (3)). We simulated data according to these four models. In this simulation study, we used the log transformation for the response times, that is, . In addition, we used N = 500 and n = 20. Difficulty parameters, β, were chosen at equally spaced values in the [−3, 3] interval. Discrimination parameters, α, were equal to 0.5 and 1.5 for the odd and even items, respectively. Time intensity parameters, ν, were equal to 2. In addition, λθ = 0.1 and λτ = 0.25 such that in the hierarchical model of Van der Linden (2007, 2009), the correlation between τ and θ equals .37 and the standard deviation of τ equals 0.27 (see equations (4) and (5)). The residual intercept parameters, δ0, were equal to −1.0. The values above already provide all parameters for the GLM. Next, for the GLM‐skew, λζ was chosen to equal −0.025. This choice for λζ given the choices for ν and λτ above corresponds to a ‘medium’ effect on the skewness of the speed factor distribution according to Molenaar, Dolan, and Verhelst (2010) and Smits et al. (2015).2 For the het‐GLM, data are generated using δ1 = −0.8 for all i. This choice for δ1 corresponds to a standardized value of δ1 στ = −0.8 × 0.27 = −0.22 which is a ‘medium’ effect according to Molenaar et al. (2010); standardizing δ1 is necessary as its value depends on the scale of τ. Finally, for the full model, het‐GLM‐skew, data are generated using both λζ = −0.025 and δ1 = −0.8 for all i. We conducted 100 replications for each true model. To each data set we fitted the four models discussed above. For all models we determined the AIC, BIC, saBIC, and the likelihood of the data given the model parameters. In addition, we conducted the multivariate normality test of Mardia (1970) to see how this test performs as compared to our modelling approach.

Results

Parameter recovery

Here, we focus on the recovery of the main parameters in the full model (M1: het‐GLM‐skew): the heteroscedasticity parameters, δ1, and the scale and shape parameters of the speed factor, λτ and λζ. First, Table 1 contains the mean parameter estimates, mean standard errors, and the root mean squared error (RMSE) of λτ, λζ, and δ1 in the full model for the different true models. As can be seen, for all parameters, the results seem acceptable, that is, the parameters are acceptably recovered in all cases with the mean standard errors close to the RMSE, indicating no bias. That is, in the cases that the speed factor is skew normal (i.e., M1: het‐GLM‐skew and M3: GLM‐skew), the mean estimates of λζ are equal to their true values (−0.025) and in the cases where the speed factor is normal (i.e., M2: het‐GLM and M4: GLM), the mean parameter estimates for λζ are 0. A similar pattern holds for the recovery of δ1. That is, in the cases where the residuals are heteroscedastic (i.e., M1: het‐GLM‐skew and M2: het‐GLM), the parameter estimates are close to their true value (−0.8), and in the cases that the residuals are homoscedastic (i.e., M3: GLM‐skew and M4: GLM), the parameter estimates for δ1 are close to 0. The results for λτ are also acceptable. Note that in the table, the results for δ1 are aggregated over items. To present the results concerning these parameter estimates for each item separated we created box plots of the parameter estimates for δ1 in the full model for the different true models in Figure 1.

Table 1

True model	λ_τ				λ_ζ				δ_1i
True model	True	mEst	mSE	RMSE	True	mEst	mSE	RMSE	True	mEst	mSE	RMSE
het‐GLM‐skew	0.250	0.248	0.011	0.013	−0.025	−0.025	0.007	0.007	−0.800	−0.811	0.299	0.300
het‐GLM	0.250	0.251	0.011	0.011	0.000	0.000	0.007	0.008	−0.800	−0.796	0.297	0.299
GLM‐skew	0.250	0.247	0.011	0.011	−0.025	−0.025	0.008	0.007	0.000	−0.007	0.300	0.296
GLM	0.250	0.250	0.010	0.010	0.000	0.000	0.007	0.007	0.000	−0.002	0.303	0.310

The results of δ1 are averaged over items. See Figure 1 for a graphical representation of the results for each item separately.

Figure 1

Box plots of the parameter estimates for δ1 in the full model (M1: het‐GLM‐skew) across replications for the different true models. The grey solid line denotes the true values.

True values, mean parameter estimates (mEst), mean standard errors (mSE), and root mean squared error (RMSE) of the λτ, λζ, and δ1 parameter estimates in the full model (M1: het‐GLM‐skew) for the different true models The results of δ1 are averaged over items. See Figure 1 for a graphical representation of the results for each item separately. Box plots of the parameter estimates for δ1 in the full model (M1: het‐GLM‐skew) across replications for the different true models. The grey solid line denotes the true values.

Power and Type I error

We established the power to detect the different effects by determining the non‐centrality parameter of the χ2 distribution under the less restricted model using the procedure outlined by Satorra and Saris (1985; see also Saris & Satorra, 1993) with a .05 level of significance. For the full model (M1: het‐GLM‐skew), we separate between the power to detect heteroscedastic residuals in the presence of a skew normal speed factor (i.e., the power to reject δ1 = 0 for all i where λζ is freely estimated) and the power to detect a skew normal speed factor in the presence of heteroscedasticity (i.e., the power to reject λζ = 0 where δ1 is freely estimated for all i). For M2: het‐GLM we only consider the power to detect heteroscedasticity (i.e., the power to reject δ1 = 0 for all i) and for M3: GLM‐skew we only consider the power to detect skewness in the speed factor (i.e., the power to reject λζ = 0). The results concerning the power to detect the different effects are in Table 2. As can be seen, in the full model (M1: het‐GLM‐skew) both effects can be well separated. That is, if both effects are in the data (i.e., the true model is M1: het‐GLM‐skew) these are detected with a power close to 1.00, and if one of the effects is in the data, that effect is detected with an acceptable power, and the other effect (which is not in the data) is detected with a power close to the level of significance (indicating that the Type I error rate is not increased). For the model with heteroscedasticity only (M2: het‐GLM), power and Type I error rate are also acceptable. That is, if there is heteroscedasticity in the data, it is well detected, and if the residuals are homoscedastic, power approaches the level of significance. For the model with a skew normal factor distribution only, the power to reject λζ = 0 is moderate (.68) if the data contain heteroscedasticity residuals as well (i.e., the true model is M1: het‐GLM‐skew). In addition, the Type I error rate seems increased. That is, if the data contain heteroscedastic residuals but no skewness in the speed factor distribution (i.e., the true model is M2: het‐GLM), the power to reject λζ = 0 is still .22.

Table 2

Power and non‐centrality parameter (ncp) of the likelihood ratio test to reject δ1 = 0 and λζ = 0 in the different models at the .05 level of significance

True model	M1: het‐GLM‐skew				M2: het‐GLM		M3: GLM‐skew
	δ_1i		λ_ζ		power	ncp	power	ncp
	power	ncp	power	ncp	power	ncp	power	ncp
M1: het‐GLM‐skew	1.00	141.27	0.95	12.95	1.00	134.15	0.68	5.83
M2: het‐GLM	1.00	142.36	0.08	0.23	1.00	143.57	0.22	1.44
M3: GLM‐skew	0.05	0.00a	0.93	11.98	0.05	0.00a	0.94	12.11
M4: GLM	0.06	0.59	0.05	0.00a	0.06	0.59	0.05	0.00a

These ncp values are fixed to 0 as their estimate was slightly negative (which can happen due to sampling fluctuations).

Power and non‐centrality parameter (ncp) of the likelihood ratio test to reject δ1 = 0 and λζ = 0 in the different models at the .05 level of significance These ncp values are fixed to 0 as their estimate was slightly negative (which can happen due to sampling fluctuations). With respect to the fit indices, we focused on the hit rates and false positive rates of the different models for the different effects in the data. That is, the hit rate is defined as the proportion of replications in which the true model is correctly identified as the best model according to the AIC, BIC and saBIC, and the false positive rate is defined as the proportion of replications in which the wrong model is identified as the best model. See Table 3 for the results. As can be seen, all fit indices perform acceptably in terms of the hit rates. In addition, the effects are statistically well separable, meaning that heteroscedastic residuals are seldom detected as skew normal factors or vice versa (i.e., the off‐diagonal proportions are close to zero for het‐GLM and GLM‐skew). This emphasizes the need to consider both effects: they pick up different aspects in the data.

Table 3

Hit rates and false positive rates for the AIC, BIC and saBIC for the estimated models across the different true models

	True model	Estimated models
	True model	het‐GLM‐skew	het‐GLM	GLM‐skew	GLM
AIC	het‐GLM‐skew	.99	.01	0	0
	het‐GLM	.23	.77	0	0
	GLM‐skew	0	0	.97	.03
	GLM	0	.01	.12	.87
BIC	het‐GLM‐skew	.73	.14	.05	.08
	het‐GLM	.02	.93	0	.05
	GLM‐skew	0	0	.82	.18
	GLM	0	0	.01	.99
saBIC	het‐GLM‐skew	.95	.05	0	0
	het‐GLM	.13	.87	0	0
	GLM‐skew	0	0	.94	.06
	GLM	0	0	.07	.93

Hit rates are in boldface (i.e., the proportion of replications in which the true model is correctly identified as the best model).

Hit rates and false positive rates for the AIC, BIC and saBIC for the estimated models across the different true models Hit rates are in boldface (i.e., the proportion of replications in which the true model is correctly identified as the best model). For each replication in the simulation study, we also conducted the test by Mardia (1970) on multivariate normality. Results indicate that non‐normality was rejected in 98, 99, 5 and 2% of the replications if the true model was M1: het‐GLM‐skew, M2: het‐GLM, M3: GLM‐skew and M4: GLM, respectively. Thus, normality was rejected if the data contained both effects, and the effect of heteroscedasticity only, but normality was not rejected when the data contained a skewed speed factor.

Illustration

Data

The data comprise the responses and response times of 668 Dutch high school students on 23 items from the ‘path finder’ test. This test resembles the embedded figures test (Witkin, 1950). In the test, respondents have to decide which of five simple figures occur within a more complex target figure. Two of the respondents were omitted from the analyses because they showed suspiciously small response times (1 s on most of the items). Following the simulation study, we focus on the log‐transformed response times. A time deadline of 40 s was imposed on all items. As a result, the observed response times showed ceiling effects which caused the log response time distribution to be non‐normal. To illustrate how this source of non‐normality can be detected and accounted for using the present approach we fit the different models as studied in the simulation study to the data. Next, we show how estimates for θ and τ differ between the present non‐normal models and the more traditional normal baseline model. Table 4 contains the results concerning the fit of the different models. As can be seen, the likelihood ratio test between each of the models and the full model (M1: het‐GLM‐skew) is only significant for the model with heteroscedasticity only, M2: het‐GLM, χ2(1) = .74, p = .39, indicating that dropping the skewness parameter, λζ, from the full model does not deteriorate model fit. As both the model with skewness only (M3: GLM‐skew) and the model without effects (M4: GLM) are associated with a significant likelihood ratio test, we choose M2: het‐GLM as the best‐fitting model. This conclusion is supported by the AIC, BIC, and saBIC. See Table 5 for the item parameter estimates within this model. As can be seen, δ1 < 0 for all i, indicating that the residual variance is smaller for subjects with lower levels of τ. This is likely due to the item deadline which results in less variance at the upper range of the log response time distribution.

Table 4

Likelihood ratio test with M1: het‐GLM‐skew and fit indices for the models considered in the illustration

	ℓ	LRT	df	AIC	BIC	saBIC
M1: het‐GLM‐skew	−20,914.49	–	–	42,065	42,596	42,221
M2: het‐GLM	−20,915.23	0.74	1	42,064	42,591	42,220
M3: GLM‐skew	−21,740.77	826.28	23	43,672	44,099	43,798
M4: GLM	−21,785.36	870.87	24	43,759	44,182	43,883

For the fit indices, best values are in bold.

Table 5

Parameters estimates (est) and standard errors (SE) for the parameters in M2: het‐GLM

α_i		β_i		ν_i		δ_0i		δ_1i
est	SE	est	SE	est	SE	est	SE	est	SE
0.65	0.10	−0.88	0.09	2.44	0.04	−0.44	0.05	−0.45	0.28
0.61	0.11	−0.02	0.08	2.51	0.03	−0.79	0.05	−0.23	0.33
0.80	0.11	−1.04	0.10	2.60	0.03	−0.73	0.05	−0.83	0.29
0.25	0.09	−1.06	0.09	2.71	0.03	−1.01	0.06	−0.65	0.38
0.36	0.12	1.04	0.09	3.04	0.03	−2.16	0.10	−4.25	0.58
0.54	0.13	0.96	0.09	2.75	0.04	−1.40	0.09	−2.73	0.40
0.15	0.09	0.49	0.08	3.02	0.03	−1.91	0.09	−2.56	0.60
0.40	0.11	0.71	0.09	2.98	0.03	−1.98	0.08	−3.70	0.84
0.36	0.12	1.11	0.09	2.95	0.03	−1.81	0.12	−2.79	1.00
0.39	0.09	0.25	0.08	2.93	0.03	−1.81	0.09	−3.31	0.96
0.20	0.09	0.60	0.08	2.92	0.03	−1.84	0.08	−2.52	0.51
0.22	0.12	1.21	0.09	2.80	0.03	−1.39	0.09	−3.61	0.55
0.53	0.11	0.19	0.08	2.73	0.03	−1.34	0.08	−1.50	0.38
0.48	0.11	0.39	0.08	2.85	0.04	−1.58	0.11	−2.54	0.57
0.53	0.11	0.70	0.09	2.88	0.04	−1.62	0.12	−3.34	0.93
0.52	0.10	0.16	0.08	2.94	0.03	−1.47	0.09	−1.81	0.59
0.40	0.12	0.90	0.09	2.93	0.02	−1.81	0.09	−3.28	1.20
0.36	0.11	0.62	0.08	2.93	0.03	−2.02	0.11	−4.37	0.85
0.66	0.17	1.05	0.10	2.82	0.03	−1.48	0.08	−3.20	0.63
0.29	0.10	0.81	0.09	2.79	0.03	−1.39	0.08	−3.22	0.52

Likelihood ratio test with M1: het‐GLM‐skew and fit indices for the models considered in the illustration For the fit indices, best values are in bold. Parameters estimates (est) and standard errors (SE) for the parameters in M2: het‐GLM To see how the effect of heteroscedasticity affects the results of the normal baseline model, we plot the estimates of θ and τ in the final model, M2: het‐GLM, against the estimates of respectively θ and τ in the baseline model, M4: GLM (see Figures 2 and 3). As can be seen, the estimates are highly comparable in the mid‐range of θ and τ. However, in the tails, the estimates diverge. This is because the non‐normality in the log response time distribution is most evident in the tails of the distribution (in this case, a thicker lower tail and a thinner upper tail). As M2: het‐GLM fits the data best according to the fit indices, the θ and τ estimates from this model can be better trusted than those from the baseline model. Especially in an applied setting where the interest is in identifying high‐ or low‐scoring respondents (e.g., personnel selection or admission for extra training for weak students), the θ estimates from the non‐normal approach might be the preferred basis for inferences.

Figure 2

Plot of estimates of θ in M2: het‐GLM (y‐axis) against M4: GLM (x‐axis). The solid grey line denotes a one‐to‐one correspondence.

Figure 3

Plot of estimates of τ in M2: het‐GLM (y‐axis) against M4: GLM (x‐axis). The solid grey line denotes a one‐to‐one correspondence.

Plot of estimates of θ in M2: het‐GLM (y‐axis) against M4: GLM (x‐axis). The solid grey line denotes a one‐to‐one correspondence. Plot of estimates of τ in M2: het‐GLM (y‐axis) against M4: GLM (x‐axis). The solid grey line denotes a one‐to‐one correspondence.

Discussion

In this paper, we have presented a method to test for departures from a normal distribution for the transformed response times within the GLM framework for responses and response times (e.g., Van der Linden, 2007). Most importantly, we distinguished between non‐normality due to a non‐normal speed factor and non‐normality due to heteroscedastic residual variances. Our simulation study showed that this distinction may be valuable as the power to detect violations of normality in the transformed response times due to a non‐normal speed factor is small for the more traditional marginal test of Mardia (1970) on multivariate normality. Molenaar et al. (2010) found a similar result within the one‐factor model. Our main motivation to adopt the present modelling approach was, first, to have a statistical tool available to study whether results from the traditional normal approaches are affected by non‐normality in the transformed response times; and second, to test for qualitative differences in response behaviour, including differences in solution strategies, guessing, etc. With respect to the latter motivation, differences in response behaviour may be detected because they will produce a mixture of two or more qualitatively different classes of respondents. In the present approach, the presence of different classes in a given data set can be detected in the speed factor distribution (indicating between‐subject differences) or in the residuals (indicating within‐subject differences). However, the exact number of classes cannot be determined. That is, violations of normality may indicate qualitative differences in the response behaviour of the respondents, but the results give no indication about the nature or number of classes. If qualitative differences in the response behaviour of the respondents result in a multi‐modal mixture, the present approach can still be used as a statistical test for the presence of such differences, but it cannot be used to actually model the bimodality in the distribution. In addition, if the mixture is perfectly multi‐modal (i.e., the mixture distribution is symmetrical), the present modelling approach will not be suitable as the observed data distribution will not be skewed. However, in less perfect cases (e.g., one mixture component is smaller than the other or one mixture component has smaller variance), the multi‐modality that arises will be characterized by skewness and our approach can be used to test for this effect, given that the effect size is large enough. Data S1. Application files. Click here for additional data file. Data S2. Building Mplus scripts. Click here for additional data file. Data S3. Simulation files. Click here for additional data file.

18 in total

1. Modelling non-normal data: The relationship between the skew-normal factor model and the quadratic factor model.

Authors: Iris A M Smits; Marieke E Timmerman; Alwin Stegeman
Journal: Br J Math Stat Psychol Date: 2015-11-14 Impact factor: 3.380

2. A Bivariate Generalized Linear Item Response Theory Modeling Framework to the Analysis of Responses and Response Times.

Authors: Dylan Molenaar; Francis Tuerlinckx; Han L J van der Maas
Journal: Multivariate Behav Res Date: 2015 Impact factor: 5.923