Literature DB >> 26648879

A Note on the Eigensystem of the Covariance Matrix of Dichotomous Guttman Items.

Clintin P Davis-Stober¹, Jean-Paul Doignon², Reinhard Suck³.

Abstract

We consider the covariance matrix for dichotomous Guttman items under a set of uniformity conditions, and obtain closed-form expressions for the eigenvalues and eigenvectors of the matrix. In particular, we describe the eigenvalues and eigenvectors of the matrix in terms of trigonometric functions of the number of items. Our results parallel those of Zwick (1987) for the correlation matrix under the same uniformity conditions. We provide an explanation for certain properties of principal components under Guttman scalability which have been first reported by Guttman (1950).

Entities: Chemical Disease Gene Species

Keywords: Guttman scale; Rasch model; dichotomous items; eigenvalues; eigenvectors; principal component analysis

Year: 2015 PMID： 26648879 PMCID： PMC4664651 DOI： 10.3389/fpsyg.2015.01767

Source DB: PubMed Journal: Front Psychol ISSN： 1664-1078

1. Introduction

Guttman scales form the conceptual foundation for modern Item Response Theory (IRT). For example, Guttman scales underlie the Rasch model (e.g., Andrich, 1985) as well as Mokken scales (e.g., van Schuur, 2003),—see Tenenhaus and Young (1985) and Lord and Novick (1968) for classic reviews and discussions of Guttman scaling. Under the auspices of understanding the principal component structure of unidimensional scales, Guttman (1950) derived several important properties relating to the correlation matrix of perfect dichotomous Guttman items. Later work by Zwick (1987) identified that the eigenvalues corresponding to this matrix can be written as simple functions of the number of items, under a set of uniformity conditions. In this brief note, we extend the results of Zwick (1987) by considering the covariance matrix of dichotomous Guttman items under these same uniformity conditions. We derive closed-form solutions for the eigenvalues and eigenvectors of this matrix, for any number of items. In particular, we provide expressions in terms of simple trigonometric functions of the number of items. These expressions lead to a simple explanation of the signing relationships among principal components for Guttman scales first described by Guttman (1950).

2. Main results

The core idea of a Guttman scale is that the set of items under consideration forms a unidimensional scale, i.e., if a person obtains a correct response to an item then this person would obtain a correct response to all “easier” items. Table 1 presents a matrix of response patterns conforming to a perfect Guttman scale for five items, with Item 5 being the most “difficult” and Item 1 being the “easiest.”

Table 1

An example of a perfect Guttman scale for five items.

Response pattern	Item 1	Item 2	Item 3	Item 4	Item 5
1	0	0	0	0	0
2	1	0	0	0	0
3	1	1	0	0	0
4	1	1	1	0	0
5	1	1	1	1	0
6	1	1	1	1	1

A value of 0 indicates an incorrect response to the item, a value of 1 indicates a correct response.

An example of a perfect Guttman scale for five items. A value of 0 indicates an incorrect response to the item, a value of 1 indicates a correct response. As in Zwick (1987), we consider the following two assumptions. First, we assume that all items are distinct, i.e., no two items produce identical responses for all possible response patterns. Second, we assume that the probability of obtaining each response pattern is , where n is the number of items, i.e., a uniform distribution over response patterns. This last assumption is rather strong, given that responses are typically modeled using a normal distribution. While we assume uniformly distributed response patterns primarily for mathematical tractability, we demonstrate later via simulations that our results approximate those obtained from a normal distribution under highly discriminating items that are equally spaced by difficulty. Under our assumptions, the covariance between any items i and j, with i ≤ j, is equal to the following: As one would expect, Equation (1) is closely related to the Pearson product-moment correlation, which, as described by Zwick (1987), is equal to: Parallel to Zwick (1987) and Guttman (1950), who handled the correlation matrix, we consider the n × n covariance matrix defined by Equation (1). We first provide the n distinct eigenvalues. Proposition 1. The covariance matrix σ (1), has its eigenvalues equal to (in decreasing order) The proof is in the Appendix. Note how i and n determine the period of the cosine term in the denominator of the right-hand side of Equation (3). From the same equation, the maximal eigenvalue for any fixed number of items n is equal to . Note that as n → ∞, . Also, the eigenvalues of the covariance matrix are very different from the eigenvalues of the Pearson product correlation matrix, which, as described by Zwick (1987), are equal to . The eigenvectors of the covariance matrix also have an elegant, closed-form expression. Proposition 2. For the covariance matrix σ (1), an eigenvector 1, 2, …, The proof is in the Appendix. Guttman (1950) derived a series of relationships on the eigenvector components of correlation matrices based on perfect, “error free” scales. Let sgn(x) be the sign function of the value x. Define a sign change of an eigenvector P as a value j such that sgn(P) ≠ sgn(P), j ∈ {1, 2, …, n}. As described in Guttman (1950), for n-many items there exists exactly one eigenvector with no sign changes, one eigenvector with a single sign change, one with two sign changes, and so on, with the eigenvector corresponding to the smallest eigenvalue having exactly n–1 sign changes. This symmetry can be seen in Table 2, which presents the eigenvectors in Equation (4) for n = 5. As made explicit by Equation (4), these sign changes result from the symmetry of the sine function as the values of i and m vary.

Table 2

This table presents the eigenvector components of the covariance matrix for .

P₁	P₂	P₃	P₄	P₅
12	32	1	32	12
32	32	0	-32	-32
1	0	–1	0	1
32	-32	0	32	-32
12	-32	1	-32	12

The eigenvectors are arrayed in descending order according to the corresponding eigenvalue, i.e., P.

This table presents the eigenvector components of the covariance matrix for . The eigenvectors are arrayed in descending order according to the corresponding eigenvalue, i.e., P.

3. Comparison to IRT data

In this section, we illustrate how our analytic results could be used to evaluate responses conforming to modern IRT models. We consider the well-known two parameter logistic (2PL) model, where the probability of a correct response to item i is defined as follows: where is the item discrimination parameter, b ∈ ℝ is the item difficulty parameter and θ ∈ ℝ is the person-specific ability parameter. From the perspective of the 2PL model, Guttman items are obtained by letting the a (item discrimination) parameter values become arbitrarily large (e.g., van Schuur, 2003), i.e., the probability of a test taker correctly answering an item given that their latent skill is higher (lower, resp.) than the item difficulty is 1 (0 resp.). Our results provide a new perspective on the item covariance and principal component structure of 2PL items under the idealized conditions of a Guttman scale. Indeed, one could consider the eigenvalues and eigenvectors in Equations (3–4) as an error-free ideal for such response data, under our assumption of a uniform distribution over response patterns. In the next section, we compare our results to simulated data that relax the assumption of a uniform distribution over response patterns. In the first simulation study, we compare our results to data generated from a Rasch model (Equation 5 with a = 1, i = 1, 2, …, n) where the person specific ability parameter, θ, is randomly drawn from a standard normal distribution. For the second simulation study, we consider a setup nearly identical to the first, with the exception that we consider large values of a for each item, i.e., high discrimination among items.

3.1. Simulation study 1

For this study, we considered six conditions comprised of: 4, 6, 8, 16, 32, and 64 test items. For each condition, the difficulty of the items, b, was equally spaced along the interval [−1, 1]. For each condition, we randomly sampled 5000 values of θ from a standard normal distribution (e.g., Anderson et al., 2007). We obtained simulated responses to the items by applying the sampled θ values, and item difficulties, b, to Equation (5), with a = 1 for all test items, i.e., a Rasch model. Thus, for each condition, we have 5000 simulated responses to the test items. For each condition, we computed the covariance matrix of the items using the 5000 simulated responses, i.e., we calculated the sample covariance of the 5000 responses. We then numerically calculated the eigenvalues of this covariance matrix. Figure 1 compares the eigenvalues obtained from the simulated data to the eigenvalues obtained from Equation (3), for each condition. It is interesting to note that the largest eigenvalue for the simulated data is always larger than the maximal eigenvalue obtained via Equation (3), this is similar to results obtained by Zwick (1987) within the context of the Guttman correlation matrix. In general, moving to a probabilistic response model (the Rasch model) and sampling the θ values from a normal distribution appears to yield covariance eigenvalues that greatly differ from those obtained in Equation (3). As we show in the next study, improving item discrimination will yield different results.

Figure 1

Each plot compares the eigenvalues obtained from Equation (3) to those obtained from simulated Rasch data under the assumption that θ ~ .

3.2. Simulation study 2

In this simulation study, we consider nearly identical conditions to the first, with the exception that the item discrimination parameters, a, are large in size, indicating excellent item discrimination. As in the previous study, we considered six conditions comprised of: 4, 6, 8, 16, 32, and 64 test items. For each condition, the difficulty of the items, b, was equally spaced along the interval [−1, 1]. As before, for each condition, we randomly sampled 5000 values of θ from a standard normal distribution. We obtained simulated responses to the items by applying the sampled θ values, and item difficulties, b, to Equation (5), with a = 3, i = 1, 2, …, n, indicating excellent item discrimination. As before, for each condition, we have 5000 simulated responses to the test items. For each condition, we computed the covariance matrix of the 5000 simulated responses and numerically calculated the eigenvalues of the generated covariance matrix for each condition. Figure 2 compares the eigenvalues from these simulated data to the eigenvalues obtained via Equation (3), for each condition. It is interesting to note that there is a much closer correspondence between the two sets of eigenvalues under these conditions. Further, this relationship becomes stronger as the number of equally spaced items increases, yielding nearly a perfect match to the maximal eigenvalue as the number of items reaches 32 and 64.

Figure 2

Each plot compares the eigenvalues obtained from Equation (3) to those obtained from simulated 2PL data with high item discrimination under the assumption that θ ~ .

Each plot compares the eigenvalues obtained from Equation (3) to those obtained from simulated 2PL data with high item discrimination under the assumption that θ ~ . This study illustrates that our analytic results, which are derived under the strong assumption of uniformly distributed response patterns, may be useful as an approximation even when the ability parameter is normally distributed. This approximation is best when the difficulty range of the items are within a single standard deviation of the mean and the items have excellent discriminability. As the range of the item difficulty increases and/or the variance of the ability parameter distribution shrinks, the approximation becomes much poorer. Our Matlab code for generating these graphs and exploring other configurations is available as an online supplement.

4. Conclusion

We derived closed-form solutions for the eigenvalues and eigenvectors of the covariance matrix of dichotomous Guttman items, under a uniform sampling assumption. We demonstrated that these eigenvalues and eigenvectors are simple trigonometric functions of the number of items, n. Our results parallel those of Zwick (1987), who examined the eigenvalues of the correlation matrix of dichotomous Guttman items under the same uniformity assumptions. It remains an open question whether the eigenvectors of the correlation matrix, as investigated by Zwick (1987), can also be solved for explicitly.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1 in total

1. Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

Authors: Karl Schweizer; Stefan Troche
Journal: Educ Psychol Meas Date: 2016-10-06 Impact factor: 2.821

1 in total