Literature DB >> 32210894

Bayesian Estimation of the DINA Model With Pólya-Gamma Gibbs Sampling.

Zhaoyuan Zhang¹, Jiwei Zhang², Jing Lu¹, Jian Tao¹.

Abstract

With the increasing demanding for precision of test feedback, cognitive diagnosis models have attracted more and more attention to fine classify students whether has mastered some skills. The purpose of this paper is to propose a highly effective Pólya-Gamma Gibbs sampling algorithm (Polson et al., 2013) based on auxiliary variables to estimate the deterministic inputs, noisy "and" gate model (DINA) model that have been widely used in cognitive diagnosis study. The new algorithm avoids the Metropolis-Hastings algorithm boring adjustment the turning parameters to achieve an appropriate acceptance probability. Four simulation studies are conducted and a detailed analysis of fraction subtraction data is carried out to further illustrate the proposed methodology.

Entities: Chemical Disease Gene

Keywords: Bayesian estimation; DINA model; Metropolis-Hastings algorithm; Pólya-Gamma Gibbs sampling algorithm; cognitive diagnosis models; potential scale reduction factor

Year: 2020 PMID： 32210894 PMCID： PMC7076190 DOI： 10.3389/fpsyg.2020.00384

Source DB: PubMed Journal: Front Psychol ISSN： 1664-1078

1. Introduction

Modeling the interaction between examinee's latent discrete skills (attributes) and items at the item level for binary response data, cognitive diagnosis models (CDMs) is an important methodology to evaluate whether the examinees have mastered multiple fine-grained skills, and these models have been widely used in a variety of the educational and psychological researches (Tatsuoka, 1984, 2002; Doignon and Falmagne, 1999; Maris, 1999; Junker and Sijtsma, 2001; de la Torre and Douglas, 2004; Templin and Henson, 2006; DiBello et al., 2007; Haberman and von Davier, 2007; de la Torre, 2009, 2011; Henson et al., 2009; von Davier, 2014; Chen et al., 2015). With the increasing complexity of the problems in cognitive psychology research, various specific and general formulations of CDMs have been proposed to deal with the practical problems. There are several specific CDMs, widely known among them, are the deterministic inputs, noisy “and” gate model (DINA; Junker and Sijtsma, 2001; de la Torre and Douglas, 2004; de la Torre, 2009), the noisy inputs, deterministic, “and” gate model (NIDA; Maris, 1999), the deterministic input, noisy “or” gate model (DINO; Templin and Henson, 2006) and the reduced reparameterized unified model (rRUM; Roussos et al., 2007). In parallel with the specific CDMs, the general CDMs have also made great progress, including the general diagnostic model (GDM; von Davier, 2005, 2008), the log-linear CDM (LCDM; Henson et al., 2009), and the generalized DINA (G-DINA; de la Torre, 2011). Parameter estimation has been a major concern in the application of CDMs. In fact, simultaneous estimations of items and examinee's latent discrete skills result in statistical complexities in the estimation task. Within a fully Bayesian framework, a novel and highly effective Pólya-Gamma Gibbs sampling algorithm (PGGSA; Polson et al., 2013) based on the auxiliary variables is proposed to estimate the commonly used DINA model in this paper. The PGGSA overcomes the disadvantages of Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970; Chib and Greenberg, 1995; Chen et al., 2000), which requires to repeatedly adjust the specification of tuning parameters to achieve a certain acceptance probability and thus increases the computational burden. More specifically, the Metropolis–Hasting algorithm depends on the variance (tuning parameter) of the proposal distribution and is sensitive to step size. If the step size is too small, the chain will take longer to traverse the target density. If the step size is too large, there will be inefficiencies due to a high rejection rate. In addition, the Metropolis-Hastings algorithm is relatively difficulty to sample parameters with monotonicity or truncated interval restrictions. Instead, it can improve the accuracy of parameter estimation by employing strong informative prior distributions to avoid violating the restriction conditions (Culpepper, 2016). The rest of this paper is organized as follows. Section 2 contains a short introductions of DINA model, its reparameterized form, and model identifications. A detailed implementation of PGGSA is shown in section 3. In section 4, four simulations focus on the performance of parameter recovery for the PGGSA, the results of comparing with the Metropolis-Hastings algorithm, the analysis of sensitivity of prior distributions for the PGGSA, the results of comparing with Culpepper (2015)'s Gibbs algorithm on the attribute classification accuracy and the estimation accuracy of class membership probability parameters. In addition, the quality of PGGSA is investigated using a fraction subtraction test data in section 5. We conclude the article with a brief discussion in section 6.

2. Models and Model Identifications

The DINA model focuses on whether the examinee i has mastered the k attribute, where i = 1, …, N, k = 1, …, K. Let α be a dichotomous latent attribute variable with values of 0 or 1 indicating absence or presence of a attribute, respectively. is a vector of K dimensional latent attributes for the ith examinee. Given the categorical nature of the latent classes, belongs to one of C = 2 attribute latent classes. If the ith examinee belongs to the cth classification, the attribute vector can be expressed as Considering a test consisting of J items, each item j is associated with a vector of K dimensional item attributes, , where Therefore, a matrix, {q}, can be obtained by the J item attribute vectors. The DINA model is conjunctive. That is, the examinee i must possess all the required attributes to answer the item j correctly. The ideal response pattern η can be defined as follows , where I(·) denotes the indicator function. The parameters for a correct response to item j when given η are denoted by s and g. The slipping parameter s and the guessing parameter g refer to the probability of incorrectly answering the item when η = 1 and the probability of correctly guessing the answer when η = 0, respectively. Let Y denote the observed item response for the ith examinee to response jth item, Y = 1 if the ith examinee correct answer the jth item, 0 otherwise. The parameters s and g are formally defined by The probabilities of observing response given attributes are represented by and.

2.1. The Reparameterized DINA Model

To describe the relationship between the attribute vector and the observed response, we can reexpress the DINA model as follow: where the model discrimination index can be defined as 1 − s − g = IDI (de la Torre, 2008). Based on the traditional DINA model, we reparameterize s and g from the probability scale to the logit scale (Henson et al., 2009; DeCarlo, 2011; von Davier, 2014; Zhan et al., 2017). That is, where logit(x) = log(x/(1 − x)). Therefore, the reparameterized DINA model (DeCarlo, 2011) can be written as where ς and β are the item intercept and interaction parameters, respectively.

2.2. The Likelihood Function of the Reparameterized DINA Model Based on the Latent Class

Suppose that the vector of item responses for the ith examinee can be denoted as Let the vector of intercept and interaction parameters for J items be and , where (ς1, …, ς) and (β1, …, β). Given the categorical nature of the latent classes, belongs to one of C = 2 attribute latent classes. For the ith examinee belonging to the cth classification, the attribute vector is expressed as According the Equation (4), the probability of observing that the ith examinee belonging to the cth latent class answers J items can be written as where = denotes the examinee i belongs to the cth latent class. p(Y = 1|, ς, β) is the probability that the examinee i in class c correctly answers the item j. Let π = p() be the probability of examinees for each class c, c = 1, …, C, and is C dimensional vector of class membership probabilities, where . Therefore, the probability of observing given item parameters , and class membership probabilities can be written as The likelihood function based on the latent class can be written as

2.3. Model Identification

The model identification is an important cornerstone for estimating parameters and practical applications. Chen et al. (2015); Xu and Zhang (2016), and Xu (2017) discuss the DINA model identification conditions. Gu and Xu (2019) further provide a set of sufficient and necessary conditions for the identifiability of the DINA model. That is, Condition 1: (1) The Q-matrix is complete under the DINA model and without loss of generality, we assume the Q-matrix takes the following form: where is the K × K identify matrix and Q* is a (J − K) × K submarix of Q. (2) Each of the K attributes is required by at least three items. Condition 2: Any two different columns of the submatrix Q* in (8) are distinct. Under the above two conditions, Gu and Xu (2019) give the following identifiability result. Theorem (Sufficient and Necessary Condition) Conditions 1 and 2 are sufficient and necessary for the identifiability of all the DINA model parameters.

3. Pólya-Gamma Gibbs Sampling Algorithm

Polson et al. (2013) propose a new data augmentation strategy for fully Bayesian inference in logistic regression. The data augmentation approach appeals to a new class of Pólya-Gamma distribution rather than Albert and Chib (1993)'s data augmentation algorithm based on a truncated normal distribution. Next, we introduce the Pólya-Gamma distribution. Definition: Let is a iid random variable sequences from a Gamma distribution with parameters λ and 1. That is, T ~ Gamma (λ, 1). A random variable W follows a Pólya-Gamma distribution with parameters λ > 0 and τ ∈ R, denoted W ~ PG(λ, τ), if where denotes equality in distribution. In fact, the Pólya-Gamma distribution is an infinite mixture of gamma distributions which provide the plausibility to sample from Gamma distributions. Based on Polson et al. (2013, p. 1341, Equation 7)'s Theorem 1, the likelihood contribution of the ith examinee to answer the jth item can be expressed as where p(W | 1, 0) is the conditional density of W. That is, W ~ PG(1, 0). The auxiliary variable W follows a Pólya-Gamma distribution with parameters (1, 0). Biane et al. (2001) provide proofs of Equation (10). In addition, Polson et al. (2013) further discuss Equation (10). Therefore, the full conditional distribution of , , given the auxiliary variables W can be written as where p(ς), p(β), and p() are the prior distributions, respectively. The joint posterior distribution based on the latent classes is given by where p(ς), p(β), and p(π) are the prior distributions, respectively. Step 1: Sampling the auxiliary variable W, given the item intercept and interaction parameters ς, β and = . According to Equation (10), the full conditional posterior distribution of the random auxiliary variable W is given by According to Biane et al. (2001) and Polson et al. (2013; p. 1341), the density function p(W|1, 0) can be written as Therefore, f(W | ς, β, = ) is proportional to Finally, the specific form of the full conditional distribution of W is as follows Next, the Gibbs samplers are used to draw the item parameters. Step 2: Sampling the intercept parameter ς for each item j. The prior distribution of ς is assumed to follow a normal distribution, that is, . Given , , , and , the fully condition posterior distribution of ς is given by where f(W | ς, β, = ) is equal to the following equation (the details see Polson et al., 2013; p. 1341) After rearrangement, the full conditional posterior distribution of ς can be written as follows Therefore, the fully condition posterior distribution of ς follow normal distribution with mean and variance Step 3: Sampling the interaction parameter β for each item j. The prior distribution of β is assumed to follow a truncated normal distribution to satisfy the model identification restriction (Junker and Sijtsma, 2001; Henson et al., 2009; DeCarlo, 2012; Culpepper, 2015). That is, Similarly, given , , , and , the full condition posterior distribution of β is given by Therefore, the fully condition posterior distribution of ς follow the truncated normal distribution with mean and variance Step 4: Sampling the attribute vector for each examinee i. Given , , , and , we can update the ith examinee's attribute vector from the following multinomial distribution where the probability that the attribute vector belongs to the cth(c = 1, …, C) class can be written as Step 5: Sampling the class membership probabilities . The prior of is assumed to follow a Dirichlet distribution. I.e., (π1, …, π) ~Dirichlet(δ0, …, δ0). The full condition posterior distribution of the class membership probabilities can be written as

4. Simulation Study

4.1. Simulation 1

4.1.1. Simulation Design

In this simulation study, the purpose is to assess the performance of the Pólya-Gamma Gibbs sampling algorithm. Considering the test length is J = 30, and the number of the attribute is set equal to K = 5. The Q-matrix is shown in Table 1, where the design of Q-matrix satisfies Gu and Xu (2019)'s DINA model identification conditions. For the true values of the class membership probabilities, we only consider the most general case that the class membership probabilities are flat though all class, i.e., c = 1, …, C, where C = 2. Next, two factors and their varied test conditions are simulated. (a) two sample sizes (N = 1000, 2000) are considered; (b) Following Huebner and Wang (2011) and Culpepper (2015), four noise levels are considered to explore the relationship between noise level and recovery by constraining the true values of the item parameters. For each item, (b1) low noise level (LNL) case: s = g = 0.1; the corresponding true values of reparameterized parameters are ζ = −2.1972, β = 4.3945; (b2) high noise level (HNL) case : s = g = 0.2; the corresponding true values of reparameterized parameters are ζ = −1.3863, β = 2.7726; (b3) slipping higher than guessing (SHG) case: s = 0.2, g = 0.1; the corresponding true values of reparameterized parameters are ζ = −2.1972, β = 3.5835; (b4) guessing higher than slipping (GHS) case: s = 0.1, g = 0.2; the corresponding true values of reparameterized parameters are ζ = −1.3863, β = 3.5835. Fully crossing the different levels of these two factors yield 8 conditions.

Table 1

The Q matrix design in the simulation study 1.

	Attribute Q(matrix)						Attribute Q(matrix)
Item	α₁	α₂	α₃	α₄	α₅	Item	α₁	α₂	α₃	α₄	α₅
1	1	0	0	0	0	16	0	1	0	1	0
2	0	1	0	0	0	17	0	1	0	0	1
3	0	0	1	0	0	18	0	0	1	1	0
4	0	0	0	1	0	19	0	0	1	0	1
5	0	0	0	0	1	20	0	0	0	1	1
6	1	0	0	0	0	21	1	1	1	0	0
7	0	1	0	0	0	22	1	1	0	1	0
8	0	0	1	0	0	23	1	1	0	0	1
9	0	0	0	1	0	24	1	0	1	1	0
10	0	0	0	0	1	25	1	0	1	0	1
11	1	1	0	0	0	26	1	0	0	1	1
12	1	0	1	0	0	27	0	1	1	1	0
13	1	0	0	1	0	28	0	1	1	0	1
14	1	0	0	0	1	29	0	1	0	1	1
15	0	1	1	0	0	30	0	0	0	1	1

The Q matrix design in the simulation study 1.

4.1.2. Priors

Based on the four noise levels, the corresponding four kinds of non-informative prior are used. I.e., where the purpose of using non-informative priors is to eliminate the influence of prior uncertainty on posterior inferences. Similarly, the non-informative Dirichlet prior distribution is employed for the class membership probabilities . I.e., (π1, …, π) ~ Dirichlet (1, …, 1).

4.1.3. Convergence Diagnostics

As an illustration of the convergence of parameter estimates, we only consider the low noise level (LNL) case and the number of examinees is 1,000. Two methods are used to check the convergence of parameter estimates. One is the “eyeball” method to monitor the convergence by visually inspecting the history plots of the generated sequences (Hung and Wang, 2012; Zhan et al., 2017), and another method is to use the Gelman-Rubin method (Gelman and Rubin, 1992; Brooks and Gelman, 1998) to check the convergence of parameter estimates. To implement the MCMC sampling algorithm, chains of length 20,000 with an initial burn-in period 10,000 are chosen. Four chains started at overdispersed starting values are run for each replication. The trace plots of Markov Chains for three randomly selected items and class membership probabilities are shown in Figure 1. In addition, the potential scale reduction factor (PSRF; Brooks and Gelman, 1998) values of all parameters are <1.1, which ensures that all chains converge as expected. The trace plots of PSRF values are shown in the simulation 2.

Figure 1

The trace plots of the arbitrarily selected item and class membership probability parameters.

4.1.4. Evaluation Criteria for Convergence and Accuracy of Parameter Estimations

The accuracy of the parameter estimates is measured by two evaluation criteria, i.e., Bias and Mean Squared Error (MSE). Let η be the interested parameter. Assume that M = 25 data sets are generated. Also, let be the posterior mean obtained from the mth simulated data set for m = 1, …, M. The Bias for parameter is defined as and the MSE for parameter is defined as For illustration purposes, we only show the Bias and MSE of , , and for the four noise levels based on 1,000 sample sizes in Figures 2, 3. In the four noise levels, the Bias of , , and are near the zero values. However, the MSE of and increase as the number of attributes required by the item increases. In the low noise level, the performances of the recovery for and are well-based on the results of MSE, and the MSE of and are <0.0250. The performances for the high noise level are worst in the four diagnosticity cases. Moreover, we find that when the item tests a attribute, the MSE of is not much different from that of . However, the MSE of is greater than that of when the item requires multiple attributes. The reason is due to a fact that the number of examinees for η = 1 is almost equal to that of η = 0 when the item tests a attribute, which is accurate for estimating the and . Along with the increase in the attributes required by the item, the number of examinees for η = 1 reduces and the number of examinees for η = 0 increases, thus resulting in the MSE of higher than that of . Note that the MSE of is dependent on the number of examinees for η = 1.

Figure 2

Figure 3

The MSE of intercept, interaction and class membership probability parameters under four different diagnosticity cases. The Q Matrix denotes the skills required for each item along the x axis, where the black square = “1” and white square = “0.” The α denotes the examinee who belongs to the cth latent class whether has mastered kth skill, where the black square = “1” for the presence of a skill and white square = “0” for the absence of a skill, . Note the MSE values are estimated from 25 replications.

The Bias of intercept, interaction and the latent class parameters under four different noise levels. The Q Matrix denotes the skills required for each item along the x axis, where the black square = “1” and white square = “0.” The α denotes the examinee who belongs to the cth latent class whether has mastered kth skill, where the black square = “1” for the presence of a skill and white square = “0” for the absence of a skill, . Note the Bias values are estimated from 25 replications. The MSE of intercept, interaction and class membership probability parameters under four different diagnosticity cases. The Q Matrix denotes the skills required for each item along the x axis, where the black square = “1” and white square = “0.” The α denotes the examinee who belongs to the cth latent class whether has mastered kth skill, where the black square = “1” for the presence of a skill and white square = “0” for the absence of a skill, . Note the MSE values are estimated from 25 replications. The average Bias and MSE for , , and based on eight different simulation conditions are shown in Table 2. The following conclusions can be obtained. (1) Given a noise level, when the number of examinees increases from 1,000 to 2,000, the average MSE for and show a decreasing trend. More specifically, when the number of examinees increases from 1,000 to 2,000, in the case of low noise level (LNL), the average MSE of decreases from 0.048 to 0.034, the average MSE of decreases from 0.0141 to 0.0107. In the case of high noise level (HNL), the average MSE of decreases from 0.0163 to 0.0117, the average MSE of decreases from 0.0254 to 0.0239. In the case of the slipping higher than the guessing (SHG), the average MSE of decreases from 0.0139 to 0.0078, the average MSE of decreases from 0.0172 to 0.0159. In the case of the guessing higher than the slipping (GHS), the average MSE of decreases from 0.0088 to 0.0041, the average MSE of decreases from 0.0198 to 0.0181. (2) Given a noise level, when the number of examinees increases from 1,000 to 2,000, In the case of four kinds of noises, the average MSE of are basically the same and close to 0 under the conditions of four noise levels. (3) Compared with the other three noise level, the average MSE of and are largest at high noise level. In summary, the Bayesian algorithm provides accurate estimates for , , and in term of various numbers of examinees.

Table 2

The average Bias and MSE for , , and .

	Number of examinees 1,000
	LNL (b1)	HNL (b2)	SHG (b3)	GHS (b4)
BIAS
ς	0.0023	0.0046	0.0042	−0.0039
β	−0.1077	−0.1016	−0.0235	−0.0248
π	−0.0000	−0.0000	−0.0000	−0.0000
MSE
ς	0.0048	0.0163	0.0139	0.0088
β	0.0141	0.0254	0.0172	0.0198
π	0.0000	0.0000	0.0000	0.0000
	Number of examinees 2,000
BIAS
ς	0.0089	0.0089	0.0023	−0.0020
β	−0.0890	0.0588	−0.0003	−0.0041
π	−0.0000	0.0000	−0.0000	0.0000
MSE
ς	0.004	0.0117	0.0078	0.0041
β	0.0107	0.0239	0.0159	0.0181
π	0.0000	0.0000	0.0000	0.0000

Note that the Bias and MSE denote the average Bias and MSE for the parameters. .

The average Bias and MSE for , , and . Note that the Bias and MSE denote the average Bias and MSE for the parameters. .

4.2. Simulation 2

In this simulation study, we compare MH algorithm and PGGSA from two aspects: the accuracy and convergence. We consider 1,000 examinees to answer 30 items, and the number of the attribute is set equal to K = 5. The true values of ζ and β are set equal to −2.1972 and 4.3945 for each item. The corresponding true values of s and g are equal to 0.1 for each item. The class membership probabilities are flat though all classes, i.e., c = 1, …, C, where C = 2. We specify the following non-informative priors to the PGGSA and MH algorithm: and (π1, …, π) ~ Dirichlet(1, …, 1). It is known that an improper proposal distribution for MH algorithm can seriously reduce the acceptance probability of sampling. Most of the posterior samples are rejected. Therefore, the low sampling efficiency is usually unavoidable, and the reduction in the number of valid samples may lead to incorrect inference results. In contrast, our PGGSA takes the acceptance probability as 1 to draw the samples from fully condition posterior distributions. The following proposal distributions for the intercept and interaction parameters are considered in the process of implementing MH algorithm. The sampling details of MH algorithm, see Appendix. Note that the class membership probabilities are updated through the same way for the PGGSA and MH algorithms. Case 1: Case 2: To compare the convergence of all parameters for the PGGSA and MH algorithm with different proposal distributions, the convergence of item and class membership probability parameters are evaluated by judging whether the values of PSRF are <1.1. From Figure 4, we find that the intercept, interaction and class membership probability parameters have already converged at the 5,000 step iterations for the PGGSA. The fastest convergence is the class membership probability parameters followed by intercept parameters. For the MH algorithm, some parameters do not converge after 5,000 step iterations for the proposal distributions with the variances of 0.1. The convergence of the proposal distributions with the variances of 1 is worse than the convergence of the proposal distributions with the variances of 0.1, even some parameters do not reach convergence at the end of the 10,000 step iterations. Moreover, the Bias and MSE are used to evaluate the performances of the two algorithms in Table 3. It has been proved that the selection of the proposal distribution has an important influence on the accuracy of parameter estimation. The process of finding the proper turning parameter is time consuming. In addition, we investigate the efficiency of the two algorithms from the perspective of the time consumed by implementing them. On a desktop computer [Intel(R) Xeon(R) E5-2695 V2 CPU] with 2.4 GHz dual core processor and 192 GB of RAM memory, PGGSA and MH algorithm, respectively consume 3.6497 and 4.7456 h when Markov chain are run for 20,000 iterations for a replication experiment, where MH algorithm is used to implement the Case 1. In summary, PGGSA is more effective than MH algorithm in estimating model parameters.

Figure 4

The trace plots of PSRF values for the simulation study 2.

Table 3

Evaluating accuracy of parameter estimation using the two algorithms in the simulation study 2.

	PGGSA		MH algorithm under Case 1		MH algorithm under Case 2
	Bias	MSE	Bias	MSE	Bias	MSE
ς	0.0023	0.0048	0.0016	0.0069	0.0021	0.0081
β	−0.1077	0.0141	−0.1042	0.0152	−0.1087	0.0174
π	−0.0000	0.0000	−0.0007	0.0005	−0.0004	0.0011

Note that the Bias and denote the average Bias and MSE for the parameters. .

The trace plots of PSRF values for the simulation study 2. Evaluating accuracy of parameter estimation using the two algorithms in the simulation study 2. Note that the Bias and denote the average Bias and MSE for the parameters. .

4.3. Simulation 3

This simulation study is to show that PGGSA is sufficiently flexible to recover various prior distributions for the item and class membership probability parameters. The simulation design is as follows: The number of the examinees is N = 1, 000, and the test length is J = 30, and the number of the attributes is set equal to K = 5. The true values of item intercept and interaction parameters are −2.1972 and 4.3945 for each item at low noise level. The class membership probabilities are flat though all classes, i.e., c = 1, …, C, where C = 2. The non-informative Dirichlet prior distribution is employed for the class membership probabilities . I.e., (π1, …, π) ~ Dirichlet (1, …, 1), and two kinds of prior distributions are considered for the intercept and interaction parameters: Informative prior: Type I: ζ ~ N(−2.1972, 0.5), β ~ N(4.3945, 0.5)I(β > 0); Type II: ζ ~ N(−2.1972, 1), β ~ N(4.3945, 1) I (β > 0); Non-informative prior: Type III: Type IV: PGGSA is iterated 20,000 times. The first 10,000 iterations are discarded as burn-in period. 25 replications are considered in this simulation study. The PSRF values of all parameters for each simulation condition are <1.1. The Bias and MSE of the , , and based on two kinds of prior distributions are shown in Table 4.

Table 4

Evaluating the accuracy of parameters based on different prior distributions in the simulation study 3.

Type of prior	Evaluation index	ς	β	π
Type I	Bias	0.0024	−0.1044	−0.0000
	MSE	0.0047	0.0134	0.0000
Type II	Bias	0.0026	−0.1059	−0.0000
	MSE	0.0047	0.0138	0.0000
Type III	Bias	0.0022	−0.1068	−0.0000
	MSE	0.0048	0.0140	0.0000
Type IV	Bias	0.0023	−0.1077	−0.0000
	MSE	0.0048	0.0141	0.0000

Note that the Bias and denote the average Bias and MSE for the parameters. .

Evaluating the accuracy of parameters based on different prior distributions in the simulation study 3. Note that the Bias and denote the average Bias and MSE for the parameters. .

4.3.1. Result Analysis

From Table 4, we find that the Bias and MSE of , and are almost the same under different prior distributions. More specifically, the Bias of ranges from 0.0022 to 0.026, ranges from −0.1077 to −0.1044, and the Bias of under the two kinds of prior distributions is equal to −0.0000. In addition, the MSE of ranges from 0.0047 to 0.0048, ranges from 0.0134 to 0.0141, and the MSE of under the two kinds of prior distributions is equal to −0.0000. This shows that the accuracy of parameter estimation can be guaranteed by PGGSA, no matter what the informative prior or non-informative distributions are chosen.

4.4. Simulation 4

The main purpose of this simulation study is to compare PGGSA and Culpepper (2015)'s Gibbs sampling algorithm (Geman and Geman, 1984; Tanner and Wong, 1987; Gelfand and Smith, 1990; Albert, 1992; Damien et al., 1999; Béguin and Glas, 2001; Sahu, 2002; Bishop, 2006; Fox, 2010; Chen et al., 2018; Lu et al., 2018) on the attribute classification accuracy and the estimation accuracy of class membership probability parameter (). The number of the examinees is N = 1, 000. Considering the test length is J = 30, and the number of the attribute is set equal to K = 5. The Q-matrix is shown in Table 1. Four noise levels are considered in this simulation, i.e., LNL, HNL, SHG, and GHS. The true values of item parameters under the four noise levels, see the simulation study 1. For the true values of the class membership probabilities, we only consider the most general case that the class membership probabilities are flat though all classes, i.e., c = 1, …, C, where C = 2. For the prior distributions of the two algorithms, we use the non-informative prior distributions to eliminate the influence of the prior distributions on the posterior inference. The non-informative Dirichlet prior distribution is employed for the class membership probabilities . I.e., (π1, …, π) ~ Dirichlet (1, …, 1), and the non-informative prior distributions of item parameters under the two algorithms based on the four noise levels are set as follows (LNL case): PGGSA: v.s. Gibbs algorithm: s ~ Beta(1, 1), g ~ Beta(1, 1) I (g < 1 − s); (HNL case): PGGSA: v.s. Gibbs algorithm: s ~ Beta(1, 1), g ~ Beta(1, 1) I (g < 1 − s); (SHG case): PGGSA: v.s. Gibbs algorithm: s ~ Beta(1, 1), g ~ Beta(1, 1) I (g < 1 − s); (GHS case): PGGSA: v.s. Gibbs algorithm: s ~ Beta(1, 1), g ~ Beta(1, 1) I (g < 1 − s). PGGSA and Gibbs algorithm are iterated 20,000 times. The first 10,000 iterations are discarded as burn-in period for the two algorithms. Twenty-five replications are considered for the two algorithms in this simulation study. The PSRF values of all parameters for each simulation condition are <1.1. Culpepper's the R “dina” package is used to implement the Gibbs sampling. The correct pattern classification rate (CPCR), the average attribute match rate (AAMR) are used as the evaluation criteria to evaluate the attributes. These statistics are defined as where represents examinee i′s estimated attribute patterns. Next, the evaluation results of the accuracy of the two algorithms for attribute patterns and class membership probability parameters are shown in Table 5.

Table 5

Evaluating accuracy of attribute and class membership probability parameter estimations using PGGSA and Gibbs algorithm in the simulation study 4.

		Attribute(α)		CMP(π)
Noise level	Algorithm	CPCR	AAMA	Bias	MSE
LNL	PGGSA	0.8740	0.9693	−0.0000	0.0000
	Gibbs	0.8722	0.9688	−0.0000	0.0000
HNL	PGGSA	0.5643	0.8696	−0.0000	0.0000
	Gibbs	0.5697	0.8718	−0.0000	0.0000
SHG	PGGSA	0.7480	0.9336	−0.0000	0.0000
	Gibbs	0.7429	0.9308	−0.0000	0.0000
GHS	PGGSA	0.8436	0.9310	−0.0000	0.0000
	Gibbs	0.8484	0.9338	−0.0000	0.0000

Note that the CMP denotes the class membership probability. Bias and MSE denote the average Bias and MSE for the class membership probability parameters.

Evaluating accuracy of attribute and class membership probability parameter estimations using PGGSA and Gibbs algorithm in the simulation study 4. Note that the CMP denotes the class membership probability. Bias and MSE denote the average Bias and MSE for the class membership probability parameters. In Table 5, we find that the results of the attributes classification accuracy (CPCR and AAMA criteria) are basically the same for PGGSA and Gibbs algorithm under four kinds of noise levels. More specifically, the values of CPCR and AAMA for two algorithms under the HNL case are lowest. At the LNL case, the values of CPCR and AAMA for two algorithms are the highest. In addition, the CPCR value for the SHG case is lower than the CPCR value for the GHS, while the corresponding AAMA values are basically the same for the SHG case and GHS case. This indicates that slipping parameters () have important influence on the CPCR. In term of the two algorithms, the Bias and MSE of the classification membership parameters () are basically the same and close to zero under the four noise levels.

5. Empirical Example

In this example, a fraction subtraction test data is analyzed based on Tatsuoka (1990), Tatsuoka (2002), and de la Torre and Douglas (2004). The middle school students of 2,144 take part in this test to response 15 fraction subtraction items, where five attributes are measured, including subtract basic fractions, reduce and simplify, separate whole from fraction, borrow from whole, and convert whole to fraction. We choose 536 of 2,144 students in this study. These students are divided into 25 latent classes based on the five attributes. The reparameterized DINA model is used to analyze the cognitive response data. The priors of parameters are also the same as the simulation 1. I.e., the non-informative priors are used in this empirical example analysis. To implement PGGSA, chains of length 20,000 with an initial burn-in period 10,000 are chosen. The PSRF is used to evaluate the convergence of each parameters. The trace plots of PSRF values for all parameters is shown in Figure 5. We find that the values of PSRF are <1.1.

Figure 5

The trace plots of PSRF values for the real data.

The trace plots of PSRF values for the real data. The Q matrix, the expected a posteriori (EAP) estimators of the item parameters, the corresponding standard deviation (SD), and 95% highest posterior density intervals (HPDIs) of these item parameters are shown in Table 6. Based on the Table 6, we transform intercept and interaction parameters into traditional slipping and guessing parameters to analyze item characteristics. We find that the expected a posteriori (EAP) estimations of the five items with the lowest slipping are item 3, item 8, item 9, item 10, and item 11 in turn. The EAP estimations of slipping parameters for the five items are 0.0461, 0.0585, 0.0708, 0.0754, and 0.1039. This shows that these items are not easy to slipping compared with the other ten items. In addition, the EAP estimations of five items with the highest guessing are item 3, item 2, item 8, item 10, and item 11 in turn. The EAP estimations of guessing parameters for the five items are 0.2271, 0.2143, 0.2069, 0.1790, and 0.1441. Furthermore, we find that items 3, 8, 10, and 11 have low slipping parameters and high guessing parameters, which indicates that these items are more likely to be guessed correctly.

Table 6

The Q matrix design and MCMC estimations of and .

	Attribute(QMatrix)					ς^			β^
Item	α₁	α₂	α₃	α₄	α₅	EAP	SD	HPDI	EAP	SD	HPDI
1	1	0	0	0	0	−2.3274	0.0277	[−2.4998, −1.9766]	3.3884	0.0662	[2.8484, 3.8721]
2	1	1	1	1	0	−1.2990	0.0225	[−1.5639, −1.0087]	3.4200	0.0947	[2.8714, 4.0615]
3	1	0	0	0	0	−1.2247	0.0276	[−1.5357, −1.0000]	4.2999	0.0294	[3.9575, 4.4999]
4	1	1	1	1	1	−1.8944	0.0358	[−2.2841, −1.5472]	3.8815	0.1217	[3.2857, 4.4977]
5	0	0	1	0	0	−1.7971	0.1042	[−2.4667, −1.2948]	2.9899	0.1145	[2.5007, 3.6131]
6	1	1	1	1	0	−2.3961	0.0113	[−2.4999, −2.1653]	3.7058	0.0817	[3.1377, 4.2461]
7	1	1	1	1	0	−2.1109	0.0322	[−2.4999, −1.8117]	4.3549	0.0223	[4.0401, 4.4998]
8	1	1	0	0	0	−1.3433	0.0409	[−1.7158, −1.0005]	4.1817	0.0558	[3.7427, 4.4999]
9	1	0	1	0	0	−1.6266	0.0566	[−2.0725, −1.1512]	4.2735	0.0384	[3.8794, 4.4998]
10	1	0	1	1	1	−1.5226	0.0246	[−1.8180, −1.2110]	4.1072	0.0796	[3.5678, 4.4999]
11	1	0	1	0	0	−1.7813	0.0681	[−2.3048, −1.2903]	4.0454	0.0884	[3.5121, 4.4999]
12	1	0	1	1	0	−2.3802	0.0119	[−2.4998, −2.1534]	4.2212	0.0481	[3.7945, 4.4994]
13	1	1	1	1	0	−1.8221	0.0399	[−2.2142, −1.4328]	3.5878	0.1009	[2.9818, 4.1937]
14	1	1	1	1	1	−2.4279	0.0058	[−2.4999, −2.2647]	3.8646	0.0982	[3.3310, 4.4741]
15	1	1	1	1	0	−2.4298	0.0060	[−2.4999, −2.2551]	4.0033	0.0765	[3.5339, 4.4946]

Note that α.

The Q matrix design and MCMC estimations of and . Note that α. The EAP estimations of the class membership probabilities, , c = 1, …, 32, and the corresponding SD and 95% HPDI are reported in Table 7. The top five classes that the majority of examinees are classified into these classes are respectively “11111,”“11101,”“11100,”“11110,” and “00010.”The estimation results show that 34.142% of the examinees have mastered all the five skills, and 11.119% of the examinees have mastered the four skills except the skill of borrow from whole, and the examinees who only have mastered the three skills of subtract basic fractions, reduce and simplify, separate whole from fraction account for 10.146%, and 9.680% of the examinees have mastered the four skills except the skill of convert whole to fraction, and the examinees who only have mastered a skill of skill of borrow from whole account for 2.001%. In addition, among the thirty-two classes, the class with the lowest number of the examinees is 0.429%. I.e., when the examinees have mastered the skills of subtract basic fractions, separate whole from fraction, borrow from whole, and convert whole to fraction, the proportion of examinees who do not master the skill of reduce and simplify is very low. According to the 1.743% and 0.429%, we find that the skill of reduce and simplify is easier to master than the other four skills.

Table 7

The posterior probability distribution of the latent class parameters for the Fraction Subtraction Test.

Latent classes					π^
α₁	α₂	α₃	α₄	α₅	EAP	SD	HPDI
0	0	0	0	0	1.909%	0.0003	[0.0000, 0.0542]
1	0	0	0	0	0.766%	0.0000	[0.0000, 0.0208]
0	1	0	0	0	1.743%	0.0002	[0.0000, 0.0504]
0	0	1	0	0	1.299%	0.0001	[0.0000, 0.0367]
0	0	0	1	0	2.001%	0.0002	[0.0000, 0.0533]
0	0	0	0	1	1.790%	0.0002	[0.0000, 0.0540]
1	1	0	0	0	0.677%	0.0000	[0.0000, 0.0190]
1	0	1	0	0	1.898%	0.0001	[0.0000, 0.0443]
1	0	0	1	0	0.756%	0.0000	[0.0000, 0.0203]
1	0	0	0	1	0.822%	0.0000	[0.0000, 0.0222]
0	1	1	0	0	1.162%	0.0001	[0.0000, 0.0339]
0	1	0	1	0	1.808%	0.0002	[0.0000, 0.0507]
0	1	0	0	1	1.943%	0.0003	[0.0000, 0.0567]
0	0	1	1	0	1.242%	0.0001	[0.0000, 0.0330]
0	0	1	0	1	1.165%	0.0001	[0.0000, 0.0328]
0	0	0	1	1	1.778%	0.0002	[0.0000, 0.0486]
1	1	1	0	0	10.146%	0.0039	[0.0002, 0.2029]
1	1	0	1	0	0.709%	0.0000	[0.0000, 0.0198]
1	1	0	0	1	0.764%	0.0000	[0.0000, 0.0205]
1	0	1	1	0	0.546%	0.0000	[0.0000, 0.0140]
1	0	1	0	1	1.782%	0.0001	[0.0000, 0.0419]
1	0	0	1	1	0.751%	0.0000	[0.0000, 0.0201]
0	1	1	1	0	1.326%	0.0001	[0.0000, 0.0370]
0	1	1	0	1	1.181%	0.0001	[0.0000, 0.0357]
0	1	0	1	1	1.675%	0.0002	[0.0000, 0.0473]
0	0	1	1	1	1.167%	0.0001	[0.0000, 0.0335]
1	1	1	1	0	9.680%	0.0002	[0.0667, 0.1264]
1	1	1	0	1	11.119%	0.0038	[0.0001, 0.2078]
1	1	0	1	1	0.688%	0.0000	[0.0000, 0.0195]
1	0	1	1	1	0.429%	0.0000	[0.0000, 0.0119]
0	1	1	1	1	1.119%	0.0001	[0.0000, 0.0320]
1	1	1	1	1	34.142%	0.0004	[0.2998, 0.3844]

Note that α.

The posterior probability distribution of the latent class parameters for the Fraction Subtraction Test. Note that α.

6. Conclusion

In this paper, a novel and effective PGGSA based on auxiliary variables is proposed to estimate the widely applied DINA model. PGGSA overcomes the disadvantages of MH algorithm, which requires to repeatedly adjust the specification of tuning parameters to achieve a certain acceptance probability and thus increases the computational burden. However, the computational burden of the PGGSA becomes intensive especially as the CDMs become more complex, when a large number of examinees or the items is considered, or a large number of the MCMC sample size is used. Therefore, it is desirable to develop a standing-alone R package associated with C++ or Fortran software for more extensive CDMs and large-scale cognitive assessment tests. In addition, Pólya-Gamma Gibbs sampling algorithm can be used to estimate many cognitive diagnosis models, which is not limited to the DINA model. These cognitive diagnostic models include DINO (Templin and Henson, 2006), Compensatory RUM (Hartz, 2002; Henson et al., 2009), and log-linear CDM (LCDM; von Davier, 2005; Henson et al., 2009) and so on. More specifically, first of all, the parameters of these cognitive diagnosis models are reparameterized, and then the logit link function is used to link these parameters with the response. Further, we can use Pólya-Gamma Gibbs sampling algorithm to estimate these reparameterized cognitive diagnosis models. Discussions of the reparameterized cognitive diagnosis models based on logit link function, see Henson et al. (2009).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://cran.r-project.org/web/packages/CDM/index.html.

Author Contributions

JZ completed the writing of the article. ZZ and JZ provided article revisions. JZ, JL, and ZZ provided the key technical support. JT provided the original thoughts.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

10 in total

3 in total

Bayesian Estimation of the DINA Model With Pólya-Gamma Gibbs Sampling.

1. Introduction

2. Models and Model Identifications

2.1. The Reparameterized DINA Model

2.2. The Likelihood Function of the Reparameterized DINA Model Based on the Latent Class

2.3. Model Identification

3. Pólya-Gamma Gibbs Sampling Algorithm

4. Simulation Study

4.1. Simulation 1

4.1.1. Simulation Design

4.1.2. Priors

4.1.3. Convergence Diagnostics

4.1.4. Evaluation Criteria for Convergence and Accuracy of Parameter Estimations

4.2. Simulation 2

4.3. Simulation 3

4.3.1. Result Analysis

4.4. Simulation 4

5. Empirical Example

6. Conclusion

Data Availability Statement

Author Contributions

Conflict of Interest

1. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.

2. Revisiting the 4-Parameter Item Response Model: Bayesian Estimation and Application.

3. Measurement of psychological disorders using cognitive diagnosis models.

4. A general diagnostic model applied to language testing data.

5. Identifiability of Diagnostic Classification Models.

6. The DINA model as a constrained general diagnostic model: Two variants of a model equivalency.

7. Statistical Analysis of Q-matrix Based Diagnostic Classification Models.

8. Bayesian Estimation of the DINA Q matrix.

9. Cognitive diagnosis modelling incorporating item response times.

10. The Sufficient and Necessary Condition for the Identifiability and Estimability of the DINA Model.

1. Exploratory Restricted Latent Class Models with Monotonicity Requirements under PÒLYA-GAMMA Data Augmentation.

2. Bayesian Analysis of Aberrant Response and Response Time Data.

3. Modeling Not-Reached Items in Cognitive Diagnostic Assessments.