| Literature DB >> 35528271 |
Maria Bolsinova1, Benjamin Deonovic2, Meirav Arieli-Attali3, Burr Settles4, Masato Hagiwara5, Gunter Maris6.
Abstract
Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners' progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints.Entities:
Keywords: adaptive learning systems; hints; item response theory; item response tree models; measurement; measurement theory; multidimensional nominal response model
Year: 2022 PMID: 35528271 PMCID: PMC9073638 DOI: 10.1177/01466216221084208
Source DB: PubMed Journal: Appl Psychol Meas ISSN: 0146-6216
Figure 1.Item difficulties in the Duolingo Spanish-from-English data set estimated separately from the responses in which no hints were used (on the x-axis) and from the responses in which hints were used (on the y-axis). The parameters were estimated for the two-parameter logistic model using marginal maximum likelihood and the R-package mirt (Chalmers, 2012). The estimates do not lie on the identity line, which means that it is not reasonable to assume that the difficulty of the items is independent on hint use.
Figure 2.Histogram of the tetrachoric correlations between hint use variables on different items in the Spanish-from-English Duolingo data set. Most of the correlations are positive and some of them are relatively high, which means that it might be beneficial to extend the scoring-rule-based model with an additional dimension which would account for these correlations.
Figure 3.Tree-structure for responding to an item when hints are available on-demand (Y —hint use, X —response accuracy).
Model comparison results for scoring-rule-based models with different specifications (the order of categories for ability measurement is indicated: I—incorrect, C—correct, H−—without hints, H+—with hints; “no α ” means that item-specific discrimination parameters are not included, “single δ ” means that category-specific thresholds are not included, “no η” means that hint-use dimension is not included) and for IRTree models with different specifications of the second (accuracy when the hint was used) and third (accuracy when the hint was not used) nodes of the tree: npar—number of freely estimated parameters, AIC—average Akaike Information Criterion in the traning data, BIC—average Bayesian Information Criterion in the training data, CVLL—average log-likelihood in the testing data. The best model based on AIC is the IRTree model with the different abilities, slopes, and intercepts when hints are and are not used. The best model based on BIC and CVLL is the two-dimensional scoring-rule-based model with highest scores for correct responses without hints, partial scores for correct responses with hints, and zero scores for all incorrect responses.
| Model | npar | AIC | BIC | CVLL |
|---|---|---|---|---|
| Scoring-rule-based models | ||||
| | 100 | 275,065 | 275,621 | −137,432 |
| | 198 | 273,548 | 274,649 | −136,867 |
| | 396 | 241,118 | 243,320 | −120,584 |
| | 496 | 210,563 | 213,322 | −105,273 |
| | 496 | 210,622 | 213,381 | −105,304 |
| ( | 496 | 210,522 | 213,280 | −105,254 |
| | 496 | 210,653 | 213,412 | −105,327 |
| ( | 496 | 210,754 | 213,512 | −105,361 |
| IRTree models | ||||
| Same ability, slope, and intercept | 397 | 211,454 | 213,662 | −105,729 |
| Same ability and intercept, different slopes | 496 | 210,975 | 213,733 | −105,543 |
| Same ability and slope, different intercepts | 496 | 210,545 | 213,304 | −105,283 |
| Same ability, different slopes and intercepts | 595 | 210,445 | 213,754 | −105,271 |
| Different abilities, slopes, and intercepts | 597 | 210,416 | 213,736 | −105,267 |
Figure 4.Comparison between the ability (θ) estimates under the selected model (two-dimensional scoring-rule-based method with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses) and two-parameter logistic models (2PL) with (1) all correct responses receiving full credit (panels on the left); (2) only correct responses without hints receiving full credit (panels on the right). The upper panels show the relationships between the estimates under the two models. The estimates are a lot more similar when comparing the scoring-rule-based method with the 2PL which ignores hints use than when comparing it to the 2PL which treats hint use as incorrect responses. The lower panels show how the difference in the estimates depends on the hint-use dimension (η): when comparing to the 2PL which ignores hint use for the persons with low η the θ-estimate is higher for the scoring-rule-based model and vice verse for persons with high η; when comparing to the 2PL which treats all hint use as incorrect responses, the opposite pattern is present.