| Literature DB >> 35819719 |
Jung Yeon Park1,2, Klest Dedja3, Konstantinos Pliakos3, Jinho Kim4,5, Sean Joo2,6, Frederik Cornillie2, Celine Vens3, Wim Van den Noortgate2.
Abstract
To obtain more accurate and robust feedback information from the students' assessment outcomes and to communicate it to students and optimize teaching and learning strategies, educational researchers and practitioners must critically reflect on whether the existing methods of data analytics are capable of retrieving the information provided in the database. This study compared and contrasted the prediction performance of an item response theory method, particularly the use of an explanatory item response model (EIRM), and six supervised machine learning (ML) methods for predicting students' item responses in educational assessments, considering student- and item-related background information. Each of seven prediction methods was evaluated through cross-validation approaches under three prediction scenarios: (a) unrealized responses of new students to existing items, (b) unrealized responses of existing students to new items, and (c) missing responses of existing students to existing items. The results of a simulation study and two real-life assessment data examples showed that employing student- and item-related background information in addition to the item response data substantially increases the prediction accuracy for new students or items. We also found that the EIRM is as competitive as the best performing ML methods in predicting the student performance outcomes for the educational assessment datasets.Entities:
Keywords: Background information; Educational assessment; Explanatory item response model; Item response theory; Machine learning; Prediction performance
Year: 2022 PMID: 35819719 PMCID: PMC9275388 DOI: 10.3758/s13428-022-01910-8
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
Presentation of the tuned parameters related to each method
| Method family | Method | Hyperparameters |
|---|---|---|
| Explanatory item response model (EIRM) | Not applicable | |
| Decision tree (DT) | Minimum samples per leaf {5,25,50,75,100} | |
| Random forest (RF) | Min samples per leaf {1, 2, 5}, # trees: 200 | |
| Gradient boosting (GB) | Max tree depth {3, 6}; learning rate {0.001, 0.01, 0.1}; number of estimators {100, 200} | |
| k-Nearest neighbors (k-NN) | Number of neighbors {5,10,25,50,75,100} | |
| Quadratic discriminant analysis (QDA) | Not applicable | |
| Multi layer perceptron (MLP) classifier | # hidden layers {2, 3}, neurons per layer {10, 20, 25, 40, 50}; learning parameter α (L2 regularization term) {0.00001, 0.0001, 0.001, 0.01, 0.1} |
Fig. 1A simple illustration example of a decision tree
Fig. 2A schematic illustration of a multi-layer perceptron with input, hidden and output layers
Fig. 3Illustrations of the three prediction scenarios
Fig. 4An illustration of the data matrix construction for the ML methods
Summary of the four simulated datasets
| Dataset description | Data size | Complexity |
|---|---|---|
| Small, 10% | Small (100 students–10 items) | Noise: 10% |
| Normal, 10% | Typical (Normal) (1000 students–100 items) | |
| Normal, 30% | Noise: 30% | |
| Normal, 60% | Noise: 60% |
In each dataset, tenfold CV for three prediction scenarios (new students, new items, and student–item pair scenarios) were conducted
Fig. 5Summary of simulation study: AUPR
Fig. 6Summary of simulation study: AUROC
Fig. 7Summary of simulation study: MSE
Fig. 8Results of post hoc tests after Friedman test (AUROC)
Average AUROC, AUPR, and MSE results from the two datasets
| Real data 1 | Real data 2 | ||||||
|---|---|---|---|---|---|---|---|
| AUROC | AUPR | MSE | AUROC | AUPR | MSE | ||
| EIRM | 0.709 (.007) | 0.737 (.015) | 0.215 (.003) | ||||
| RF | 0.723 (.009) | 0.752 (.016) | 0.726 (.009) | 0.835 (.011) | 0.188 (.004) | ||
| GB | 0.725 (.008) | 0.838 (.010) | 0.189 (.005) | ||||
| New | DT | 0.704 (.010) | 0.733 (.017) | 0.217 (.004) | 0.700 (.005) | 0.813 (.012) | 0.197 (.004) |
| student | k-NN | 0.687 (.009) | 0.711 (.013) | 0.222 (.003) | 0.668 (.011) | 0.794 (.012) | 0.202 (.005) |
| scenario | QDA | 0.669 (.014) | 0.699(.022) | 0.256 (.009) | 0.666 (.018) | 0.797 (.018) | 0.351(.038) |
| MLP | 0.698 (.012) | 0.722 (.021) | 0.219 (.005) | 0.711 (.010) | 0.820 (.017) | 0.195 (.006) | |
| EIRM | 0.691 (.056) | 0.719 (.093) | 0.232 (.033) | 0.226 (.061) | |||
| RF | 0.653 (.061) | 0.784 (.106) | 0.225 (.059) | ||||
| GB | 0.741 (.082) | 0.656 (.055) | 0.791 (.096) | 0.230 (.060) | |||
| New item | DT | 0.662 (.045) | 0.694 (.088) | 0.227 (.021) | 0.587 (.050) | 0.734 (.112) | 0.247 (.070) |
| scenario | k-NN | 0.686 (.048) | 0.712 (.079) | 0.224 (.017) | 0.652 (.027) | 0.775 (.101) | |
| QDA | 0.608 (.074) | 0.646 (.083) | 0.298 (.060) | 0.545 (.097) | 0.722 (.124) | 0.432 (.123) | |
| MLP | 0.664 (.051) | 0.694 (.105) | 0.228 (.031) | 0.656 (.050) | 0.790 (.095) | 0.223 (.058) | |
| EIRM | 0.750 (.004) | 0.780 (.005) | 0.201 (.002) | ||||
| Student–item pair | RF | 0.730 (.009) | 0.841 (.006) | 0.187 (.002) | |||
| scenario | GB | 0.783 (.004) | 0.748 (.008) | 0.854 (.007) | 0.182 (.002) | ||
| DT | 0.712 (.004) | 0.746 (.004) | 0.214 (.001) | 0.701 (.009) | 0.815 (.009) | 0.196 (.003) | |
| k-NN | 0.717 (.005) | 0.744 (.006) | 0.212 (.002) | 0.697 (.009) | 0.817 (.006) | 0.197 (.002) | |
| QDA | 0.678 (.009) | 0.706 (.006) | 0.252 (.005) | 0.678 (.011) | 0.802 (.007) | 0.326 (.020) | |
| MLP | 0.725 (.005) | 0.756 (.004) | 0.210 (.002) | 0.734 (.009) | 0.843 (.006) | 0.187 (.003) | |
Best values are indicated in bold and standard deviations in parenthesis