| Literature DB >> 29599736 |
Abstract
A substantial body of research has been conducted on variables relating to students' mathematics achievement with TIMSS. However, most studies have employed conventional statistical methods, and have focused on selected few indicators instead of utilizing hundreds of variables TIMSS provides. This study aimed to find a prediction model for students' mathematics achievement using as many TIMSS student and teacher variables as possible. Elastic net, the selected machine learning technique in this study, takes advantage of both LASSO and ridge in terms of variable selection and multicollinearity, respectively. A logistic regression model was also employed to predict TIMSS 2011 Korean 4th graders' mathematics achievement. Ten-fold cross-validation with mean squared error was employed to determine the elastic net regularization parameter. Among 162 TIMSS variables explored, 12 student and 5 teacher variables were selected in the elastic net model, and the prediction accuracy, sensitivity, and specificity were 76.06, 70.23, and 80.34%, respectively. This study showed that the elastic net method can be successfully applied to educational large-scale data by selecting a subset of variables with reasonable prediction accuracy and finding new variables to predict students' mathematics achievement. Newly found variables via machine learning can shed light on the existing theories from a totally different perspective, which in turn propagates creation of a new theory or complement of existing ones. This study also examined the current scale development convention from a machine learning perspective.Entities:
Keywords: TIMSS; elastic net; machine learning; mathematics achievement; penalized regression; regularization
Year: 2018 PMID: 29599736 PMCID: PMC5862814 DOI: 10.3389/fpsyg.2018.00317
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Majority vote result with 2011 TIMSS Korean 4th graders' math.
| Observations | 9 | 99 | 684 | 1,771 | 1,737 | 34 | 4,334 |
Training and test data.
| Data ( | 995 | 1,358 |
| Training data ( | 696 | 951 |
| Test data ( | 299 | 407 |
Figure 1Regularization parameter (λ) and corresponding coefficients.
Figure 2Ten-fold CV results of five measure types.
Prediction accuracy, sensitivity, and specificity of five measures.
| Misclassification | 73.23 | 56.67 | 84.39 |
| AUC | 73.83 | 59.07 | 83.77 |
| Deviance | 73.75 | 60.00 | 83.02 |
| MSE | 76.06 | 70.23 | 80.34 |
| MAE | 73.75 | 63.15 | 80.90 |
Selected variables, their labels, scales, and coefficients.
| Intercept | −1.262 | |||
| 1 | ATBG05BC | Specialization/language-reading | 1 (yes), 0 (no) | −0.060 |
| 2 | ATBG06E | Parental support | 1 (very high) to 5 (very low) | −0.060 |
| 3 | ATBG06F | Parental involvement | 1 (very high) to 5 (very low) | −0.005 |
| 4 | ATBG07A | Sch/safe neighborhood | 1 (agree a lot) to 4 (disagree a lot) | −0.098 |
| 5 | ATBG10B | Interactions t/collaborate | 1 (never or almost never) to 4 (daily or almost daily) | 0.042 |
| 6 | ASBG04 | Amount of books in your home | 1(0–10), 2(11–25), 3(26–100), 4(101–200), 5(200+) | 0.214 |
| 7 | ASBG05E | Internet connection | 1 (yes), 0 (no) | 0.287 |
| 8 | ASBG05F | Car possession | 1 (yes), 0 (no) | 0.003 |
| 9 | ASBG06A | How often use computer home | 1 (everyday) to 4 (never) | −0.004 |
| 10 | ASBM02C | Teacher is easy to understand | 1 (agree a lot) to 4 (disagree a lot) | −0.030 |
| 11 | ASBM03A | Usually do well in math | 1 (agree a lot) to 4 (disagree a lot) | −0.230 |
| 12 | ASBM03B | Harder for me than for others | 1 (agree a lot) to 4 (disagree a lot) | 0.255 |
| 13 | ASBM03C | Just not good in math | 1 (agree a lot) to 4 (disagree a lot) | 0.103 |
| 14 | ASBM03D | Learn quickly in mathematics | 1 (agree a lot) to 4 (disagree a lot) | −0.024 |
| 15 | ASBM03E | Good at working out problems | 1 (agree a lot) to 4 (disagree a lot) | −0.163 |
| 16 | ASBGSCM | Math self-confidence (score) | 0.070 | |
| 17 | ASDGSCM | Math self-confidence (index) | 1 (confident) to 3 (not confident) | −0.287 |
Teacher variables were followed by student variables, and variables were presented in the order of TIMSS questionnaires.
| Predicted as positive (Advanced) | Predicted as negative (Others) | |
| Actual + | ||
| Actual – |