| Literature DB >> 36135602 |
Allan B I Bernardo1, Macario O Cordel2, Minie Rose C Lapinid3, Jude Michael M Teves2, Sashmir A Yap2, Unisse C Chua2.
Abstract
Filipino students performed poorly in the 2018 Programme for International Student Assessment (PISA) mathematics assessment, with more than 50% obtaining scores below the lowest proficiency level. Students from public schools also performed worse compared to their private school counterparts. We used machine learning approaches, specifically binary classification methods, to model the variables that best identified the poor performing students (below Level 1) vs. better performing students (Levels 1 to 6) using the PISA data from a nationally representative sample of 15-year-old Filipino students. We analyzed data from students in private and public schools separately. Several binary classification methods were applied, and the best classification model for both private and public school groups was the Random Forest classifier. The ten variables with the highest impact on the model were identified for the private and public school groups. Five variables were similarly important in the private and public school models. However, there were other distinct variables that relate to students' motivations, family and school experiences that were important in identifying the poor performing students in each school type. The results are discussed in relation to the social and social cognitive experiences of students that relate to socioeconomic contexts that differ between public and private schools.Entities:
Keywords: PISA; Philippines; machine learning; mathematics achievement; public vs. private schools; school type; socioeconomic differences
Year: 2022 PMID: 36135602 PMCID: PMC9504801 DOI: 10.3390/jintelligence10030061
Source DB: PubMed Journal: J Intell ISSN: 2079-3200
List of the considered ML models and the different hyperparameters during the grid search. Hyperparameters define the complexity of the ML and each model’s learning performance during the training.
| ML Models | Hyperparameters |
|---|---|
| Logistic Regression | solver: newton-cg, lbfgs, liblinear |
| MLP | hidden layer sizes: (10, 30, 10), (10, 30), (32, 32), (10, 10, 10, 10) |
| SVM | kernel: radial basis function, polynomial |
| Decision Tree | criterion: gini, entropy |
| Random Forest | criterion: gini, entropy |
Figure 1Illustration of a Multilayer perceptron (top left), Support Vector Machine with linearly separable data (bottom left), and a Random Forest (right) with four Decision Tree predictors.
Summary of best validation performance per ML model after grid search. Text in bold indicates best ML model performance for a specific metric and school type. For both participants from private and public schools, the best classifier is the Random Forest in terms of accuracy.
| School Type | ML Model | Validation Performance | Hyperparameters Optimal Values | |||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-Score | Acc | |||
| Private | Logistic regression | 0.63 |
|
| 0.74 | C: 1; penalty: l2; solver: newton-cg |
| MLP | 0.67 | 0.56 | 0.61 | 0.73 | activation: ‘relu’; alpha: 0.005, | |
| SVM | 0.67 | 0.02 | 0.04 | 0.63 | C: 10; gamma: 1; kernel: rbf | |
| Decision tree | 0.54 | 0.54 | 0.54 | 0.72 | criterion: gini; max_depth: 12 | |
| Random forest |
| 0.61 | 0.65 |
| criterion: ‘gini’; max_depth: 20 max_features: log2 n_estimators: 500 | |
| Public | Logistic regression | 0.81 | 0.75 | 0.78 | 0.75 | C: 1; penalty: l1; solver: liblinear |
| MLP | 0.80 | 0.75 | 0.78 | 0.74 | activation: ‘relu’; alpha: 0.05; | |
| SVM | 0.75 | 0.76 | 0.75 | 0.70 | C: 100; gamma: 0.1; kernel: rbf | |
| Decision tree | 0.76 | 0.76 | 0.76 | 0.71 | criterion: gini; max_depth: 6 | |
| Random forest |
|
|
|
| criterion: ‘gini’; max_depth: 15 | |
Figure 2(a) Area under the ROC curve (AUC) indicators for the private (left) and public (right) school participants. AUC score indicates how well separated are the classes 0 and 1 in the Random Forest classifier. (b) Confusion matrix for the Random Forest Classifier model for the private (left) and public (right) school participants. (c) a cursory look at the accuracy of the different ML models in the exhaustive search for best hyperparameters for the private (left) and public (right) school participants. Note that RF performs better than other ML models in terms of performance consistency regardless of the hyperparameters.
Figure 3Top 10 most significant variables (in descending order) in the Random Forest model classifier for (a) private school participants and (b) public school participants. Red bars represent direct relationships with identifying the poor performing students while blue bars represent inverse relationships with identifying poor performing students. SHAP values represent the level of variable importance relative to other variables.