| Literature DB >> 35494796 |
Muhammad Adnan1, Alaa Abdul Salam Alarood2, M Irfan Uddin1, Izaz Ur Rehman3.
Abstract
Corona Virus Disease 2019 (COVID-19) pandemic has increased the importance of Virtual Learning Environments (VLEs) instigating students to study from their homes. Every day a tremendous amount of data is generated when students interact with VLEs to perform different activities and access learning material. To make the generated data useful, it must be processed and managed by the proper machine learning (ML) algorithm. ML algorithms' applications are many folds with Education Data Mining (EDM) and Learning Analytics (LA) as their major fields. ML algorithms are commonly used to process raw data to discover hidden patterns and construct a model to make future predictions, such as predicting students' performance, dropouts, engagement, etc. However, in VLE, it is important to select the right and most applicable ML algorithm to give the best performance results. In this study, we aim to improve those ML and DL algorithms' performance that give an inferior performance in terms of performance, accuracy, precision, recall, and F1 score. Several ML algorithms were applied on Open University Learning Analytics (OULA) dataset to reveal which one offers the best results in terms of performance, accuracy, precision, recall, and F1 score. Two popular ML algorithms called Decision Tree (DT) and Feed-Forward Neural Network (FFNN) provided unsatisfactory results. They were selected and experimented with various techniques such as grid search cross-validation, adaptive boosting, extreme gradient boosting, early stopping, feature engineering, and dropping inactive neurons to improve their performance scores. Moreover, we also determined the feature weights/importance in predicting the students' study performance, leading to the design and development of the adaptive learning system. The ML techniques and the methods used in this research study can be used by instructors/administrators to optimize learning content and provide informed guidance to students, thus improving their learning experience and making it exciting and adaptive.Entities:
Keywords: Adaptive boosting; Cross validation; Grid search; Machine learning; Performance augmentation
Year: 2022 PMID: 35494796 PMCID: PMC9044349 DOI: 10.7717/peerj-cs.803
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Comparative Analysis of studies using ML techniques on VLE and MOOC datasets.
| Authors | Objective | ML model | Performance & Evaluation |
|---|---|---|---|
|
| Academic performance prediction by considering student heterogeneity | Naïve Bayes, J48, SMO JRip | Naïve Bayes with best Accuracy = 85% |
|
| Using daily online activity to predict students’ success | Decision Tree, Random Forest, Logistic regression, Support Vector Machine | SVM with f1 score = 91% and accuracy = 87% precision = 93% recall = 89% |
|
| Using predictive learning analytics to empower online teachers | Chi-square Kruskal-Wallis H test | 55.9%, |
|
| Using recurrent neural network to predict students performance in multiple courses | RNN | Accuracy = 84.6% |
|
| Deploying deep learning model to predict students performance | Deep Artificial Neural Network | Accuracy: DANN = 84–93% |
| Logistic Regression = 79.82–85.60% | |||
| Support Vector Machine = 79.95–89.14% | |||
|
| Using classification algorithms to predict students dropout | Naïve Bayes (NB) | Precision, Recall, Accuracy, f1 Random Forest Accuracy = 86%, Precision = 86%, f1 = 85% GBT Recall = 86% |
| Support Vector Machine (SVM) | |||
| Random Forest (RF) | |||
| Gradient Boosting Tree (GBT) | |||
|
| Predicting students dropout and success in MOOC for professional learning | Boosted Logistic Regression | Stochastic Gradient Boosting |
| Stochastic Gradient Boosting | Neuronal Network | ||
| Random Forest, Naïve Bayes | Random Forest with accuracy above 80% | ||
| Neuronal Network | |||
| Support Vector Machine | |||
| K-Nearest Neighbors | |||
| Classification Tree | |||
| Extreme Gradient Boosting | |||
|
| Predicting dropout and performance using ensemble, regression and deep learning technique | Generalized Linear Model | Deep learning with 94% accuracy for dropout prediction. |
| Gradient Boosting Machine | Deep learning with 96% for performance prediction | ||
| Distributed Random Forest | |||
| Deep Learning | |||
|
| Intervening students to avoid dropout by leveraging deep learning | K-Nearest Neighbors | Deep Learning with 98% accuracy |
| Support Vector Machine | |||
| Decision Tree Deep Learning | |||
|
| Predicting learners’ dropout in MOOC based on efficient algorithm and feature selection | Support Vector Machine | Random Forest with 97% accuracy using RFE feature selection method |
| K-Nearest Neighbors | |||
| Decision Tree, Naïve Bayes | |||
| Logistic Regression | |||
| Random Forest | |||
|
| Identification of learning styles using predictive model | Neural Networks, Decision Tree | Decision Tree with the highest accuracy of 99%, Precision = 99%, Recall = 99%, f1 score = 99%, micro-precision = 99%, macro-precision = 98% |
| Random Forest | |||
| K-Nearest Neighbors | |||
|
| Predicting students performance in MOOC with clickstreams data | K-Nearest Neighbors | ANN with the highest accuracy of 96% |
| Artificial Neural Networks | |||
| Support Vector Machines | |||
|
| Evaluating the performance of deep neural networks to predict students dropout in MOOC | Deep Neural Network (DNN) | DNN with 99% accuracy when using 64 neurons |
|
| Improving the performance of predictive models by effective feature selection using unsupervised machine learning | Long Short-Term Memory (LSTM) | LSTM with best feature selection |
| Convolutional Neural Networks (CNN) |
Figure 1Workflow of the proposed machine learning based VLE architecture.
Performing one-hot encoding of the final result feature.
| Withdrawn | Fail | Pass | Distinction |
|---|---|---|---|
| 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 |
ML classifiers with inferior performance scores.
| Gaussian NB | Precision | Recall | f1-score | Support |
|---|---|---|---|---|
| Distinction | 0.32 | 0.48 | 0.38 | 464 |
| Fail | 0.35 | 0.37 | 0.36 | 927 |
| Pass | 0.62 | 0.46 | 0.53 | 2,067 |
| Withdrawn | 0.40 | 0.51 | 0.45 | 883 |
| Accuracy | 0.45 | 4,341 | ||
| Macro avg | 0.42 | 0.46 | 0.43 | 4,341 |
| Weighted avg | 0.49 | 0.45 | 0.46 | 4,341 |
| SVM | ||||
| Distinction | 0.52 | 0.17 | 0.26 | 519 |
| Fail | 0.46 | 0.11 | 0.18 | 928 |
| Pass | 0.64 | 0.55 | 0.59 | 2,000 |
| Withdrawn | 0.32 | 0.81 | 0.46 | 894 |
| Accuracy | 0.46 | 4,341 | ||
| Macro avg | 0.49 | 0.41 | 0.37 | 4,341 |
| Weighted avg | 0.52 | 0.46 | 0.44 | 4,341 |
| Bernoulli NB | ||||
| Distinction | 0.34 | 0.08 | 0.13 | 464 |
| Fail | 0.37 | 0.24 | 0.29 | 927 |
| Pass | 0.56 | 0.77 | 0.65 | 2,067 |
| Withdrawn | 0.41 | 0.38 | 0.39 | 883 |
| Accuracy | 0.50 | 4,341 | ||
| Macro avg | 0.42 | 0.37 | 0.37 | 4,341 |
| Weighted avg | 0.47 | 0.50 | 0.47 | 4,341 |
| KNN | ||||
| Distinction | 0.46 | 0.49 | 0.48 | 464 |
| Fail | 0.36 | 0.41 | 0.38 | 927 |
| Pass | 0.61 | 0.67 | 0.64 | 2,067 |
| Withdrawn | 0.47 | 0.27 | 0.34 | 883 |
| Accuracy | 0.52 | 4,341 | ||
| Macro avg | 0.48 | 0.46 | 0.46 | 4,341 |
| Weighted avg | 0.51 | 0.52 | 0.51 | 4,341 |
ML classifier with superior performance score.
| Gradient boosting | Precision | Recall | f1-score | Support |
|---|---|---|---|---|
| Distinction | 0.68 | 0.46 | 0.55 | 519 |
| Fail | 0.51 | 0.38 | 0.43 | 928 |
| Pass | 0.66 | 0.86 | 0.75 | 2,000 |
| Withdrawn | 0.61 | 0.47 | 0.53 | 894 |
| Accuracy | 0.63 | 4,341 | ||
| Macro avg | 0.61 | 0.54 | 0.56 | 4,341 |
| Weighted avg | 0.62 | 0.63 | 0.61 | 4,341 |
| Extra tree classifier | ||||
| Distinction | 0.65 | 0.43 | 0.52 | 464 |
| Fail | 0.51 | 0.40 | 0.45 | 927 |
| Pass | 0.67 | 0.85 | 0.75 | 2,067 |
| Withdrawn | 0.60 | 0.47 | 0.53 | 883 |
| Accuracy | 0.63 | 4,341 | ||
| Macro avg | 0.61 | 0.54 | 0.56 | 4,341 |
| Weighted avg | 0.62 | 0.63 | 0.62 | 4,341 |
| Random forest ‘gini’ | ||||
| Distinction | 0.66 | 0.48 | 0.56 | 464 |
| Fail | 0.56 | 0.45 | 0.50 | 927 |
| Pass | 0.69 | 0.86 | 0.77 | 2,067 |
| Withdrawn | 0.64 | 0.50 | 0.57 | 883 |
| Accuracy | 0.66 | 4,341 | ||
| Macro avg | 0.64 | 0.57 | 0.60 | 4,341 |
| Weighted avg | 0.65 | 0.66 | 0.65 | 4,341 |
| Random forest ‘entropy’ | ||||
| Distinction | 0.66 | 0.48 | 0.56 | 464 |
| Fail | 0.55 | 0.44 | 0.49 | 927 |
| Pass | 0.69 | 0.87 | 0.77 | 2,067 |
| Withdrawn | 0.65 | 0.48 | 0.56 | 883 |
| Accuracy | 0.66 | 4,341 | ||
| Macro avg | 0.64 | 0.57 | 0.59 | 4,341 |
| Weighted avg | 0.65 | 0.66 | 0.64 | 4,341 |
Figure 2Confusion matrix for the DT model.
Figure 3Confusion matrix for the DT model after feature engineering.
Figure 4Confusion matrix for the AdaBoost ensembling technique.
Figure 5Confusion matrix for the XGBoost algorithm.
Figure 6Features importance/weights in predicting the final performance.
Figure 7XGBoost log loss performance.
Figure 8XGBoost classification error performance.
Figure 9DFFNN training accuracy/loss, validation accuracy/loss.
Figure 10DFFNN early stopping during the training process.
Figure 11DFFNN early stopping after inactive neurons dropouts.
DFFNN performance score for distinction, fail, pass and withdrawn grades.
| DFFNN performance score | Precision | Recall | f1-score | Support |
|---|---|---|---|---|
| D | 0.57 | 0.51 | 0.54 | 471 |
| F | 0.50 | 0.44 | 0.47 | 946 |
| P | 0.67 | 0.83 | 0.74 | 2,036 |
| W | 0.59 | 0.38 | 0.46 | 888 |
| Accuracy |
| 4,341 | ||
| Macro avg | 0.58 | 0.54 | 0.55 | 4,341 |
| Weighted avg | 0.61 | 0.62 | 0.60 | 4,341 |
Note:
Bold value is the accuracy score of the DFFNN model for distinction, fail, pass and withdrawn grades
After feature engineering, DFFNN performance for complete/incomplete grades.
| DFFNN performance score for complete/incomplete grades | Precision | Recall | f1-score | Support |
|---|---|---|---|---|
| Complete | 0.78 | 0.90 | 0.83 | 2,526 |
| Incomplete | 0.82 | 0.64 | 0.72 | 1,815 |
| Accuracy |
| 4,341 | ||
| Macro avg | 0.80 | 0.77 | 0.78 | 4,341 |
| Weighted avg | 0.80 | 0.79 | 0.79 | 4,341 |
Note:
Bold is the accuracy score of the DFFNN model for complete and incomplete grades
Figure 12DFFNN training history for complete/incomplete grades.
Figure 13DFFNN early stopping for complete/incomplete grades.
Figure 14DFFNN early stopping and dropping inactive neurons for complete/incomplete grades.
Figure 15FFNN receiver operating characteristic (ROC) with AUC score.
ML Classifiers with average performance score.
| Decision tree | Precision | Recall | f1-score | Support |
|---|---|---|---|---|
| Distinction | 0.42 | 0.44 | 0.43 | 464 |
| Fail | 0.40 | 0.42 | 0.41 | 927 |
| Pass | 0.66 | 0.63 | 0.64 | 2,067 |
| Withdrawn | 0.46 | 0.47 | 0.46 | 883 |
| Accuracy | 0.53 | 4,341 | ||
| Macro avg | 0.49 | 0.49 | 0.49 | 4,341 |
| Weighted avg | 0.54 | 0.53 | 0.53 | 4,341 |
| Logistic | ||||
| Regression | ||||
| Distinction | 0.67 | 0.32 | 0.43 | 519 |
| Fail | 0.48 | 0.31 | 0.38 | 928 |
| Pass | 0.63 | 0.85 | 0.72 | 2,000 |
| Withdrawn | 0.54 | 0.47 | 0.50 | 894 |
| Accuracy | 0.59 | 4,341 | ||
| Macro avg | 0.58 | 0.49 | 0.51 | 4,341 |
| Weighted avg | 0.58 | 0.59 | 0.57 | 4,341 |
| Ada boost | ||||
| Distinction | 0.52 | 0.52 | 0.52 | 464 |
| Fail | 0.46 | 0.40 | 0.43 | 927 |
| Pass | 0.67 | 0.79 | 0.73 | 2,067 |
| Withdrawn | 0.55 | 0.40 | 0.46 | 883 |
| Accuracy | 0.60 | 4,341 | ||
| Macro avg | 0.55 | 0.53 | 0.53 | 4,341 |
| Weighted avg | 0.58 | 0.60 | 0.59 | 4,341 |
| FFNN | ||||
| Distinction | 0.57 | 0.51 | 0.54 | 471 |
| Fail | 0.50 | 0.44 | 0.47 | 946 |
| Pass | 0.67 | 0.83 | 0.74 | 2,036 |
| Withdrawn | 0.59 | 0.38 | 0.46 | 888 |
| Accuracy | 0.62 | 4,341 | ||
| Macro avg | 0.58 | 0.54 | 0.55 | 4,341 |
| Weighted avg | 0.61 | 0.62 | 0.60 | 4,341 |