| Literature DB >> 36097545 |
Yudish Teshal Badal1, Roopesh Kevin Sungkur2.
Abstract
The outbreak of COVID-19 has caused significant disruption in all sectors and industries around the world. To tackle the spread of the novel coronavirus, the learning process and the modes of delivery had to be altered. Most courses are delivered traditionally with face-to-face or a blended approach through online learning platforms. In addition, researchers and educational specialists around the globe always had a keen interest in predicting a student's performance based on the student's information such as previous exam results obtained and experiences. With the upsurge in using online learning platforms, predicting the student's performance by including their interactions such as discussion forums could be integrated to create a predictive model. The aims of the research are to provide a predictive model to forecast students' performance (grade/engagement) and to analyse the effect of online learning platform's features. The model created in this study made use of machine learning techniques to predict the final grade and engagement level of a learner. The quantitative approach for student's data analysis and processing proved that the Random Forest classifier outperformed the others. An accuracy of 85% and 83% were recorded for grade and engagement prediction respectively with attributes related to student profile and interaction on a learning platform.Entities:
Keywords: Machine learning; Online learning platform; Predictive analysis; Random forest; Student engagement
Year: 2022 PMID: 36097545 PMCID: PMC9452868 DOI: 10.1007/s10639-022-11299-8
Source DB: PubMed Journal: Educ Inf Technol (Dordr) ISSN: 1360-2357
Criteria for research questions
| Criteria | Details |
|---|---|
| Population | University students |
| Intervention | Using machine learning algorithms for predicting performance |
| Context | Academic institution specialised in pedagogy Use of secondary data (Student’s data, examination results, online learning platform & files) |
| Outcome | Predicting accuracy of machine learning algorithms and correlation analysis |
Research questions
| # | Research Question & Description |
|---|---|
| RQ 1 | How accurate are the machine learning algorithms at predicting students’ performance (Grade & engagement)? |
| The dataset set compiled will be fed to 7 machine learning algorithms which are at the forefront of the analytics community. The best one can be a deciding factor for an education analytic framework | |
| RQ 2 | What are the important attributes in predicting the students’ grade? |
| The students’ dataset will consist of multiple attributes such as age, certificates obtained, experience, activities and so on. The attributes retained for prediction will be identified through analysis | |
| RQ 3 | Can an adaptable predictive modelling framework be developed for student performance and engagement? The framework should cater for new features in online learning and predict the performance and engagement of a student |
Fig. 1Structure of decision tree algorithm (Hafeez et al., 2021)
Fig. 2An example of a random forest structure considering multiple
Fig. 3Transformation of data into a higher dimension with the kernel function (Theobald, 2017)
Fig. 4Decision boundary distance in SVM (Raschka & Mirjalili, 2017)
Fig. 5Basic machine learning workflow (Landset et al., 2015)
Fig. 6Ralph Kimball's bottom-up approach to DWH design. (Kimball & Ross, 2013)
Variables for performance metrics (Michelucci, 2019)
| Variables | Definition |
|---|---|
| True positives (TP) | Tests are predicted correctly |
| False positives (FP) | Test predicting a particular class but actually is not |
| True negatives (TN) | Test correctly predicting not belonging to a class |
| False negatives (FN) | Test predicted as not belonging to a particular class when in fact it is |
Fig. 7General Research Approach for machine learning (Kamiri & Mariga, 2021)
Fig. 8Research design derived from research questions
Fig. 9Machine learning architectural design to evaluate student performance
Fig. 10Web Scraper Architecture for retrieve discussion forum information
Fig. 11CSV format generated from database
Fig. 12Proposed framework for predicting student performance and engagement
Dataset after feature encoding for predicting student’s grade
| # | Feature Name | Description | Data type |
|---|---|---|---|
| 1 | Age | Student's current age | Ordinal |
| 2 | Rural or urban | Student living in urban or rural area | Nominal |
| 3 | Gender | Male or Female | Nominal |
| 4 | Work | Student's work status (No, work in rural or urban area) | Nominal |
| 5 | Marital status | Married or single | Nominal |
| 6 | No of Attempt | Number of times student attempted exam | Ordinal |
| 8 | Work experience count | Number of work experiences | Ordinal |
| 9 | Student's total work days | Total number of work experiences in days | Ordinal |
| 10 | No of certificate in Training Centre | Number of certificated obtained in training centres | Ordinal |
| 11 | Masters at University | Number of master's degree obtained | Ordinal |
| 12 | Undergraduate at University | Number of bachelor degree or diploma awarded | Ordinal |
| 13 | Sum No of days at university | Total number of days spent in a tertiary institution as student | Ordinal |
| 14 | Connected | student has connected at least one time to LMS | Ordinal |
| 15 | Learning object menu | Page containing list of learning objcts | Ordinal |
| 16 | Learning object 1 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 17 | Learning object 2 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 18 | Learning object 3 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 19 | Learning object 4 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 20 | Learning object 5 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 21 | Learning object 6 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 22 | Learning object 7 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 23 | Learning object 8 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 24 | Learning object 9 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 25 | Learning object 10 | Video, animation, Audio file, pdf file etc.… | Ordinal |
| 26 | Number of post created | Number of posts initiated by student | Ordinal |
| 27 | Sum of Number of words | Total number of words submitted by student for all posts created | Ordinal |
| 28 | Sum of Number of characters | Total number of characters submitted by student for all posts created | Ordinal |
| 29 | Participate in discussion | Total number of discussions participated | Ordinal |
| 30 | Sum of Number of words in discussion | Total number of words submitted by student in discussions | Ordinal |
| 31 | Sum of Number of characters in discussion | Total number of characters submitted by student in discussions | Ordinal |
| 32 | MCQ attempted | Student attempted MCQs(Yes or No) | Ordinal |
| 33 | Duration in hours | Time take to submit MCQs | Ordinal |
| 34 | MCQ Score | Score obtained in online MCQ | Ordinal |
Functional and non-functional requirements
| Functional Requirements | |
|---|---|
| FR 1 | The system should be able to accept a csv file with all data |
| FR 2 | The system should allow the selection of ML algorithm for grade prediction |
| FR 3 | The system should allow the selection of ML algorithms for student engagement level prediction |
| FR 4 | The system should apply the Mean, Mode, Median and Mice imputation technique |
| FR 5 | The system should be able to apply feature encoding |
| FR 6 | The system should be able to remediate imbalance classes using SMOTE |
| FR 7 | The system should be able to discard irrelevant attributes |
| FR 8 | The system should evaluate the model’s accuracy, prediction, recall and F1 score |
| FR 9 | The system should be able to normalise the dataset |
| FR 10 | The system should evaluate the best hyperparameters |
| FR 11 | The system should display the best accuracy of the model in percentage |
Fig. 13Web application for student performance and engagement prediction
Fig. 14Confusion matrix for grade prediction using RF
Fig. 15Average of Evaluation metrics per fold—grade prediction using RF
Performance metrics for grade prediction using RF
| Metric | |
| Average Accuracy | 85.13% |
| Average Precision | 85.14% |
| Average Recall | 85.12% |
| Average F-Measure | 84.94% |
| Imputation Technique | |
| MICE | |
| Hyperparameters | |
| max_depth | None |
| max_features | auto |
| n_estimators | 1000 |
Fig. 16Confusion matrix for engagement prediction using RF
Fig. 17Average of Evaluation metrics per fold—engagement prediction using RF
Performance metrics for engagement prediction using RF
| Metric | |
| Average Accuracy | 83.88% |
| Average Precision | 84.31% |
| Average Recall | 83.86% |
| Average F-Measure | 83.51% |
| Hyperparameters | |
| max_depth | 10 |
| max_features | auto |
| n_estimators | 100 |
Fig. 18Evaluation of ML Models for grade prediction
Fig. 19Evaluation of ML Models for engagement prediction
Fig. 20Drop in 2nd Fold for Average of Evaluation metrics per fold—engagement prediction using MLP