| Literature DB >> 35869103 |
Muhammad Bilal1, Muhammad Omar2,3, Waheed Anwar4, Rahat H Bokhari5, Gyu Sang Choi6.
Abstract
Educational Data Mining is widely used for predicting student's performance. It's a challenging task because a plethora of features related to demographics, personality traits, socio-economic, and environmental may affect students' performance. Such varying features may depend on the level of study, program offered, nature of subject, and geographical location. This study attempted to predict the final semester's results of students studying Doctor of Veterinary Medicine (DVM) based on their pre-admission academic achievements, demographics, and first semester performance. The imbalanced data led to non-generic prediction models, so it was addressed through synthetic minority oversampling technique. Among five prediction models, the Support Vector Machine led the best with 92% accuracy. The decision tree model identified key features affecting students' performance. The analysis led to the conclusion that marks obtained in Biology, Islamiat, and Urdu at Matric and English at Intermediate level affected the students' performance in their final semester. The findings provide useful information to predict students' performance and guidelines for academic institutes' management regarding improving students' achievement. It is speculated that adoption of digital transformation may help reduce difficulty faced in data collection and analysis.Entities:
Mesh:
Year: 2022 PMID: 35869103 PMCID: PMC9307570 DOI: 10.1038/s41598-022-15880-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Proposed approach for student performance prediction and feature extraction.
Dataset variables and their metadata.
| No | Features’ type | Features with description | Category | Values |
|---|---|---|---|---|
| 1 | Demographic | Gender | Categorical | Male/Female |
| 2 | Father’s Profession | Categorical | Nature of work | |
| 3 | Hafiz E Quran (the person remembers the holy book Quran) | Categorical | Yes/No | |
| 4 | Domicile (it shows the residence area of the person) | Categorical | Area Name | |
| 5 | Quota (admission based on open merit or local domicile) | Categorical | Open/BWP | |
| 6 | FSc Board Name (name of intermediate Board) | Categorical | Board Name | |
| 7 | Entry Test Name (Admission test mandatory for admission) | Categorical | NAT/MCAT | |
| 8 | Accommodation (whether student living in a hostel?) | Categorical | Yes /No | |
| 9 | Year of Birth (Year in which the applicant born) | Numeric | Year | |
| 10 | FSc Passing Year (Intermediate passing year, 12 years of education) | Numeric | Year | |
| 11 | Academic | FSc Percentage (Percentage marks in Intermediate, 12 years of education) | Numeric | Percentage |
| 12 | Entry Test Percentage | Numeric | NAT or MCAT Percentage | |
| 13 | FSc Urdu Percentage (Percentage marks in Urdu subject in intermediate) | Numeric | Percentage | |
| 14 | FSc English Percentage (Percentage marks in English subject in intermediate) | Numeric | Percentage | |
| 15 | FSc Islamic Education Percentage (Percentage marks in Islamic Education subject in intermediate) | Numeric | Percentage | |
| 16 | FSc Pak Studies Percentage (Percentage marks in Pak Studies subject in intermediate) | Numeric | Percentage | |
| 17 | FSc Physics Percentage (Percentage marks in Physics subject in intermediate) | Numeric | Percentage | |
| 18 | FSc Chemistry Percentage (Percentage marks in Chemistry subject in intermediate) | Numeric | Percentage | |
| 19 | FSc Biology Percentage (Percentage marks in Biology subject in intermediate) | Numeric | Percentage | |
| 20 | Matric Urdu Percentage (Percentage marks in Urdu subject in matric) | Numeric | Percentage | |
| 21 | Matric English Percentage (Percentage marks in English subject in matric) | Numeric | Percentage | |
| 22 | Matric Islamic Education Percentage (Percentage marks in Islamic Education subject in matric) | Numeric | Percentage | |
| 23 | Matric Pak Studies Percentage (Percentage marks in Pak Studies subject in matric) | Numeric | Percentage | |
| 24 | Matric Mathematics Percentage (Percentage marks in Mathematics subject in matric) | Numeric | Percentage | |
| 25 | Matric Physics Percentage (Percentage marks in Physics subject in matric) | Numeric | Percentage | |
| 26 | Matric Chemistry Percentage (Percentage marks in Chemistry subject in matric) | Numeric | Percentage | |
| 27 | Matric Biology Percentage (Percentage marks in Biology subject in matric) | Numeric | Percentage | |
| 28 | Matric Percentage (Percentage marks in Matric, 10 years of education) | Numeric | Percentage | |
| 29 | SGPA (First Semester SGPA percentage) | Numeric | Percentage | |
| 30 | SGPA (final semester SGPA, 0/1 for binary classification models where 0 indicate SGPA < 3 and 1 indicate ≥ = 3.00) | Categorical | 0/1(dependant variable) |
Students’ performance prediction models based on 15-folds cross validation results.
| Metric | Classification algorithm | ||||
|---|---|---|---|---|---|
| Decision tree (%) | Random forest | Support vector machine (%) | K-nearest neighbours (%) | Logistic regression (%) | |
| Precision | 80 | 87 | 81 | 72 | |
| Recall | 80 | 86 | 70 | 72 | |
| Accuracy | 80 | 86 | 67 | 72 | |
Top result values are in bold.
Figure 2Hierarchical model of a decision tree where the label, high shows a student had at least 3.00 SGPA in the final semester results.
Decision rules derived from a decision tree, where values are % marks in different subjects.
| Sr. No | If Conditions | THEN Class |
|---|---|---|
| 1 | MatricBioPct < = 68.72 AND FscEngPct < = 81.59 AND MatricIslPct < = 77.54 AND MatricBioPct < = 67.45 | Class 0 |
| 2 | MatricBioPct < = 68.72 AND FscEngPct < = 81.59 AND MatricIslPct < = 77.54 AND MatricBioPct > 67.45 | Class 1 |
| 3 | MatricBioPct < = 68.72 AND FscEngPct < = 81.59 AND MatricIslPct > 77.54 AND FscEngPct < = 58.49 | Class 0 |
| 4 | MatricBioPct < = 68.72 AND FscEngPct (between 77.54 & 81.59 ) | Class 1 |
| 5 | MatricBioPct < = 68.72 AND FscEngPct > 81.59 | Class 1 |
| 6 | MatricBioPct > 68.72 AND MatricBioPct < = 88.97 AND MatricIslPct < = 80.73 AND FscEngPct < OR > 62.83 | Class 1 |
| 7 | MatricBioPct > 68.72 AND MatricBioPct < = 88.97 AND MatricIslPct > 80.73 AND MatricUrduPct < OR > 74.31 | Class 1 |
| 8 | MatricBioPct > 68.72 AND MatricBioPct > 88.97 AND MatricIslPct < = 90.09 | Class 1 |
| 9 | MatricBioPct > 68.72 AND MatricBioPct > 88.97 AND MatricIslPct > 82.18 | Class 0 |
| 10 | MatricBioPct > 68.72 AND MatricBioPct > 88.97 AND MatricIslPct > 91.78 | Class 1 |