| Literature DB >> 35586111 |
Yazan A Alsariera1, Yahia Baashar2, Gamal Alkawsi3, Abdulsalam Mustafa4, Ammar Ahmed Alkahtani5, Nor'ashikin Ali4.
Abstract
Student performance is crucial to the success of tertiary institutions. Especially, academic achievement is one of the metrics used in rating top-quality universities. Despite the large volume of educational data, accurately predicting student performance becomes more challenging. The main reason for this is the limited research in various machine learning (ML) approaches. Accordingly, educators need to explore effective tools for modelling and assessing student performance while recognizing weaknesses to improve educational outcomes. The existing ML approaches and key features for predicting student performance were investigated in this work. Related studies published between 2015 and 2021 were identified through a systematic search of various online databases. Thirty-nine studies were selected and evaluated. The results showed that six ML models were mainly used: decision tree (DT), artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), linear regression (LinR), and Naive Bayes (NB). Our results also indicated that ANN outperformed other models and had higher accuracy levels. Furthermore, academic, demographic, internal assessment, and family/personal attributes were the most predominant input variables (e.g., predictive features) used for predicting student performance. Our analysis revealed an increasing number of research in this domain and a broad range of ML algorithms applied. At the same time, the extant body of evidence suggested that ML can be beneficial in identifying and improving various academic performance areas.Entities:
Mesh:
Year: 2022 PMID: 35586111 PMCID: PMC9110122 DOI: 10.1155/2022/4151487
Source DB: PubMed Journal: Comput Intell Neurosci
PICO framework for developing research questions.
| PICO criteria | Description |
|---|---|
| Population | Male/female students; above 17 years; all educational levels. |
| Intervention | Machine learning (ML) algorithms. |
| Context | Academic institutions; university; college; high school. |
| Outcome | Model accuracy; key predictive features and models. |
Figure 1Articles screening and selection flowchart.
Figure 2Number of publications per year.
Figure 3World map showing the distribution of studies per country.
Figure 4Distribution of machine learning approaches in student's performance.
Attributes used in the prediction of student's performance.
| Attribute category | Attributes | Frequency | Study reference |
|---|---|---|---|
| Demographic | Gender; age; nationality; place of birth; marital status; guardian; address; transport | 21 | [ |
| Academic | CGPA; stage ID; grade ID; section ID; topic; semester; program; attendance; final grade | 20 | [ |
| Internal assessment | Coursework; assignments; quizzes; lab test; midterms; examinations; daily study time; plagiarism counts; virtual learning access; group presentation; personal report | 15 | [ |
| Family/personal | Parent status; parent survey; parent satisfaction; family size; parent education; parent job; income; travel time; Study time; free time; health | 12 | [ |
| Behavioral | Raised hands; visited resources; announcement view; discussion | 5 | [ |
| Communication | Messages; emails; response time; login/Logout time; time spent; number of words; voting system | 4 | [ |
| Psychological | Personality; motivation; contextual influences; learning strategies; socio economic status; approach to learning | 2 | [ |
Main classifiers used in the selected studies.
| Algorithm | Average accuracy (%) | Study |
|---|---|---|
| Artificial neural network (ANN) | 85.9 | [ |
| Decision tree (DT) | 85 | [ |
| Support vector machine (SVM) | 83.4 | [ |
| K-nearest neighbor (KNN) | 80.7 | [ |
| Naive Bayes (NB) | 83 | [ |
| Linear regression (LinR) | 55.5 | [ |
Accuracy results for decision tree (DT).
| Study | Year | Predictive features | Accuracy (%) |
|---|---|---|---|
| [ | 2016 | Student ID, graduation GPA, high school score, general aptitude test (GAT), educational attainment test (EAT), and courses | 80 |
| [ | 2019 | Final examination, continuous assessment, schooling marks, quizzes, assignments, class test, and midterm examinations | 98.2 |
| [ | 2019 | Gender, school name, travel time, age, hobbies, health details, and address | 97.9 |
| [ | 2019 | Student demographics, student grades, subjects, school-related information, and social activities | 95.8 |
| [ | 2019 | Gender, age, family size, health, marital status, work status, school grade, university type, faculty type, scholarship, transportation, traveling time, credit hours, study time, and GPA | 66 |
| [ | 2020 | Gender, age, address location, parent job, Travel time, study time, free time, failures, activities, health, and abstance | 72.26 |
Accuracy results for linear regression (LinR).
| Study | Year | Predictive features | Results |
|---|---|---|---|
| [ | 2015 | Total playing time, number of videos played, number of rewinds, number of pauses, number of fast forwards, and number of slow play rate use | Accuracy = 76.2% |
| [ | 2016 | Course-specific subdata | RMSE = (0.63, 0.72), Precisition = 26.86%. |
| [ | 2018 | Exercises, homeworks, and quizzes | pMSE = 198.68, pMAPC = 0.81 |
| [ | 2018 | Number of views/post of student, course information, student information, submitted assignments, and progress of assignments | Accuracy = 50% |
| [ | 2018 | Summative evaluation attributes | Accuracy = 69% |
| [ | 2020 | Gender, age, parent education, family size, test preparation, father job, mother job, absent days, parent status, travel time, and academic scores | — |
| [ | 2020 | Final grades | — |
Accuracy results for artificial neural networks (ANNs).
| Study | Year | Predictive features | Accuracy |
|---|---|---|---|
| [ | 2015 | Gender, location, type of school, high school score, CGPA, number of credits, and results | 84.6% |
| [ | 2016 | Test mark, class and lab performance, attendance, assignment, study time, previous result, family education, living area, drug addiction, affair, social media, and final year results | 88% |
| [ | 2016 | Online quizzes, email communication, content creation, and content interaction | 98.3% |
| [ | 2018 | Grades, gender, nationality, place of birth, section ID, topic, raised hand, discussion, class in 1st and 2nd terms, attendance, and parent satisfaction | 85.4%, |
| [ | 2018 | Gender, attendance, results, economic status, and parental education | - |
| [ | 2019 | Gender, CGPA, English, Chinese, math, science, and proficiency test | 84.8% |
| [ | 2019 | Gender, content score, time spent, homework score, and attendance | 80.5% |
| [ | 2019 | CourseID, total of learning sessions, length of sessions, total of assessments of semester 1, grades, quizzes, and emails sent | 97.4% |
| [ | 2019 | Gender, nationality, place of birth, StageID, GradeID, SectionID, topic, semester, relation, raised hands, discussion, parent survey and satisfaction, and attendance | 73.5% |
| [ | 2020 | Gender, age, address location, parent job, travel time, study time, free time, failures, activities, health, and abstance | 64.40% |
| [ | 2021 | Gender, region, educational level, age range, neighborhood crime rate (IMD), number of times they have previously participated in the course, enrolled credits, disability, and the final exam result (passed/failed). In addition, the number of times the student has interacted with any of the online course contents has been counted throughout the courses | 78.20% |
| [ | 2020 | Gender, content score, time spent, number of entries to content, homework score, attendance, and archived courses | 80.47% |
| [ | 2021 | 123 variables | 82.10% (high) |
| [ | 2021 | 116 features for the production and 84 for the learning phase | 80.76% and 86.57% |
Accuracy results for Naive Bayes.
| Study | Year | Predictive features | Accuracy (%) |
|---|---|---|---|
| [ | 2015 | Attendance, internal grade, computer skills, school level, mobile, tuition, type of school, type of board, and gender | 65.1 |
| [ | 2016 | Age, section, program, method, place of birth, transport, subject, motivation level, homework, tuition, parent education, attendance, communication, GPA, quiz, assignment, lab test, and final exam | 86 |
| [ | 2017 | List of subjects and grades | 83.6 |
| [ | 2018 | Gender, age, admission, attendance, study mode, program, education status, book resources, and quiz | 72.4 |
| [ | 2018 | CGPA, high risk, coursework, examination, plagiarism count, campus access, and off-campus access | 90 |
| [ | 2015 | Number of views/post of student, course information, student information, submitted assignments, and progress of assignments | 96.9 |
Accuracy results for K-nearest neighbor.
| Study | Year | Predictive features | Accuracy (%) |
|---|---|---|---|
| [ | 2017 | Gender, age, knowledge score, skill score, CGPA, group heterogeneity, and label class | 95.5 |
| [ | 2017 | School, gender, address, family size, parent status, parent job, guardian, support, activities, nursery, internet, and romantic relationship | 93 |
| [ | 2018 | Parent income, semester, family members, and CGPA | 95.8 |
| [ | 2019 | Nationality, gender, place of birth, parent responsibility, stages, grades, SectionID, topic, attendance, semester, raised hand, visited resource, discussion, and parent satisfaction | 69 |
| [ | 2019 | Gender, age, school, address, parent status, parent education, parent job, family size, guardian, travel time, and study time | 88 |
| [ | 2020 | Absence, virtual learning access, voting system result, presentation result, and personal report result | 74 |
Accuracy results for support vector machine (SVM).
| Study | Year | Predictive features | Accuracy (%) |
|---|---|---|---|
| [ | 2016 | Attendance, class time, class length, instructor knowledge, instructor appearance, performance, assignments, exams, course materials, communication, motivation, learning outcomes, and grades | 91.3 |
| [ | 2018 | Specialization, subject, programming skills, analytical skills, personal details, memory, workshops, certifications, and sports | 90.3 |
| [ | 2019 | Gender, race, grades, and subjects | 77 |
| [ | 2019 | Gender, nationality, place of birth, relation, StageID, SectionID, GradeID, topic, semester, raised hands, visited resources, announcement view, discussion, parent satisfaction, and attendance | 66 |
| [ | 2019 | Motivation, personality, learning strategies, socio-economic status, learning approach, and psychosocial influences | 90 |
| [ | 2019 | Performance, subjects, parental status, family size, location, and address | 79.4 |
| [ | 2020 | Gender, age, address location, parent job, Travel time, study time, free time, failures, activities, health, and abstance | 71.2 |
Figure 5Prediction accuracy categorized by methods from 2015 to 2021.