| Literature DB >> 35603317 |
Abstract
Educational institutions abruptly implemented online higher education to cope with sanitary distance restrictions in 2020, causing an increment in student failure. This negative impact attracts the analyses of online higher education as a critical issue for educational systems. The early identification of students at risk is a strategy to cope with this issue by predicting their performance. Computational techniques are projected helpful in performing this task. However, the accurateness of predictions and the best model selection are goals in progress. This work objective is to describe two experiments using student grades of an online higher education program to build and apply three classifiers to predict student performance. In the literature, the three classifiers, a Probabilistic Neural Network, a Support Vector Machine, and a Discriminant Analysis, have proved efficient. I applied the leave-one-out cross-validation method, tested their performances by five criteria, and compared their results through statistical analysis. The analyses of the five performance criteria support the decision on which model applies given particular prediction goals. The results allow timely identification of students at risk of failure for early intervention and predict which students will succeed.Entities:
Keywords: Discriminant analysis; Online higher education; Probabilistic neural network; Student performance prediction; Support vector machine
Year: 2022 PMID: 35603317 PMCID: PMC9110636 DOI: 10.1007/s10639-022-11106-4
Source DB: PubMed Journal: Educ Inf Technol (Dordr) ISSN: 1360-2357
Description of the data sets
| Subject | #Set | Total Enrolment | Pass | Fail | Attrition | Average |
|---|---|---|---|---|---|---|
| Logical-mathematical thinking (DPLM) | 1 | 180 | 51 | 99 | 30 | 36.09 |
| Project planning (DPP) | 2 | 165 | 45 | 56 | 64 | 29.69 |
| Situational diagnosis (DS) | 3 | 178 | 61 | 65 | 52 | 38.70 |
| Computer Fundamentals (CFC) | 4 | 169 | 42 | 69 | 58 | 28.35 |
Statistics of BD1 and BD2 data sets
| Subject | BD1 – 692 records | BD2 – 488 records | ||
|---|---|---|---|---|
| U1 Scores | Final scores | U1 Scores | Final scores | |
| DPLM | 57.20 | 36.09 | 68.65 | 43.31 |
| DPP | 36.31 | 29.69 | 59.31 | 48.50 |
| DS | 56.44 | 38.70 | 78.93 | 54.13 |
| CFC | 42.43 | 28.35 | 65.64 | 43.90 |
Fig. 1Perfectly linear separable case. Note: adapted from James et al. (2013)
Fig. 2Diagram of a PNN applied to the first data set of this study
Confusion matrix
| Predicted Class | |||
|---|---|---|---|
| Actual Class | Fail | Pass | Total |
| Fail | TP | FN | P |
| Pass | FP | TN | N |
| P + N | |||
Evaluation measures for classifier performance
| Measure | Formula | Description |
|---|---|---|
| Accuracy | Refers to the total percentage of cases that the model correctly labels, meaning the recognition rate | |
| Recall | Describes the quality of the model in terms of the number of positive cases correctly labelled from the total positive cases labelled | |
| Sensitivity | Reports the number of correct cases identified as positive from the total number of positive cases | |
| Specificity | Reports the number of correct cases identified as negative from the total number of negative cases | |
| Combines the Sensitivity and Recall measures to avoid the impact of unbalanced data |
Imbalance ratio by data set
Results in SVM, PNN, and AD performances by data set, using BD1 and BD2. The Size column corresponds to the total records by subject used
| Exp | Data set | Size | Accuracy | Recall | Sensitivity | Specificity | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | PNN | DA | SVM | PNN | DA | SVM | PNN | DA | SVM | PNN | DA | SVM | PNN | DA | |||
| BD1 | DPLM | 180 | 84.44 | 81.67 | 77.22 | 91.20 | 94.55 | 97.87 | 87.02 | 79.39 | 70.23 | 77.55 | 87.75 | 95.91 | 89.06 | 86.31 | 81.78 |
| DP | 165 | 94.55 | 94.55 | 93.33 | 98.25 | 98.25 | 98.25 | 94.12 | 94.12 | 92.44 | 95.65 | 95.65 | 91.30 | 96.14 | 96.14 | 95.24 | |
| DS | 178 | 84.91 | 86.03 | 75.98 | 92.59 | 92.73 | 98.72 | 84.03 | 85.71 | 64.71 | 86.66 | 86.66 | 98.33 | 88.11 | 89.08 | 78.17 | |
| CFC | 169 | 85.80 | 85.80 | 85.80 | 93.22 | 100 | 100 | 87.30 | 80.95 | 80.95 | 81.39 | 100 | 100 | 90.16 | 89.47 | 89.47 | |
| Mean | 87.43 | 87.01 | 83.08 | 93.82 | 96.38 | 98.71 | 88.12 | 85.04 | 77.08 | 85.31 | 92.52 | 96.39 | 90.87 | 90.25 | 86.17 | ||
| Median | 85.36 | 85.92 | 81.51 | 92.91 | 96.40 | 98.49 | 87.16 | 83.33 | 75.59 | 84.03 | 91.70 | 97.12 | 89.61 | 89.28 | 85.63 | ||
| BD2 | DPLMb | 150 | 81.33 | 80.67 | 73.33 | 88.42 | 91.86 | 96.92 | 83.17 | 78.22 | 62.38 | 77.55 | 85.71 | 95.91 | 85.71 | 84.49 | 75.90 |
| DPb | 101 | 92.07 | 90.10 | 90.10 | 96.15 | 92.59 | 96.00 | 89.29 | 89.29 | 85.71 | 95.55 | 91.11 | 95.55 | 92.59 | 90.91 | 90.57 | |
| DSb | 126 | 78.91 | 81.25 | 74.22 | 85.96 | 86.67 | 84.31 | 72.06 | 76.47 | 63.24 | 86.66 | 86.66 | 86.66 | 78.40 | 81.25 | 72.27 | |
| CFCb | 109 | 79.92 | 79.92 | 79.92 | 86.21 | 97.83 | 89.29 | 75.76 | 68.18 | 75.76 | 81.39 | 97.67 | 86.04 | 80.65 | 80.36 | 81.97 | |
| Mean | 83.03 | 82.99 | 79.39 | 89.19 | 92.24 | 91.63 | 80.07 | 78.04 | 71.77 | 85.29 | 90.29 | 91.04 | 84.34 | 84.25 | 80.18 | ||
| Median | 80.63 | 80.96 | 77.07 | 87.32 | 92.23 | 92.65 | 79.47 | 77.35 | 69.50 | 84.03 | 88.89 | 91.11 | 83.18 | 82.87 | 78.94 | ||
Exp: experiment; SVM: Support Vector Machine; PNN: Probabilistic Neural Network; DA: Discriminant Analysis; BD1: Database 1; BD2: Database 2
Statistical significance analysis based on the Friedman test by performance measure
| BD1 | BD2 | |
|---|---|---|
| Measure | p-value | p-value |
| Accuracy | 0.0459 | 0.1482 |
| Recall | 0.0597 | 0.7788 |
| Sensitivity | 0.0495 | 0.2231 |
| Specificity | 0.3678 | 0.4412 |
| 0.0439 | 0.3678 |
Statistical significance analysis based on the t-student test by performance measure
| Measure | |||
|---|---|---|---|
| SVM-DA | PNN-DA | SVMPNN | |
| Accuracy | 0.0477 | 0.1814 | 0.9955 |
| Sensitivity | 0.0395 | 0.0914 | 0.2387 |
| 0.0495 | 0.3114 | 0.7892 | |
| H | The SVM performance is equal to that of the DA |
| H | The SVM performance is not equal to that of the DA |
| H | The SVM performance is equal to that of the PNN |
| H | The SVM performance is not equal to that of the PNN |
| H | The PNN performance is equal to that of the DA |
| H | The PNN performance is not equal to that of the DA |