| Literature DB >> 34347794 |
Firuz Kamalov1, Hana Sulieman2, David Santandreu Calonge3.
Abstract
The COVID-19 pandemic has impelled the majority of schools and universities around the world to switch to remote teaching. One of the greatest challenges in online education is preserving the academic integrity of student assessments. The lack of direct supervision by instructors during final examinations poses a significant risk of academic misconduct. In this paper, we propose a new approach to detecting potential cases of cheating on the final exam using machine learning techniques. We treat the issue of identifying the potential cases of cheating as an outlier detection problem. We use students' continuous assessment results to identify abnormal scores on the final exam. However, unlike a standard outlier detection task in machine learning, the student assessment data requires us to consider its sequential nature. We address this issue by applying recurrent neural networks together with anomaly detection algorithms. Numerical experiments on a range of datasets show that the proposed method achieves a remarkably high level of accuracy in detecting cases of cheating on the exam. We believe that the proposed method would be an effective tool for academics and administrators interested in preserving the academic integrity of course assessments.Entities:
Year: 2021 PMID: 34347794 PMCID: PMC8336856 DOI: 10.1371/journal.pone.0254340
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Both sequences of scores consist of the same values, but in different order.
The steady progression of grades of Student 2 makes a score of 95 on the final exam seem plausible. On the other hand, the pattern of grades for Student 1 makes a grade of 95 on the final exam highly unexpected.
Fig 21D kernel density estimate of a Gaussian distribution using various bandwidth values.
The neural network architecture for the proposed algorithm.
| Hidden layer 1 | Hidden layer 2 | Hidden layer 3 | |
|---|---|---|---|
| Type | LSTM | Fully connected | Fully connected |
| Dimension | 8 | 64 | 32 |
Fig 3Representative samples of the simulated datasets used in our experiments.
The datasets capture different scenarios for the distribution of the grades. (a) A representative sample of Dataset 1 grades. The dataset consists of 91% normal and 9% anomalous grades. The normal grades consist of three quarters homogeneous grades and one quarter increasing grades. The anomalous grades rise sharply—by 35 points—during the final exam. (b) A representative sample of Dataset 2 grades. The dataset is similar to Dataset 1. However, the anomalous grades rise less sharply—by 20 points—during the final exam. As a result, the outliers are harder to identify. (c) A representative sample of Dataset 3 grades. The dataset is similar to Dataset 2. However, around 10% of the normal grades are increasing at an incremental pace so that the difference between the average prior and final exam scores are same as in the anomalous instances. As a result, it is even more challenging to identify the outlier scores. (d) A representative sample of Dataset 4 grades. The dataset is designed to simulate a scenario when the final exam is easy and everyone receives a relatively high grade. The normal final exam scores 10 points higher than the average on prior assessments. The anomalous final exam scores 25 points higher than the average preceding scores.
The mean and standard deviation TPR for the anomaly detection algorithms.
The results represent experiments on four datasets based on 20 simulated experiments. The proposed method (NewAlgo) produces the best overall results.
| DS1 | DS2 | DS3 | DS4 | Overall | |
|---|---|---|---|---|---|
| Naive | 1.000±0.00 | 0.780±0.112 | 0.490±0.158 | 1.000±0.000 | 0.818 |
| RobustCov | 0.975±0.043 | 0.420±0.194 | 0.020±0.068 | 0.450±0.470 | 0.466 |
| IsoForest | 0.24±0.132 | 0.000±0.000 | 0.000±0.000 | 0.015±0.036 | 0.065 |
| LOF | 0.040±0.058 | 0.045±0.074 | 0.055±0.086 | 0.055±0.092 | 0.049 |
| NewAlgo | 0.915±0.111 | 0.760±0.097 | 0.840±0.107 | 0.980±0.051 | 0.874 |
The mean and standard deviation of FPR for the anomaly detection algorithms.
The results represent experiments on four datasets based on 20 simulated experiments.
| DS1 | DS2 | DS3 | DS4 | Overall | |
|---|---|---|---|---|---|
| Naive | 0.000±0.000 | 0.022±0.011 | 0.051±0.016 | 0.000±0.000 | 0.018 |
| RobustCov | 0.002±0.004 | 0.058±0.019 | 0.098±0.007 | 0.055±0.047 | 0.053 |
| IsoForest | 0.076±0.013 | 0.100±0.000 | 0.100±0.000 | 0.098±0.004 | 0.094 |
| LOF | 0.096±0.006 | 0.096±0.007 | 0.094±0.009 | 0.094±0.009 | 0.095 |
| NewAlgo | 0.008±0.011 | 0.024±0.010 | 0.016±0.011 | 0.002±0.005 | 0.013 |
The mean TPR for the anomaly detection algorithms.
The results represent experiments on four datasets based on 20 simulated experiments and the class size of 220. The proposed method (NewAlgo) produces the best overall results.
| DS1 | DS2 | DS3 | DS4 | Overall | |
|---|---|---|---|---|---|
| Naive | 1.000 | 0.755 | 0.515 | 1.000 | 0.818 |
| RobustCov | 0.972 | 0.402 | 0.022 | 0.267 | 0.416 |
| IsoForest | 0.280 | 0.008 | 0.002 | 0.038 | 0.082 |
| LOF | 0.092 | 0.053 | 0.038 | 0.042 | 0.056 |
| NewAlgo | 0.915 | 0.780 | 0.807 | 0.988 | 0.872 |
The scores of the (true) cheating cases and the outlier cases determined by the detection methods in DS2 dataset.
| Cheating | Quiz 1 | 66 | 60 | 64 | 69 | 62 | 63 | 65 | 68 | 66 | 68 |
| Quiz 3 | 67 | 69 | 62 | 68 | 69 | 64 | 67 | 64 | 66 | 68 | |
| Final | 89 | 86 | 84 | 88 | 87 | 87 | 81 | 80 | 88 | 83 | |
| RobCov | Quiz 1 | 50 | 51 | 50 | 50 | 50 | 50 | 51 | 52 | 50 | 63 |
| Quiz 3 | 68 | 65 | 64 | 61 | 70 | 71 | 71 | 70 | 68 | 64 | |
| Final | 79 | 77 | 78 | 77 | 79 | 76 | 79 | 79 | 78 | 87 | |
| IsoForest | Quiz 1 | 54 | 59 | 58 | 57 | 54 | 87 | 89 | 83 | 88 | 50 |
| Quiz 3 | 57 | 55 | 51 | 51 | 51 | 89 | 85 | 89 | 89 | 56 | |
| Final | 51 | 50 | 58 | 54 | 52 | 81 | 87 | 89 | 80 | 60 | |
| LOF | Quiz 1 | 69 | 78 | 79 | 82 | 87 | 89 | 88 | 53 | 59 | 53 |
| Quiz 3 | 69 | 77 | 78 | 89 | 89 | 85 | 89 | 67 | 71 | 66 | |
| Final | 65 | 79 | 75 | 80 | 81 | 87 | 80 | 70 | 76 | 68 |
The true positive and false positive rates of the anomaly detection algorithms.
The results represent experiments on a single real-life dataset. The proposed method (NewAlgo) produces the best overall results.
| Naive | RobustCov | IsoForest | LOF | NewAlgo | |
|---|---|---|---|---|---|
| TPR | 0.67 | 0.33 | 0.33 | 0.67 | 1 |
| FPR | 0.06 | 0.08 | 0.08 | 0.06 | 0.04 |