| Literature DB >> 30278049 |
Jef Vanderoost1,2, Rianne Janssen3, Jan Eggermont4, Riet Callens1,2, Tinne De Laet1,2.
Abstract
BACKGROUND AND HYPOTHESES: This study is the first to offer an in-depth comparison of elimination testing with the scoring rule of Arnold & Arnold (hereafter referred to as elimination testing with adapted scoring) and negative marking. As such, this study is motivated by the search for an alternative for negative marking that still discourages guessing, but is less disadvantageous for non-relevant student characteristics such a risk-aversion and does not result in grade inflation. The comparison is structured around seven hypotheses: in comparison with negative marking, elimination testing with adapted scoring leads to (1) a similar average score (no grade inflation); (2) students expressing their partial knowledge; (3) a decrease in the number of blank answers; (4) no gender bias in the number of blank answers; (5) a reduction in guessing; (6) a decrease in self-reported test anxiety; and finally (7) students preferring elimination testing with adapted scoring over negative marking.Entities:
Mesh:
Year: 2018 PMID: 30278049 PMCID: PMC6168139 DOI: 10.1371/journal.pone.0203931
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Exams test design and number of selected students by course, master/scoring method, examination method and gender.
| course | master | scoring method | examination moment | number of students | gender | |
|---|---|---|---|---|---|---|
| male | female | |||||
| 168 | 79 (42%) | 89 (58%) | ||||
| 179 | 89 (49%) | 90 (51%) | ||||
| 217 | 89 (41%) | 128 (59%) | ||||
| 119 | 57 (48%) | 62 (52%) | ||||
| 168 | 79 (42%) | 89 (58%) | ||||
| 179 | 89 (49%) | 90 (51%) | ||||
| 217 | 89 (41%) | 128 (59%) | ||||
| 119 | 57 (48%) | 62 (52%) | ||||
Test administration design for comparing negative marking and elimination testing with adapted scoring. The number of 1st and 2nd master students and their gender is indicated for the two examination moments of each course.
Response rate on online questionnaire per examination moment.
| master | course & scoring method | examination moment | all students | gender | |
|---|---|---|---|---|---|
| male | female | ||||
| 60 (35.7%) | 38.0% | 33.7% | |||
| 47 (26.3%) | 23.6% | 28.9% | |||
| 56 (25.8%) | 29.2% | 23.4% | |||
| 20 (16.8%) | 15.8% | 17.7% | |||
Response rate for the online questionnaire divided over the different examination moments for the 683 students of the sample studies in this paper specified in Table 1.
Fig 1Example of a multiple-choice question for negative marking and elimination testing with adapted scoring with five alternatives (n = 5).
Answering patterns, knowledge levels, and corresponding scores for negative marking and elimination testing with traditional or adapted scoring for five alternatives (n = 5) with A the correct answer.
| answering pattern | [A B C D E] | negative marking | elimination testing | |||
|---|---|---|---|---|---|---|
| score | knowledge level | scoring | knowledge level | |||
| traditional | adapted | |||||
| no doubt | [1 0 0 0 0] | 1 | full knowledge | 1 | 1 | full knowledge |
| [0 0 1 0 0] | −1/4 | misconception | −1/4 | −1/4 | partial misconception 1 | |
| doubt two | [1 1 0 0 0] | - | - | 3/4 | 3/8 | partial knowledge 1 |
| [0 1 1 0 0] | - | - | −2/4 | −1/4 | partial misconception 2 | |
| doubt three | [1 1 0 1 0] | - | - | 2/4 | 1/6 | partial knowledge 2 |
| [0 1 1 1 0] | - | - | −3/4 | −1/4 | partial misconception 3 | |
| doubt four | [1 1 0 1 1] | - | - | 1/4 | 1/16 | partial knowledge 3 |
| [0 1 1 1 1] | - | - | −1 | −1/4 | total misconception | |
| blank | [0 0 0 0 0] | 0 | no knowledge | 0 | 0 | no knowledge |
The [A B C D E] column provides an example of the answering pattern: 1 corresponds to an alternative indicated by the student (as “could be” in elimination testing), while 0 corresponds to an alternative not indicated by the student (indicated as “cannot be” in elimination testing).
Average score and standard deviation for different examination moments for both elimination testing with adapted scoring and negative marking.
| course | master | scoring method | examination moment | score | F-test | t-test |
|---|---|---|---|---|---|---|
| 14.98 (2.05) | 0.907 (0.510 | -0.673 (0.502 | ||||
| 14.07 (2.25) | 0.670 (0.016 | 3.031 (0.003 | ||||
| 15.13 (2.15) | ||||||
| 13.15 (2.75) | ||||||
| 13.17 (2.63) | 0.909 (0.507 | -4.935 (<0.001 | ||||
| 12.05 (3.22) | 1.558 (0.007 | -4.635 (<0.001 | ||||
| 14.53 (2.76) | ||||||
| 13.68 (2.58) |
Average grade and standard deviation for the different exams, examination moments, and scoring methods (negative marking and elimination with adapted scoring). Additionally the result of the t-tests for the hypothesis “in comparison with negative marking, elimination testing with adapted scoring leads to a similar average score (no grade inflation)”. The t-tests and F-tests are done between the different scoring methods (negative marking vs elimination testing with adapted scoring) for the same examination moments. Depending on the result of the F-test a two-sided t-test for equal or unequal variances was used. Superscripts indicate levels of significance using the following coding
ns p > 0.05
* p < 0.05
** p < 0.01
*** p < 0.001.
Average score and standard deviation for different test moments and gender using scoring methods negative marking and elimination with adapted scoring.
| course | master | scoring method | examination moment | gender | |
|---|---|---|---|---|---|
| male | female | ||||
| 14.81 (2.25) | 15.14 (1.85) | ||||
| 13.94 (2.49) | 14.20 (2.00) | ||||
| 14.81 (2.07) | 15.35 (2.19) | ||||
| 12.66 (2.93) | 13.60 (2.52) | ||||
| 14.26 (3.16) | 14.77 (2.34) | ||||
| 13.44 (3.11) | 13.92 (1.91) | ||||
| 12.80 (2.49) | 13.42 (2.49) | ||||
| 11.29 (3.46) | 12.75 (2.83) | ||||
Average score and standard deviation for the different exams (master level / scoring method, examination moments, gender and scoring methods (negative marking and elimination with adapted scoring)
Grade point average over different examination moments and master.
| master | examination moment | gender | |
|---|---|---|---|
| male | female | ||
| 72.09 (8.84) | 72.11 (8.16) | ||
| 70.73 (9.12) | 72.13 (9.66) | ||
| 69.63 (9.48) | 71.15 (9.66) | ||
| 64.82 (12.58) | 68.53 (9.40) | ||
The average grade point average (%) and the standard deviation between brackets of male and female students from the 1st and 2nd master participating to the different examination moments.
Multi-way ANOVA between grade point average and factors examination moment (EM), gender and master level.
| response: grade point average | ||
|---|---|---|
| 7.95 | 0.005 | |
| 4.45 | 0.035 | |
| 17.07 | <0.001 | |
| 1.47 | 0.226 | |
| 4.35 | 0.037 | |
| 1.66 | 0.198 | |
| 0.08 | 0.781 |
ANOVA Table for Type II tests. ANOVA model: grade point average ~ examination moment * gender * master. Degrees of freedom (numerator, denominator) for all factors: (1, 675). Superscripts indicate levels of significance using the following coding
ns p > 0.05
* p < 0.05
** p < 0.01
*** p < 0.001.
Results of multiple linear regression for predicting exam score.
| Pediatrics | Gynaecology | |||||
|---|---|---|---|---|---|---|
| Intercept | 2.635 (1.101) | 2.393 | 0.017 | -5.187 (0.958) | -5.414 | <0.001 |
| grade point average | 0.170 (0.015) | 11.230 | <0.001 | 0.258 (0.014) | 18.929 | <0.001 |
| SM [negative marking] | 1.260 (1.265) | 0.996 | 0.320 | 0.440 (1.305) | 0.337 | 0.736 |
| gender [female] | -0.300 (0.932) | -0.321 | 0.748 | 3.377 (0.927) | 3.644 | <0.001 |
| examination moment (EM) [T2] | -2.974 (1.424) | -2.088 | 0.037 | -0.766 (1.194) | -0.642 | 0.521 |
| GPA | -0.013 (0.018) | -0.752 | 0.452 | 0.007 (0.018) | 0.396 | 0.692 |
| GPA | 0.006 (0.013) | 0.488 | 0.628 | -0.044 (0.013) | -3.317 | 0.001 |
| GPA | 0.031 (0.020) | 1.548 | 0.122 | 0.009 (0.017) | 0.507 | 0.613 |
| GPA | 0.011 (0.026) | 0.410 | 0.682 | -0.003 (0.026) | -0.129 | 0.897 |
| SM | 0.33 (0.236) | 0.563 | 0.574 | 0.071 (0.243) | 0.293 | 0.769 |
The table shows the regression coefficients (β) and the standard deviation between brackets for both the courses (Paediatrics and Gynaecology) (N = 683, R2 = 0.600 for Paediatrics and 0.706 for Gynaecology). GPA = grade point average, SM = scoring method, EM = examination moment. Superscripts indicate levels of significance using the following coding
ns p > 0.05
* p < 0.05
** p < 0.01
*** p < 0.001.
Percentage of students that show different answering patterns on at least one multiple-choice question.
| no doubt | blank | doubt | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| scoring method (master) | exam mo-ment | tot | male | female | tot | male | female | tot | male | female | |
| 100 | 100 | 100 | 50.3 | 55.6 | 44.9 | 97.2 | 95.6 | 98.9 | |||
| 100 | 100 | 100 | 31.6 | 24.1 | 38.3 | 95.5 | 95.2 | 95.7 | |||
| 100 | 100 | 100 | 71.1 | 73.2 | 69.6 | - | - | - | |||
| 100 | 100 | 100 | 81.6 | 85.7 | 77.5 | - | - | - | |||
| 100 | 100 | 100 | 93.1 | 88.5 | 97.1 | - | - | - | |||
| 100 | 100 | 100 | 91.6 | 88.8 | 94.6 | - | - | - | |||
| 100 | 100 | 100 | 65.6 | 61.8 | 68.6 | 97.2 | 96.4 | 97.9 | |||
| 100 | 100 | 100 | 76.8 | 75.0 | 78.5 | 98.4 | 96.7 | 100 | |||
ETA is the abbreviation for elimination testing with adapted scoring, PED abbreviates Pediatrics, and GY Gynaecoloy. The answering patterns are defined in Table 3. Doubt is an aggregation of doubt two, doubt three, and doubt four. The full details for the different doubt answering patterns is available in S2 Table and S3 Table.
Average number of questions for which a student showed each answering pattern.
| scoring method (master) | exam moment | no doubt | blank | doubt | ||||
|---|---|---|---|---|---|---|---|---|
| male | female | male | female | male | female | |||
| 33.62 (4.68) | 33.75 (3.93) | 0.42 (0.89) | 0.71 (1.23) | 5.96 (4.50) | 5.54 (3.49) | |||
| 33.12 (4.42) | 32.90 (3.97) | 0.78 (0.96) | 0.83 (1.19) | 6.10 (4.20) | 6.27 (3.53) | |||
| 37.60 (2.41) | 37.86 (2.32) | 2.40 (2.41) | 2.14 (2.32) | - | - | |||
| 37.19 (2.58) | 37.03 (2.78) | 3.28 (2.73) | 2.84 (2.74) | - | - | |||
| 73.41 (5.68) | 71.60 (5.92) | 6.59 (5.68) | 8.40 (5.92) | - | - | |||
| 72.63 (6.12) | 71.48 (6.70) | 7.37 (6.12) | 8.52 (6.70) | - | - | |||
| 63.73 (11.66) | 65.05 (10.68) | 2.81 (3.67) | 2.55 (3.41) | 13.46 (9.96) | 12.41 (8.97) | |||
| 60.16 (10.64) | 62.65 (9.44) | 4.25 (4.90) | 3.29 (4.87) | 15.60 (8.99) | 14.06 (7.04) | |||
Average number of questions for which a student showed each answering pattern and standard deviation (between brackets). ETA is the abbreviation for elimination testing with adapted scoring, PED abbreviates Pediatrics and GY Gynaecoloy. Paediatrics exams have 40 questions and Gynaecology exams have 80 questions. The answering patterns are defined in Table 3. Doubt is an aggregation of doubt two, doubt three, and doubt four. The full details for the different doubt answering patterns is available in the S2 Table and S3 Table.
Multi-way ANOVA between number of blank answers and factors gender, master/scoring method, examination moment and binned grade point average.
| Paediatrics | Gynaecology | |||
|---|---|---|---|---|
| response: number of blank answers | ||||
| 3.628 | 0.057 | 0.235 | 0.628 | |
| 0.038 | 0.846 | 5.446 | 0.020 | |
| 37.041 | <0.001 | 91.287 | <0.001 | |
| 150.683 | <0.001 | 218.196 | <0.001 | |
| 0.397 | 0.529 | 0.272 | 0.601 | |
| 0.086 | 0.918 | 0.689 | 0.502 | |
| 0.142 | 0.868 | 0.468 | 0.626 | |
| 0.176 | 0.675 | 0.009 | 0.925 | |
| 1.502 | 0.221 | 5.759 | 0.017 | |
| 15.373 | <0.001 | 5.169 | 0.006 | |
| 0.368 | 0.692 | 4.421 | 0.012 | |
| 0.051 | 0.821 | 0.122 | 0.727 | |
| 0.220 | 0.803 | 0.042 | 0.960 | |
| 0.379 | 0.685 | 0.019 | 0.981 | |
| 1,000 | 0.368 | 2.010 | 0.135 | |
ANOVA Table for Type II tests. ANOVA model: number of blank answers ~ examination moment * gender * scoring method*GPA_bin. Degrees of freedom (numerator, denominator) for all factors: (1,659) except for those in combination with binned grade point average (2,659). GPA_bin = binned grade point average, SM = scoring method, EM = examination moment. Superscripts indicate levels of significance using the following coding
ns p > 0.05
* p < 0.05
** p < 0.01
*** p < 0.001.
Results of multiple linear regression for predicting number of non-doubt answers.
| Paediatrics | Gynaecology | |||||
|---|---|---|---|---|---|---|
| Intercept | 15.125 (2.221) | 6.810 | <0.001 | 9.425 (4.257) | 2.214 | 0.027 |
| grade point average | 0.259 (0.031) | 8.484 | <0.001 | 0.778 (0.061) | 12.844 | <0.001 |
| scoring method [negative marking] | 15.110 (2.552) | 5.922 | <0.001 | 37.936 (5.796) | 6.545 | <0.001 |
| gender [female] | -1.936 (1.880) | -1.030 | 0.303 | -2.256 (4.118) | -0.548 | 0.584 |
| examination moment [T2] | -0.799 (2.873) | -0.278 | 0.781 | 20.821 (5.306) | 3.924 | <0.001 |
| GPA | -0.154 (0.035) | -4.351 | <0.001 | -0.417 (0.080) | -5.187 | <0.001 |
| GPA | 0.023 (0.026) | 0.911 | 0.363 | 0.037 (0.059) | 0.637 | 0.524 |
| GPA | 0.004 (0.040) | 0.100 | 0.920 | -0.313 (0.077) | -4.077 | <0.001 |
| GPA | -0.034 (0.476) | 0.868 | 0.386 | 0.3478 (0.118) | 2.956 | 0.003 |
| SM | 0.413 (0.052) | -0.651 | 0.515 | -2.191 (1.082) | -2.026 | 0.043 |
The table shows the regression coefficients (β) for both the courses (Paediatrics and Gynaecology) (N = 683, R2 = 0.444 for Paediatrics and 0.512 for Gynaecology). GPA = grade point average, SM = master/scoring method, EM = examination moment. Superscripts indicate levels of significance using the following coding
ns p > 0.05
* p < 0.05
** p < 0.01
*** p < 0.001.
Student responses on stress-related questions on elimination testing with adapted scoring and negative marking after receiving the exam score.
| question | N | mean | p-value t-test | mean male | mean | p-value wilcoxon | |
|---|---|---|---|---|---|---|---|
| I felt unsafe because I was able to choose more than one answer in elimination testing with adapted scoring. | 180 | 2.83 | 0.058 | 2.47 | 3.15 | <0.001 | |
| Being able to choose more than one answer in elimination testing with adapted scoring felt very safe. | 183 | 3.45 | <0.001 | 3.58 | 3.34 | 0.13 | |
| elimination testing with adapted scoring made me feel more relaxed, knowing that I can get a reasonable mark. | 182 | 2.59 | <0.001 | 2.84 | 2.37 | 0.002 | |
| My stress levels were high with elimination testing with adapted scoring. | 182 | 3.10 | 0.216 | 2.80 | 3.37 | <0.001 | |
| Having to choose just one answer in negative marking feels very risky. | 183 | 3.34 | <0.001 | 3.27 | 3.41 | 0.38 | |
| Being able to choose just one answer in negative marking feels very safe. | 181 | 2.98 | 0.787 | 2.99 | 2.97 | 0.91 | |
| It makes me feel more relaxed, knowing that I can get a reasonable mark. | 183 | 2.69 | <0.001 | 2.87 | 2.54 | 0.015 | |
| My stress levels were high with negative marking. | 183 | 3.63 | <0.001 | 3.37 | 3.87 | 0.001 | |
| I would be more stressed with negative marking than with elimination testing with adapted scoring. | 183 | 2.92 | 0.325 | 2.92 | 2.92 | 0.334 |
N indicates the number of responses. NM abbrebiates negative marking and ETA elimination testing with adapted scoring. The 5-point Likert scale of the questionnaire was converted to a numeric scale as follows: Strongly agree– 5, Agree– 4, Neither agree nor disagree– 3; Disagree– 2, Strongly disagree– 1. Superscripts indicate levels of significance using the following coding
ns p > 0.05
* p < 0.05
** p < 0.01
*** p < 0.001.
Student responses on comparative statements on elimination testing with adapted scoring and negative marking after receiving the exam score.
| questions | N | mean | p-value t-test | mean male | mean female | p-value wilcoxon |
|---|---|---|---|---|---|---|
| Negative marking is more difficult than elimination testing with adapted scoring. | 182 | 2.99 | 0.891 | 3.08 | 2.91 | 0.28 |
| negative marking will lead to a higher score compared to elimination testing with adapted scoring. | 175 | 2.80 | 0.009 | 2.67 | 2.92 | 0.13 |
| Negative marking will lead to a lower score compared to elimination testing with adapted scoring. | 174 | 2.87 | 0.074 | 2.98 | 2.78 | 0.17 |
| Elimination testing with adapted scoring will lead to a higher score compared to negative marking. | 175 | 2.79 | 0.005 | 2.93 | 2.66 | 0.081 |
| There is a higher chance of getting answers right with elimination testing with adapted scoring than with negative marking. | 182 | 3.62 | <0.001 | 3.62 | 3.62 | 0.84 |
| I would be more stressed with negative marking than with elimination testing with adapted scoring. | 183 | 2.92 | 0.325 | 2.92 | 2.92 | 0.97 |
| After taking all aspects into consideration, I prefer negative marking. | 183 | 2.85 | 0.109 | 2.77 | 2.93 | 0.34 |
| After taking all aspects into consideration, I prefer elimination testing with adapted scoring. | 183 | 3.16 | 0.074 | 3.26 | 3.08 | 0.28 |
| I expected a higher mark for negative marking. | 181 | 3.06 | 0.444 | 3.07 | 3.05 | 0.99 |
| I expected a higher mark for elimination testing with adapted scoring. | 181 | 2.71 | <0.001 | 2.62 | 2.79 | 0.27 |
| I expected to do equally as well for both (elimination testing with adapted scoring or negative marking) tests. | 179 | 2.80 | 0.008 | 2.83 | 2.77 | 0.73 |
| I prefer to be rewarded for knowing or guessing the answers exactly even though there is a penalty for answering or guessing incorrectly. | 179 | 3.41 | <0.001 | 3.56 | 3.28 | 0.11 |
| I prefer to be rewarded for demonstrating my partial and full knowledge rather than guessing what the right answer is. | 182 | 3.84 | <0.001 | 3.72 | 3.95 | 0.24 |
| I need more time to answer in elimination testing with adapted scoring compared to negative marking. | 183 | 4.40 | <0.001 | 4.52 | 4.29 | 0.29 |
N indicates the number of responses. The 5-point Likert scale of the questionnaire was converted to a numeric scale as follows: Strongly agree– 5, Agree– 4, Neither agree nor disagree– 3; Disagree– 2, Strongly disagree– 1. Superscripts indicate levels of significance using the following coding
ns p > 0.05
* p < 0.05
** p < 0.01
*** p < 0.001.