| Literature DB >> 23437081 |
A Elizabeth Bond1, Owen Bodger, David O F Skibinski, D Hugh Jones, Colin J Restall, Edward Dudley, Geertje van Keulen.
Abstract
Multiple-choice question (MCQ) examinations are increasingly used as the assessment method of theoretical knowledge in large class-size modules in many life science degrees. MCQ-tests can be used to objectively measure factual knowledge, ability and high-level learning outcomes, but may also introduce gender bias in performance dependent on topic, instruction, scoring and difficulty. The 'Single Answer' (SA) test is often used in which students choose one correct answer, in which they are unable to demonstrate partial knowledge. Negatively marking eliminates the chance element of guessing but may be considered unfair. Elimination testing (ET) is an alternative form of MCQ, which discriminates between all levels of knowledge, while rewarding demonstration of partial knowledge. Comparisons of performance and gender bias in negatively marked SA and ET tests have not yet been performed in the life sciences. Our results show that life science students were significantly advantaged by answering the MCQ test in elimination format compared to single answer format under negative marking conditions by rewarding partial knowledge of topics. Importantly, we found no significant difference in performance between genders in either cohort for either MCQ test under negative marking conditions. Surveys showed that students generally preferred ET-style MCQ testing over SA-style testing. Students reported feeling more relaxed taking ET MCQ and more stressed when sitting SA tests, while disagreeing with being distracted by thinking about best tactics for scoring high. Students agreed ET testing improved their critical thinking skills. We conclude that appropriately-designed MCQ tests do not systematically discriminate between genders. We recommend careful consideration in choosing the type of MCQ test, and propose to apply negative scoring conditions to each test type to avoid the introduction of gender bias. The student experience could be improved through the incorporation of the elimination answering methods in MCQ tests via rewarding partial and full knowledge.Entities:
Mesh:
Year: 2013 PMID: 23437081 PMCID: PMC3577794 DOI: 10.1371/journal.pone.0055956
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Descriptive statistics of MCQ tests and student surveys.
| L1 | L2 | |
| Students drawn from degree courses | (Medical) Biochemistry, (Medical)Genetics, Biochemistry & Genetics(Joint Hon.), Biology, MarineBiology, Zoology | (Medical) Biochemistry, (Medical) Genetics,Biochemistry & Genetics (Joint Hon.) |
| Enrolled number of students | 198 | 45 |
| Number of participants (% of total) | 142 (72%) | 40 (88%) |
| Paired answer sheets | 136 | 40 |
| Number of Females | 74 | 14 |
| Number of Males | 62 | 26 |
| Number of MCQ per test | 25 | 25 |
| Number of survey respondents | 142 | 40 |
| Number of post-survey respondents | 76 | 17 |
Scoring grid for SA and ET MCQ tests with negative marking.
| Student indicates: | Single Answer MCQ | Elimination Answer MCQ |
| Correct Answer | +4 marks | −4 marks |
| Incorrect Answer | −1 marks | +1 mark for each answer |
| No Answer | 0 marks | 0 marks |
Figure 1Mean (± standard error SE) of overall test score performances in L1 (top) and L2 (bottom) SA and ET style MCQ assessments.
L1 and L2 student responses immediately after sitting of the ET and SA tests on a 6 item Likert scale (survey scores).
| L1 | L2 | |||||||||||
| survey scores (n = 142) | survey scores (n = 40) | |||||||||||
| ET | SA | ET | SA | |||||||||
| Survey statements | mean | median | P | mean | median | P | mean | median | P | mean | median | P |
| There is no reward for random guessing | 2.606 | 2 | <0.001 | 2.268 | 2 | <0.001 | 2.949 | 3 |
| 1.967 | 2 | <0.001 |
| Loosing marks for guessing detracted | 3.152 | 3 |
| 3.141 | 3 |
| 2.925 | 3 |
| 2.575 | 2 | 0.018 |
| Being able to choose more than one answerfelt very safe | 2.167 | 2 | <0.001 | 2.500 | 2 | <0.001 | 2.025 | 2 | <0.001 | 2.222 | 2 | 0.002 |
| There is a high chance of getting answers right | 2.779 | 2 | 0.023 | 3.555 | 4 | <0.001 | 2.575 | 2 | 0.01 | 3.250 | 3 |
|
| The answering options were confusing | 3.577 | 4 | <0.001 | 3.654 | 4 | <0.001 | 3.525 | 4 | 0.014 | 3.692 | 4 | 0.002 |
| I got distracted by thinking about the besttactics for getting a high mark | 3.657 | 4 | <0.001 | 3.686 | 4 | <0.001 | 3.425 | 4 | 0.021 | 3.400 | 3 |
|
| It makes you think more about your answers | 2.333 | 2 | <0.001 | 2.265 | 2 | <0.001 | 2.462 | 2 | 0.003 | 2.275 | 2 | 0.003 |
| It made me feel more relaxed, knowing thatI can get a reasonable mark | 2.686 | 2 | <0.001 | 3.314 | 4 | <0.001 | 2.450 | 2 | 0.009 | 3.475 | 4 | 0.031 |
| I could answer conservatively by hedging my bets | 2.547 | 2 | <0.001 | 3.788 | 4 | <0.001 | 2.600 | 2 | 0.033 | 3.875 | 4 | <0.001 |
| It was a fair test | 2.304 | 2 | <0.001 | 2.356 | 2 | <0.001 | 2.425 | 2 | 0.007 | 3.125 | 3 |
|
| The test score will accurately reflect myknowledge | 2.971 | 3 |
| 2.748 | 2 | 0.012 | 2.525 | 2 | 0.011 | 3.175 | 3 |
|
| It enhanced my critical thinking skills | 2.859 | 3 | 0.047 | 2.926 | 3 |
| 2.600 | 2 | 0.017 | 2.974 | 3 |
|
| The questions were easy to answer | 3.123 | 3 |
| 3.120 | 3 |
| 2.825 | 3 |
| 3.125 | 3 |
|
| I was scared to answer some questions | 3.029 | 3 |
| 2.583 | 2 | <0.001 | 3.100 | 3 |
| 2.850 | 3 |
|
| I was confident to answer some questions | 2.139 | 2 | <0.001 | 2.289 | 2 | <0.001 | 2.425 | 2 | 0.004 | 2.425 | 2 | 0.002 |
| It made me feel motivated | 2.942 | 3 |
| 3.071 | 3 |
| 2.846 | 3 |
| 3.524 | 4 | 0.033 |
| My stress levels were high | 3.628 | 4 | <0.001 | 3.223 | 3 | 0.044 | 3.067 | 3 |
| 3.000 | 3 |
|
| It gave me confidence for the January exams | 2.759 | 2 | 0.003 | 2.857 | 3 |
| 2.600 | 3 |
| 2.867 | 3 |
|
A P-score of <0.05 indicates a significant difference to the neutral response (Likert item 3); P-scores in italics indicate differences that are not significant to a neutral response.
Figure 2Mean (± standard error SE) of overall test score performances by gender: (top left) L1 SA MCQ, (top right) L1 ET MCQ, (bottom left) L2 SA MCQ, and (bottom right) L2 ET MCQ.
L1 and L2 student responses on comparative statements on ET and SA MCQ testing on a 6 item Likert scale immediately after sitting the test (survey scores) and after receiving test and formal examination results (post survey scores).
| L1 | L2 | |||||||||||
| survey scores (n = 142) | post survey scores (n = 76) | survey scores (n = 40) | post survey scores (n = 17) | |||||||||
| Comparison of ET and SA answering options | mean | median | P | mean | median | P | mean | median | P | mean | median | P |
| SA testing will lead to a higher score compared to ET | 3.661 | 4 | <0.001 | 3.700 | 4 | 0.001 | ||||||
| SA testing will lead to a lower score compared to ET | 2.896 | 3 |
| 2.567 | 3 |
| ||||||
| ET will lead to a higher score compared to SA | 2.518 | 2 | <0.001 | 2.667 | 2.5 |
| ||||||
| There is a higher chance of getting answers right with ET than with SA | 2.526 | 2 | <0.001 | 2.467 | 2 | 0.017 | ||||||
| I was more stressed with SA testing than with ET | 2.661 | 2 | 0.008 | 2.167 | 2 | 0.001 | ||||||
| After taking all aspects into consideration, I prefer SA testing | 3.530 | 4 | <0.001 | 2.577 | 2 | 0.015 | 3.600 | 4 | 0.03 | 4.056 | 4 | 0.002 |
| After taking all aspects into consideration, I prefer ET testing | 2.600 | 2 | 0.002 | 3.282 | 3 |
| 2.333 | 3 |
| 1.667 | 1.5 | <0.001 |
| The results from both MCQ tests were as I expected | 2.870 | 3 |
| 2.333 | 2 | 0.018 | ||||||
| I expected a higher mark for the elimination test | 3.680 | 4 | <0.001 | 2.167 | 2 | 0.002 | ||||||
| I expected a higher mark for the single answer test | 3.539 | 3.5 | <0.001 | 3.278 | 3 |
| ||||||
| I expected to do equally as well for both MCQ tests | 2.889 | 4 | 0.003 | 3.539 | 3 |
| ||||||
| I prefer to be rewarded for knowing or guessing the answers exactly | 2.474 | 2 | <0.001 | 3.611 | 4 | 0.047 | ||||||
| I prefer to be rewarded for demonstrating my partial and full knowledge | 2.526 | 2 | <0.001 | 1.667 | 2 | <0.001 | ||||||
| My revision for the voluntary mcq tests was adequate | 3.641 | 4 | <0.001 | 3.500 | 4 |
| ||||||
| I should have revised more for the voluntary mcq test | 2.308 | 2 | <0.001 | 2.556 | 2.5 |
| ||||||
A P-score of <0.05 indicates a significant difference to the neutral response (Likert item 3); P-scores in italics indicate differences that are not significant to a neutral response.
L1 and L2 student responses to ET and SA MCQ testing after having received test and formal examination results (post survey scores) on a 6 item Likert scale.
| L1 | L2 | |||||||||||
| post survey scores (n = 76) | post survey scores (n = 17) | |||||||||||
| ET | SA | ET | SA | |||||||||
| Survey statements | mean | median | P | mean | median | P | mean | median | P | mean | median | P |
| There is no reward for random guessing | 2.077 | 2 | <0.001 | 2.321 | 2 | <0.001 | 2.667 | 2.5 |
| 2.222 | 2 | 0.002 |
| Loosing marks for guessing detracted | 2.641 | 2 | 0.002 | 2.949 | 3 |
| 2.944 | 3 |
| 3.167 | 3 |
|
| There is a high chance of getting answers right | 3.308 | 3 |
| 3.766 | 4 | <0.001 | 2.833 | 3 |
| 3.471 | 3.5 | 0.033 |
| The answering options were confusing | 2.987 | 3 |
| 3.474 | 3.5 | <0.001 | 4.167 | 4.5 | 0.002 | 3.778 | 4 | 0.023 |
| I got distracted by thinking about the best tactics for getting a high mark | 2.397 | 2 | <0.001 | 2.680 | 2 | 0.016 | 3.765 | 4 | 0.022 | 3.833 | 4 | 0.007 |
| It makes you think more about your answers | 2.810 | 3 |
| 3.436 | 4 | 0.001 | 2.389 | 2 | 0.047 | 2.889 | 3 |
|
| It made me feel more relaxed, knowing that I can get a reasonable mark | 2.641 | 2.5 | 0.001 | 3.846 | 4 | <0.001 | 2.278 | 2 | 0.028 | 3.444 | 3 |
|
| I could answer conservatively by hedging my bets | 2.346 | 2 | <0.001 | 2.423 | 2 | <0.001 | 2.944 | 3 |
| 3.389 | 3 |
|
| It was a fair test | 2.974 | 3 |
| 3.416 | 3 | 0.025 | 2.056 | 2 | 0.001 | 2.944 | 3 |
|
| The test score will accurately reflect my knowledge | 2.923 | 3 |
| 3.346 | 3 | 0.021 | 2.278 | 2 | 0.016 | 2.444 | 2 | 0.019 |
| It enhanced my critical thinking skills | 3.051 | 3 |
| 3.321 | 3.5 | 0.013 | 2.722 | 3 |
| 3.111 | 3 |
|
| The questions were easy to answer | 3.397 | 3 | 0.006 | 3.115 | 3 |
| 2.778 | 3 |
| 3.167 | 3 |
|
| I was scared to answer some questions | 2.731 | 2 | 0.038 | 3.321 | 4 | 0.016 | 3.556 | 3 |
| 2.722 | 3 |
|
| I was confident to answer some questions | 3.397 | 3.5 | 0.003 | 2.974 | 3 |
| 1.889 | 2 | <0.001 | 2.833 | 3 |
|
| My stress levels were high | 3.077 | 3 |
| 2.577 | 2 | 0.004 | 3.611 | 4 |
| 2.944 | 3 |
|
| It gave me confidence for the January exams | 3.103 | 3 |
| 3.359 | 3.5 | 0.009 | 2.722 | 3 |
| 2.889 | 3 |
|
| It was good preparation for the real ET exams in January | 2.872 | 3 |
| 3.039 | 3 |
| 2.222 | 2 | 0.023 | 2.722 | 3 |
|
| Knowing my score now, I should have eliminated less answers as I was guessing too much | 3.885 | 4 | <0.001 | 3.103 | 3 |
| 3.278 | 3 |
| 3.278 | 3 |
|
A P-score of <0.05 indicates a significant difference to the neutral response (Likert item 3); P-scores in italics indicate differences that are not significant to a neutral response.