Literature DB >> 33463588

Item analysis and optimizing multiple-choice questions for a viable question bank in ophthalmology: A cross-sectional study.

Subrahmanya K Bhat¹, Kishan H L Prasad¹.

Abstract

Purpose: Multiple-choice questions (MCQs) are useful in assessing student performance, covering a wide range of topics in an objective way. Its reliability and validity depend upon how well it is constructed. Defective Item detected by item analysis must be looked for item writing flaws and optimized. The aim of this study was to evaluate the MCQs for difficulty levels, discriminating power with functional distractors by item analysis, analyze poor items for writing flaws, and optimize.
Methods: This was a prospective cross-sectional study involving 120 MBBS students writing formative assessment in Ophthalmology. It comprised 40 single response MCQs as a part of 3-h paper for 20 marks. Items were categorized according to their difficulty index, discrimination index, and distractor efficiency with simple proportions, mean, standard deviation, and correlation. The defective items were analyzed for proper construction and optimized.
Results: The mean score of the study group was 13.525 ± 2.617. Mean difficulty index, discrimination index, and distractor efficiency were 53.22, 0.26, and 78.32, respectively. Among 40 MCQs, twenty-five MCQs did not have non-functioning distractor; 7 had one, 5 had two, and 3 had three. Of the 20 defective items, 17 were optimized and added to the question bank, two were added without modification, and one was dropped.
Conclusion: Item analysis is a valuable tool in detecting poor MCQs, and optimizing them is a critical step. The defective items identified should be optimized and not dropped so that the content area covered by the defective item is not kept of the assessment.

Entities: Chemical Disease Gene Species

Keywords: Difficulty index; discrimination index; item analysis; nonfunctioning distractor; optimization

Year: 2021 PMID： 33463588 PMCID： PMC7933874 DOI： 10.4103/ijo.IJO_1610_20

Source DB: PubMed Journal: Indian J Ophthalmol ISSN： 0301-4738 Impact factor: 1.848

The medical education across the world consists of the initial assessment of learner's need, monitoring the teaching-learning activities, certification of the competence to award a degree and practise medicine in context to the need of the society.[1] Student learning, certification and quality assurance are the aims of assessment.[2] It also acts as a strong incentive for learning.[123] Properly constructed multiple-choice questions (MCQs) can assess higher cognitive processing like interpretation, analysis, and problem solving of Bloom's taxonomy instead of just recall of facts.[345] It was found to be superior to the modified essay question in assessing higher-order skills.[6] Keeping in view the widespread use of MCQs in the assessment of medical students, this study was undertaken for Item analysis and to identify the difficulty and discrimination indices and distractor efficiency. Defective MCQs were analyzed and optimized based on item writing guidelines.[17] The aims of this study were to evaluate MCQs for difficulty levels and discriminating power with functional distractors by item analysis and to analyze the items with poor indices for Item writing flaws and optimize them for a viable question bank in Ophthalmology.

Methods

It was a prospective, cross-sectional study involving MBBS students taking Ophthalmology examination. Forty MCQs were part of the final formative assessment, which included a 3-h Question paper for 100 marks with long essays, short essays, short answer questions, and MCQs. After taking permission from the Dean of the college and consent from study participants, 120 students were assessed in Ophthalmology. The MCQs for the same were chosen randomly and were prevalidated by two departmental colleagues. The 40 MCQ items of 'single response type' having a stem, a key and three distractors were chosen. Each correct response (key) was awarded “half mark,” an incorrect response or un-attempted Item was awarded “Zero mark.” Time allotted for attempting the MCQs was initial 30 minutes. Strict invigilation was carried out to avoid malpractices. The faculties have evaluated the MCQs with the correct keys. After scoring, students were ranked in order of merit using Microsoft Excel 2010. Bottom third (40) of low achievers (L) and top third (40) of high achievers (H) were used for item analysis to serve the twin purpose of having groups large enough to be representative and different enough to be meaningful.[1] Each Item was analyzed for difficulty index (DIF I), discrimination index (DI), and distractor efficiency (DE) using the standard formulae.[1] DIF I is the percentage of students who answered the Item correctly and given by DIF I = [(H + L)/N] × 100. DI is a measure of the effectiveness of an item in discriminating between L from H and is given by DI = 2 × [(H-L)/N], where H is the number of students answering the Item correctly in the high achieving group, L is the number of students answering the Item correctly in the low achieving group, and N is the total number of students in the two groups (including non-responders). Distractors which were chosen by less than 5% of students were considered as a nonfunctional distractor (NFD).[1] DE was determined for each Item based on the number of NFDs in it and ranged from zero to 100%. DE of an item was considered as 100% if there were no NFD and 66.6%, 33.3% and 0% if it contained 1, 2, or 3 NFDs, respectively.[8] The relationship between the DIF I and DI were determined by Pearson correlation coefficient (r) using an online calculator (https://www.socscistatistics.com/tests/pearson/). P value of <0.05 was considered to be statistically significant. Items were categorized according to their DIF I, DI, and DE with simple proportions, mean, standard deviations, and correlation. Data were interpreted, as shown in Table 1.[1] The range of values for DIF I is 0%–100%. The range of values for DI is –1.0 to +1.0. The DI becomes negative if more students in the L group answer correctly than those in the H group.

Table 1

The DIF I and DI interpretation

DIF I	Interpretation	DI	Interpretation
>70%	Easy	>0.35	Excellent
30%-70%	Good	0.2-0.35	Good
<30%	Difficult	<0.2	Poor

The DIF I and DI interpretation

Results

The score obtained by the study group ranged from 6.5 to 20, with a mean of 13.53 ± 2.62. Mean of DIF I, DI, and DE was 53.22 ± 22.44, 0.26 ± 0.16 and 78.32 ± 32.11, respectively. The distribution of DIF I and DI of the items and their corresponding DE with actions proposed are shown in Tables 2 and 3, respectively. The scattered diagram [Fig. 1] shows the relationship between the DIF I and DI. DI was poor for both easy and difficult items. Pearson correlation between DI and DIF I was small and negative (r = –0.207) and not significant at P < 0.05 (P = 0.199). The relationship of NFDs with mean DIF I and DI of items are shown in Table 4.

Table 2

Distribution of items according to DIF I with DE and actions proposed (n=40)

DIF I	Interpretation	No. of items (%)	DE	Action taken
≥70	Too easy	11 (27.5)	39.37	Optimized and stored
30-70	Good	24 (60)	91.65	1 item optimized and stored, others stored unaltered
<30	Too difficult	5 (12.5)	100	1 discarded, 3 optimized and stored, 1 stored unaltered

Table 3

Distribution of items according to DI with DE and action taken (n=40)

DI	Interpretation	No. of items (%)	DE	Action
<0.2	Poor	10 (25)	63.32	1 discarded, 2 stored unaltered, others optimized and stored
0.2-0.35	Good	15 (37.5)	84.42	Stored unaltered
DI >0.35	Excellent	13 (32.5)	87.17	1 optimized and stored, others stored unaltered

Figure 1

Relationship between Difficulty Index and Discrimination Index

Table 4

Relationship of Non-functioning distractors (NFDs) with DIF I and DI (n=40)

Parameter	Items with 0 NFDs	Items with 1 NFD	Items with 2 NFDs	Items with 3 NFDs
Number (%)	25 (62.5%)	7 (17.5%)	5 (12.5%)	3 (7.5%)
Mean DIF I (%)	30.4	58.03	74.25	94.17
Mean DI	0.287	0.254	0.245	0.05

Distribution of items according to DIF I with DE and actions proposed (n=40) Distribution of items according to DI with DE and action taken (n=40) Relationship between Difficulty Index and Discrimination Index Relationship of Non-functioning distractors (NFDs) with DIF I and DI (n=40) Item analysis showed 20 items (50%) to be defective, and the subject expert safely added the remaining 20 items to the question bank with an item card for each Item. Of 20 defective items, 17 items were found to have bad stem or distractors, and these were optimized and added to the question bank. It included five easy items with normal DI, five easy items with poor DI, one easy Item with negative DI, two difficult items with normal DI, one difficult item poor DI, two items with acceptable difficulty but poor DI, 1 item with good DI and DIF I but two defective NFD. One difficult Item and one Item with acceptable difficulty having poor DI did not have any flaws and added to the question bank without alteration. One difficult Item with negative DI was dropped because there was one more question, based on the same data in the same paper with a hint to the correct answer. Feedback was given to the faculty on all difficult items for corrective measures in teaching and framing the Item. All optimized items were appropriately tagged via the item card for future reassessment.

Discussion

The Item analysis should be regularly carried out for creating a good question bank updating recent advances so that MCQs can be used in the evaluation of the cognitive skills of medical students effectively.[910111213141516] The MCQs will provide feedback to the teachers on their educational actions. Designing MCQ is a complicated and time-consuming process in a multidisciplinary and integrated curriculum. In this study, the mean DIF, DI, and DE were 53.22 ± 22.43, 0.26 ± 0.15, 78.31 ± 32.11, respectively, which were in the acceptable range, and a few other studies reported similar results.[10111215] In this study, 60% of items were of acceptable difficulty (DIF I = 30-70%), 27.5% items easy, and 12.5% difficult. Similar results were seen in a few of the studies.[91214] Nearly 80% of the items were in an acceptable range in three studies.[101117] In studies by Kheyami D (53.4%),[15] Shenoy PJ (40%),[18] Gajjar S (48%),[8] and Rajkumar P (46.6%),[13] it was less. In the last two studies, the acceptable range was defined as 30%–60%, unlike our and other (30%–70%) studies. In this study, 28 (70%) items had good DI (>0.2). Similar findings were seen in a few studies[141516] but was less compared to studies by Karelia BN (78%),[9] Kaur M (86%),[17] Shenoy PJ (80%),[18] Pande SS (75%),[12] Rao C (85%)[10] and Mozaffer RH (88%).[11] In our study, 15 (37.5%) items had one or more NFD which was similar to a study by Kaur M (38%)[17] but less compared to study by Mozaffer RH (58%),[11] Rajkumar P (43.3%)[13] and Shenoy PJ (65%).[18] In our study, of 120 distractors, 26 (21.6%) were non-functional, which is comparable to previous studies.[111317] More NFD was seen by Shenoy PJ (33.3%),[18] but less NFD was observed by Gajjar S (11.3%)[8] and Rao C (5%).[10] Seven items with two or more NFD and four items with 1 NFD with poor DI/DIF I was optimized and added the question bank. Three items with 1 NFD had normal DI, DIF I, and distractors and were added to the question bank without alteration. One of Item with 2 NFD, had good DI and DIF I but distractors were defective and were optimized and added to the question bank. One of the easy Item with multiple writing flaws with a negative DI and 3 NFDs re-establishes the finding by Omer AA that flawed items affected high achievers more than low achievers.[19] The defective Items identified by item analysis should be appropriately analyzed for flaws, and it should be optimized. These should not be blindly dropped; otherwise, some skills may be left out of the assessment.[91011] In early formative assessments with fewer topics to be studied and short term memory assessed, majority of students may answer the Item correctly (DIF I >70%) reducing the DI complimenting an efficient teaching-learning process. Decreased or negative DI can also occur with poor teaching-learning interaction with DIF I <30%. In this scenario, both high and low achievers answer the MCQs by guesswork. DI decreases when the Item becomes too easy or too difficult. This was noted in this study where maximum discrimination (0.4) was seen when the DIF I is 40-60%, and the relationship was dome-shaped as found in many studies.[91214151620] In our study, Pearson correlation between DI and DIF I was small and negative (r = –0.207) and not significant at P < 0.05 (P = 0.199). A study by Mitra et al. also showed a moderate negative correlation in their study (r = –325, significant at P < 0.01).[16] Karelia et al., showed a small positive correlation (r = 0.11, insignificant at P < 0.05).[9] Two other studies showed a slight positive correlation significant at P < 0.01.[1215] In this study, items that were not acceptable due to poor DIF I, DI, or bad distractors were analyzed for writing flaws and optimized keeping in mind the guidelines.[17] In this study, some of the Item writing flaws were unique to Ophthalmology. The presence of two conflicting terms in options which cannot co-exist, like mydriasis and miosis, lid retraction and ptosis, intumescent lens, and shrunken lens suggesting that one of them has to be chosen or excluded even to a low achiever was one of the writing flaws. Other flaws detected were All of the above as the key, the EXCEPT in the stem not capitalized and made bold, options used were not uniform, presence of abbreviations in the stem and options, the part of the word same in stem and options such as phacomorphic and phacoanaphylaxis. Few others being, questions related to controversial information in the books recommended for undergraduates and distractors which are unrelated and implausible. One of the easy Item (DIF I = 92.5%) with negative DI and 3 NFD which low achievers answered better was Bitot spots are seen in a) Vitamin A deficiency b) Dry eye c) Sjogren syndrome d) Lagophthalmos. It was optimized as 'Bitot spot can be seen in all the following EXCEPT a. Cirrhosis of the liver, b. Celiac disease, c. Dietary deficiency of vitamin A, d. Primary Sjogren syndrome' which tests the same cognitive domain in a better way. Defective Items need to be appropriately reconstructed; validated and feedback should be given to the faculties for corrective action. The New Graduate medical regulation 2019 released by the Medical Council of India has made it mandatory to include MCQs in the formative and summative assessment in a competency-based medical curriculum.[21] We hope item analysis will serve as a helpful tool to generate question banks at departmental and university levels which will provide items with acceptable difficulty and discrimination indices.[5678910] Our study has included only 40 MCQs. Periodical Item analysis with more MCQs is necessary for validation of viable question bank in important subjects like Ophthalmology.

Conclusion

To conclude, Item analysis is a valuable tool in detecting poor MCQs, and optimizing them is a critical step. It will help us to identify poorly constructed items and give attention to optimize them to improve the quality of the question bank. With nearly half of the items detected to be less than optimum, efforts to optimize the stem and distractor is essential, without which the purpose of the assessment will be defeated. It also points toward the need to have more frequent faculty development programs in creating standard MCQs, effective pre-validation, and to have viable question bank in ophthalmology.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

8 in total

1. Analysis of one-best MCQs: the difficulty index, discrimination index and distractor efficiency.

Authors: Mozaffer Rahim Hingorjo; Farhan Jaleel
Journal: J Pak Med Assoc Date: 2012-02 Impact factor: 0.781

2. Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper.

Authors: Si-Mui Sim; Raja Isaiah Rasiah
Journal: Ann Acad Med Singapore Date: 2006-02 Impact factor: 2.473

3. Quantitative analysis of single best answer multiple choice questions in pharmaceutics.

Authors: Suha A Al Muhaissen; Anna Ratka; Amal Akour; Hatim S AlKhatib
Journal: Curr Pharm Teach Learn Date: 2018-12-27

4. Evaluation of Modified Essay Questions (MEQ) and Multiple Choice Questions (MCQ) as a tool for Assessing the Cognitive Skills of Undergraduate Medical Students.

Authors: Moeen-Uz-Zafar Khan; Badr Muhammad Aljarallah
Journal: Int J Health Sci (Qassim) Date: 2011-01

5. Item Analysis of Multiple Choice Questions at the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain.

Authors: Deena Kheyami; Ahmed Jaradat; Tareq Al-Shibani; Fuad A Ali
Journal: Sultan Qaboos Univ Med J Date: 2018-04-04

6. Item analysis of in use multiple choice questions in pharmacology.

Authors: Mandeep Kaur; Shweta Singla; Rajiv Mahajan
Journal: Int J Appl Basic Med Res Date: 2016 Jul-Sep

7. Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple choice questions? Research paper.

Authors: Edward J Palmer; Peter G Devitt
Journal: BMC Med Educ Date: 2007-11-28 Impact factor: 2.463

8. Item and Test Analysis to Identify Quality Multiple Choice Questions (MCQs) from an Assessment of Medical Students of Ahmedabad, Gujarat.

Authors: Sanju Gajjar; Rashmi Sharma; Pradeep Kumar; Manish Rana
Journal: Indian J Community Med Date: 2014-01

8 in total

1 in total

1. Quality of multiple-choice questions in medical internship qualification examination determined by item response theory at Debre Tabor University, Ethiopia.

Authors: Lalem Menber Belay; Tegbar Yigzaw Sendekie; Fantu Abebe Eyowas
Journal: BMC Med Educ Date: 2022-08-22 Impact factor: 3.263

1 in total