| Literature DB >> 28521768 |
Line Thellesen1, Thomas Bergholt2, Morten Hedegaard2, Nina Palmgren Colov2, Karl Bang Christensen3, Kristine Sylvan Andersen2, Jette Led Sorensen2.
Abstract
BACKGROUND: To reduce the incidence of hypoxic brain injuries among newborns a national cardiotocography (CTG) education program was implemented in Denmark. A multiple-choice question test was integrated as part of the program. The aim of this article was to describe and discuss the test development process and to introduce a feasible method for written test development in general.Entities:
Keywords: Cardiotocography; Continuing professional development; Fetal monitoring; Interprofessional; Multiple-choice question; Validity; Written assessment
Mesh:
Year: 2017 PMID: 28521768 PMCID: PMC5437628 DOI: 10.1186/s12909-017-0915-2
Source DB: PubMed Journal: BMC Med Educ ISSN: 1472-6920 Impact factor: 2.463
Fig. 1Study design. Flowchart of the five sources of validity evidence and the participants involved
Fig. 2Example of a multiple-choice question in a one-best-answer format
Psychometric properties. Proportion of correct answers, loglinear Rasch model fit, and differential item functioning (DIF) in the 30-item CTG test
| Item | Blueprint domain | Pilot test participants | CTG course participants | Loglinear Rasch | DIF | ||
|---|---|---|---|---|---|---|---|
| Observed | Expected |
|
| ||||
| Item1 | Indication | 81.4 | 97.7 | 0.350 | 0.346 | - | * |
| Item2 | Classification | 78.8 | 91.8 | 0.737 | 0.685 | - | - |
| Item3 | Classification | 82.2 | 92.9 | 0.795 | 0.751 | - | - |
| Item4 | Classification | 80.5 | 97.0 | 0.524 | 0.530 | - | - |
| Item5 | Equipment | 94.1 | 99.3 | 0.134 | 0.348 | - | - |
| Item6 | Management | 94.1 | 99.5 | 0.537 | 0.348 | - | - |
| Item7 | Indication | 74.6 | 93.9 | 0.466 | 0.372 | - | * |
| Item8 | Classification | 73.3 | 89.7 | 0.296 | 0.341 | - | * |
| Item9 | Classification | 57.6 | 70.0 | 0.153 | 0.242 | - | - |
| Item10 | Management | 86.4 | 92.1 | 0.278 | 0.342 | - | - |
| Item11 | Physiology | 72.9 | 95.6 | 0.371 | 0.345 | - | - |
| Item12 | Physiology | 80.5 | 96.7 | 0.633 | 0.414 | - | - |
| Item13 | Classification | 72.9 | 96.4 | 0.583 | 0.610 | - | - |
| Item14 | Management | 83.1 | 97.3 | 0.636 | 0.704 | - | - |
| Item15 | Management | 85.6 | 97.1 | 0.440 | 0.346 | - | - |
| Item16 | Physiology | 76.3 | 96.3 | 0.331 | 0.345 | - | - |
| Item17 | Physiology | 93.2 | 97.3 | 0.160 | 0.346 | - | - |
| Item18 | Physiology | 72.0 | 85.0 | 0.327 | 0.338 | - | + |
| Item19 | Physiology | 80.2 | 96.8 | 0.442 | 0.416 | - | + |
| Item20 | Classification | 77.1 | 95.7 | 0.724 | 0.646 | - | - |
| Item21 | Classification | 82.2 | 94.9 | 0.572 | 0.596 | - | - |
| Item22 | Physiology | 91.5 | 98.5 | 0.615 | 0.517 | - | - |
| Item23 | Management | 87.3 | 98.5 | 0.608 | 0.546 | - | - |
| Item24 | Management | 88.1 | 98.5 | 0.552 | 0.347 | - | - |
| Item25 | Classification | 71.2 | 93.5 | 0.481 | 0.451 | - | + |
| Item26 | Physiology | 60.2 | 98.5 | 0.445 | 0.347 | - | - |
| Item27 | Management | 93.2 | 96.9 | 0.479 | 0.346 | - | - |
| Item28 | Management | 66.1 | 79.0 | 0.159 | 0.218 | - | *+ |
| Item29 | Classification | 66.9 | 91.5 | 0.543 | 0.466 | - | - |
| Item30 | Management | 74.6 | 98.9 | 0.723 | 0.500 | - | - |
- Non-significant P-values
* P-values that indicate DIF concerning profession
+ P-values that indicate DIF concerning regions
Sensitivity analysis
Mean test scores in the 30-item CTG test for groups with expected differentiated level of CTG knowledge and interpretive skills within each profession (pilot test participants)
Fig. 3Standard setting in the 30-item CTG test using the Contrasting Groups method (pilot test participants)
Strengths and challenges in the test development process
| Strengths | |
| Project group | Consisted of professionals with profound content knowledge, a medical educationalist and a statistician with experience in test-development. |
| Test content | Based on nationally defined learning objectives, which generated relevant and coverable test content. |
| Test blueprint | Predefined and based on nationally developed learning objectives. |
| Test format | MCQ’s, which can test more than simple facts, is suitable for large groups and time- and cost effective. Assess competences at the two lower levels of Millers triangle, |
| Language | Predefined spelling and abbreviations ensured consistency in wordings and terms. |
| Proofreading | Several proofreaders. Proofreading of content, language and structure/format. |
| Pilot test participants | A large sample representing in part the intended test-takers. |
| Pilot testing | Written and verbal feedback gave insight into the pilot participants’ thought processes during testing. |
| Standard setting | An acknowledged method was used. The passing score was adjusted to minimize false-positive values and was validated on initial test responses. |
| Psychometric properties | Evaluated on both pilot test responses and the responses from the real test-takers. |
| Test-takers | A high number of participants enabled the use of advanced statistical analyses such as Rasch analyses. |
| No. of options in each item | Three or four options were chosen dependent on the numbers of plausible distractors. |
| Challenges | |
| Test format | A written assessment cannot assess competences on the two higher levels of Millers triangle, |
| Number of items | More items would expectedly have increased reliability and would have allowed for the development of an item bank. |
| Item difficulty | Items of a higher difficulty would expectedly have increased reliability and entailed a more challenging test. |
| Pilot test participants | Medical and midwifery students did not represent the intended test-takers and lowered the percentage of correct answers. |
| Relations to other variables | There was no test available for comparison. |
| Context | The context of pilot testing and real testing differed; pilot participants did not attend a one-day teaching course prior to testing and the test was therefore more challenging than in the real setting. |
| Time devoted for assessment | More items and items with a higher difficulty require more time devoted for assessment in an education program. |