J Norcini1, R Reshetar, R Lipner. 1. American Board of Internal Medicine, 3624 Market Street, 19104, Philadelphia, PA, USA, jnorcini@ABIM.org.
Abstract
OBJECTIVE: To determine the impact of variability in answer key construction (i.e., option weighting) on total errors of measurement and to compare several designs for reducing this effect. METHODS: A video-based format that assesses ability to interpret arteriograms is studied because it reproduces with high fidelity an important task faced by cardiologists. Responses given by highly qualified examinees are used to develop answer keys by the aggregate scoring method, and generalizability theory is applied to estimate the error variance in answer key construction. SAMPLE: Two hundred and two examinees who volunteered to participate in the study after taking their certifying examination in cardiovascular diseases. RESULTS: RESULTS show less scorer variability than case variability; however, several scorers make a sizeable reduction in measurement error. CONCLUSION: Although there is some error in the answer key construction process, its size is relatively small and it should not be a major impediment to the use of performance-based item formats.
OBJECTIVE: To determine the impact of variability in answer key construction (i.e., option weighting) on total errors of measurement and to compare several designs for reducing this effect. METHODS: A video-based format that assesses ability to interpret arteriograms is studied because it reproduces with high fidelity an important task faced by cardiologists. Responses given by highly qualified examinees are used to develop answer keys by the aggregate scoring method, and generalizability theory is applied to estimate the error variance in answer key construction. SAMPLE: Two hundred and two examinees who volunteered to participate in the study after taking their certifying examination in cardiovascular diseases. RESULTS: RESULTS show less scorer variability than case variability; however, several scorers make a sizeable reduction in measurement error. CONCLUSION: Although there is some error in the answer key construction process, its size is relatively small and it should not be a major impediment to the use of performance-based item formats.