| Literature DB >> 34007684 |
Abstract
BACKGROUND: When available, empirical evidence should help guide decision-making. Following each administration of a learning assessment, data becomes available for analysis. For learning assessments, Kane's Framework for Validation can helpfully categorize evidence by inference (i.e., scoring, generalization, extrapolation, implications). Especially for test-scores used within a high-stakes setting, generalization evidence is critical. While reporting Cronbach's alpha, inter-rater reliability, and other reliability coefficients for a single measurement error are somewhat common in pharmacy education, dealing with multiple concurrent sources of measurement error within complex learning assessments is not. Performance-based assessments (e.g., OSCEs) that use raters, are inherently complex learning assessments. PRIMER: Generalizability Theory (G-Theory) can account for multiple sources of measurement error. G-Theory is a powerful tool that can provide a composite reliability (i.e., generalization evidence) for more complex learning assessments, including performance-based assessments. It can also help educators explore ways to make a learning assessment more rigorous if needed, as well as suggest ways to better allocate resources (e.g., staffing, space, fiscal). A brief review of G-Theory is discussed herein focused on pharmacy education. MOVING FORWARD: G-Theory has been common and useful in medical education, though has been used rarely in pharmacy education. Given the similarities in assessment methods among health-professions, G-Theory should prove helpful in pharmacy education as well. Within this Journal and accompanying this Idea Paper, there are multiple reports that demonstrate use of G-Theory in pharmacy education. © Individual authors.Entities:
Keywords: generalizability theory; pharmacy education; reliability; validation
Year: 2021 PMID: 34007684 PMCID: PMC8102977 DOI: 10.24926/iip.v12i1.2131
Source DB: PubMed Journal: Innov Pharm ISSN: 2155-0417
Overview of Kane’s Framework for Validation
translating an observed performance into an observed score | Scoring for one individual OSCE station | |
Generating and examining the total-scores from an entire exam | The total-score for an entire OSCE (over multiple stations) | |
examining the total-score in relation to other real-world performances | An OSCE total-score’s relationship to performance on APPEs and/or pharmacist licensing | |
exploring consequences of the test, including standard-setting | The passing score for an OSCE, and identification of who will need to remediate |
OSCE=objective structured clinical examination. APPE=Advanced Pharmacy Practice Experience
Examples of Extensions in Statistics
Correlation | Multivariable regression | From comparing bivariate association, to controlling for multiple (3+) variables[ |
Student’s t-test | ANOVA | From comparing two groups, to comparing three or more groups[ |
Winsteps | Facets | From comparing persons versus items, to adding additional facets such as raters[ |
Simple ANOVA | Factorial ANOVA | From comparing main effects (within vs between), to also including interaction effects[ |
CTT’s inter-rater reliability (e.g., Cohen’s kappa) | CTT’s Intraclass correlation | From comparing 2/binary outcomes, to comparing 3+/ordinal outcomes (e.g., ratings)[ |
CTT’s internal consistency (e.g., Cronbach’s alpha or KR-20) | From characterizing one source of error between two test parameters (e.g., students and exam items), to multiple error sources with addition of more test parameters such as raters or testing occasions |
ANOVA= analysis of variance, CTT= Classical Test Theory, KR-20= Kuder-Richardson formula #20
Glossary of Terms for Generalizability Theory
Facet | A set of similar conditions of assessment: a “variable”, a test parameter, test sources of variation (e.g., students, items, occasions, raters, stations). A facet is a “factor” in Analysis Of Variance (ANOVA) language. |
Fixed facet | A finite facet that is held constant and will |
Random facet | A facet with many versions; a facet to generalize/extrapolate in D-Studies |
Levels | Levels is ANOVA language, with each facet/factor having multiple configurations (e.g., item scored with 4-levels; 1, 2, 3, or 4; one, two, or three raters; 10 or 15 stations in an OSCE; 50 or 100 items on an exam). |
G-Study | Generalizability Study: Initial analysis of data for variance components from different facets in the specified G-Theory design and discriminate contribution to score variance from different facets and interactions of facets. |
D-Studies | Decision Studies: Extensions from a G-Study that use its analyzed score variance to examine “what if” situations for impact on reliability, to help decide on modifications to the next testing iteration (e.g., What if there were 3 raters instead 2? What if there was 1 rater instead of 2? What if there were 10 stations instead of 6?). |
OSCE=objective structured clinical examination
G-Theory Design Features
Crossed facet | Every facet is sampled at all levels with one another (e.g., in an OSCE, raters are crossed with student and crossed with stations |
Nested facet | One or more facets occur only within certain instances of another facet (e.g., in an OSCE, raters are nested in stations and crossed with students |
Balanced design | A design that has equal amounts for all facets (e.g., all exam occasions have same number of questions, all stations use same items, all occasions have same number of stations). |
Unbalanced design | A design with an unequal number within any facet (e.g., multiple quizzes with different numbers of items on each quiz, multiple exams have different numbers of items on each exam, an OSCE with different number of raters in various stations or different items used by OSCE raters within different stations). |
Univariate design | A conventional design with random facets as crossed or nested facets. This type of design is this vast majority of literature. |
Multivariate design | This is an alternative to the popular variant of univariate design, wherein one facet is fixed. Only mGENOVA (at the time of this writing) can analyze a multivariate design and has 13 pre-determined designs. |
Available Generalizability Theory Software Programs
IRDP (Neuchatel, Switzerland) | Windows | No | U | |
McMaster University (Hamilton, ON, Canada) | Windows & Mac | Yes | U | |
University of Iowa (Iowa City, IA, USA) | Windows | No | U | |
Windows | Yes | U | ||
Windows | No | M |