| Literature DB >> 29450000 |
David A Cook1,2,3, Rose Hatala4.
Abstract
BACKGROUND: Simulation plays a vital role in health professions assessment. This review provides a primer on assessment validation for educators and education researchers. We focus on simulation-based assessment of health professionals, but the principles apply broadly to other assessment approaches and topics. KEY PRINCIPLES: Validation refers to the process of collecting validity evidence to evaluate the appropriateness of the interpretations, uses, and decisions based on assessment results. Contemporary frameworks view validity as a hypothesis, and validity evidence is collected to support or refute the validity hypothesis (i.e., that the proposed interpretations and decisions are defensible). In validation, the educator or researcher defines the proposed interpretations and decisions, identifies and prioritizes the most questionable assumptions in making these interpretations and decisions (the "interpretation-use argument"), empirically tests those assumptions using existing or newly-collected evidence, and then summarizes the evidence as a coherent "validity argument." A framework proposed by Messick identifies potential evidence sources: content, response process, internal structure, relationships with other variables, and consequences. Another framework proposed by Kane identifies key inferences in generating useful interpretations: scoring, generalization, extrapolation, and implications/decision. We propose an eight-step approach to validation that applies to either framework: Define the construct and proposed interpretation, make explicit the intended decision(s), define the interpretation-use argument and prioritize needed validity evidence, identify candidate instruments and/or create/adapt a new instrument, appraise existing evidence and collect new evidence as needed, keep track of practical issues, formulate the validity argument, and make a judgment: does the evidence support the intended use?Entities:
Keywords: Content Evidence; Lumbar Puncture; Validation Framework; Validity Argument; Validity Evidence
Year: 2016 PMID: 29450000 PMCID: PMC5806296 DOI: 10.1186/s41077-016-0033-y
Source DB: PubMed Journal: Adv Simul (Lond) ISSN: 2059-0628
The classical validity framework
| Type of validitya | Definition | Examples of evidence |
|---|---|---|
| Content | Test items and format constitute a relevant and representative sample of the domain of tasks | Procedures for item development and sampling |
| Criterion (includes correlational, concurrent, and predictive validity) | Correlation between actual test scores and the “true” (criterion) score | Correlation with a definitive standard |
| Construct | Scores vary as expected based on an underlying psychological construct (used when no definitive criterion exists) | Correlation with another measure of the same construct |
aSome authors also include “face validity” as a fourth type of validity in the classical framework. However, face validity refers either to superficial appearances that have little merit in evaluating the defensibility of assessment [26, 59] (like judging the speed of the car by its color) or to influential features that are better labeled content validity (like judging the speed of the car by its model or engine size). We discourage use of the term "face validity"
The five sources of evidence validity framework
| Source of evidence | Definition | Examples of evidence |
|---|---|---|
| Content | “The relationship between the content of a test and the construct it is intended to measure” [ | Procedures for item sampling, development, and scoring (e.g., expert panel, previously described instrument, test blueprint, and pilot testing and revision) |
| Internal structure | Relationship among data items within the assessment and how these relate to the overarching construct | Internal consistency reliability |
| Relationships with other variables | “Degree to which these relationships are consistent with the construct underlying the proposed test score interpretations” [ | Correlation with tests measuring similar constructs |
| Response process | “The fit between the construct and the detailed nature of performance . . . actually engaged in” [ | Analysis of examinees’ or raters’ thoughts or actions during assessment (e.g., think-aloud protocol) |
| Consequences | “The impact, beneficial or harmful and intended or unintended, of assessment” [ | Impact on examinee performance (e.g., downstream effects on board scores, graduation rates, clinical performance, patient safety) |
See the following for further details and examples [20, 25, 26]
The validation inferences validity framework
| Validity inference | Definition (assumptions)a | Examples of evidence |
|---|---|---|
| Scoring | The score or written narrative from a given observation adequately captures key aspects of performance | Procedures for creating and empirically evaluating item wording, response options, scoring options |
| Generalization | The total score or synthesis of narratives reflects performance across the test domain | Sampling strategy (e.g., test blueprint) and sample size |
| Extrapolation | The total score or synthesis in a test setting reflects meaningful performance in a real life setting | Authenticity of context |
| Implications/decisions | Measured performance constitutes a rational basis for meaningful decisions and actions | See Table |
See Kane [10] and Cook et al [12] for further details and examples
aEach of the inferences reflects assumptions about the creation and use of assessment results
Fig. 1Key inferences in validation
A practical approach to validation
| 1. Define the construct and proposed interpretation |