| Literature DB >> 29962989 |
Abstract
Test validity lies at the core of educational and psychological testing, but there are controversies about what test validity is and how test validation should proceed. This paper develops a taxonomy to redefine test validity with hierarchical levels. On the basis of testing foundation, the hierarchy includes operational, measurable, realizable, and useful levels, which result in testing consequence. With the help of a context-specific construct, different levels of test validity, and different types of score use, the proposed taxonomy offers more flexibility for test validation. It can also shed light on the interpretations of important testing concepts and help streamline test development. Real-life examples are given to demonstrate the usefulness of the taxonomy across different settings.Entities:
Keywords: construct domain; context specific; hierarchical taxonomy; score use; test validity
Year: 2018 PMID: 29962989 PMCID: PMC6013560 DOI: 10.3389/fpsyg.2018.00972
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Evidence for validity as defined by claims in each hierarchical level.
| Level | Claim | Description | Types of evidence |
|---|---|---|---|
| Operational | Population | The population of examinee is appropriately defined | Theoretical or expert analysis |
| Structure | The dimension and scale of the construct are appropriately defined | Theoretical or expert analysis | |
| Content | The substantive content of the construct is appropriately operationalized | Qualitative (e.g., content analysis) | |
| Measurable | Item and format | The items and format are appropriate for the construct domain | Item analysis and qualitative (e.g., think aloud protocol) |
| Specifications | The test specifications are appropriate for the construct domain | Expert analysis | |
| Method | The method used to analyze the data and derive the test score is appropriate | Theoretical or empirical (e.g., standard setting) | |
| Realizable | Administration | The testing process is appropriately administered | Qualitative (e.g., independent observation) |
| Fitness | The method or model fits the response data appropriately | Various fit indices or empirical analysis | |
| Useful | Use validity | The score use (extended or joint) is appropriate | Empirical (e.g., correlation analysis) |
Usefulness of the taxonomy with different real-life examples.
| No | Testing purpose and foundation | Operational level (construct domain) | Measurable level | Administration concern | Score use | |||
|---|---|---|---|---|---|---|---|---|
| Population | Structure | Content | Instrument | Method | ||||
| 1 | Classification and practical | Time: within year; place: regional; prerequisite | Unidimensional, criteria- referenced (certified or not) | Practice proficiency based on regulations or requirements | Criteria-driven cutoffs; multiple forms | Security; standardization | Basic (criteria-driven) | |
| 2 | Academic prediction and theoretical | Years: 6–16; place: United States; time: since 2003 | Unidimensional and norm-referenced | Intelligence (verbal comprehension, perceptual reasoning, working memory, processing speed) | Composite score based on reliability;factor analysis | Balancing the wide coverage of content and test time | Extended or joint | |
| 3 | Diagnosis and theoretical | Same as above | Multidimensional and categorical (mastery/partial/Non-mastery) | Various dimensions (subtests): e.g., block design, digit span, vocabulary, picture concepts, arithmetic | Criteria-driven cutoffs | Same as above | Basic (criteria-driven) | |
| 4 | Placement and practical | Time: current year; age: specific grade; place: regional | Unidimensional and categorical (levels of competencies) | Competencies based on course requirements | Criteria-driven cutoffs | Usually little | Basic (criteria-driven) | |
| 5 | Placement and practical | Same as above | Unidimensional and norm-referenced | Same as above | CTT- or IRT-based | Same as above | Extended (selection-driven with other factors) | |
| 6 | Admission and theoretical | Time: within year; place: non-native | Unidimensional and norm-referenced | English reading competencies | IRT-based | Security; standardization | Joint (e.g., with listening and speaking) | |
| 7 | Admission and mixed | Same as above | Unidimensional and categorical (levels of competencies) | English writing competencies | Criteria- driven cutoffs; scoring consistency | Security; rater training | Same as above | |