| Literature DB >> 24006400 |
Chad E Campbell1, Ross H Nehm.
Abstract
The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are necessary tools for answering this question, their outputs are dependent on their quality. Our study 1) reviews the central importance of reliability and construct validity evidence in the development and evaluation of science assessments and 2) examines the extent to which published assessments in genomics and bioinformatics education (GBE) have been developed using such evidence. We identified 95 GBE articles (out of 226) that contained claims of knowledge increases, affective changes, or skill acquisition. We found that 1) the purpose of most of these studies was to assess summative learning gains associated with curricular change at the undergraduate level, and 2) a minority (<10%) of studies provided any reliability or validity evidence, and only one study out of the 95 sampled mentioned both validity and reliability. Our findings raise concerns about the quality of evidence derived from these instruments. We end with recommendations for improving assessment quality in GBE.Entities:
Mesh:
Year: 2013 PMID: 24006400 PMCID: PMC3763019 DOI: 10.1187/cbe.12-06-0073
Source DB: PubMed Journal: CBE Life Sci Educ ISSN: 1931-7913 Impact factor: 3.325
Sources of evidence to consider when establishing or evaluating construct validitya
| Source of validity evidence | Answers this question | Methodology example(s)b | Example-related measurement standard(s)c |
|---|---|---|---|
| A. Content | Does the assessment appropriately represent the specified knowledge domain? | Delphi Study; textbook analyses; expert survey; Rasch analysis | 1.6 |
| B. Substantive | Are the thinking processes thought to be used to answer the items the ones that were actually used? | “Think aloud” interviews during problem solving; cognitive task analysis | 1.8 |
| C. Internal structure | Do the items capture one dimension or construct? | Factor analysis; Rasch analysis | 1.11 |
| D. External structure | Does the construct represented in the assessment align with expected external patterns of association (convergent and/or discriminant)? | Correlation coefficients | 1.14 |
| E. Generalization | Are the scores derived from an assessment meaningful across populations and learning contexts? | Analyses of performance across a diversity of contexts (e.g., ethnicity, socioeconomic status, etc.); differential item functioning | 1.5 |
| F. Consequences | In what ways might the scores derived from the assessment lead to positive or negative consequences? | Studying the types of social consequences produced as a result of using test scores (e.g., passing a class, graduating from a program). | 1.22 and 1.24 |
aModified from Messick (1995) and Nitko and Brookhart (2010).
bMethodology examples are based on both the classical test theory and item-response theory. For more information about these perspectives, their implicit assumptions, and how they may be used to gather validity and reliability evidence, see chapters in Educational Measurement (Brennan, 2006) and the Handbook of Test Development (Downing and Haladyna, 2006).
cFrom AERA ).
Sources of reliability evidence to consider when creating or evaluating an assessmenta
| Source of reliability evidence | Answers this question | Methodology example(s)b | Related measurement standard(s)c |
|---|---|---|---|
| A. Stability | How consistent are scores from one administration of the assessment to another? | Stability coefficient | 2.4 |
| B. Alternate forms | Are scores comparable when using similar items to assess the same construct? | Spearman-Brown double length formula: split half | 2.4 |
| C. Internal consistency | To what extent do the items on an assessment correlate with one another? | Coefficient alpha or Kuder-Richardson 20 | 2.4 |
| D. Reliability of raters | Is the assessment scored consistently by different raters? | Cohen's or Fleiss's kappa | 2.10 |
aModified from Nitko and Brookhart (2010).
aSee Table 1, footnote b.
bExamples from AERA .
Figure 1.Percentage of articles that evaluated each educational level (secondary, undergraduate, and graduate) in all articles (n = 83).
Figure 2.Percentage of articles that assessed each learning target (cognitive, affective, and psychomotor) in all articles (n = 95).
Articles containing keywords pertaining to education validity and reliability
| Author(s) | Valid | Interrater agreement |
|---|---|---|
| X | — | |
| X | X | |
| — | X | |
| X | — | |
| X | — | |
| X | — | |
| — | X |
Articles mentioning the importance of validity in natural sciences, but not educationa
| Ackovska and Madevska-Bogdanova | |
| Farh and Lee, 2007 |
aArticles are arranged alphabetically by last name of first author.