| Literature DB >> 27252299 |
Christian D Wright1, Sarah L Eddy2, Mary Pat Wenderoth3, Elizabeth Abshire3, Margaret Blankenbiller3, Sara E Brownell4.
Abstract
Recent reform efforts in undergraduate biology have recommended transforming course exams to test at more cognitively challenging levels, which may mean including more cognitively challenging and more constructed-response questions on assessments. However, changing the characteristics of exams could result in bias against historically underserved groups. In this study, we examined whether and to what extent the characteristics of instructor-generated tests impact the exam performance of male and female and middle/high- and low-socioeconomic status (SES) students enrolled in introductory biology courses. We collected exam scores for 4810 students from 87 unique exams taken across 3 yr of the introductory biology series at a large research university. We determined the median Bloom's level and the percentage of constructed-response questions for each exam. Despite controlling for prior academic ability in our models, we found that males and middle/high-SES students were disproportionately favored as the Bloom's level of exams increased. Additionally, middle/high-SES students were favored as the proportion of constructed-response questions on exams increased. Given that we controlled for prior academic ability, our findings do not likely reflect differences in academic ability level. We discuss possible explanations for our findings and how they might impact how we assess our students.Entities:
Mesh:
Year: 2016 PMID: 27252299 PMCID: PMC4909345 DOI: 10.1187/cbe.15-12-0246
Source DB: PubMed Journal: CBE Life Sci Educ ISSN: 1931-7913 Impact factor: 3.325
Best models include the interaction between student gender identity and the median weighted Bloom’s level of an exam and the interaction between SES status and the median weighted Bloom’s level of an exama
| Rank | Modelb | AICc | ΔAICc | ωi |
|---|---|---|---|---|
| 1 | Cum.GPA + Time + Gender + SES + W.Diff + W.Blooms + SES*W.Blooms + Gender*W.Blooms | −41429.42 | 0.00 | 0.59 |
| 2 | Cum.GPA + Time + Gender + SES + W.Diff + W.Blooms + SES*W.Blooms + Gender*W.Blooms + Course | −41427.96 | 1.46 | 0.29 |
| 3 | Cum.GPA + Time + Gender + SES + W.Diff + W.Blooms + SES*W.Blooms | −41424.51 | 4.90 | 0.05 |
| 4 | Cum.GPA + Time + Gender + SES + W.Diff + W.Blooms + Gender*W.Blooms | −41423.36 | 6.06 | 0.03 |
| 5 | Cum.GPA + Time + Gender + SES + W.Diff + W.Blooms + SES*W.Blooms + Course | −41423.05 | 6.36 | 0.02 |
| 6 | Cum.GPA + Time + Gender + SES + W.Diff + W.Blooms + Gender*W.Blooms + Course | −41421.90 | 7.51 | 0.02 |
| 7 | Cum.GPA + Time + Gender + SES + W.Diff + W.Blooms | −41417.46 | 11.96 | 0.00 |
aRelative ranking (from most support to least) of the six best models for predicting student exam performance using AICc model selection. Models that are informative (ΔAICc < 10) are shown, plus the next best model that had a ΔAICc > 10. The table shows only fixed-effect terms, but all models also include two random-effect terms: student and the instructor whose classes students were enrolled in.
bTime = exam number in a course; Cum.GPA = cumulative college GPA at start of introductory biology series; Gender = student’s gender identity; SES = students’ socioeconomic status; W.Diff = median weighted difficulty of an exam; W.Blooms = the median weighted Bloom’s level of an exam; Course = the three courses that are part of the introductory biology sequence.
Increasing the median weighted Bloom’s level of an exam disproportionately favors male students and middle/high-SES students relative to female or low-SES students, respectivelya
| Parameter | Relative variable importance | Model averaged regression coefficient ± SE | |
|---|---|---|---|
| Intercept | NA | 0.647 ± 0.0199 | |
| Cum.GPA | 1.00 | 0.164 ± 0.00311 | |
| Course (reference level: course 1) | |||
| Course 2 | 0.33 | 0.00945 ± 0.0199 | 0.634 |
| Course 3 | 0.33 | −0.00416 ± 0.0148 | 0.779 |
| Time (reference level: time 1 (exam 1) | |||
| Time 2 (exam 2) | 1.00 | 0.0131 ± 0.00182 | |
| Time 3 (exam 3) | 1.00 | 0.0388 ± 0.00255 | |
| Time 4 (exam 4) | 1.00 | 0.0821 ± 0.00270 | |
| Student gender (reference level: male) | |||
| Female | 1.00 | −0.00295 ± 0.0115 | 0.798 |
| Student SES status (reference level: middle/high-SES) | |||
| Low-SES | 1.00 | 0.00931 ± 0.0149 | 0.531 |
| Exam characteristics | |||
| W.Diff | 1.00 | −0.197 ± 0.0113 | |
| W.Blooms | 1.00 | −0.168 ± 0.0173 | |
| Student identity × exam characteristics (reference level: male or middle/high-SES) | |||
| Female × W.Blooms | 0.92 | −0.0418 ± 0.0205 | |
| Low-SES × W.Blooms | 0.96 | −0.0628 ± 0.0263 |
aThe outputs were produced via model averaging of all possible models using the MuMIn package in the program R. Although not shown, the models include two random-effects terms: (1|Stu.ID) + (1|Instr).
bBolded p values are significant.
Figure 1.Increasing the median weighted Bloom’s level of an exam negatively impacts all students’ scores, but it disproportionately favors men more so than women and middle/high-SES students over low-SES students. The figure shows a point estimate for exam performance (percentage score) for (a) male and female students and (b) middle/high-SES and low-SES students based on the model-averaged regression coefficients. The bars are the regression-model predictors of performance for two hypothetical students with an incoming GPA of 3.27 (the median GPA for all students in our data set) who are either (a) middle/high-SES students that identify as male or female or (b) male students who are classified as middle/high-SES or low-SES students, both of whom took a moderately difficult exam with a median difficulty of 0.63 (on a scale of 0.33–1). Thus, these students differ from each other in only two ways: the median weighted Bloom’s level of the exam and either (a) their gender (male, unfilled bars; females, filled bars) or (b) their SES status (middle/high-SES, unfilled bars; low-SES, filled bars). The median weighted Bloom’s levels, on a scale of 0.33–1, used to calculate the low, medium, and high Bloom’s-level exams were 0.36, 0.53, and 0.71, respectively. An asterisk indicates a significant differences between groups of students on a given test. Brackets with percent scores indicate the magnitude of the difference in exam scores for the two students.
Best model includes the interaction between SES status and the percentage of constructed-response questions on an exama
| Rank | Modelb | AICc | ΔAICc | ωi |
|---|---|---|---|---|
| 1 | Cum.GPA + Time + Gender + SES + W.Diff + Percent CR + SES*Percent CR + Course | −41285.44 | 0.00 | 0.61 |
| 2 | Cum.GPA + Time + Gender + SES + W.Diff + Percent CR + SES*Percent CR + Gender*Percent CR + Course | −41283.64 | 1.81 | 0.25 |
| 3 | Cum.GPA + Time + Gender + SES + W.Diff + Percent CR + SES*Percent CR | −41281.96 | 3.49 | 0.11 |
| 4 | Cum.GPA + Time + Gender + SES + W.Diff + Percent CR + SES*Percent CR + Gender*Percent CR | −41280.15 | 5.29 | 0.03 |
| 5 | Cum.GPA + Time + Gender + SES + W.Diff + Percent CR + Course | −41268.62 | 16.82 | 0.00 |
aRelative ranking (from most support to least) of the four best models for predicting student exam performance using AICc model selection. Models that are informative (ΔAICc < 10) are shown, plus the next best model that had a ΔAICc > 10. The table shows only fixed-effect terms, but all models also include two random-effect terms: student and the instructor whose classes’ students were enrolled in.
bTime = exam number in a course; Cum.GPA = cumulative college GPA at start of introductory biology series; Gender = student’s gender identity; SES = students’ socioeconomic status; W.Diff = median weighted difficulty of an exam; Percent CR = percentage of constructed-response question on an exam; Course = the three courses that are part of the introductory biology sequence.
Increasing the number of constructed-response questions on an exam disproportionately benefits middle/high-SES students, but not male students, relative to low-SES and female students, respectivelya
| Parameter | Relative variable importance | Model averaged regression coefficient ± SE | |
|---|---|---|---|
| Intercept | NA | 0.515 ± 0.0238 | |
| Cum.GPA | 1.00 | 0.165 ± 0.00311 | |
| Course (reference level: course 1) | |||
| Course 2 | 0.85 | 0.0677 ± 0.0381 | 0.0759 |
| Course 3 | 0.85 | 0.0159 ± 0.0247 | 0.521 |
| Exam (reference level: time 1 (exam 1) | |||
| Time 2 (exam 2) | 1.00 | 0.0116 ± 0.00183 | |
| Time 3 (exam 3) | 1.00 | 0.0257 ± 0.00250 | |
| Time 4 (exam 4) | 1.00 | 0.0745 ± 0.00269 | |
| Student gender (reference level: male) | |||
| Female | 1.00 | −0.0252 ± 0.00341 | |
| Student SES status (reference level: middle/high-SES) | |||
| Low-SES | 1.00 | −0.00503 ± 0.00589 | 0.393 |
| Exam characteristics | |||
| W.Diff | 1.00 | −0.243 ± 0.0114 | |
| Percent CR | 1.00 | 0.0789 ± 0.00668 | |
| Student identity × exam characteristics (reference level: male or middle/high-SES) | |||
| Female × percent CR | 0.29 | −0.000607 ± 0.00273 | 0.824 |
| Low-SES × percent CR | 1.00 | −0.0278 ± 0.00645 |
aThe outputs were produced via model averaging of all possible models using the MuMIn package in the program R. Although not shown, the models include two random-effects terms: (1|Stu.ID) + (1|Instr).
bBolded p values are significant.
Figure 2.Increasing the number of constructed-response questions on an exam positively impacts all students’ exam scores, equally benefiting male and female students yet disproportionately favoring middle/high-SES students over low-SES students. The figure shows a point estimate for exam performance (percentage score) for (a) male and female students and (b) middle/high-SES and low-SES students based on the model-averaged regression coefficients. The bars are the regression-model predictors of performance for two hypothetical students with an incoming GPA of 3.27 (the median GPA for all students in our data set) who are either (a) middle/high-SES students who identify as male or female or (b) male students who are classified as middle- to high-SES or low-SES students, both of whom took a moderately difficult exam with a median difficulty of 0.63 (on a scale of 0.33–1). Thus, these students differ from each other in only two ways: the percentage of constructed-response questions on the exam and either (a) their gender (male, unfilled bars; females, filled bars) or (b) their SES status (middle/high-SES, unfilled bars; low-SES, filled bars). The percentage of constructed-response questions, on a scale of 0–1, used to calculate the all restricted-response (RR), mixture of restricted-response and constructed-response (CR), and all constructed-response exams were 0.00, 0.50, and 1.00, respectively. The + indicates a significant overall difference between two groups of students. An asterisk indicates a significant differences between groups of students on a given test. Brackets with percent scores indicate the magnitude of the difference in exam scores for the two students.