| Literature DB >> 30142053 |
Abstract
Discipline-based education researchers have a natural laboratory-classrooms, programs, colleges, and universities. Studies that administer treatments to multiple sections, in multiple years, or at multiple institutions are particularly compelling for two reasons: first, the sample sizes increase, and second, the implementation of the treatments can be intentionally designed and carefully monitored, potentially negating the need for additional control variables. However, when studies are implemented in this way, the observations on students are not completely independent; rather, students are clustered in sections, terms, years, or other factors. Here, I demonstrate why this clustering can be problematic in regression analysis. Fortunately, nonindependence of sampling can often be accounted for with random effects in multilevel regression models. Using several examples, including an extended example with R code, this paper illustrates why and how to implement random effects in multilevel modeling. It also provides resources to promote implementation of analyses that control for the nonindependence inherent in many quasi-random sampling designs.Entities:
Mesh:
Year: 2018 PMID: 30142053 PMCID: PMC6234830 DOI: 10.1187/cbe.17-12-0280
Source DB: PubMed Journal: CBE Life Sci Educ ISSN: 1931-7913 Impact factor: 3.325
Glossary of terms used throughout the paper (terms are bolded in the text at first use)
| Term | Definition | Example/synonym/application |
|---|---|---|
| Complex model | Used to mean a model that includes all likely parameters, including parameters that explicitly test the hypothesis of interest | Example: Score ∼ SAT + Sex + Treatment + Sex*Treatment This model includes SAT as a likely parameter that explains score and an interaction between Sex*Treatment and tests whether the treatment has a disproportional effect on students of different sexes. |
| Converge | The process of reaching a solution. Maximum likelihood estimation attempts to find the parameter values that maximize the likelihood function given the observations. If the parameter values cannot be found, the model will not converge. | When models do not converge, R will report a warning or an error message. |
| Estimate | Verb: A model estimates the effects of the parameters. Noun: An approximation of a parameter derived from a sample of individuals | The estimated coefficient in a regression estimates (i.e., approximates) the relationship between performance and SAT scores for a sample of college students. |
| Intraclass correlation (ICC) | The amount of clustering, or nonindependence, within a variable | ρ (“rho”); the ratio of between-cluster variance to total variance |
| Overfit model | A model that considers too many parameters; the penalty of adding an extra parameter vs. the additional variation explained has not been appropriately weighed. | A saturated model is an example of an extreme example of an overfit model. A model does not have to be saturated to be overfit. |
| Parameter | The true value [of something] for all individuals in a population | The parameter is the true relationship between SAT scores and performance for all college students in the country. (A model estimates this value from data from a subset of the whole population; see “Estimate.”) |
| Pseudo-replication | Replication when replicates are not statistically independent | Example: if students are nested in sections, students are not independent, so observations on students exhibit pseudo-replication. |
| Quasi-random | When a study is randomized, but not at the level where observations are made | Example: observations are made on student outcomes, treatments are randomized at the section level. Results in pseudo-replication. As opposed to randomized |
| Saturated model | A model that includes as many parameters as data points | Saturated models should be avoided; see “Overfit model.” |
| Underfit model | A model that does not have enough variables to explain the data | Example: not including prescore as a predictor when modeling postscore |
| Variance | The square of the SD of a sample; describes how far each value in the data set is from the mean | σ2 (where σ is the SD of the sample) |
| Variation | A general term describing the amount of variability in something; it is measured by various quantities, including variance. | Synonyms: spread, dispersion, scatter, variability |
FIGURE 1.Workflow for multilevel modeling and a guide to using this paper. Each “decision point” in this figure represents a critical step in multilevel modeling and corresponds to a section in the paper and a figure and is illustrated with the extended example. R code for analyzing the data from the extended example can be found in Appendix 3 in the Supplemental Material (data are in Appendix 4). The code details calculating the ICC (helpful in determining whether random effects are needed), using model selection to select the appropriate random effect structure, and using model selection to select the appropriate fixed effects.
FIGURE 2.Experimental design in the extended example. Squares represent class days (note that there were more class days in the course and the spacing between interventions is not to scale). Experimental treatment and control are color coded as yellow and blue, respectively. Unique topic is indicated with the letters “Q,” “R,” and “S.” The pre- and posttests (black triangles) on each day were identical. The fact that students took a posttest three times made the students “repeated measure” and warranted students being treated as random effects. Additionally, students were clustered in sections (and section was not synonymous with intervention), so section was tested as a possible random effect. The experimental design was unbalanced, such that section A received the control twice and the treatment once, whereas section B received the treatment twice and the control once.
FIGURE 3.Random effects are important to include when modeling data from (A) nested design studies and (B) repeated-measures studies. In both nested designs and repeated measures, the outcome is quantified on the student level. (Note that this is not always the case in DEBR studies; e.g., a researcher may be interested in which courses have the highest proportion of women, in which case the outcome is likely measured on the section level.) In the nested design illustration (A), students are nested in sections, which are nested in courses, which are nested in years, which are nested in universities. Each of these nested levels, or clusters, is important to account for with random effects. Similarly, in the repeated-measures study (B), the outcome (students’ survey responses) is quantified on the student level, and students take the survey four times. In this case, it is important to account for the fact that students are repeated, so student 1 on survey 1 is not independent from student 1 on survey 2. Including a student random effect accounts for this nonindependence within a student.
FIGURE 4.Researchers can build intuition to determine whether variables should be included as fixed effects or random effects by asking three questions and following this decision tree.
FIGURE 5.Depending on the structure of the data, a fixed effects–only model (A) may not be the best model. There are three main ways to specify random effects in multilevel models: (B) as random intercepts, in which the slope is the same for all groups, but the intercepts vary for each group; (C) as random slopes, in which the intercept is the same for all groups, but the slopes vary for each group; and (D) as random intercepts and random slopes, in which all groups are allowed to have different slopes and different intercepts. Illustrated here, students (points) are nested in six sections (illustrated by the six combinations of filled/unfilled symbols in different shapes). There are two treatments, illustrated by filled and unfilled symbols. The lines show how each approach models the relationship between PreScore and PostScore. In a fixed effects–only model (A), a single regression line is fit for the treatment (filled) and control (unfilled symbol), ignoring sections; in a random effects model (B–D), separate regression lines are fit for each section. In the random intercepts model (B), the intercepts are allowed to vary by section but not the slopes (thus, the lines are parallel); in the random slopes model (C), the slopes are allowed to vary for each section, but not the intercepts (thus, all the lines start at the same place); and in the random intercepts and random slopes model (D), both the intercepts and the slopes are allowed to vary for each section. The overall model fit in B–D is essentially the weighted average of the regression lines for each group and is not shown.
FIGURE 6.Steps in random effect model selection, as recommended by Zuur and Burnham and Anderson (2002). This example was implemented in R.
FIGURE 7.R output from code in Figure 3: (A) step 3 of model selection (comparing AIC values for each model that was fit); (B) model 1, which did not include any random effects; and (C) the best-fitting multilevel model with student as the only random effect. Note that treatment was retained in the final model, noting a treatment effect and supporting the hypothesis that the treatment is positively correlated with postscore.
Random effects can be implemented in regression models that model various types of outcome variablesa
| Implementation in R | |||||
|---|---|---|---|---|---|
| Outcome data type | Example in DBER | Regression type (R function) | With random effect (R function) | R package | R syntax |
| Continuous | Exam points | Linear model (lm) | Linear mixed effects model (lmer) | lme4b | Mod ← lmer(outcome ∼ predictor, data = data) |
| Binary (0/1; yes/no) | Pass/fail | Binomial (glm) | Generalized linear mixed effects model (glmer) | lme4b | Mod ← glmer(outcome ∼ predictor, family = binomial, data = data) |
| Proportion | Proportion of classes attended | Binomial (glm, family = binomial) | Generalized linear mixed effects model (glmer) | lme4b | Mod ← glmer(cbind(numerator, denominator) ∼ predictor, family = binomial, data = data) |
| Count | Number of hand-raises | Poisson (glm, family = Poisson) | Generalized linear mixed effects model (glmer) | lme4b | Mod ← glmer(outcome ∼ predictor, family = Poisson, data = data) |
| Likert; categorical ordinal | Agree–neutral–disagree | Proportional odds or ordered logit (polr) | Cumulative link mixed model (clmm) | ordinalc | Mod ← clmm(as.factor(outcome) ∼ predictor, data = data) |
aSome of the most common types of discipline-based education research (DBER) outcome variables can be categorized as continuous, binary, proportion, count, or on a Likert scale. This table shows the most common types of data in DBER and the corresponding implementation of multilevel models in R, including a recommended R package and corresponding syntax for model specification.
bBates .
cChristensen, 2018.