| Literature DB >> 31164847 |
Denise Kerkhoff1,2, Fridtjof W Nussbeck1,2.
Abstract
In educational psychology, observational units are oftentimes nested within superordinate groups. Researchers need to account for hierarchy in the data by means of multilevel modeling, but especially in three-level longitudinal models, it is often unclear which sample size is necessary for reliable parameter estimation. To address this question, we generated a population dataset based on a study in the field of educational psychology, consisting of 3000 classrooms (level-3) with 55000 students (level-2) measured at 5 occasions (level-1), including predictors on each level and interaction effects. Drawing from this data, we realized 1000 random samples each for various sample and missing value conditions and compared analysis results with the true population parameters. We found that sampling at least 15 level-2 units each in 35 level-3 units results in unbiased fixed effects estimates, whereas higher-level random effects variance estimates require larger samples. Overall, increasing the level-2 sample size most strongly improves estimation soundness. We further discuss how data characteristics influence parameter estimation and provide specific sample size recommendations.Entities:
Keywords: parameter estimation; power analysis; random effects model; sample size; three-level model
Year: 2019 PMID: 31164847 PMCID: PMC6536630 DOI: 10.3389/fpsyg.2019.01067
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Investigated parameters of studies examining the impact of sample sizes on estimation results in two-level models.
| Authors (Year) | Simulation conditions: | Other conditions | |
|---|---|---|---|
| M | Missing values (singletons): 0%, 10%, 30%, 50%, MCAR and MAR | ||
| S | |||
| P | ICC: 0.1, 0.2, 0.3, 0.5, 0.7, ε2 | ||
| M | Logistic model, | Estimation: generalized linear mixed model and hierarchical linear model | |
| S | |||
| P | γ00 = -2.47, τ2 | ||
| M | |||
| S | |||
| P | ICC: 0.1, 0.2, 0.3, ε2 | ||
| M | – | ||
| S | |||
| P | ICC: 0.1, 0.2, 0.3, ε2 | ||
| M | Logistic model, | – | |
| S | |||
| P | ICC: 0.04, 0.17, 0.38, τ2 | ||
| M | Variable reliability: | ||
| S | 0.8, 0.9, 1.0 | ||
| P | ICC: 0.15, 0.30, ε2 | level 1 slopes | |
Selected results and recommendations of studies examining the impact of sample sizes on estimation results in three-level models.
| Authors (Year) | Results | Sample size recommendations |
|---|---|---|
| Higher MDESD/smaller power for higher-level | – | |
| Smaller MDESD/ larger power for | ||
| Power ≥80% for level-1/level-2 | ||
| Fixed effects: | If the third level is incidental and not of research interest, a fixed-effects-only model is appropriate, as results are acceptable even for | |
| τ2 | ||
| coverage < 0.9 for level-3 predictors | ||
| REML gives better results than ML | ||
| – | ||
| Sample sizes should be large at the level where randomization takes place (i.e., level-2 randomization requires larger N2, level-3 randomization requires larger N3) | ||
| power ≥ 80% for | ||
| power ≥ 80% for | ||
| Numbers of measurements, adding a significant slope predictor or 25% | ||
| dropout have negligible effect on power. | ||
| ES = 0.3: power ≥ 0.8 for | To increase power, more N3 should be sampled, since the impact of N3 on power is larger than N2 on power | |
| ES = 0.4: power ≥ 0.8 for | ||
| ES = 0.5: power ≥ 0.8 for | ||
| Empirical power < theoretical power | ||
| Increasing N3 impacts power in all conditions | – | |
| N1 and N2 have non-substantial effect on power | ||
| Two-level analyses decrease power substantially | ||
| Including covariates increases power, e.g., | ||
| ES = 0.5, ICC3 = 0.1, ICC2 = 0.067: power = 89% with covariates (power = 66% without covariates) | ||
Investigated parameters of studies examining the impact of sample sizes on estimation results in three-level models.
| Authors (Year) | Simulation conditions: | Other conditions | |
|---|---|---|---|
| M | Moderator slope variation: Non-randomly varying or random | ||
| S | N3: 40, 80, N2: 5, 10, N1: 20 | ||
| P | ICC3 = 0.15, ICC2 = 0.08, variance explained by | ||
| M | Estimator: ML, REML or corrected REML; Model: three-level or misspecified as two-level model | ||
| S | N3: 4, 7, 10, N2: 15 to 25, N1: 10 to 20 | ||
| P | ICC3 = 0.05, 0.15, ICC2 = 0.2 ES = 0.1 ( | ||
| ε2 | |||
| M | – | ||
| S | |||
| P | ICC3 = 0.03, ICC2 = 0.02 | ||
| M | Dropout: 0–25%, early, throughout the study duration, or late | ||
| S | |||
| P | ICC3 = 0.18, ICC2 = 0.75, δ000 = 78.37/78.38 (model B), δ100 = -12.3/-10.56, | ||
| δ010 = -4.16, ε2 | |||
| M | – | ||
| S | |||
| P | ICC3 = 0.01, 0.05, 0.1, ICC2 = 0.4, 0.5, 0.6, δ001 = 0.3, 0.4, 0.5, δ000 = 0 | ||
| M | Misspecification (two-level analysis) | ||
| S | |||
| P | ICC3 = 0.1, 0.2, ICC2 = 0.067, 0.134, ES = 0.2, 0.5, variance explained by covariates: 50% | ||
Selected results and recommendations of studies examining the impact of sample sizes on estimation results in two-level models.
| Authors (Year) | Results | Sample size recommendations |
|---|---|---|
| τ2 | coverage: | |
| Fixed effects and r.e. variance component | ||
| Fixed effects and r.e. variance component | ||
| High number of groups | ||
| fixed effect: 93–95% | Standard error estimates not interpretable | |
| ε2 | Non-parametric approach is advised | |
| τ2 | ||
| Fixed and random effects | ||
| 15% underestimation of τ2 | ||
| fixed effect standard errors: 5–6% | ||
| ε2 | ||
| τ2 | ||
| Fixed effects overall | ||
| τ2 | ||
| Fixed effects and r.e. variance estimates | ||
| Fixed effect: 3–7% | ||
| τ2 | ||
| τ2 | ||
| Average power across all conditions: 0.192 | power to detect CLI can be increased by sampling more N1 units, rather than N2 units. N1 sample size is most important if additionally, lower level effects are of interest. | |
| CLI power mainly influenced by CLI effect size, N1, N2, and standard deviation of slopes. | ||
| power>80% for | ||
ML growth curve results by Maulana et al. (2013).
| Influence on controlled motivation | ||
|---|---|---|
| Variable | Coefficient | |
| Fixed effects level 1 | ||
| | 0.0860 | |
| | 0.0205 | |
| | 0.0694 | |
| Fixed effects level 2 | ||
| | 0.0647 | |
| Fixed effects level 3 | ||
| subject taught | 0.0314 | 0.0711 |
| | 0.0781 | |
| Interaction effects | ||
| | 0.0107 | |
| time × subject taught | -0.0055 | 0.0106 |
| | 0.0081 | |
| | - | 0.0015 |
| | - | 0.0848 |
| Random variance level 3 | ||
| | 0.0073 | |
| intercept × time | 0.0001 | 0.0008 |
| | 0.0002 | |
| Random variance level 2 | ||
| | 0.0346 | |
| intercept × time | -0.0069 | 0.0040 |
| | 0.0006 | |
| Random variance level 1 | ||
| | 0.0123 | |
Percentage of students data used for analysis for each missing value pattern.
| T0 | T1 | T2 | T3 | T4 | |
|---|---|---|---|---|---|
| COM | 100% | 100% | 100% | 100% | 100% |
| MCAR | R | R | R | R | R |
| DrOP2 | 100% | 100% | 100% | 80% | 80% |
| DrOP3 | 100% | 100% | 80% | 80% | 80% |
Population characteristics and values reported by Maulana et al. (2013).
| Fraction [Study fraction] in % | |||
|---|---|---|---|
| Influencea | 0.42 [0.42] | 0.36 [0.36] | – |
| Motivationa | 3.54 [3.55] | 0.73 [0.70] | – |
| Genderb | |||
| girls | – | – | 55.67 [56] |
| boys | – | – | 44.33 [44] |
| Classtypec | |||
| heterogeneous | – | – | 50.70 [52] |
| homogeneous | – | – | 49.30 [48] |
Results of the MLA for the population.
| Model: influence on controlled motivationa | ||
|---|---|---|
| Variable | Coefficient | |
| intercept | 3.2110 | |
| time | 0.0529 | |
| influence | 0.1465 | |
| gender | 0.0083 | |
| classtype | 0.2845 | |
| classtype × time | 0.0100 | |
| gender × time | 0.0057 | |
| influence × classtype | -0.1419 | |
| time × time | -0.0038 | |
| intercept | 0.0045 | |
| time | 0.0002 | |
| intercept | 0.1926 | |
| time | 0.0009 | |
| residual | 0.2626 | |
Random effects variance components and ICC values for the multilevel model without predictors (empty model) for the population data.
| Variance component | ICC | |
|---|---|---|
| Level-1 (Residual) | ε2 | – |
| Level-2 Intercept | τ2 | ρlevel2 (a) = 0.44 ρlevel2 (b) = 0.39 |
| Level-3 Intercept | σ2 | ρclasses = 0.05 |
Effect sizes of predictors and random effects variance components.
| Regression coefficient | standard deviation of predictor variable | Effect sizea | |
|---|---|---|---|
| Predictors level 1 | |||
| time | 0.05 | 3.63 | 0.75 |
| influence | 0.15 | 0.36 | 0.21 |
| Predictor level 2 | |||
| gender | 0.01 | – | 0.02 |
| Predictor level 3 | |||
| classtype | 0.28 | – | 0.56 |
| Interaction effects | |||
| time × time | -0.0038 | 37.51 | -0.55 |
| gender × time | 0.01 | 3.51 | 0.08 |
| classtype × time | 0.01 | 3.40 | 0.13 |
| influence × classtype | -0.14 | 0.33 | -0.18 |
| Level-2 intercept (a) | 0.20 | 0.44 | 1.76 |
| Level-2 intercept (b) | 0.20 | 0.39 | 1.59 |
| Level-3 intercept | 0.03 | 0.05 | 0.46 |
Parameter estimation bias (peb) of samples without missing values for fixed effects.
| Level 1 | Level 2 | Level 3 | Interaction effects | ||||||
|---|---|---|---|---|---|---|---|---|---|
| N2/N3 | Intercept | Time | Influence | Gender | Classtype | time × time | classtype × time | gender × time | influence × classtype |
| 5/15 | 0.003 | 0.056 | -0.054 | 0.011 | 0.003 | ||||
| 5/35 | 0.005 | 0.003 | -0.040 | -0.087 | |||||
| 5/55 | 0.006 | 0.025 | -0.014 | ||||||
| 15/15 | -0.003 | 0.007 | 0.024 | 0.026 | 0.002 | 0.082 | 0.017 | ||
| 15/35 | -0.002 | 0.007 | 0.032 | 0.035 | 0.011 | -0.088 | 0.063 | ||
| 15/55 | -0.003 | 0.019 | 0.028 | 0.038 | 0.025 | 0.094 | 0.049 | ||
| 35/15 | < 0.001 | -0.012 | 0.002 | - | 0.012 | -0.009 | 0.034 | 0.056 | -0.012 |
| 35/35 | 0.001 | -0.011 | 0.010 | 0.012 | < 0.001 | 0.031 | 0.004 | ||
| 35/55 | < 0.001 | -0.006 | 0.011 | 0.015 | 0.001 | 0.045 | 0.004 | ||
Parameter estimation bias (peb) of samples without missing values for variances of random effects.
| N2/N3 | Level-3 intercept | Level-3 slope | Level-2 intercept | Level-2 slope | Level-1 residual |
|---|---|---|---|---|---|
| 5/15 | -0.067 | 0.002 | |||
| 5/35 | -0.066 | 0.060 | 0.007 | ||
| 5/55 | 0.063 | -0.060 | 0.090 | 0.011 | |
| 15/15 | -0.025 | -0.040 | 0.001 | ||
| 15/35 | 0.099 | -0.014 | -0.031 | < 0.001 | |
| 15/55 | 0.084 | -0.014 | -0.034 | 0.002 | |
| 35/15 | 0.010 | 0.004 | -0.014 | -0.001 | |
| 35/35 | -0.078 | -0.028 | 0.012 | -0.009 | -0.002 |
| 55/55 | -0.105 | -0.028 | 0.013 | -0.013 | -0.002 |
FIGURE 1Parameter estimation bias (peb) for level-2 predictor gender, cross-level interaction classtype × time, and level-3 random intercept variance, in relation to the standard deviation of their estimated parameter values over 1000 simulation runs for each sample size condition. Shades differentiate between numbers of students per class (N2), shapes differentiate between numbers of classrooms per sample (N3). As results are very similar between samples with and without missing values, plots do not differentiate between missing value patterns.
FIGURE 2Statistical power for the time effect, level-1 predictor influence, level-3 predictor classtype, and the cross-level interaction of influence × classtype. The sample size condition is displayed as “level-2 sample size/level-3 sample size,” e.g., 5/15 corresponds to “5 students each in 15 sampled classrooms.” The missing value conditions (see section “Materials and Methods” and Table 6) are displayed by the type and shade of lines. The light dotted line marks the minimum sufficient power of 80%.