| Literature DB >> 36246511 |
Carolyn Beth McNabb1, Kou Murayama1,2,3.
Abstract
Nested data structures create statistical dependence that influences the effective sample size and statistical power of a study. Several methods are available for dealing with nested data, including the summary-statistics approach and multilevel modelling (MLM). Recent publications have heralded MLM as the best method for analysing nested data, claiming benefits in power over summary-statistics approaches (e.g., the t-test). However, when cluster size is equal, these approaches are mathematically equivalent. We conducted statistical simulations demonstrating equivalence of MLM and summary-statistics approaches for analysing nested data and provide supportive cases for the utility of the conventional summary-statistics approach in nested experiments. Using statistical simulations, we demonstrate that losses in power in the summary-statistics approach discussed in the previous literature are unsubstantiated. We also show that MLM sometimes suffers from frequent singular fit errors, especially when intraclass correlation is low. There are indeed many situations in which MLM is more appropriate and desirable, but researchers should be aware of the possibility that simpler analysis (i.e., summary-statistics approach) does an equally good or even better job in some situations.Entities:
Keywords: Clustering; Hierarchical linear model; Mixed model; t-test
Year: 2021 PMID: 36246511 PMCID: PMC9559079 DOI: 10.1016/j.crneur.2021.100024
Source DB: PubMed Journal: Curr Res Neurobiol ISSN: 2665-945X
Fig. 1Examples of nested data where clusters are nested within conditions (a and c) and conditions are nested within clusters (b and d). Images adapted from freepik.com and vecteezy.com.
Equivalence of multilevel model and summary-statistics approaches for analysing data with equal cluster sizes.
| Dataset A | Dataset B | |
|---|---|---|
| Nesting description | Clusters within conditions | Conditions within clusters |
| Intensity ∼ Condition + (1 | Cluster) | Intensity ∼ Condition + (1 + Condition | Cluster) | |
| Intercept | .77 (.22) | .49 (.20) |
| Slope | .83 (.31) | .49 (.20) |
| Intercept variance | .23 | .35 |
| Slope variance | .25 | |
| Correlation | -.01 | |
| Within cluster (residual) variance | .51 | .53 |
| Two sample | ||
| Paired samples | ||
Fig. 2Increased power and associated increases in Type I error rate with use of multilevel modelling (MLM) with log-likelihood test (purple) compared with summary-statistics approach (red) when number of clusters is few. Power (a) and Type I error rate (b) are shown for all 10,000 simulations, including data that resulted in singular fit or convergence warnings with MLM. Data are shown for small (solid circle), medium (solid triangle) and large (solid square) effect sizes (Cohen’s d) and intra-class correlation (ICC) corresponding to the majority of total variance being due to within-group variance (ICC = .1, dotted line) and more total variance being due to between-group variance (ICC = .5, dashed line). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Fig. 3Proportion of total simulations resulting in singular fit errors when using restricted maximum likelihood estimation (REML) under conditions of varying effect size (Cohen’s d) and intra-class correlation (ICC). Plots show the proportion of simulations resulting in singular fits when a) ICC = .1 and b) ICC = .5.