| Literature DB >> 29550526 |
Diego L Lorca-Puls1, Andrea Gajardo-Vidal2, Jitrachote White3, Mohamed L Seghier4, Alexander P Leff5, David W Green6, Jenny T Crinion7, Philipp Ludersdorfer3, Thomas M H Hope3, Howard Bowman8, Cathy J Price3.
Abstract
This study investigated how sample size affects the reproducibility of findings from univariate voxel-based lesion-deficit analyses (e.g., voxel-based lesion-symptom mapping and voxel-based morphometry). Our effect of interest was the strength of the mapping between brain damage and speech articulation difficulties, as measured in terms of the proportion of variance explained. First, we identified a region of interest by searching on a voxel-by-voxel basis for brain areas where greater lesion load was associated with poorer speech articulation using a large sample of 360 right-handed English-speaking stroke survivors. We then randomly drew thousands of bootstrap samples from this data set that included either 30, 60, 90, 120, 180, or 360 patients. For each resample, we recorded effect size estimates and p values after conducting exactly the same lesion-deficit analysis within the previously identified region of interest and holding all procedures constant. The results show (1) how often small effect sizes in a heterogeneous population fail to be detected; (2) how effect size and its statistical significance varies with sample size; (3) how low-powered studies (due to small sample sizes) can greatly over-estimate as well as under-estimate effect sizes; and (4) how large sample sizes (N ≥ 90) can yield highly significant p values even when effect sizes are so small that they become trivial in practical terms. The implications of these findings for interpreting the results from univariate voxel-based lesion-deficit analyses are discussed.Entities:
Keywords: Deficit; Lesion; Lesion-symptom; Reproducibility; Speech production; Stroke; Voxel-based
Mesh:
Year: 2018 PMID: 29550526 PMCID: PMC6018568 DOI: 10.1016/j.neuropsychologia.2018.03.014
Source DB: PubMed Journal: Neuropsychologia ISSN: 0028-3932 Impact factor: 3.139
Summary of demographic and clinical data for full sample.
| Age at stroke | 54.4 | |
| onset (years) | 12.9 | |
| Range | 17.2–86.5 | |
| Age at testing | 59.4 | |
| (years) | 12.4 | |
| Range | 21.3–90.0 | |
| Time post-stroke | 4.9 | |
| (years) | 5.2 | |
| Range | 0.2–36.0 | |
| Education | 14.5 | |
| (years) | 3.2 | |
| Range | 10.0–30.0 | |
| Lesion size | 85.7 | |
| (cm3) | 87.6 | |
| Range | 1.5–386.2 | |
| Gender | Males | 250 |
| Females | 110 | |
| Rep-N | Imp/Non | 132/228 |
| 54.4 | ||
| 9.1 | ||
| Writt-PN | Imp/Non | 105/255 |
| 58.6 | ||
| 8.7 | ||
| Recog-M | Imp/Non | 37/323 |
| 53.9 | ||
| 7.0 | ||
| Sem-A | Imp/Non | 36/324 |
| 56.6 | ||
| 6.1 | ||
| AW-P | Imp/Non | 77/283 |
| 57.0 | ||
| 6.8 |
Imp/Non = number of patients with impaired/non-impaired performance.
Missing data: three patients.
Fig. 1Design matrix. The design matrix for Analysis 1 is shown, where the columns represent the subject-specific independent variables (IVs), with one value for each subject, and the rows correspond to the dependent variable (DV) indexing the degree of structural abnormality in the fuzzy lesion images.
Fig. 2Lesion overlap map and region of interest from Analysis 1. (A) Lesion overlap map for the full sample of 360 stroke patients, depicting voxels that were damaged in a minimum of 5 and a maximum of 215 patients. The colour scale indicates the number of patients with overlapping lesions at each given voxel. (B) In red, the region of interest identified in Analysis 1 (i.e. 549 voxels) where a significant association between lesion load and speech articulation abilities was found. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Statistical power in the region of interest.
| Power | 98% | 100% | 100% | 100% | 100% | 100% | |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | ||
| 0.999 | 0.999 | 0.999 | 0.999 | 0.404 | 0.093 | ||
| Power | 99% | 100% | 100% | 100% | 100% | 100% | |
| 0.01 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | ||
| 0.638 | 0.218 | 0.064 | 0.015 | 0.001 | 0.000 | ||
| Power | 63% | 100% | 100% | 100% | 100% | 100% | |
| 0.03 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 | ||
| 0.400 | 0.093 | 0.022 | 0.004 | 0.000 | 0.000 | ||
| Power | 86% | 100% | 100% | 100% | 100% | 100% | |
| 0.06 | 0.07 | 0.08 | 0.08 | 0.09 | 0.10 | ||
| 0.250 | 0.046 | 0.009 | 0.002 | 0.000 | 0.000 | ||
| Power | 92% | 100% | 100% | 100% | 100% | 100% | |
| 0.08 | 0.09 | 0.10 | 0.10 | 0.10 | 0.11 | ||
| 0.158 | 0.025 | 0.004 | 0.001 | 0.000 | 0.000 | ||
| Power | 98% | 100% | 100% | 100% | 100% | 100% | |
| 0.11 | 0.11 | 0.11 | 0.11 | 0.11 | 0.11 | ||
| 0.099 | 0.012 | 0.002 | 0.000 | 0.000 | 0.000 | ||
| Power | 100% | 100% | 100% | 100% | 100% | 100% | |
| 0.15 | 0.14 | 0.13 | 0.13 | 0.13 | 0.12 | ||
| 0.060 | 0.006 | 0.001 | 0.000 | 0.000 | 0.000 | ||
| Power | 83% | 100% | 100% | 100% | 100% | 100% | |
| 0.18 | 0.16 | 0.15 | 0.14 | 0.14 | 0.13 | ||
| 0.032 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 | ||
| Power | 96% | 100% | 100% | 100% | 100% | 100% | |
| 0.23 | 0.19 | 0.17 | 0.16 | 0.15 | 0.14 | ||
| 0.015 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | ||
| Power | 100% | 100% | 100% | 100% | 100% | 100% | |
| 0.30 | 0.23 | 0.21 | 0.19 | 0.18 | 0.16 | ||
| 0.004 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
| Power | 99% | 100% | 100% | 100% | 100% | 100% | |
| 0.79 | 0.52 | 0.39 | 0.39 | 0.38 | 0.28 | ||
| 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
The table shows that in all but one case, more than 80% of the voxels comprising the region of interest from Analysis 1 had sufficient statistical power to detect a significant lesion-deficit association at a threshold of p < 0.05 after correction for multiple comparisons. %tile = percentile of the effect size (R2) distribution; Power = percentage of voxels within the region of interest from Analysis 1 that had sufficient statistical power to detect a significant lesion-deficit association at a statistical threshold of p < 0.05 after correction for multiple comparisons; R2 = R2 value (at a particular decile); P = p value (at a particular decile).
Brain regions where lesion load is associated with speech articulation abilities.
| x | y | z | Z-score | Extent | |||
|---|---|---|---|---|---|---|---|
| Post-Central | − 60 | − 16 | 12 | 5.8 | 0.000 | 549 | < 0.001 |
| − 52 | − 14 | 24 | 4.7 | 0.009 | |||
| − 56 | − 12 | 18 | 4.6 | 0.012 | |||
| Posterior Insula | − 40 | − 16 | 8 | 5.3 | 0.001 | ||
| Anterior SMG | − 66 | − 30 | 20 | 4.7 | 0.008 | ||
| WM | − 48 | − 24 | 26 | 4.6 | 0.010 | ||
The table shows representative (peak) voxels where a significant association between stroke damage and difficulties articulating speech was found. All were in the left hemisphere and the coordinates are reported in MNI space. SMG = supramarginal gyrus; WM = white matter; PFWE-corr = p value corrected (family-wise error correction) for multiple comparisons.
At a cluster-forming voxel-wise threshold of p < 0.05 FWE-corrected.
Fig. 3Effect of interest. Visual illustration of the strength of the relationship between lesion load in the region of interest and nonword repetition scores, after factoring out variance explained by the covariates of no interest (i.e. a plot of the lesion load and nonword repetition residuals; Analysis 1).
Fig. 4Differential sensitivity of effect sizes and p values to sample size. The figure highlights that, while the mean and median of the effect size distributions remained relatively constant across the different sample sizes, the mean and median of the p value distributions exhibited substantial and systematic variability. Box plots depict medians with interquartile ranges and whiskers represent the 5th and 95th percentiles. The crosses indicate the mean for each sample size. The horizontal dashed line in red signals the R2 value obtained in Analysis 1 (including data from all 360 patients), whereas the horizontal dashed line in blue shows the standard alpha level (i.e. 0.05). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 5Distribution of R2 and p values. (A) From left to right, the frequency (in intervals of 0.02) and probability distributions of effect sizes for each sample size. The vertical dotted lines indicate the boundary between non-significant (p ≥ 0.05; to the left) and significant (p < 0.05; to the right) R2 values. (B) From left to right, the frequency (in intervals of 0.05) and probability distributions of p values for each sample size.
Mean and median effect size of the significant and non-significant random data sets by sample size.
| s | ns | s | ns | s | ns | s | ns | s | ns | s | ns | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2214 | 3786 | 4272 | 1728 | 5289 | 711 | 5747 | 253 | 5974 | 26 | 5999 | 1 | |
| 258 | 5742 | 1279 | 4721 | 2613 | 3387 | 3911 | 2089 | 5369 | 631 | 5997 | 3 | |
| 0.26 | 0.07 | 0.16 | 0.04 | 0.13 | 0.03 | 0.12 | 0.02 | 0.12 | 0.01 | 0.11 | – | |
| 0.45 | 0.12 | 0.24 | 0.09 | 0.18 | 0.07 | 0.15 | 0.06 | 0.12 | 0.05 | 0.11 | 0.02 | |
| 0.24 | 0.06 | 0.15 | 0.04 | 0.12 | 0.03 | 0.11 | 0.02 | 0.11 | 0.01 | 0.11 | – | |
| 0.43 | 0.11 | 0.23 | 0.09 | 0.17 | 0.08 | 0.14 | 0.06 | 0.12 | 0.05 | 0.11 | 0.03 | |
| 0.16 | 0.00 | 0.07 | 0.00 | 0.05 | 0.00 | 0.03 | 0.00 | 0.02 | 0.00 | 0.03 | 0.01 | |
| 0.38 | 0.00 | 0.19 | 0.00 | 0.12 | 0.00 | 0.09 | 0.00 | 0.06 | 0.00 | 0.03 | 0.01 | |
| 0.79 | 0.16 | 0.52 | 0.07 | 0.39 | 0.05 | 0.39 | 0.03 | 0.38 | 0.02 | 0.28 | 0.01 | |
| 0.79 | 0.38 | 0.52 | 0.19 | 0.39 | 0.12 | 0.39 | 0.09 | 0.38 | 0.06 | 0.28 | 0.03 | |
For each summary statistic, the upper row indicates the corresponding value when the alpha threshold was set at 0.05, whereas the lower row indicates the corresponding value when the alpha threshold was set at 0.001. Count = the number of resampled data sets that generated significant or non-significant R2 values; s = significant (i.e. p < α); ns = not significant (i.e. p ≥ α); M = mean R2 value; Mdn = median R2 value; Min = minimum R2 value; Max = maximum R2 value.
Frequency of accurate and inaccurate effect size estimates by sample size and statistical significance.
| 173 | 5686 | 140 | 0 | 0 | 1 | |
| 556 | 4925 | 493 | 0 | 0 | 26 | |
| 795 | 4430 | 522 | 0 | 0 | 253 | |
| 1081 | 3887 | 321 | 0 | 0 | 711 | |
| 1417 | 2855 | 0 | 0 | 421 | 1307 | |
| 1873 | 341 | 0 | 0 | 2007 | 1779 | |
The table shows, for each sample size, the frequency with which effect size estimates reached statistical significance (i.e. p < 0.05) and fell within (=) or outside the 95% credible interval (i.e. 0.06–0.18) of the best estimate of the “true” population effect (i.e. R2 = 0.11). 95% CI = 95% credible interval; > = larger than the upper bound of 95% CI; < = smaller than the lower bound of 95% CI.