| Literature DB >> 34529303 |
Martin Fungisai Gerchen1,2,3, Peter Kirsch1,2,3, Gordon Benedikt Feld1,3,4,5.
Abstract
Null hypothesis significance testing is the major statistical procedure in fMRI, but provides only a rather limited picture of the effects in a data set. When sample size and power is low relying only on strict significance testing may lead to a host of false negative findings. In contrast, with very large data sets virtually every voxel might become significant. It is thus desirable to complement significance testing with procedures like inferiority and equivalence tests that allow to formally compare effect sizes within and between data sets and offer novel approaches to obtain insight into fMRI data. The major component of these tests are estimates of standardized effect sizes and their confidence intervals. Here, we show how Hedges' g, the bias corrected version of Cohen's d, and its confidence interval can be obtained from SPM t maps. We then demonstrate how these values can be used to evaluate whether nonsignificant effects are really statistically smaller than significant effects to obtain "regions of undecidability" within a data set, and to test for the replicability and lateralization of effects. This method allows the analysis of fMRI data beyond point estimates enabling researchers to take measurement uncertainty into account when interpreting their findings.Entities:
Keywords: Hedge's g; confidence interval; equivalence tests; functional magnetic resonance imaging; null hypothesis significance testing
Mesh:
Year: 2021 PMID: 34529303 PMCID: PMC8596945 DOI: 10.1002/hbm.25664
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.038
FIGURE 1Data simulation. (a) Three effects located at I, II, and III (d = 0.28, d = 0.50, and d = 0.50, respectively) were generated for one fMRI‐slice in a simulated dataset that compared two conditions (two sample t‐test, see supplement for details). (b) The left panel shows the effect size per voxel for a large sample (n = 500 per group) drawn from the simulated population. The plane cuts the 3‐d graph at voxel 200, where the effects were inserted, and the red line marks the maximum effect size. In the right panel, all effect sizes lying on the plane are shown with the 90% and 99.9% CI added. The lower bound of the 99.9% CI in location I, II, and III is above 0 indicating that these voxels would be significant in an uncorrected whole‐brain t‐test with α = 0.001. This means that in a large sample all three effects that were inserted into the data can be recovered. In addition, the maximum effect size (red line, effect at III) is larger than the 90% CI of the effect located I, which following the logic of equivalence testing, would enable to conclude that the effect in I is smaller than the effect in III. The same is not true for effects III and II as the red line cuts the 90% CI of effect II. (c) This panel shows the same as (b) however of a smaller sample (n = 50 per group). On the left it is evident that the effect sizes that are being estimated are much noisier, which is a result of the smaller sample size. On the right side it is evident that the CIs are also much enlarged, showing that the point estimate of the effect is much more uncertain. Consequently, only the effect in II is significant at the whole‐brain threshold (p < .001). Importantly, we are also able to determine that most other voxels on this plane have effects that are smaller than the maximum effect found in the significant cluster of voxels. However, there is a cluster of voxels that are not significantly different from 0 at III, but that can also not be determined to be smaller than the effect present at II. Since the ground truth of the simulation is known this makes sense. Our method enables to identify such clusters in the whole brain and thereby allows deciding which brain areas can be excluded from being a relevant driver of certain behaviors and which cannot. Of course, the chosen threshold (peak voxel in this simulated case) will strongly influence the interpretation. Please see our use cases for indications on useful thresholds
FIGURE 2“Maps of Undecidability”—One‐Sample t‐test. Results for a monetary incentive delay task in a sample of n = 32 participants with Alcohol Use Disorder and n = 35 healthy controls. (a) Activation (p < .05 whole‐brain FWE corr.) for the main effect of the anticipation of monetary reward compared to the anticipation of verbal feedback in the whole sample showing a strong activation in bilateral striatum. (b) Map of ES g for the activations shown in (a). (c) Areas of undecidability in yellow are marking voxels for which ES 90% CI included the median effect size (g = 0.66) in the significant clusters. For comparison, uncorrected activation (p < .001 unc.) is shown in red. In this specific example the areas of undecidability are largely overlapping with the uncorrected activation and are just slightly more spatially restricted. Please note that this correspondence depends on the exact chosen reference value and the properties of the specific data set for a given analysis. Reanalyzed data from Becker et al. (2017)
FIGURE 3“Maps of Undecidability”—Two‐Sample t‐test. Results of the group comparison for the monetary incentive delay task comparing participants with Alcohol Use Disorder and healthy controls. (a) Activation for the group comparison (AUD > HC) based on ROI analyses in the left and right nucleus accumbens (p < .025 FWE ROI analyses in each of the two ROIs). Participants with Alcohol Use Disorder showed stronger reactions in the nucleus accumbens than healthy controls. See Becker et al. (2017) for further details and discussion. (b) Map of ES g for the activations shown in (a). (c) Areas of undecidability in yellow are marking voxels for which ES 90% CI included the median ES (g = 0.7289) in the significant clusters. For comparison, uncorrected activation (p < .001 unc.) is shown in red. In this example, the uncorrected activation is very restricted and the areas of undecidability are rather large and extend well beyond. Reanalyzed data from Becker et al. (2017)
FIGURE 4Replication of Effects. Results from the encoding phase of an episodic memory task are shown. (a) Activation (p < .05 whole‐brain FWE corr.) for the contrast encoding > control in the reference sample of n1 = 54 participants. (b) Activation (p < .05 whole‐brain FWE corr.) for the contrast encoding > control in the replication sample of n2 = 82 participants. Both samples were acquired in the same project with the same protocol but at different sites. (c) Map of ES g for the activations shown in (a). (d) Yellow marks voxels where the ES 90% CI in the replication sample includes the ES of the voxel in the reference sample, which we define as a replication of the original effect size. Red circles: Area where the effect was significant in both samples but the reference ES did not replicate. Please note that only voxels are shown where the reference effect size was g > 0. Data from Gerchen and Kirsch (2017)
FIGURE 5Lateralization of Effects. Results for a written statement presentation task are shown. (a) Activation (p < .05 whole‐brain FWE corr.) for the main effect of written statement presentation in N = 30 healthy participants. (b) Map of ES g for the activations shown in (a). (c) Regions (yellow) that cannot be assumed to be smaller than the 75th percentile ES in the reference cluster (red) including Broca's area. Red circles: Area in the right inferior frontal gyrus, suggesting contralateral effects in our data that cannot be shown to have a smaller effect than the reference cluster. (d) Regions (yellow) that cannot be assumed to be smaller than the 75th percentile ES in the reference cluster (red) in the left ventral occipito‐temporal cortex. Red circles: Inferior effects in the corresponding right left ventral occipito‐temporal cortex, suggesting lateralization of effects in our data. Unpublished data by M.F. Gerchen