Literature DB >> 25048024

A review of fMRI simulation studies.

Marijke Welvaert1, Yves Rosseel1.   

Abstract

Simulation studies that validate statistical techniques for fMRI data are challenging due to the complexity of the data. Therefore, it is not surprising that no common data generating process is available (i.e. several models can be found to model BOLD activation and noise). Based on a literature search, a database of simulation studies was compiled. The information in this database was analysed and critically evaluated focusing on the parameters in the simulation design, the adopted model to generate fMRI data, and on how the simulation studies are reported. Our literature analysis demonstrates that many fMRI simulation studies do not report a thorough experimental design and almost consistently ignore crucial knowledge on how fMRI data are acquired. Advice is provided on how the quality of fMRI simulation studies can be improved.

Entities:  

Mesh:

Year:  2014        PMID: 25048024      PMCID: PMC4105464          DOI: 10.1371/journal.pone.0101953

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Twenty years ago, functional magnetic resonance imaging (fMRI) was established as a method to measure brain activity [1], [2]. In these past twenty years, this technique has been used increasingly and has pioneered the search to map and connect the brain that has caused a world-wide collaboration of scientists from different disciplines. Engineers and physicists are intrigued by the acquisition of the fMRI data, while physicians and psychologists are challenged to adapt their behavioural experimental protocols to the scanner environment. Last but not least, the analysis of fMRI data has been, and still is, a topic of numerous discussions among statisticians. The latter are confronted with the fact that the data acquired through fMRI have no ground truth. This ground truth is needed to ensure validation of the statistical methods that are used to analyse the data and to assess statistical properties such as sensitivity, specificity, bias and robustness. Great efforts to establish this ground truth have gone into the development of mechanical models [3], while direct measuring of the neural activity with intracranial EEG (iEEG) offers another solution [4]. However, for most studies iEEG may not be feasible and simulations may be the only realistic approach to establish the ground truth of fMRI data. NeuroImage, one of the flagship journals in the neuroimaging community, celebrated the 20th anniversary of the first fMRI publications with a special volume that consisted of 103 reviews about the early beginnings, developments in acquisition, software, processing and methodology, and prospectives for the future of fMRI [5]. Although the advances in statistical methods for fMRI data are discussed in several of these reviews, simulations an sich are not mentioned. In general, it appears that simulation studies are still not standard practice for fMRI methods validation. A possible explanation is that it can be quite challenging to simulate fMRI data. Not only is the coupling between the neural activity and the Blood Oxygenation Dependent Level (BOLD) not completely understood [6], fMRI data are also characterised by a great deal of noise coming from multiple sources [7]. Consequently, no common data generating process for fMRI data is available and the data generation in fMRI simulation studies is mostly defined ad hoc. The goal of this review is to provide an overview of the most common data generation methods used in fMRI simulation studies. An established and accepted data generating process does not yet exist and therefore an investigation of the existing published models is called for. In particular, the validity of these data generating methods is analysed and the overall reporting and conduct of fMRI simulation studies is critically reviewed. The rest of the paper is organised as follows: In the Methods section the article selection criteria are reported that were applied to establish a database of fMRI simulation studies literature, and the focus points of the article evaluation are discussed. The Results section focuses on different aspects of the simulation studies, namely, the goals of the studies, the experimental design under investigation, the simulation parameters and the data generation models. Finally, in the Discussion, best practice recommendations are provided to increase the reliability and generalisability of fMRI simulation studies.

Materials and Methods

Article selection

Articles were selected from the Web of Science database using the following query: “fmri AND simulation AND (statistics OR data analysis)”. By excluding articles labelled as reviews or proceedings, this search resulted in 318 hits (Result as of January, 1st 2013). All these articles were manually inspected on content and relevance. This screening resulted in excluding articles based on the following criteria: the conducted simulations were for another modality (e.g. PET, EEG, MEG, …); no time series were simulated (e.g. inference methods are often validated on simulated statistical maps); non-human fMRI was simulated; and no simulation study was conducted (e.g. papers presenting simulation software). After exclusion, the remaining 119 articles were taken into account in this analysis. Full bibliographic details of our sample can be found in the Supporting Information (Table S1). These articles were published in 39 peer-reviewed academic journals (Table 1) over a period of 16 years (Figure 1). In this sample, most simulation studies were published in NeuroImage (37), Human Brain Mapping (11), IEEE Transactions on Medical Imaging (10), Magnetic Resonance Imaging (7), IEEE Transaction on Biomedical Engineering (6) and the Journal of Magnetic Resonance Imaging (6).
Table 1

Overview of journals in the survey. Full details of the included studies can be found in the supporting information (Table S1).

Journal titleNumber of articles
NeuroImage37
Human Brain Mapping11
IEEE Transactions on Medical Imaging10
Magnetic Resonance Imaging7
IEEE Transactions on Biomedical Engineering6
Journal of Magnetic Resonance Imaging6
Magnetic Resonance in Medicine4
Other38
Figure 1

Overview of number of articles for each publication year included in the survey.

During our article selection, we focused on simulation studies conducted to validate or compare analysis procedures for BOLD-fMRI data. In order to perform this validation, a data generating process results in artificial data that reflect to some degree the characteristics of real measured fMRI data. From a statistical perspective, scanning parameters that influence magnetic properties of the data (e.g. flip angle) are of less importance since they mainly have an effect on the signal-to-noise ratio. For instance, when these scanning parameters are optimised, the baseline signal might increase while the noise level decreases. The crucial aspect is to determine the components in the data that are expected to have an effect on the data analysis and model these components while generating the simulated fMRI data.

Article evaluation

In the present study, we analysed the sections describing the simulation study for the selected papers. Where necessary the appendices or supplementary materials were also included and whenever there was still missing information after screening these sections, the whole paper was searched for this information. Only the reported methodology was evaluated (i.e. no authors were contacted for more information). There might be a discrepancy between the conducted and reported simulation studies (e.g. not all details are mentioned), however, to ensure reproducible science all critical elements should be reported. It may not always be feasible to report everything in the main text, but academic journals allow for crucial content to be described in appendices or through online supplementary materials. For each study we evaluated the goal of the simulation study, the simulation parameters and the data generating process. In the case that multiple simulation studies were present in the article, this information was retrieved from the most complex case that was described. In the Results section, summarised results are presented. For a detailed results list on the individual study level, the reader is referred to Table 2.
Table 2

Detailed results for the analysis of the fMRI simulation database, the ID numbers refer to the references in Table S1.

IDAuth.JournalYearModelDesigndim.nSrepparVparJHRFmHRFvtypeNoise modelNoise corr.
1Afshin-PourHum Brain Map2011Mutual Informationblock4Dno30yesnogammayessyntheticGaussiantemporal
2AllenNeuroImage2012ICArest3Dyes1yesyescanonicalnosyntheticRiciannone
3AndradeHum Brain Map2001Cortical Surface Mappingrest2Dno500yesnosquare wavenosyntheticGaussianspatial
4BackfriederPhys Med Bio1996PCAblock3Dno1yesnosquare wavenosyntheticGaussiannone
5BaiStat Sin2008ICAblock4Dno1noyessinusoidalnosyntheticUniformnone
+physiological
6BellecMagn Res Imag2009Parametric Bootstrapblock4Dyes1yesyescanonicalyessyntheticGaussian + driftnone
+physiological
7BellecNeuroImage2010cluster analysisrest3Dyes1yesnononesyntheticGaussiantemporal
8BirnNeuroImage2004GLMER1Dno1yesyesgammayessyntheticGaussiannone
+motion
9BiswalJ Comp Ass Tom1999ICArest3Dno1noyessinusoidalnosyntheticGaussiannone
10BrezgerJ Roy Stat Soc C2007spatial smoothingblock3Dno1nonosquare wavenosyntheticGaussianspatial
& temporal
11CabellaBraz J Phys2008waveletsER1Dno nonocanonicalnosyntheticGaussiannone
12CabellaPhys A2009waveletsblock1Dno yesyescanonicalnosyntheticGaussiannone
13CalhounJ VLSI2006ICArest1Dno30yesnosinusoidalnosyntheticGaussiannone
14CalhounNeuroImage2004GLMblock1Dno1yesnocanonicalyessyntheticGaussiannone
15CalhounNeuroImage2005Spatio-temporalrest3Dno100yesnosinusoidalnosyntheticGaussianspatial
16CasanovaNeuroImage2008BOLD estimationER1Dno200yesnocanonicalnosyntheticGaussian + drifttemporal
17CasanovaPhysio Meas2009BOLD estimationER3Dno200yesnocanonicalnosyntheticGaussian + driftnone
18ChenIEEE T Biomed Eng2006ICAblock3Dno1noyessquare wavenosyntheticGaussiannone
19ChenMagn Res Imag2004ICAblock3Dno1yesnononesyntheticGaussiannone
20ChenBrain Topo2003cluster analysisblock3Dno1yesnosquare wavenosyntheticGaussiannone
21ChenNeuroImage2003t-testblock4Dno1noyessquare wavenohybrid
22ChurchillPlos One2012preprocessingblock3Dno500yesnocanonicalnosyntheticGaussianspatial
23De MartinoNeuroImage2008classificationblock4Dno1yesyescanonicalyessyntheticGaussiantemporal
24De MazièreJ Magn Res2007non-parametricblock1Dno yesyescanonicalnosyntheticGaussiantemporal
25Den DekkerIEEE T Med Imag2009LRTblock1Dno1000yesyescanonicalnosyntheticGaussiantemporal
26DescoHum Brain Map2001waveletsblock3Dno4yesyessquare wavenosyntheticGaussianspatial
27DeshpandeIEEE T Biomed Eng2010Granger causalityrest1Dno yesnocanonicalnosyntheticGaussiantemporal
28DesmondJ Neurosc Meth2002GLMblock1Dyes1000yesyessquare wavenosyntheticGaussiannone
29DimitriadouArt Intel Med2004cluster analysisblock3Dno1yesyessquare wavenosyntheticGaussiannone
30EspositoCurr Opinion Neuro2011ICArest4Dno2yesnononesyntheticGaussiannone
31FadiliMed Imag Anal2001cluster analysisblock3Dno1yesyesPoissonnosyntheticGaussian + driftnone
+physiological
32SunIEEE T Inf Biomed2010GLMblock4D1yesnosquare wavenohybrid
33GavrilescuNeuroImage2002Granger causalityrest1Dno200nonogammayessyntheticGaussiannone
34GoebelMagn Res Imag2003adaptive thresholdingblock3Dno500noyessquare wavenosyntheticGaussianspatial
35GorgolewskiFront Hum Neurosc2012GLMER1Dno yesyescanonicalnosyntheticGaussiantemporal
36GrinbandNeuroImage2008Bayesian inferenceER3DnoNAyesnocanonicalyessyntheticGaussiannone
37GrovesNeuroImage2009ICAER4Dno1yesyesgammayeshybrid
38GuNeuroImage2001cluster analysisnone3Dyes1noyesestimatednohybrid
39GuoStat Int2010ICArest3Dyes1yesyessinusoidalnosyntheticGaussiannone
40GuoNeuroImage2008cluster analysisblock3Dno100yesnosquare wavenosyntheticGaussianspatial
41HellerNeuroImage2006ICAblock4Dno10yesnogammayeshybrid
42HuNeuroImage2005STAP algorithmblock4Dno7yesnosquare wavenohybrid
43HuangIEEE T Biomed Eng2009cluster analysisblock3Dno1yesnogammayeshybrid
44JahanianMagn Res Imag2004connectivityblock4Dno15yesnocanonicalnosyntheticGaussiannone
45JoelMagn Res Med2011BOLD estimationER1Dno100yesyesBalloonnosyntheticGaussiannone
46JohnstonNeuroImage2008denoisingER3Dno1yesnosquare waveyeshybrid
47KadahIEEE T Biomed Eng2004mixed effectsblock1Dno500noyessquare wavenosyntheticGaussianspatial
& temporal
48KangJ Am Stat Ass2012connectivityblock1D1yesnogammayessyntheticnonenone
49KimMagn Res Imag2008GLMER3Dno1yesyescanonicalnosyntheticGaussiantemporal
50KimInt J Imag Sys Tech2011fractal scalingER3Dno1yesyesgammanohybrid
51LeeNeuroImage2008GLMblock3Dno1nonosquare wavenosyntheticGaussiannone
52LeeIEEE T Med Imag2011ICAblock3Dno100yesyessquare wavenosyntheticUniform + drifttemporal
+physiological
53LeeJ Am Stat Ass2011ICAblock3D1nonogammayessyntheticGaussiannone
54LeiNeuroImage2010ICAER4D1yesnocanonicalyeshybrid
55LeVanHum Brain Map2009GLMER3Dyes400yesnosquare wavenosyntheticGaussianspatial
56LiaoIEEE T Med Imag2005GLMER3Dyes1000yesyessquare wavenosyntheticGaussianspatial
+Chi-square
57LiaoMag Res Med2006ICArest3Dno1nonosinusoidalyessyntheticsuper Gaussianspatial
58LiaoIEEE T Med Imag2008ICArest4Dno1yesyessinusoidalyessyntheticsuper Gaussianspatial
59LindquistHum Brain Map2008cluster analysisER3Dno1yesnosquare waveyessyntheticGaussiannone
60LindquistNeuroImage2007PCA/ICAblock2Dno30nonocanonicalnosyntheticmixture Gaussiantemporal
61LinMag Res Med2005t-testblock4Dno50yesnosquare wavenohybrid
62LinNeuroImage2003ICArest3Dno1yesyescanonicalnosyntheticGaussiannone
63LinHum Brain Map2010spatial smoothingblock3Dno1000nonosquare wavenosyntheticGaussiannone
64LiIEEE T Med Imag2012change-point theoryblock3Dno1nonosquare wavenosyntheticGaussiantemporal
65LiJ Roy Stat Soc B2011GLMblock3Dno1000yesyessquare wavenosyntheticGaussianspatial
66LoganNeuroImage2004residual analysisblock1Dno1000yesnocanonicalnosyntheticGaussiannone
67LohStat Sin2008ICAblock3Dno50yesnocanonicalnosyntheticGaussiannone
68LongHum Brain Map2009GLMblock1Dno yesyessquare wavenosyntheticGaussian + driftnone
69LoweJ Comp Ass Tom1999cluster analysisER3Dno1yesnogammanosyntheticGaussiannone
70LuJ Mag Res Imag2006spatio-temporalblock3Dno1yesnocanonicalnosyntheticGaussiannone
71LuoInt J Neural Sys2006correlationblock1Dno1000yesyesgammanosyntheticGaussian + driftnone
72MacIntoshHum Brain Map2003BOLD estimationER1Dno1000yesnocanonicalnosyntheticGaussian + driftnone
73MarrelecHum Brain Map2003ICArest3Dyes1yesnosinusoidalnosyntheticsuper Gaussiannone
74MoosmannInt J Psychophysio2008spectral analysisrest3Dno1000yesyessinusoidalyessyntheticGaussiantemporal
75MüllerJ Mag Res Imag2007LRTblock3Dno1nonosquare wavenosyntheticGaussiannone
76NanIEEE T Med Imag1999spatio-temporalblock3Dno1nonosinusoidalnohybrid
77NganMag Res Imag2001spatial decompositionblock4Dyes500yesyessquare wavenosyntheticGaussiannone
78ParkNeuroImage2012t-testblock1Dno yesyessquare wavenosyntheticGaussiannone
79ParrishMag Res Med2000GLMblock3Dno100yesnosquare wavenohybrid
80PendseNeuroImage2009connectivityER1Dno1000yesyescanonicalnosyntheticGaussiannone
81PennyNeuroImage2011BOLD estimationER3Dno80yesyesgammayessyntheticGaussian + driftnone
82PuthusseryIEEE T Biomed Eng2010spatio-temporalblock3Dno1noyesgammanohybrid
83QuirósNeuroImage2010spatio-temporalblock3Dno1nonogammanohybrid
84QuirósNeuroImage2010conditional maximisationER1Dno100yesyescanonicalnosyntheticGaussiannone
85RodriguezNeuroImage2010connectivityER1Dno25yesyescanonicalyessyntheticGaussiantemporal
86RyaliNeuroImage2011cluster analysisrest3Dno500nonononesyntheticGaussianspatial
87SalliIEEE T Med Imag2001connectivityblock1Dno1000nonononesyntheticGaussiantemporal
88SatoNeuroImage2006Support Vector Machineblock3Dyes100noyessquare wavenosyntheticGaussiannone
89SatoJ Neuro Meth2008Granger causalityrest1Dno200yesyescanonicalyessyntheticGaussiantemporal
+physiological
90SchippersNeuroImage2011ICArest3Dno50noyesnonesyntheticGaussiannone
91SchmithorstJ Mag Res Imag2009ICArest4Dyes1noyesnonesyntheticGaussiannone
92SchmithorstJ Mag Res Imag2004LRTblock1DnoNAnonosquare wavenosyntheticRiciannone
93SijbersMed Imag2004GLMblock1DnoNAnonosquare wavenosyntheticRiciannone
94SijbersAdv Con IVS2005LRTblock1Dno yesnosquare wavenosyntheticGaussiantemporal
95SijbersIEEE T Med Imag2005connectivityblock1Dno20yesyescanonicalnosyntheticGaussiannone
96StephanNeuroImage2008GLMER3Dno yesyescanonicalyessyntheticGaussiannone
97SturzbecherPhys Med Bio2009permutation testsblock3DnoNAnonosquare wavenosyntheticGaussianspatial
& temporal
98SucklingHum Brain Map2004cluster analysisblock4Dno1nonocanonicalnosyntheticGaussiannone
99SunMed Bio Eng Comp2009cluster analysisblock4Dno1nonocanonicalnosyntheticGaussiannone
100TabelowIEEE T Med Imag2008spatial smoothingblock3Dno1yesnocanonicalnosyntheticGaussianspatial
& temporal
101ThompsonJ Mag Res Imag2006STAP algorithmblock3Dno15nonosquare wavenohybrid
102ThompsonJ Mag Res Imag2004STAP algorithmblock3Dno100nonosquare wavenohybrid
103VahdatNeural Comp2012ICArest3Dyes800yesyescanonicalnosyntheticRicianspatial
104Valdés-SosaPhil T Roy Soc Lond B2005connectivityrest1DnoNAnonononesyntheticGaussiantemporal
105ValenteMag Res Imag2009ICAblock4Dno1yesyescanonicalyeshybrid
106VincentIEEE T Med Imag2010adaptive mixture modellingER3Dno100yesyescanonicalnosyntheticGaussian + driftnone
107VisscherNeuroImage2003GLMblock4Dyes1noyesgammanosyntheticGaussiantemporal
108WagerNeuroImage2005robust regressionrest1Dno2000yesyesnonesyntheticGaussiannone
109WangNeuroImage2009GLMblock4Dyes1yesyescanonicalyeshybrid
110WeedaHum Brain Map2009GLMblock3Dno1000yesyessquare wavenosyntheticGaussiannone
111WeedaNeuroImage2011connectivityrest4Dno100yesnononesyntheticGaussiannone
112WorsleyNeuroImage1997Canonical Variates Analysisblock4Dno100nonocanonicalyessyntheticGaussianspatial
& temporal
113XieNeurocomp2009dimension estimationblock1Dno20000yesnosquare wavenosyntheticGaussiantemporal
114YueStat Int2010spatial smoothingblock3Dno1nonocanonicalnosyntheticGaussiannone
115ZhangJ Multi Anal2010classificationER1Dno1000yesnocanonicalnosyntheticGaussiantemporal
116ZhangAnn Stat2008BOLD estimationER4Dno1yesyesestimatednosyntheticGaussian + drifttemporal
117ZhangComp Stat Dat Anal2008BOLD estimationER4Dno500yesyescanonicalnosyntheticGaussian + drifttemporal
118ZhangIEEE T Biomed Eng2011cluster analysisblock3Dno1noyessquare waveyessyntheticGaussiannone
119ZhangNeuroImage2012BOLD estimationblock1Dyes100yesyescanonicalyessyntheticGaussiantemporal

ID - paper identification number; Auth. - first author; dim. - data dimension; nS - Multiple subjects?

rep - Number of replications; parV - Parameter variation?; parJ - Parameter justification; HRFm - HRF model; HRFv - HRF variation?; Noise corr. - Noise correlations.

ID - paper identification number; Auth. - first author; dim. - data dimension; nS - Multiple subjects? rep - Number of replications; parV - Parameter variation?; parJ - Parameter justification; HRFm - HRF model; HRFv - HRF variation?; Noise corr. - Noise correlations.

Results

Study goals

Simulation studies are conducted to evaluate statistical models based on a given experimental design. For each article we assessed which statistical technique was validated. Six categories of statistical models were distinguished (see Figure 2, left panel). Most simulation studies are conducted for signal decomposition models like Principal Components Analysis (PCA), Independent Component Analysis (ICA) and Wavelet analysis. This group of methods is closely followed by General Linear Model (GLM) analysis, Likelihood Ratio Tests (LRT) and -tests. 11.8% of the simulation studies investigate properties of classification techniques using for example Support Vector Machines or cluster analysis. Methods that are less represented in our sample are connectivity analyses or preprocessing methods like motion correction and spatial smoothing. All studies that did not use any of the previous methods were gathered in a “rest” category. In this category are included, for example, HRF estimation methods, spatio-temporal models, bootstrapping and nonparametric techniques.
Figure 2

Statistical models investigated in the selected articles.

Experimental design

The methods described above are validated using a given experimental design (Table 3a). The majority of simulation studies report using a block design for the generation of the BOLD activity. When this design is not used, modelled activation is based on an event-related design or it concerns a resting-state study.
Table 3

Proportions of (a) experimental designs, (b) dimensions of the simulated data, and (c) the use of correlated noise reported in the selected articles.

a. Experimental designs
BlockEvent-relatedResting-state
58.0%21.8%20.2%
b. Data dimensions
1D2D3D4D
28.6%1.7%48.7%21.0%
c. Noise correlations
NoneTemporalSpatialBoth
58.0%24.0%13.0%5.0%

Simulation parameters

The general goal of a simulation study is to research a certain outcome (e.g. power, bias, …) under several conditions (e.g. noise level, HRF variability, …). The most common method to achieve this goal is by conducting a Monte Carlo experiment. The simulation reports in our database were evaluated on the dimensions of the simulated data, the number of replications and parameter variation.

Data dimensions

fMRI data have in essence four dimensions (i.e. coordinates in an xyz-space and time). However, the majority of articles in our sample published results for 3D data where time series are simulated for all voxels in a single slice (Table 3b), while one fifth considered full 4D fMRI data. On the other hand, many of the articles reported simulating fMRI time series only with no spatial context. In this case, mass-univariate techniques were mostly evaluated that also regard fMRI data as being multiple measurements of single time series. A very small proportion considered two-dimensional data. This was reported exclusively in an ICA validation context, where the fMRI data are organised as voxels timepoints.

Replications

The overall majority of the selected articles considered single-subject data, while the remaining articles simulated data for multiple subjects. In these last studies, the number of subjects that was simulated corresponded typically with sample sizes reported in real fMRI studies (e.g. 4 to 20 subjects) and the data for these subjects were mostly simulated once (with a few notable exceptions, see Figure 3, right panel). For the single-subject simulation studies, the number of repetitions was higher in the majority of the studies, while about one third of the articles reported 1 replication of the simulated data for each setting of the manipulated parameters. It should be noted that simulating 3D or 4D datasets without any spatial correlations is equal to the simulation of fMRI time series with replications where is the number of voxels. This was true for 22 of the 37 studies that reported using 1 replication. However, for the remaining studies conclusions are based on 1 realisation of the data. Two studies reported simulating time series just once for each setting of the simulation parameters.
Figure 3

Overview of the number of replications for single-subject and multi-subject simulations.

Parameter variation

Other possible parameters taken into account in the simulations were, for example, strength of the modelled activation, number of time points, noise level, repetition time (TR), etc. The relevance of a simulation study is highly dependant on the representativeness of these chosen parameter values. To ensure that the parameters are characteristic for fMRI data, it is recommended that a range of values is evaluated. Additionally, a justification is expected on why specific values of certain parameters are chosen. In our sample both requirements were assessed (see Table 4 for an overview). A study was classified as using varying parameters as soon as more than one value of a specific parameter was considered. Whenever a reason for choosing a specific parameter value was reported, the simulation study was evaluated as positive on the justification of the chosen parameters. About one third of the studies reported a variation in the values and gave a justification for their choices. Frequently reported variations were several noise levels and activation strengths that were taken into account. As for the choices of the values, authors mostly justified these as being realistic values in real fMRI data or being estimated from real data. However, about one third of the studies reporting variation of the parameters did not give any justification, ten percent did justify the choice of the parameter values but only used one specific value for each parameter, while one fifth of the studies in our sample did neither.
Table 4

Proportions of studies reporting (a) parameter variation and justification of the chosen parameter values and (b) whether HRF variability was taken into account.

a. Parameter variation and justification
Justification of value
Parameter variationNoYes
No20.2%10.9%
Yes32.8%36.1%
b. HRF variation
NoYes
77.3%22.7%

Data generation models

Of all simulation studies investigated, 84% were pure synthetic simulations while the other 16% adopted a hybrid simulation strategy. In hybrid simulations, a resting-state dataset is acquired and synthetically generated activation is added to these data. As such, knowledge of the ground truth is assured while the noise is representative for real data. However, manipulation of the noise in the simulated fMRI data is not possible and replicating the data will be a costly process. Therefore, in most simulations the fMRI data are generated completely artificially. All synthetic simulation studies adopted an additive data generation model (e.g. [8]) in which three main components can be distinguished: (1) a baseline signal, (2) BOLD activation and (3) noise. However, half of the studies did not report using a baseline for the data, so we could assume that this is zero for these studies. For the other half, 47% used a static baseline, for example a constant when simulating time series and a template slice or volume that was repeated for each time point in the case of simulating 3D or 4D fMRI data. A minority of the studies (3%) used a varying baseline, meaning that the baseline values were varied over time, e.g. to model thermal shifts [9].

BOLD response

An important component in the simulated fMRI data is the BOLD response because this signal defines the ground truth in the simulation studies. Despite the fact that the coupling between the neural activation and the BOLD response is still not completely understood [6], several models are available to generate a haemodynamic response function (HRF). See Figure 4 (left) for an overview of the models used in the selected articles. Those methods are, for example, a gamma function [10], [11], a difference of two gamma functions, also known as the canonical HRF [12], [13] or the Balloon model [14], [15]. The different shapes of these models are illustrated in Figure 4 (right). Nevertheless, one third of the reported simulation studies disregarded any BOLD characteristics and chose a square wave (i.e. a boxcar function) to represent the BOLD activation in the simulated fMRI data. When no experimental task was simulated, resting-state activation was predominantly modelled as a set of sinusoidal functions, although a few of the selected studies did not simulate any BOLD activation. The shape of the HRF varies immensely from brain region to brain region and also from subject to subject. One fifth of the simulation studies reported modelling this variation in the HRF parameters, while the majority considered a fixed HRF in all simulations (Table 4b).
Figure 4

Overview of the different HRF functions used in the simulation studies (left) and illustration of the BOLD response shapes as the result of a block design fMRI experiment for the different HRF models (right, source: [24]).

Noise model

Noise is not only characteristic for fMRI data but also ensures generalisability of the conclusions based on simulations. All simulation studies incorporated some noise generating process (see Figure 5, left panel, for an overview). The vast majority of the synthetic simulation studies (i.e. 75%) selected the noise randomly from a Gaussian distribution. An additional 9% added also some drift function to this noise, while about 7% of the studies considered a skewed noise distribution (e.g. Rician or super Gaussian distribution). The remainder of the studies used a very specific noise model (for example by adding physiological noise, using a uniform distribution or adding motion correlated noise), because they focused on the effects of these noise sources. fMRI noise is also known to be spatially and temporally correlated. However, the majority of the selected articles did not report modelling any correlations in the noise (Table 3c). Temporal correlation was almost exclusively modelled as an auto-regressive autocorrelation process. Typically this process was of order 1, but there are exceptions that used a model order of 3 or 4. Spatial correlations were typically created by spatial smoothing of the generated noise. A small fraction of the studies modeled both spatial and temporal correlations.
Figure 5

Overview of the noise models in the synthetic simulation studies.

Discussion

Whenever statistical models are validated based on simulations, the model that is used for the data generation is of utmost importance. In this paper, a survey was conducted to list currently used data generation models. Based on 119 research articles we described the simulation type, use and justification of simulation parameters and the different components in the fMRI data generating process. The survey results showed that current fMRI simulation studies sometimes lack a thorough experimental manipulation. The parameters in the simulation study (e.g. noise level, TR, HRF delay, etc.) are not always varied, while representative values of some of these parameters are not known. Using the results from iEEG could guide many of these parameter choices and make simulation more realistic in general. Further, the number of replications is a major topic of concern. We observed that the conclusions of some of the simulation studies were based on only one replication of the random data generating process. When a simulation study is used to evaluate the expectation of random variables, the external validity of the study is threatened if only a few replications of the data generating process are used.

Model-based versus data-based simulation

In about 60% of the synthetic simulation studies, the fMRI data were generated based on a model similar to the model being validated (e.g. generating time series from a VAR model to evaluate Granger causality). As such, the simulation is entirely model-based and the assumptions of the model under investigation are completely met. Consequently, the conclusions of these simulation studies give only partial information on the applicability of these models as an analysis tool for fMRI data, since fMRI data generally do not meet the assumptions of most statistical models. A better practice would be to start from the data themselves and to define a data generating process that models the different sources that are present in fMRI data. By using data-based simulations, the properties of the analysis techniques can be assessed in more realistic circumstances. In this context, it should be noted that the data generating process used in most current simulation studies is not compatible with the knowledge on how fMRI data are constructed. For instance, it is well-known that the BOLD response is the result of a haemodynamic coupling to neural activity. Although the precise dynamics are perhaps still debatable, there is consensus about the BOLD signal being a delayed response with varying dynamics over the brain regions and between subjects. However, about one third of the reported simulation studies in our database did not model any of these characteristics and used a simple boxcar function to distinguish stimulus induced activation from rest (Figure 4). About the same number did model the slow emergence of the BOLD signal by using a canonical HRF, but only a small fraction (i.e. two studies) did also model BOLD nonlinearities by means of the Balloon model. In the case of spontaneous neural activation (for example in resting-state studies), BOLD fluctuations were mostly modelled through sinusoidal functions with frequencies that are commonly observed in resting-state studies. However, describing these spontaneous fluctuations by sinusoids stems from the tradition to use ICA to analyse these data and is again more compatible with the model under investigation than being representative for the data. Further, variability of the BOLD response was taken into account only in about one fifth of the simulation studies (Table 4b). With regard to modelling BOLD activation, in a data-based simulation context at least some form of HRF should be used that takes into account the basic characteristics of the BOLD signal, while any variation of the parameters of this model will enhance the generalisability of the simulation results. The generation of fMRI noise alsocauses a discrepancy between simulated and real fMRI data. The noise in fMRI consists of several sources [7], [16], for example thermal noise, motion related noise, physiological noise and task-related noise. Nevertheless, the vast majority of simulation studies investigated here have only used a white Gaussian noise model to generate fMRI noise ignoring its multiple-source character. In some cases, spatial or temporal correlations are added. Again, this noise model is consistent with many of the statistical models for fMRI data (e.g. GLM). Unfortunately, the Gaussian noise model only accounts for a fraction of the noise in real data. One solution is to use hybrid simulations in which using real noise acquired in a resting-state study increases the realistic character of the simulated data. However, it is impossible to manipulate noise related parameters and unwanted activation in resting-state data can influence the simulation results. Moreover, multiple replications (i.e. acquiring resting-state data from multiple subjects) are costly. Perhaps the better solution is to model more than only Gaussian noise (i.e. thermal noise) and also include, as has been demonstrated in several simulation studies, motion noise, physiological noise, signal drift, etc. In some simulation studies, the results will not be altered under a full noise model. It may not always be necessary to include all noise sources (e.g. if a certain noise source is removed or the influence of a source is assumed to be equal in all conditions), but this should be motivated at least. To assure generalisability of the simulation results, a more complex noise model, compared to the one that is generally adopted now, might be imperative.

Recommendations for simulating fMRI data

Based on these results we present some recommendations to improve the reliability and generalisability of fMRI simulation studies. All parameters for which a value is chosen in the simulation experiments should be thoroughly justified. If a single value is not agreed upon, a range of values should be evaluated (see [8], [17]–[19] for some examples). The conditions in the simulation study, (e.g. statistical model, parameter values,…), have to be combined in an experimental design. The construction of this experimental design is essential [20]. Factors that can be considered in the experiment are, for example, variations of parameter levels, analysis methods and number of replications. The most complete design is the full-factorial design, although there might be reasons to adopt fractional designs. Based on the experimental design, the simulation experiment will have external validity (i.e. the results can be generalised beyond a given experiment). A Monte Carlo experiment has to be repeated to exclude random influences on the simulation results. Therefore, a sufficient number of replications of the experiment has to be performed. In the case of time series simulations, at least 10000 replications might be necessary, while for the simulation of 3D or 4D fMRI data a total of 100 might be enough. In general, the more replications, the better. For example, [19] generated 10000 replications of 3D datasets, and [17] simulated 4D multi-subject datasets to represent twin data using 500 replications of each paired dataset. In practice, this number can be limited due to time or computational constraints. When in doubt, the convergence of the results should be tested. The simulated task-related activation signal should reflect known properties of the BOLD response. This includes, but is not limited to, response delay, nonlinearities and inter-region and -subject variability. Either the canonical HRF or the Balloon model can be used (see [21] for an example using the Balloon model). fMRI noise is partially white (i.e. system noise) and this part can be modelled by random Gaussian noise. However, in addition one should account for (residual) motions, heart rate and respiratory rate fluctuations, task-related noise and spatial and temporal correlations (see, for example, [8], [22], [23]). If either the BOLD model or the noise model is simplified, this should be duly motivated.

Conclusion

The use of simulation studies to validate statistical techniques for fMRI data should be highly encouraged, because simulation experiments are a fast and cheap tool to assess the quality and applicability of the analysis techniques. However, our survey of the fMRI simulation literature raised several concerns with respect to simulation studies as they are conducted now. The observed decrease in the number of fMRI simulation studies in recent years is troubling. Furthermore, it was demonstrated that the data generating process used to simulate fMRI data is often model-based and parameter variation in the data generating process is not implemented in a standard manner. A possible reason for the absence of a common fMRI data generation model might be the lack of established software packages. Current simulation studies are mainly conducted using in-house software routines that have no common programming language and are not widely available. Recently, developments to fill this gap have resulted in the release of software packages that provide a flexible and fast framework for fMRI simulations [24], [25]. Using these software packages can be an important step in the right direction. Additionally, by taking into account the different sources present in fMRI data and adopting a complete simulation design with sufficient replications, conclusions from fMRI simulation studies can be expected to be more reliable. Researchers that conduct fMRI simulation studies are encouraged to consider the recommendations presented in this paper in order to increase the reliability and generalisability of the conclusions from simulation studies. Bibliographic details of the selected articles. (PDF) Click here for additional data file.
  23 in total

1.  Simulation of human respiration in fMRI with a mechanical model.

Authors:  Jared R Brosch; Thomas M Talavage; John L Ulmer; John A Nyenhuis
Journal:  IEEE Trans Biomed Eng       Date:  2002-07       Impact factor: 4.538

2.  Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation.

Authors:  K K Kwong; J W Belliveau; D A Chesler; I E Goldberg; R M Weisskoff; B P Poncelet; D N Kennedy; B E Hoppel; M S Cohen; R Turner
Journal:  Proc Natl Acad Sci U S A       Date:  1992-06-15       Impact factor: 11.205

3.  Nonlinear estimation of the BOLD signal.

Authors:  Leigh A Johnston; Eugene Duff; Iven Mareels; Gary F Egan
Journal:  Neuroimage       Date:  2007-12-03       Impact factor: 6.556

4.  Non-extensive entropy and the extraction of BOLD spatial information in event-related functional MRI.

Authors:  M J Sturzbecher; W Tedeschi; B C T Cabella; O Baffa; U P C Neves; D B de Araujo
Journal:  Phys Med Biol       Date:  2008-12-10       Impact factor: 3.609

5.  Dynamics of blood flow and oxygenation changes during brain activation: the balloon model.

Authors:  R B Buxton; E C Wong; L R Frank
Journal:  Magn Reson Med       Date:  1998-06       Impact factor: 4.668

6.  Correlation and heritability in neuroimaging datasets: a spatial decomposition approach with application to an fMRI study of twins.

Authors:  Joonkoo Park; Kerby Shedden; Thad A Polk
Journal:  Neuroimage       Date:  2011-07-07       Impact factor: 6.556

7.  Intrinsic signal changes accompanying sensory stimulation: functional brain mapping with magnetic resonance imaging.

Authors:  S Ogawa; D W Tank; R Menon; J M Ellermann; S G Kim; H Merkle; K Ugurbil
Journal:  Proc Natl Acad Sci U S A       Date:  1992-07-01       Impact factor: 11.205

8.  Bootstrap generation and evaluation of an fMRI simulation database.

Authors:  Pierre Bellec; Vincent Perlbarg; Alan C Evans
Journal:  Magn Reson Imaging       Date:  2009-06-30       Impact factor: 2.546

9.  Comparing dynamic causal models using AIC, BIC and free energy.

Authors:  W D Penny
Journal:  Neuroimage       Date:  2011-07-27       Impact factor: 6.556

10.  Identifying neural drivers with functional MRI: an electrophysiological validation.

Authors:  Olivier David; Isabelle Guillemain; Sandrine Saillet; Sebastien Reyt; Colin Deransart; Christoph Segebarth; Antoine Depaulis
Journal:  PLoS Biol       Date:  2008-12-23       Impact factor: 8.029

View more
  13 in total

1.  Passive BCI based on drowsiness detection: an fNIRS study.

Authors:  M Jawad Khan; Keum-Shik Hong
Journal:  Biomed Opt Express       Date:  2015-09-22       Impact factor: 3.732

2.  Case for fMRI data repositories.

Authors:  Satish Iyengar
Journal:  Proc Natl Acad Sci U S A       Date:  2016-06-29       Impact factor: 11.205

3.  Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates.

Authors:  Anders Eklund; Thomas E Nichols; Hans Knutsson
Journal:  Proc Natl Acad Sci U S A       Date:  2016-06-28       Impact factor: 11.205

4.  Anatomical, functional and molecular biomarker applications of magnetic resonance neuroimaging.

Authors:  Christina H Liu
Journal:  Future Neurol       Date:  2015-01-01

5.  NeuMapper: A scalable computational framework for multiscale exploration of the brain's dynamical organization.

Authors:  Caleb Geniesse; Samir Chowdhury; Manish Saggar
Journal:  Netw Neurosci       Date:  2022-06-01

6.  From correlation to causation: Estimating effective connectivity from zero-lag covariances of brain signals.

Authors:  Jonathan Schiefer; Alexander Niederbühl; Volker Pernice; Carolin Lennartz; Jürgen Hennig; Pierre LeVan; Stefan Rotter
Journal:  PLoS Comput Biol       Date:  2018-03-26       Impact factor: 4.475

Review 7.  Meta-Analysis of the Structural Equation Models' Parameters for the Estimation of Brain Connectivity with fMRI.

Authors:  Joan Guàrdia-Olmos; Maribel Peró-Cebollero; Esteve Gudayol-Ferré
Journal:  Front Behav Neurosci       Date:  2018-02-15       Impact factor: 3.558

8.  A validation framework for neuroimaging software: The case of population receptive fields.

Authors:  Garikoitz Lerma-Usabiaga; Noah Benson; Jonathan Winawer; Brian A Wandell
Journal:  PLoS Comput Biol       Date:  2020-06-25       Impact factor: 4.475

9.  Identifying the neural representation of fast and slow states in force field adaptation via fMRI.

Authors:  Andria J Farrens; Fabrizio Sergi
Journal:  IEEE Int Conf Rehabil Robot       Date:  2019-06

10.  A method to compare the discriminatory power of data-driven methods: Application to ICA and IVA.

Authors:  Yuri Levin-Schwartz; Vince D Calhoun; Tülay Adalı
Journal:  J Neurosci Methods       Date:  2018-10-30       Impact factor: 2.390

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.