Literature DB >> 24755843

The reporting of observational clinical functional magnetic resonance imaging studies: a systematic review.

Qing Guo1, Melissa Parlar2, Wanda Truong3, Geoffrey Hall4, Lehana Thabane5, Margaret McKinnon6, Ron Goeree7, Eleanor Pullenayegum5.   

Abstract

INTRODUCTION: Complete reporting assists readers in confirming the methodological rigor and validity of findings and allows replication. The reporting quality of observational functional magnetic resonance imaging (fMRI) studies involving clinical participants is unclear.
OBJECTIVES: We sought to determine the quality of reporting in observational fMRI studies involving clinical participants.
METHODS: We searched OVID MEDLINE for fMRI studies in six leading journals between January 2010 and December 2011.Three independent reviewers abstracted data from articles using an 83-item checklist adapted from the guidelines proposed by Poldrack et al. (Neuroimage 2008; 40: 409-14). We calculated the percentage of articles reporting each item of the checklist and the percentage of reported items per article.
RESULTS: A random sample of 100 eligible articles was included in the study. Thirty-one items were reported by fewer than 50% of the articles and 13 items were reported by fewer than 20% of the articles. The median percentage of reported items per article was 51% (ranging from 30% to 78%). Although most articles reported statistical methods for within-subject modeling (92%) and for between-subject group modeling (97%), none of the articles reported observed effect sizes for any negative finding (0%). Few articles reported justifications for fixed-effect inferences used for group modeling (3%) and temporal autocorrelations used to account for within-subject variances and correlations (18%). Other under-reported areas included whether and how the task design was optimized for efficiency (22%) and distributions of inter-trial intervals (23%).
CONCLUSIONS: This study indicates that substantial improvement in the reporting of observational clinical fMRI studies is required. Poldrack et al.'s guidelines provide a means of improving overall reporting quality. Nonetheless, these guidelines are lengthy and may be at odds with strict word limits for publication; creation of a shortened-version of Poldrack's checklist that contains the most relevant items may be useful in this regard.

Entities:  

Mesh:

Year:  2014        PMID: 24755843      PMCID: PMC3995931          DOI: 10.1371/journal.pone.0094412

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

In the past decade, the use of functional MRI (fMRI) studies in cognitive neuroscience has increased a great deal [1], [2]. Given that fMRI is increasingly applied to the study of clinical disorders (e.g., [3]–[8]), and considering the vulnerability of clinical participants, there is an ethical imperative for scientists to apply rigorous methodology and to provide adequate reporting. Rigorous methodology is required in order to uphold the promises typically made to participants during the consent process, namely that the study will help investigators to understand their conditions. Complete reporting with sufficient details permits readers to ensure the methodological rigor of a study [9], consider the validity of findings [10]–[14], and extend and replicate the findings [9]–[13], [15]–[17]. In particular, recent evidence indicates that overall, the fMRI literature lacks key details in their methods section, such as sample size calculations, whether temporal autocorrelations were modeled, descriptions of slice-timing and motion correction, slice order and coverage of functional brain images [18], and related parameter estimates (i.e., effect size and variance components) in the results section [19]. Standard guidelines have been developed to aid authors in reporting their research, such as the Consolidated Standards for Reporting Trials (CONSORT) [10] and the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) initiative [9]. Recently, Poldrack and his colleagues have proposed guidelines specifically for reporting fMRI studies [14]. Although many authors have suggested endorsing the guidelines proposed by Poldrack et al. in reporting fMRI studies to improve the quality, transparency and consistency of results [2], [18], [20], [21], few systematic reviews have been conducted to appraise the quality of reporting based on these guidelines. Although a study by Carp (2012) recently examined adherence to Poldrack et al.'s guidelines in randomly selected fMRI studies published since 2007, it included few studies involving clinical populations. Thus, the reporting quality in clinical fMRI studies remains unclear. Given the unique challenges (e.g., technical, interpretive, and methodological) that confront clinical fMRI studies, reporting details on design, subject characteristics, analyses and interpretation is suggested to enhance reproducibility of results in this subset of fMRI studies. Therefore, we expect that reporting in clinical fMRI studies is different from that of the overall fMRI literature. Moreover, based on our experience and anecdotal evidence that the majority of fMRI studies are observational (i.e., the type of study is not designed to randomize participants to test efficacy and safety of any therapeutic intervention), these studies are less scrutinized than randomized clinical trials with experimental interventions; for example, randomized trials have to be registered with clinicaltrials.gov. Therefore, we aimed to systematically evaluate the quality of reporting in observational fMRI studies involving clinical human participants (i.e., individuals who either have a disease or are at risk of developing a disease) using a checklist adapted from the guidelines proposed by Poldrack et al. In this study, we set out to address the following two questions: (1) what percentage of articles reported each item of the fMRI-specific guideline, and (2) what percentage of items was reported per article?

Methods

Search Strategy and Eligible Journals

We searched OVID MEDLINE on January 2012 by using key word search terms (e.g., functional magnetic resonance imaging) combined with the acronym (i.e., fMRI) for articles published in 2010 and 2011, in the English language, and involving human participants. Compared with journals in general, top journals are cited more frequently (e.g., higher impact factors (IF)) and more scrutinized prior to publication (e.g., lower manuscript acceptance rates). Furthermore, studies have indicated that high IF and low manuscript acceptance rates of journals are associated with higher methodological rigor of articles published in the journals [22]–[26]. In this study, we further constrained our selection to six leading journals: In the Journal Citation Report 2010, we selected four journals with a high IF in the category “Neurosciences”, namely, Neuron (IF 14.9), Nature Neuroscience (IF 14.2), Brain (IF 9.2), Journal of Neuroscience (IF 7.3), one journal with the highest impact factor in the category “Neuroimaging” (NeuroImage, IF 5.94), and one journal which contributes a great number of articles in fMRI studies [18] and has a high impact factor (Proceedings of the National Academy of Sciences of the United States of America, IF 9.8). More details on the search strategy can be found on Table S1. Duplicate articles were removed.

Eligibility Criteria for Studies and Study Selection

We included articles that were peer-reviewed, full reports of observational fMRI studies involving human clinical participants, and block or event-related or mixed design for the fMRI paradigm. We excluded articles that were published only in abstract form or any that were only editorials, letters, comments or reviews. Genetic, resting-state observational fMRI studies, fMRI studies other than observational studies (e.g., randomized clinical trials), and studies of connectivity were also excluded. As studies of connectivity aim to identify and quantify the correlations between brain regions [27], these studies have a different reporting focus vis-à-vis fMRI data analyses. For example, they report the Psycho-Physiological Interaction analyses to estimate effective connectivity or functional coupling rather than data preprocessing steps, which were demonstrated to have significant impacts on the quality of data and the reliability and interpretation of fMRI results [28] [29]. However, the reporting essentials for effective connectivity studies have not been reflected in the current available guidelines including the one proposed by Poldrack et al. As our study aimed to evaluate the quality of reporting based on Poldrack et al.'s guidelines, we therefore excluded this type of study to ensure consistency. In this study, we decided to include a target sample size of 100 articles that had to meet the predefined inclusion and exclusion criteria. We therefore randomly selected and assessed the eligibility of articles among the unique citations, which were identified from the initial search strategy and after the duplicates were removed, until 100 articles were reached.

Data Extraction

We created an electronic data extraction form containing 83 items adapted from the guidelines proposed by Poldrack et al. [14] to assess the reporting of study articles, which we piloted using a random selection of four studies reviewed by three independent reviewers (QG, MP, and WT). Through the pilot testing, we modified the abstraction form by deleting three items (Unwarping of B0 distortions; Describe any data quality control measures; any additional operations, e.g., masking out parts of the image) from Poldrack et al.'s original checklist. The reason for excluding these three items was that we found assessing them required too much subjectivity, meaning that biases among reviewers' judgments were very high. Excluding them meant we were better able to achieve a common perception and interpretation of definitions among items we did evaluate, and hence increased between-reviewer agreement. The observed percentage of agreement on judgments between any two reviewers was 0.78 or higher. Final abstraction forms were devised prior to use (see Table S2). The data were extracted from each article and any online supplements. Items were answered with “Reported”, “Not Reported”, or “Not Applicable”. Three authors (QG, MP, and WT), blinded to each other's assessments, abstracted the reporting of each article independently. Instead of all three raters reviewing all articles, we decided to have two reviewers rate each article. To determine the number of articles needed to be evaluated by the second reviewer to ensure a desired level of reliability, we performed a sample size calculation [30], [31]. The sample size of 50 was chosen so as to estimate the kappa for the inter-rater agreement within a margin of error of 0.3 with 95% confidence, assuming that the true kappa would be 0.6 or more and that the proportion of agreements by chance was 0.7 or less (see File S2). The first reviewer (QG) evaluated all 100 articles, of which 50 articles were randomly selected for the second reviewer (MP), and the other 50 articles were given to the third reviewer (WT) for abstraction; each article was therefore rated by two reviewers. After completion of independent assessments, any disagreements between any pair of reviewers (i.e., QG and MP; QG and WT) were resolved by discussion among two reviewers, and if necessary, involving the third reviewer or expert (GH) until consensus was reached. The raw data collected from the 100 studies is available at online Supporting Information (see File S4).

Statistical Analysis

We calculated the percentage of studies that reported each evaluation item and a 95% confidence interval (CI) using an exact binomial method [32]. We then estimated the median, minimum and maximum percentages of reported items for each article. Inter-rater agreement was assessed using the prevalence-adjusted bias-adjusted kappa (PABAκ) coefficient [33]. When the prevalence of a rating is very high or low, the value of kappa may indicate a low level of agreement while the observed percentage of agreement is high, known as the kappa paradox [34]. Hence, we used prevalence-adjusted bias-adjusted kappa [33] to address this paradox and to better interpret the inter-rater agreement. Kappa coefficient results were interpreted based on the scale as proposed by Byrt [35]: 0.00 or less (No agreement), 0.01–0.20 (Poor agreement), 0.21–0.40 (Slight agreement), 0.41–0.60 (Fair agreement), 0.61–0.80 (Good agreement), 0.81–0.92 (Very good agreement), 0.93–1.00 (Excellent agreement). We performed a sample size calculation to determine the number of articles to be included in the extraction and analysis. A sample size of 100 was chosen so that with 95% confidence, we would be able to quantify the true percentage of articles that reported each item to within 10% (see File S1). All statistical analyses were conducted using the SAS 9.2 software (Cary, NC).

Results

Study Selection

After removing the duplicates, the initial search strategy identified 1120 unique articles. We screened the articles in a random order for eligibility until the quota of 100 eligible articles was reached. To reach this target, we assessed 1100 articles (see Figure S1 for a flow diagram). The list of the 100 eligible articles is included in File S3.

Study Characteristics

Among the included 100 eligible articles published in six leading journals in 2010 and 2011, about 60% came from the journal NeuroImage. The majority of study designs were cross-sectional (94%). The funding source was reported in 78% of the citations, and came primarily from two or more different sources (77%) rather than from industry alone (1%). Fifty three percent of included articles were published in 2010 and the remaining forty seven percent in 2011. The median total number of subjects was 34 (first quartile (Q1)  = 26, third quartile (Q3)  = 48) ranging from 8 to 126, and most studies (79%) had a sample size of no more than 50 (see Table 1).
Table 1

Characteristics of Included fMRI Studies (Information Extracted from Each Article).

All articles (n = 100)
Study FeatureMedian (Q1, Q3) or %
Publication Journal
Neuron2
Nature Neuroscience1
Proceedings of the National Academy ofSciences of the United States of America4
Brain22
Journal of Neuroscience13
Neuroimage58
Publication Year
201053
201147
Study Design
Case-control0
Cohort6
Cross-sectional94
Number of Subjects34 (26, 48)
Up to 102
10–5077
51–10017
More than 1004
Funding Sources
Completely funded by industry1
Others77
Not reported22

Note: Q1 = first quartile or 25th percentile, Q3 = third quartile or 75th percentile.

Note: Q1 = first quartile or 25th percentile, Q3 = third quartile or 75th percentile.

Items Commonly Reported

Of the 83 items, 22 items were reported by 85% or more of the 100 included articles. Specifically, all of the studies reported sample sizes. Most studies further described the manufacturer, field strength and model name of the scanner and the pulse sequence type (98%), statistical methods used for group modeling (97%), subjects' characteristics such as age and gender (94%), statistical methods used for within-subject modeling (92%), eligibility criteria on selecting subjects (91%), and whether statistical inferences were corrected for multiple comparisons (90%). Similarly, 86% of the articles reported how regions of interest (ROIs) were defined. Of 86 articles that reported analyses not conducted on the whole brain, 80 (93%) explained how regions were determined (see Tables 2–10).
Table 2

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Experimental Design”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
1aDescribed number of blocks, trials, experimental units per session or per subject92 (84, 96)0.90 (0.77, 0.97)Included
1bStated length of each trial and interval between trials described81 (71, 88)0.76 (0.60, 0.87)Included
1c# If ISIs are variable, reported the mean and range of ISIs and how they were distributed (n = 39)23 (11, 39)0.76 (0.60, 0.87)Included
1d# If block designs, specified the length of blocks (n = 73)79 (67, 87)0.72 (0.55, 0.84)Included
1e# If event-related designs, stated whether the design was optimized for efficiency, and if so, stated how (n = 35)22 (10, 40)0.70 (0.53, 0.83)Included
1f# If mixed design, stated correlation between block and event regressors (n = 2)50 (1, 98)0.94 (0.83, 0.99)Included
2aStated task instructions on what subjects were asked to do92 (84, 96)0.92 (0.80, 0.98)Included
2bDescribed what the Stimuli were and how many there were69 (58, 77)0.72 (0.55, 0.84)Included
2cStated whether specific stimuli repeated across trials49 (38, 59)0.46 (0.26, 0.63)Included
3If the experiment had multiple conditions, stated what the specific planned comparisons were, or whether an omnibus ANOVA test was used89 (81, 94)0.90 (0.77, 0.97)Included

Abbreviations: ISIs, inter-stimulus intervals; ANOVA, analysis of variance.

The conditional item which is needed to report when the condition is met.

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

Table 10

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Figures and Tables”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
17aStated the statistical map that the figure or table is based upon (e.g., Z, t, p)95 (88, 98)0.84 (0.69, 0.93)Included
17bProvided the thresholds used to create the image or figure (e.g., intensity and cluster extent)71 (61, 79)0.60 (0.41, 0.75)Included
18Underlying anatomical image stated (e.g., average anatomy, template image)26 (17, 35)0.66 (0.48, 0.79)Included
19aLocations in stereotactic space provided73 (63, 81)0.80 (0.64, 0.90)Included
19bProvided statistics for each cluster including maximum and cluster extent51 (40, 61)0.86 (0.72, 0.94)Included
19cProvided source of anatomical labels (e.g., atlas, automated labeling method)67 (56, 76)0.62 (0.43, 0.76)Included

*To identify whether the item should be included in future shortened checklist. If excluded, the

Abbreviations: ISIs, inter-stimulus intervals; ANOVA, analysis of variance. The conditional item which is needed to report when the condition is met. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. Abbreviations: IRB, institutional review board. The conditional item which is needed to report when the condition is met. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. Abbreviations: EPI, Echo Planar Imaging; TE, echo time; TR, repetition time. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. Abbreviations: MRI, magnetic resonance imaging; MNI, Montreal Neurological Institute space. The conditional item which is needed to report when the condition is met. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. Abbreviations: DCT, discrete cosine transform; AR(1), first-order Autoregressive Model; WN, white noise. The conditional item which is needed to report when the condition is met. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. Abbreviations: CC, cubic centimeter; FWE, family-wise error; FDR, false discovery rate; FWHM, full-width at half-maximum; RESEL, resolution element. The conditional item which is needed to report when the condition is met. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. Abbreviations: ROI, region of interest; FIR, finite impulse response. The conditional item which is needed to report when the condition is met. *To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given. *To identify whether the item should be included in future shortened checklist. If excluded, the

Items Not Commonly Reported

Among the 83 items, a total of 31 items were reported by no more than 50% of the included articles; 13 items were reported by fewer than 20% of the articles. Critically, and in sharp contrast to Poldrack's guidelines, none of the studies reported observed effect sizes if they failed to reject the null hypothesis. Only one article (3%, 1/31) provided justifications for using fixed-effect inferences for group modeling. Other items that were insufficiently reported included slice-timing and motion corrections (12/100), temporal autocorrelation modeling used to account for within-subject variances and correlations (18/100), whether and how the task design was optimized for efficiency if it was an event-related design (22%, 8/35), distributions of inter-stimulus intervals (ISI), whether ISI was variable (23%, 9/39), statistical methods for repeated measurements (24/100), and smoothness and resolution element (RESEL) count if family-wise error (FWE) was found by random field theory (RFT) (25%, 1/4). Moreover, only six articles (28%, 6/21) described whether variances were assumed equal among groups if there were more than two groups. Of the 35 articles that reported percent signal changes, 12 (34%, 12/35) explained how scaling factors were determined. Similarly, 45% (45/100) of the articles stated how signal was extracted within ROIs.

Reported Items per Article

The median (minimum, maximum) percentage of reported items per article was 51% (30%, 78%). The inter-rater agreement was very good (PABAκ >0.8) for 31 items, good (0.6< PABAκ ≤0.8) for 31 items, fair (0.4

Specifics on Reported Items

Manuscript quality hinges not only on whether an item was reported, but the specifics of the method that was used. Here we describe manuscripts' methodological choices regarding software, spatial smoothing, temporal filtering and thresholding for statistical significance. Seventy-eight percent of the articles reported a version of the software package used in fMRI data analyses (see Table 5), and 98% reported using at least one software package. Of the 98 articles, 71.4% used SPM, 11.2% used FSL, and 10.2% used BrainVoyager (Table 11). The packages used by fewer than 10 articles include AFNI (7.1%), MATLAB (6.1%) and XBAM (1.0%). Many software packages were reported with a version; SPM5 was the most commonly used by 43.9% (43/98) of the articles, followed by SPM2 (17.3%, 17/98), SPM8 (8.2%, 8/98), and FSL-no version (6.1, 6/98). No version of XBAM was specified (see Table 11 for details).
Table 5

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Data Preprocessing”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
8aStated the version number or date of last application for each piece of software used78 (68, 85)0.76 (0.60, 0.87)Included
8bSpecified differences in any subjects who required different processing operations or settings in the analysis (n = 78)3 (1, 10)0.60 (0.42, 0.75)Excluded due to much subjectivity. For example, if the study states that all subjects received same operations or settings, this item would not be applicable. If there is no indication of this, it is difficult to decide under what condition this item is expected to be reported.
9aSpecified order of preprocessing operations26 (17, 35)0.70 (0.53, 0.83)Included
9bStated reference slice and interpolation type for slice timing correction9 (4, 16)0.94 (0.83, 0.99)Included
9cStated reference scan, image similarity metric, type of interpolation used, degrees-of-freedom, and ideally optimization method for motion correction15 (8, 23)0.74 (0.58, 0.86)Included

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

Table 11

The use of software packages and versions.

Reporting Articles(N = 98)
Type of SoftwareFrequency%
AFNI (no version)77.1
BrainVoyager1010.2
 BrainVoyager2.1 11.0
 BrainVoyager2000 11.0
 BrainVoyagerQX1.10.4 11.0
 BrainVoyagerQX1.9 11.0
 BrainVoyagerQX2 11.0
 BrainVoyagerQX (no version) 33.1
 BrainVoyager (no version) 22.1
FSL1111.2
 FSL3.3 22.1
 FSL4.1 11.0
 FSL4.1.4 11.0
 FSL5.9.2 11.0
 FSL (no version) 66.1
MATLAB66.1
 MATLAB6 11.0
 MATLAB6.5 11.0
 MATLAB7.2 11.0
 MATLAB (no version) 33.1
SPM7071.4
  SPM2 1717.3
 SPM5 4343.9
 SPM8 88.2
 SPM99 11.0
 SPM (no version) 11.0
XBAM (no version)11.0

Abbreviations: AFNI, Analysis of Functional NeuroImages; FSL, FMRIB Software Library; SPM, Statistical Parametric Mapping; XBAM, Brain Activation Mapping.

Abbreviations: AFNI, Analysis of Functional NeuroImages; FSL, FMRIB Software Library; SPM, Statistical Parametric Mapping; XBAM, Brain Activation Mapping. reasons for exclusion are given. Spatial smoothing reduces noise and hence increases the signal-to-noise ratio while reducing the resolution of data [36], [37]. Therefore, it is important to specify the extent to which spatial smoothing that has been applied. Specifically, the size of the smoothing kernel determines how much the data is smoothed, which has an effect on the extent of within-subject variability of estimates [38]. Reporting smoothing parameters helps readers to determine the balance between improving the sensitivity and maintaining the resolution of the functional image. As can be seen in Table 12, the majority of studies reported using spatial smoothing (88/100), with 95.5% (84/88) specifying a type of kernel. The widths of smoothing kernel ranged from 3 mm to 12 mm with a median width of 8 mm. The most frequent kernel width was 8 mm (42%, 37/88). Other common widths included 6 mm (29.5%, 26/88), 9 mm (8%, 7/88), and 10 mm (5.7%, 5/88). The widths used by fewer than 5 studies were 5 mm, 12 mm, 4 mm, 4.2 mm and 3 mm. None of the studies justified their choices of smoothing kernel.
Table 12

The use of spatial smoothing, temporal filtering, and between-subject inference.

Reporting Articles
ParameterFrequency%
Spatial Smoothing
Use of Spatial Smoothing (N = 100)8888
Type of Kernel (N = 88)8495.5
Width of Smoothing Kernel (FWHM, N = 88)
8 mm 3742.0
6 mm 2629.5
9 mm 78.0
10 mm 55.7
5 mm 44.5
12 mm 33.4
4 mm 22.3
4.2 mm 11.1
3 mm 11.1
Median (min, max) 8 mm (3 mm, 12 mm)
Justification for the Chosen Smoothing Kernel00
Temporal Filtering
Use of Temporal Filtering (N = 100)6161
Type of Filtering (N = 60)
High-pass 5795
Low-pass 11.7
Band-pass 23.3
Filter Cut-off (second)
High-pass: Median (min, max) 128 s (2.8 s, 318 s)
Low-pass: Median (min, max) 6.7 s (6.7 s, 6.7 s)
Between-subject Inference
Use of Per-voxel (height) Threshold (N = 100)7878
Size of Per-voxel Threshold (N = 78)
p<0.001 2532.1
p<0.05 2430.8
p<0.01 1316.7
p<0.005 1215.4
Others 1114.1
Use of Cluster-extent Threshold (N = 100)6363
Size of Cluster-extent Threshold (mm3)
Median (min, max) 184 (3, 5625)
Use of Formal Corrections for Multiple Comparison8181
Methods Used for Formal Corrections (N = 81)
Family-Wise Error 2328.4
False Discovery Rate 2227.2
Monte Carlo Simulation 1518.5
Gaussian Random Field Theory 44.9
Other Methods 44.9
Not Reported 1316.1

Abbreviation: FWHM, Full Width at Half Maximum.

Abbreviation: FWHM, Full Width at Half Maximum. As with spatial smoothing, temporal filtering aims to increase the signal-to-noise ratio. Since most of the noise in fMRI is low frequency, high-pass filtering improves the ratio better than low-pass filtering, and is almost as good as band-pass filtering [36], [39]. Specifying the filter cut-off parameter helps understand the temporal filtering process. Most studies (61/100) reported whether temporal filtering was used. Of the 60 studies that reported actual use of temporal filtering, most (95%, 57/60) used high-pass filtering. Only a few studies used low-pass (1.7%, 1/60) and band-pass (3.3%, 2/60) temporal filtering. Forty-eight studies reported the filter cut-off, among which the high-pass filtering cut-off ranged from 2.8 s to 318 s with a median and mode value of 128 s, compared to low-pass filtering with a single cut-off value of 6.7 s. The threshold for statistical significance in voxel- or cluster-level analysis controls the type I error rate [40], and many papers have suggested using formal correction methods [40]–[45]. Of the 100 included studies, 78% reported the use of per-voxel (or height) threshold. The most common per-voxel threshold was p<0.001 (32.1%, 25/78), followed by p<0.05 (30.8%, 24/78), p<0.01 (16.7%, 13/78), and p<0.005 (15.4%, 12/78). More than half of the studies (63/100) reported using cluster-extent threshold. The size of cluster-extent threshold ranged from 3 mm3 to 5625 mm3 with a median threshold of 184 mm3. The majority of studies (81%, 81/100) reported using corrections for multiple testing; among these studies, around 16.1% (13/81) did not report which correction method was used. Among the studies that reported a method, the correction methods included False-wise Error (28.4%, 23/81), False Discovery Rate (27.2%, 22/81), Monte Carlo Simulation (18.5%, 15/81), Gaussian Random Field Theory (4.9%, 4/81) and several others (4.9%, 4/81).

Discussion

This study identified some reporting practices in observational clinical fMRI studies that met expectations and other areas where reporting was less than adequate. In particular, only one quarter of the items from the recommended reporting guidelines by Poldrack et al. (2008) were reported adequately. Indeed, only one half of recommended items were routinely reported in each article. Moreover, one third of the items were reported by less than half of the articles. Less adequately reported items were distributed across the categories: experimental design, inter-subject registration and smoothing, data preprocessing, statistical modeling, and statistical inference on ROI analysis. These results indicate that substantial room for improvement exists in the reporting of observational clinical fMRI studies. Specifically, improvement in reporting important details is recommended in areas such as observed effect sizes in the results section when study results are negative, justifications for fixed-effect inferences used for group modeling, and temporal autocorrelation matrix used to account for within-subject variance and correlations. As effect sizes observed from statistically significant regions overestimate true effect sizes [46], [47], including values from non-significant regions (e.g., those that are identified from similar previous studies) would help provide a more realistic range of effect size estimates and reduce the risk of bias arising from reporting on active regions only. Given the existence of temporal autocorrelation in fMRI time series, incorporating an autocorrelation structure increases the accuracy of variance estimates. Reporting temporal autocorrelation estimates enables proper power analyses based on the method proposed by Mumford and Nichols [48]. Whereas findings from fixed-effect inferences particularly reflect the cohort of subjects studied, random-effect inferences generalize findings to the population at large from which the study sample was drawn [49]. The current recommendation is to use random-effect inferences for between-subject group modeling and fixed-effect inferences for single-subject modeling. Providing justifications for using fixed-effects for group modeling would enhance understanding and interpretation. This study differed substantially from the one existing review of fMRI reporting [18] in the number of items, definitions of items, study population and study design. For example, although Carp's study used a single reviewer, we conducted a systematic review by using a duplicate abstraction, measuring inter-rater agreement and resolving disagreements through consensus. Moreover, our study focused on observational studies with clinical participants; in contrast, Carp evaluated fMRI studies in general which may not capture many studies involving clinical participants. There are also some notable differences in results between the two studies. For example, in the current study around one-third reported the distribution of inter-trial intervals, compared to one-twelfth in Carp's study. About one half reported the number of subjects rejected from analyses with reasons for rejection in our study, which is one quarter greater than that of Carp's study. Similarly, less than one-third of the articles in our study reported the following four methodological items but still showed better reporting than those in Carp's study: how potentially confounding variables were matched across groups for group comparisons, whether autocorrelations were modeled, whether equal variance was assumed across groups for multiple group designs, and the number of RESELs and image smoothness for studies using FWE correction. Unfortunately, we are unable to identify the specific factors associated with these differences between the current study and Carp's study; the factors might be the type of clinical participants involved in the study, impact factors of the journal, or the exclusion of studies of connectivity. Future research may be helpful in this regard by comparing reporting quality among studies with clinical participants versus without clinical participants, with high impact factor journals versus with low impact factor journals, and including studies of connectivity versus excluding connectivity. Although different, both studies did detect some commonality in important items that are frequently absent from published reports, indicating that incomplete reporting challenges the evaluation, understanding and interpretation of study findings, and limits the use of results for synthesis, e.g., for meta analyses. Complete reporting becomes particularly important for studies involving clinical populations, where ensuring methodological rigor is necessary to uphold investigators' promises to their participants that their participation will help society to better understand the nature of their condition. Our findings point towards the need for substantial improvement in this regard. In several other fields of health research, it has been demonstrated that journals adopting standard reporting guidelines (e.g., CONSORT statement) have better quality of reporting than those that do not [50]–[52], thus the use of guidelines in the fMRI literature may help improve the quality of reporting as well. Implementation of the guidelines for reporting fMRI studies proposed by Poldrack and his colleagues (2008) do face some challenges. Firstly, authors often have strict word limits and the current guidelines are lengthy, making it important to identify which items are most essential. Secondly, some items are relevant to the quality of reporting observational clinical studies but are not covered in Poldrack et al.'s guidelines (for example, sample size calculations in the methods section, characteristics of clinical participants, and participation data flow diagrams to better understand potential bias due to non-participation [53]). Since reporting guidelines are evolving documents [54], we suggest dividing the list of items that should be reported into those that are essential, which should be placed in the manuscript itself, and those which are helpful to report can be included as online supplements. Some methodological parameters have more impact than others [28], [55] and hence should be considered as essential items. Some journals (e.g., Nature) have recently removed space limitations on methods sections, however, since this is not a widespread practice it would still be useful to distinguish between essential and helpful items. In addition to the form of text-based reporting, some items can be reported in the form of source code (e.g., for data collection and statistical analyses) [56] and machine-readable information compatible to different imaging analyses packages [57]. Our recommendation for creating a list of essential items is not intended to supplant the existing guidelines but rather a suggestion to consider during the next update of the guidelines. We hope that our suggestions will lead to more discussion and future consensus regarding what is in fact essential to report in the manuscript itself for observational clinical fMRI studies. For example, the consensus can be reached through a consensus meeting involving a variety of experts in this area, in a similar way that the standard CONSORT guideline was created. Involving journal editors in the process and having their endorsement of the guidelines would encourage researchers to comply with the new standards. The present study has several limitations. First, findings in this study reflect the quality of reporting of observational clinical fMRI studies in six top neuroscience journals published between 2010 and 2011, results that may not apply to journals in general. Most likely, these results may overestimate true rates of reporting. Second, several items on the checklist used for evaluation in this systematic review involve subjectivity. However, using duplicate review and consensus for any disagreements helped to reduce differences in interpretations between reviewers.

Conclusion

This study has highlighted under-reported areas in observational fMRI studies involving clinical participants and points towards a need for improvement. Adherence to the guidelines for fMRI studies proposed by Poldrack and his colleagues could help improve quality of reporting. Considering that the guidelines are evolving and need continual updates, we suggest constructing a checklist that captures essential items to report to accommodate practical needs, and enforcing the reporting guidelines through proposed ways. Flow Diagram of Citation Selection Process. (DOC) Click here for additional data file. PRISMA 2009 Checklist. (DOC) Click here for additional data file. Sample size calculation for estimating a single proportion with a level of confidence. (DOC) Click here for additional data file. Sample size calculation for estimating a Cohen's kappa coefficient with a given precision. (DOC) Click here for additional data file. List of 100 eligible studies. (DOC) Click here for additional data file. Raw data collected from the 100 studies. (XLS) Click here for additional data file. Search strategy for Ovid Medline database. (DOC) Click here for additional data file. Data extraction form containing 83 items adapted from Poldrack et al.'s checklist. (DOC) Click here for additional data file.
Table 3

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Study Subjects”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
4aStated number of subjects100 (96, 100)1.00 (0.93, 1.00)Included
4bStated age (mean and range)92 (84, 96)0.90 (0.77, 0.97)Included
4cStated handedness64 (53, 73)0.98 (0.89, 0.99)Included
4dStated number of males or females95 (88, 98)0.90 (0.77, 0.97)Included
4eStated inclusion and exclusion criteria91 (83, 95)0.86 (0.72, 0.94)Included
4fIf any subjects were scanned but then rejected from analysis after data collection, stated numbers and reasons for rejection52 (41, 62)0.82 (0.67, 0.92)Included
4g# For group comparisons, stated what variables (if any) were equated across groups (n = 90)70 (59, 79)0.56 (0.37, 0.71)Included
5Stated which IRB approved the protocol94 (87, 97)0.94 (0.83, 0.99)Included
6Stated how behavioral performance was measured (e.g., response time, accuracy)56 (45, 65)0.34 (0.14, 0.52)Excluded due to much subjectivity and low inter-rater agreement. For example, some standard tools (e.g., E-Prime, Fiber-Optic-Button box) measure response timing and accuracy. If these tools are cited, is it safe to assume that the behavioral performance is measured? If not, what minimum details are required to report so as to score it as ‘reported’? Is this item required to report in every study? If not, under what condition?

Abbreviations: IRB, institutional review board.

The conditional item which is needed to report when the condition is met.

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

Table 4

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Image Properties”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
7aProvided manufacturer, field strength (in Tesla) and model name of MRI system98 (92, 99)0.96 (0.86, 0.99)Included
7bGave number of experimental sessions and volumes acquired per session50 (39, 60)0.78 (0.62, 0.88)Included
7cStated pulse sequence type (e.g., gradient/spin echo, EPI/spiral)98 (92, 99)1.00 (0.93, 1.00)Included
7dStated field of view, matrix size, slice thickness, inter-slice skip36 (26, 46)0.76 (0.60, 0.87)Included
7eProvided acquisition orientation (axial, sagittal, coronal, oblique)71 (61, 79)0.90 (0.77, 0.97)Included
7fStated whether it is on the whole brain. If not, state area of acquisition65 (54, 74)0.90 (0.77, 0.97)Included
7gStated order of acquisition of slices (sequential or interleaved)21 (13, 30)0.82 (0.67, 0.92)Included
7hStated TE, TR and flip angle86 (77, 92)0.92 (0.80, 0.98)Included

Abbreviations: EPI, Echo Planar Imaging; TE, echo time; TR, repetition time.

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

Table 6

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Inter-subject Registration and Smoothing”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
10aIllustrated the voxels presented in all subjects using “mask image”16 (9, 24)0.68 (0.51, 0.81)Included
10bDescribed transformation model (linear/affine, nonlinear), type of any non-linear transformations (polynomial, discrete cosine basis), number of parameters (e.g., 12 parameter affine), regularization image-similarity metric, and interpolation method18 (11, 26)0.70 (0.53, 0.83)Included
10cStated object anatomical image information used for transformation to Atlas42 (32, 52)0.46 (0.26, 0.63)Included
10dStated if anatomical MRI is co-planar with functional acquisition36 (26, 46)0.80 (0.65, 0.90)Included
10eStated if functional acquisition is co-registered to anatomical47 (36, 57)0.82 (0.67, 0.92)Included
10f# If functional acquisition is co-registered to anatomical, stated how (n = 47)27 (15, 42)0.50 (0.31, 0.66)Included
10gProvided Atlas/target information87 (78, 92)0.66 (0.48, 0.79)Included
10hStated brain image template space, name, modality and resolution (e.g., “FSL's MNI Avg152, T1 2×2×2 mm”, “SPM2's MNI gray matter template 2×2×2 mm”)16 (9, 24)0.64 (0.46, 0.78)Included
10iStated typically MNI, Talairach, or MNI converted to Talairach85 (76, 91)0.84 (0.69, 0.93)Included
10j# If MNI is converted to Talairach, stated the method used (e.g., Brett's mni2tal) (n = 13)61 (31, 86)0.86 (0.72, 0.94)Included
10kState clearly how anatomical locations (e.g., gyral anatomy, Brodmann areas) were determined (e.g., paper atlas, Talairach Daemon, manual inspection of individual's anatomy, etc.)61 (50, 70)0.68 (0.50, 0.81)Included
11Described size and type of smoothing kernel (e.g., for a group study, “12 mm FHWM Gaussian smoothing applied to ameliorate differences in inter-subject localization”; for single subject fMRI “6 mm FWHM Gaussian smoothing used to reduce noise”)84 (75, 90)0.96 (0.85, 0.99)Included

Abbreviations: MRI, magnetic resonance imaging; MNI, Montreal Neurological Institute space.

The conditional item which is needed to report when the condition is met.

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

Table 7

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Statistical Modeling”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
12For novel methods not described in a separate paper, provided description and validation of method in the text or an appendix (n = 2)50 (1, 98)0.88 (0.74, 0.96)Excluded. Given that methods are continually developing, it involves much subjectivity as to whether or not the reported methods are novel.
13aStated statistical model and estimation method for both intra-subject and group modeling described92 (84, 96)0.80 (0.65, 0.90)Included
13bStated block- or epoch-based or event-related model97 (91, 99)0.92 (0.80, 0.98)Included
13cSpecified hemodynamic response function58 (47, 67)0.76 (0.60, 0.87)Included
13dClearly stated additional regressors used (e.g., temporal derivatives, motion, behavioral covariates)53 (42, 63)0.58 (0.39, 0.73)Included
13eStated any orthogonalization of regressors7 (2, 13)0.86 (0.72, 0.94)Included
13fStated drift modeling or high-pass filtering (e.g., “DCT with cut off of X seconds”; “Gaussian-weighted running line smoother, cut-off 100 seconds”, or “cubic polynomial”)55 (44, 64)0.74 (0.57, 0.86)Included
13gDescribed autocorrelation model (e.g., AR(1), AR(1)+WN, or arbitrary autocorrelation function)18 (11, 26)0.80 (0.64, 0.90)Included
13hDefined contrast for task or stimulus conditions90 (82, 95)0.90 (0.77, 0.97)Included
14aStated statistical model, estimation method and inference type for group modeling (e.g., mixed, random or fixed effects)97 (91, 99)0.90 (0.77, 0.97)Included
14b# If fixed effects inference used for group modeling, provided the justification (n = 31)3 (1, 16)0.46 (0.26, 0.63)Included
14cIf the group has more than 2-levels, described the levels and assumptions of the model (e.g., are variances assumed equal between groups) (n = 21)28 (11, 52)0.60 (0.41, 0.75)Included
14dStated methods used for repeated measures to account for within subject correlation in group modeling24 (16, 33)0.66 (0.48, 0.79)Included

Abbreviations: DCT, discrete cosine transform; AR(1), first-order Autoregressive Model; WN, white noise.

The conditional item which is needed to report when the condition is met.

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

Table 8

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Statistical Inference on Statistic Image (thresholding)”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
15aStated type of search region for analysis, and the volume in voxels or CC54 (43, 64)0.60 (0.41, 0.75)Included
15b# If not whole brain, stated how region was determined (n = 86)93 (85, 97)0.58 (0.39, 0.73)Included
15c# Stated and listed each if threshold used for inference and threshold used for visualization in figures is different (n = 49)44 (30, 59)0.56 (0.37, 0.71)Included
15dStated if inferences are corrected for multiple comparisons90 (82, 95)0.80 (0.64, 0.90)Included
15e# If correction is limited to a small volume, stated the method for selecting the region (n = 73)72 (60, 82)0.54 (0.35, 0.70)Included
15f# Labeled “uncorrected” if no formal multiple comparisons method is used (n = 76)84 (74, 91)0.80 (0.64, 0.90)Included
15gStated if it is voxel-wise significance49 (38, 59)0.54 (0.35, 0.70)Included
15hStated if inferences are corrected for FWE or FDR50 (39, 60)0.78 (0.62, 0.89)Included
15i# Listed the smoothness in mm FWHM and the RESEL count if FWE found by random field theory (n = 45)25 (1, 80)0.70 (0.52, 0.83)Included
15j# Provided details of parameters for simulation if FWE found by simulation (e.g., AFNI AphaSim) (n = 7)57 (18, 90)0.62 (0.43, 0.76)Included
15k# If not a standard method, specified the method for finding significance (n = 12)100 (73, 100)0.72 (0.55, 0.84)Included
15lStated cluster-defining threshold (e.g., P = 0.001)51 (40, 61)0.44 (0.24, 0.61)Included
15mStated the corrected cluster significance level (e.g., “Statistic images were assessed for cluster-wise significance using a cluster-defining threshold of P = 0.001; the 0.05 FWE-corrected critical cluster size was 103”)55 (44, 64)0.42 (0.22, 0.59)Included
15n# Provided smoothness and RESEL count if significance determined with random field theory (n = 8)12 (1, 52)0.96 (0.85, 0.99)Included
15oStated correction for multiple planned comparisons based upon each voxel14 (7, 22)0.44 (0.24, 0.61)Included
15p# Stated observed effect size for any failure to reject the null hypothesis (e.g., lack of activation in a particular region) (n = 1)0 (0, 3)0.98 (0.89, 0.99)Included

Abbreviations: CC, cubic centimeter; FWE, family-wise error; FDR, false discovery rate; FWHM, full-width at half-maximum; RESEL, resolution element.

The conditional item which is needed to report when the condition is met.

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

Table 9

Percentage of articles reported each item, inter-rater agreement on the item and whether the item should be included in future shortened checklist relating to “Statistical Inference on ROI Analysis”.

Item NoDescription% ReportedPABAκItem Selection*
(95% CI)(95% CI)
16aDescribed how ROIs were defined (e.g., functional or anatomical localizer)86 (77, 92)0.54 (0.35, 0.70)Included
16bDescribed how signal was extracted within ROI (e.g., average parameter estimates, FIR deconvolution)45 (35, 55)0.46 (0.26, 0.63)Included
16c# If percent signal change reported, described how scaling factor was determined (n = 35)34 (19, 52)0.52 (0.32, 0.68)Included
16dStated if percent signal change is relative to voxel-mean, or whole-brain mean16 (9, 24)0.66 (0.48, 0.79)Included

Abbreviations: ROI, region of interest; FIR, finite impulse response.

The conditional item which is needed to report when the condition is met.

*To identify whether the item should be included in future shortened checklist. If excluded, the reasons for exclusion are given.

  51 in total

1.  The persistence of underpowered studies in psychological research: causes, consequences, and remedies.

Authors:  Scott E Maxwell
Journal:  Psychol Methods       Date:  2004-06

Review 2.  Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review.

Authors:  Amy C Plint; David Moher; Andra Morrison; Kenneth Schulz; Douglas G Altman; Catherine Hill; Isabelle Gaboury
Journal:  Med J Aust       Date:  2006-09-04       Impact factor: 7.738

3.  Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors.

Authors:  An-Wen Chan; Douglas G Altman
Journal:  BMJ       Date:  2005-01-28

Review 4.  Citation analysis for measuring the value of scientific publications: quality assessment tool or comedy of errors?

Authors:  D Schoonbaert; G Roelants
Journal:  Trop Med Int Health       Date:  1996-12       Impact factor: 2.622

Review 5.  The secret lives of experiments: methods reporting in the fMRI literature.

Authors:  Joshua Carp
Journal:  Neuroimage       Date:  2012-07-10       Impact factor: 6.556

6.  Bias, prevalence and kappa.

Authors:  T Byrt; J Bishop; J B Carlin
Journal:  J Clin Epidemiol       Date:  1993-05       Impact factor: 6.437

7.  Methodological rigor and citation frequency in patient compliance literature.

Authors:  J T Bruer
Journal:  Am J Public Health       Date:  1982-10       Impact factor: 9.308

8.  On the plurality of (methodological) worlds: estimating the analytic flexibility of FMRI experiments.

Authors:  Joshua Carp
Journal:  Front Neurosci       Date:  2012-10-11       Impact factor: 4.677

9.  Guidance for developers of health research reporting guidelines.

Authors:  David Moher; Kenneth F Schulz; Iveta Simera; Douglas G Altman
Journal:  PLoS Med       Date:  2010-02-16       Impact factor: 11.069

10.  Why current publication practices may distort science.

Authors:  Neal S Young; John P A Ioannidis; Omar Al-Ubaydli
Journal:  PLoS Med       Date:  2008-10-07       Impact factor: 11.069

View more
  8 in total

Review 1.  Scanning the horizon: towards transparent and reproducible neuroimaging research.

Authors:  Russell A Poldrack; Chris I Baker; Joke Durnez; Krzysztof J Gorgolewski; Paul M Matthews; Marcus R Munafò; Thomas E Nichols; Jean-Baptiste Poline; Edward Vul; Tal Yarkoni
Journal:  Nat Rev Neurosci       Date:  2017-01-05       Impact factor: 34.870

2.  An Updated Survey on Statistical Thresholding and Sample Size of fMRI Studies.

Authors:  Andy W K Yeung
Journal:  Front Hum Neurosci       Date:  2018-01-26       Impact factor: 3.169

3.  Readability of the 100 Most-Cited Neuroimaging Papers Assessed by Common Readability Formulae.

Authors:  Andy W K Yeung; Tazuko K Goto; W Keung Leung
Journal:  Front Hum Neurosci       Date:  2018-08-14       Impact factor: 3.169

4.  Vulnerability for new episodes in recurrent major depressive disorder: protocol for the longitudinal DELTA-neuroimaging cohort study.

Authors:  Roel J T Mocking; Caroline A Figueroa; Maria M Rive; Hanneke Geugies; Michelle N Servaas; Johanna Assies; Maarten W J Koeter; Frédéric M Vaz; Marieke Wichers; Jan P van Straalen; Rudi de Raedt; Claudi L H Bockting; Catherine J Harmer; Aart H Schene; Henricus G Ruhé
Journal:  BMJ Open       Date:  2016-03-01       Impact factor: 2.692

5.  The reporting of studies using routinely collected health data was often insufficient.

Authors:  Lars G Hemkens; Eric I Benchimol; Sinéad M Langan; Matthias Briel; Benjamin Kasenda; Jean-Marie Januel; Emily Herrett; Erik von Elm
Journal:  J Clin Epidemiol       Date:  2016-06-23       Impact factor: 6.437

6.  Data management and sharing in neuroimaging: Practices and perceptions of MRI researchers.

Authors:  John A Borghi; Ana E Van Gulick
Journal:  PLoS One       Date:  2018-07-16       Impact factor: 3.240

7.  Empirical assessment of changing sample-characteristics in task-fMRI over two decades: An example from gustatory and food studies.

Authors:  Andy W K Yeung; Natalie S M Wong; Simon B Eickhoff
Journal:  Hum Brain Mapp       Date:  2020-03-26       Impact factor: 5.038

8.  Most Common Publication Types of Neuroimaging Literature: Papers With High Levels of Evidence Are on the Rise.

Authors:  Andy Wai Kan Yeung
Journal:  Front Hum Neurosci       Date:  2020-04-28       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.