Literature DB >> 35573038

Of differing methods, disputed estimates and discordant interpretations: the meta-analytical multiverse of brain volume and IQ associations.

Jakob Pietschnig1, Daniel Gerdesmann1,2, Michael Zeiler3, Martin Voracek4.   

Abstract

Brain size and IQ are positively correlated. However, multiple meta-analyses have led to considerable differences in summary effect estimations, thus failing to provide a plausible effect estimate. Here we aim at resolving this issue by providing the largest meta-analysis and systematic review so far of the brain volume and IQ association (86 studies; 454 effect sizes from k = 194 independent samples; N = 26 000+) in three cognitive ability domains (full-scale, verbal, performance IQ). By means of competing meta-analytical approaches as well as combinatorial and specification curve analyses, we show that most reasonable estimates for the brain size and IQ link yield r-values in the mid-0.20s, with the most extreme specifications yielding rs of 0.10 and 0.37. Summary effects appeared to be somewhat inflated due to selective reporting, and cross-temporally decreasing effect sizes indicated a confounding decline effect, with three quarters of the summary effect estimations according to any reasonable specification not exceeding r = 0.26, thus contrasting effect sizes were observed in some prior related, but individual, meta-analytical specifications. Brain size and IQ associations yielded r = 0.24, with the strongest effects observed for more g-loaded tests and in healthy samples that generalize across participant sex and age bands.
© 2022 The Authors.

Entities:  

Keywords:  in vivo brain volume; intelligence; meta-analysis; multiverse analysis; specification curve analysis; systematic review

Year:  2022        PMID: 35573038      PMCID: PMC9096623          DOI: 10.1098/rsos.211621

Source DB:  PubMed          Journal:  R Soc Open Sci        ISSN: 2054-5703            Impact factor:   3.653


Introduction

Associations between brain volume and intelligence have attracted the interest of the scientific community at least since the early 1800s [1]. This potential association is important for our understanding of (human) intelligence because brain volume is considered to be among the best-replicated correlates of psychometric g (i.e. general cognitive ability; [2]). In this vein, larger brains have been assumed to accommodate more neurons and glial cells which in turn may allow for more complex and quicker information processing. Other researchers have argued that larger brains provide surplus brain tissue that serves as a type of brain reserve, which may protect against a deterioration of cognitive abilities due to ageing- or fitness-related factors (for an overview, see [3]). However, to determine the potential relevance of these mechanisms, it is important to obtain a plausible estimate of the brain volume and IQ association. Early empirical studies had to rely on proxies for intelligence (e.g. educational achievement) and brain size measures (such as head height, width, circumference or composites of it; e.g. [4]), which introduced considerable statistical noise in the data, thus leading to imprecise estimates of associations. Even with the advent of the first psychometric tests in the early 1900s, the issue of reliable brain volume measurement pertained, thus failing to alleviate validity problems of the observed associations. It was only the development of neuroimaging methods, such as magnetic resonance imaging (MRI), that allowed reliable assessments of in vivo brain volumes and consequently their link with intelligence. While the direction of the seemingly positive brain size and intelligence link was frequently reproduced, the observed strength varied considerably. The first available report of MRI-assessed brain volume and IQ correlations in healthy men suggested that this association explained an impressive 25% of variance (r = 0.51; [5]), thus representing a large effect according to the well-established effect size classification of Cohen [6]. However, subsequent replications appeared to yield predominantly effects in the small-to-moderate range. Several narrative reviews have been published that aimed at clarifying the brain-volume and IQ link [7-12]. Although all of them concluded that there is overwhelming evidence for a positive relationship, the strength and consequently meaning of the effect remained unclear. However, this is unsurprising because narrative reviews (i) lack formalized procedures that would allow systematic syntheses of effect sizes or influences of potential moderators and (ii) are vulnerable to various forms of bias (e.g. [13], pp. 301–302). Systematic numerical effect syntheses by means of meta-analytical approaches are often considered to represent a means to provide definite conclusions about a given research question (i.e. assuming that they have been adequately performed) because they can address both moderator effects and estimate potentially confounding dissemination biases. However, meta-analytic researchers that investigate an identical research question may arrive at surprisingly different conclusions. It has been demonstrated that design and analytical choices as well as researcher degrees of freedom (decisions that are being made that affect e.g. the specification of inclusion criteria or the selected statistical approach) may lead to substantially varying results from meta-analyses although they are based on similar databases and examine identical research questions (see the multiverse and specification-curve approach to meta-analysis introduced by [14], adopting analogous approaches for primary data analysis, i.e. multiverse analysis, by [15], and specification-curve-analysis, by [16,17]). Typically, there are a variety of reasonable choices that need to be made in regard to any type of study (for an overview, see [18]) which different researchers may (dis)agree about. For meta-analyses in this context, it matters most which studies are included and how they are analysed [14]. Researchers typically have conceptual or methodological reasons to prefer certain specifications over others. However, other researchers may apply different but equally reasonable selection criteria and analysis approaches which necessarily will lead to different summary effect estimations. One prominent example of the effects that such differing methodological choices can have is the widely received but heavily criticized paper that seemingly showed larger death-tolls of hurricanes with female as opposed to those with male names [19] in the prestigious journal PNAS. Although the methodological choices of this study had been well-justified, they represented an extreme specification which represented a single possibility out of a total of 1728 reasonable specifications (i.e. based on a number of published opinions about how these data should have been analyzed) to analyze these data (i.e. under the assumption that all possible meaningful kinds of data to analyze and how to do it had been identified in this study), only 37 of which (i.e. 2.1%) would have led to significantly deadlier hurricanes with female names [16,17]. Such observations have led researchers to argue that data analyses in general and research syntheses in particular should not only rely on a single reasonable specification which may represent only one of many justifiable approaches to treat and analyze data. Providing a distribution of these reasonable approaches allows a multi-faceted evaluation of potential moderating variables in terms of how they affect an effect size in terms of accuracy and strength. Naturally, researchers may be unaware of all potentially reasonable ways to analyze their data (i.e. both in terms of which data to analyze and how to do so) because they may be unaware about influential factors such as moderating variables. By means of a brute-force approach, all possible (but not necessarily reasonable) ways to calculate meta-analytical summary effects can inform a researcher about potentially meaningful specifications that she may have missed. This means that clustering of summary effects of certain data subsets within the framework of a combinatorial meta-analysis may allow an identification of previously unobserved moderators ([20]; for discussion and an application, [14]). Moreover, the time-point a certain meta-analysis is performed at can affect its outcomes because of the so-called decline effect (i.e. effect sizes of early studies investigating a given research question are more often than not inflated, thus leading to systematic decreases of effect strengths over time; [21,22]). So far three meta-analyses have been published about the association between brain size and IQ. They are an illustrative example of the influences on meta-analytical results in terms of which studies have been included as well as when and how a meta-analysis was performed. The first formal meta-analysis about in vivo brain size and IQ associations was published in 2005 [23], yielding evidence for a moderate effect of r = 0.33 (i.e. explaining about 11% of variance; k = 33 independent samples, N = 1530). About a decade later, this account was updated in the wake of another meta-analysis [24] which included an extension of the search strategy to further literature databases and the revision of the inclusion criteria (i.e. extending the inclusion from associations of only healthy to patient-based samples as well). This investigation yielded a considerably lower summary effect, indicating a small-to-moderate association between brain volume and IQ of r = 0.24 (i.e. explaining about 6% of variance; k = 120 independent samples, N = 6778). Moreover, an application of several dissemination bias assessment methods indicated that this summary effect was likely an inflated estimate due to confounding dissemination bias. Finally, in another meta-analysis [25], a subset of study effects that had been reported in Pietschnig et al. [24] was synthesized anew. This reanalysis of a selection of studies, focusing merely on a part of the extant research evidence (i.e. about a fifth of the available independent effect sizes, representing a mere fifth of the evidence that was available in 2015), yielded a moderate effect of r = 0.31 (explaining about 10% of variance; k = 32 independent samples, N = 1758), although the authors concluded that the true association with highly g-loaded tests (i.e. g-loadings indicate how close a given test is related with the general factor of intelligence; henceforth referred to as: g-ness) should amount to about r = 0.40 (i.e. corresponding to about 16% of explained variance). Importantly, the analysis of Gignac & Bates [25] represents a comparatively small data subset based on one particular (out of many reasonable) specifications from the data of Pietschnig et al. [24]. These inconsistent results of the meta-analytical effect estimates illustrate how different procedural choices in conducting a meta-analysis are. In fact, the inconsistencies between results of these very three meta-analyses have recently been highlighted as a textbook example of specification-dependent outcomes [14]. The considerable differences of the observed summary effects have obviously meaningful implications for the meaning of the brain volume and IQ link. For instance, when assuming 16% of explained variance of the brain volume and IQ association (i.e. corresponding to the interpretation of [25]), this would mean that 2.4 IQ points of this difference are merely due to their difference in brain size, when observing two individuals with a 10 IQ point difference. But when assuming 6% of explained variance (i.e. corresponding to the estimate of [24]), only 0.9 IQ points are explained by this correlation. Obviously, such inconsistencies in effect estimations are undesirable. Although brain volume and IQ have consistently been shown to be positively linked in prior research syntheses, the strength and therefore meaningfulness of this link remains unresolved. In the present case, these differing estimates may be predominantly attributed to two components. On the one hand, a larger estimate of the first published meta-analysis [23] compared to subsequent updates could be attributed to the decline effect, because published primary study effects and consequently meta-analytical summary effects tend to decrease over time [22]. However, a decline effect can be ruled out as the sole cause for these differences because the most recent meta-analysis about this topic [25] yielded a larger effect than the previous estimate. On the other hand, these inconsistencies may be attributable to the different (reasonable) choices that researchers make when conceptualizing their study (or, in certain cases, analysing their data). In these past three meta-analyses, several differences in terms of such choices can be identified. For instance, the inclusion criteria differed considerably in terms of sample characteristics (healthy versus patients; children/adolescents versus adults). Similarly, the analyses differed in terms of their methodological approaches. While two meta-analyses [23,25] used the so-called psychometric approach as developed by Hunter & Schmidt (e.g. [26]), the third one used the approach as introduced by Hedges & Olkin [27]. These two approaches most notably differ in terms of their underlying philosophy. While the former focuses on potential summary effect underestimations due to suboptimal sampling and measurement inaccuracies in the primary studies, the latter focuses on effect inflation and confounding bias. Consequently, in the former approach larger summary effects are typically obtained because individual study effects are corrected for artefacts such as range restriction or unreliability prior to the effect synthesis. The latter approach yields smaller summary effects because uncorrected estimates are synthesized. It is evident that these differences in how to synthesize which primary study data necessarily lead to differing results. However, evidently any individual choice of a certain way to synthesize the available data may be criticized in its own right, even if this choice has been reasonably justified, thus leaving the meaning of contradictory findings of isolated meta-analytical summary estimates unresolved. Novel methods provide a means to explore the multiverse of different design choices by allowing the assessment of a large number of (reasonable) specifications for the effect size synthesis in any given research question, thus providing a range of plausible estimates instead of an isolated point estimate (see [14]). In the present preregistered meta-analysis of the associations between in vivo brain volume and cognitive abilities (intelligence, IQ), we aimed at resolving the ambiguity of available effect estimates by (i) updating the available meta-analytical data base, (ii) assessing subgroup analysis- and meta-regression-based influences of moderators, (iii) investigating evidence of dissemination bias and (iv) providing a range of effect estimates based on a large number of (reasonable) effect syntheses based on evidence from combinatorial, multiverse analysis, and specification-curve approaches to meta-analysis (see [28], for a similarly designed research synthesis).

Methods

The present study was preregistered at the Open Science Framework (OSF; https://osf.io/r6gnk). Study materials, R-codes, and all data are available at https://osf.io/y6msp/.

Inclusion and exclusion criteria

In order to be included in the present meta-analysis the primary studies were required to fulfil four criteria. First, they had to assess the association between in vivo brain volume and IQ. Second, in vivo brain volume has had to be measured by either MRI or CT. Studies had to provide measurements of the whole brain volume (TBV or ICV). If both were reported, TBV was preferred over ICV. Associations with partial brain area volume were excluded. Third, intelligence had to be measured with psychometric intelligence tests. Fourth, in cases of data dependencies, studies with the most comprehensive account of parameters were preferred. If no such hierarchy could be established, the earliest published account was coded. Effect sizes were coded separately according to intelligence domain (full-scale, verbal and performance IQ). We used two different analysis methods when multiple subtest correlations within a certain domain were reported. First, in line with standard approaches, we selected those estimates that were judged to be conceptually more closely related to the respective domain (e.g. preferring verbal comprehension over working memory correlations for verbal IQ analyses). Second, we used robust variance estimations (RVE; [29]) to account for data dependencies while retaining the information of all coded information. In cases where potentially eligible studies did not provide sufficient information to calculate effect sizes, corresponding study authors were contacted. When no response was received, the respective study was excluded (when effect sizes were reported to be non-significant in published papers, but no correlation coefficient was given and no response was received, coefficients were set to zero according to standard meta-analytical procedures; [30], pp. 408–409; in our study 30 out of 454 effect sizes).

Literature search

To update the so far most comprehensive available data set of IQ and brain size correlations [24], we first searched the online literature databases PubMed, ISI Web of Science, Scopus and Google Scholar using the string: (brain size AND intelligen*) OR (brain volume AND intelligen*) OR (brain size AND IQ) OR (brain volume AND IQ). Second, we used a forward citation search of all three published meta-analyses on the subject [23-25]. Third, we conducted an extensive search for unpublished accounts in grey literature databases, search engines and repositories, sources dedicated to theses and dissertations, conference materials and registries for active studies. Finally, reference lists of all identified studies were handsearched for additional potentially eligible studies. Individual search strings and covered databases are available from https://osf.io/7z2u3/. Sample sex ratio, sample type (healthy versus patient samples), test instrument, publication year, mean sample age, participant numbers, effect size estimates and volumetric as well as IQ test standard deviations, number of corrections to the correlation coefficient (e.g. controlling for body height or mass), publication status (i.e. published versus grey literature versus personal communication), sample age (children/adolescents versus adults), brain volume measurement type (TBV or ICV), IQ domain and the assumed g-loadedness of the respectively used intelligence test (g-ness: fair versus good versus excellent; these categories correspond to the three rating criteria of [25], indicating a rank score based on included test number, dimensions and correlation with g; testing time was presently not used as an evaluation criterion; for details refer to [25]) were recorded independently by two researchers (D.G., M.Z.). A coding book containing data sheets and a coding manual explaining all variables and their categories is available at https://osf.io/fr8g7/. Inconsistencies were resolved by discussion with a third independent coder (J.P.). All searches were updated on April 14, 2021.

Synthesis methods

Meta-analytical calculations were performed in R by means of the metafor package [31].

Hedges and Olkin meta-analysis

A random-effects meta-analysis in the tradition of Hedges & Olkin [27] was conducted based on independent effect sizes. Effect syntheses were calculated based on Fisher's z-transformed values to account for the skewed distributions of Pearson rs. For ease of interpretation, results were back-transformed into the r-metric prior to reporting. Some researchers have criticized this procedure because it may introduce a substantial upward bias (see, [26]). Therefore, we conducted a sensitivity analysis with correlations corrected for this negative bias [32]. Effect sizes were weighted according to study precision. Precision of the effect size estimates is illustrated by 95% confidence intervals (CI) based on the Knapp-Hartung adjustment [33,34]. In sensitivity analyses, we conducted leave-one-out analyses to evaluate influences of individual studies on the overall effect size and assessed potential influences of outliers following the approach of Viechtbauer & Cheung [35].

Psychometric meta-analysis

Within the framework of psychometric meta-analyses, researchers aim at accounting for potential measurement inaccuracy and sampling errors. Following the specifications of two previous meta-analyses [23,25], we accounted for direct range restriction only (i.e. using the Case II formula of [36], to correct correlations and the approach of [37], to estimate their standard errors; see, [38]), because (i) reliabilities of MRI and IQ measures are typically high and (ii) reliabilities were infrequently reported [23,25]. Here, we used the n-weighted Hunter and Schmidt estimator (HS) to calculate effect syntheses.

Robust variance estimation

By means of this approach, we were able to include more than one effect size per sample within a particular study and intelligence domain, following current recommendations [39]. Data dependencies within domain-specific analyses (i.e. full-scale versus verbal versus performance IQ) were modelled using robust variance estimation within meta-regressions, thus allowing for including a maximum amount of available information of primary studies (i.e. inclusion of multiple effect sizes from identical participants; this is reasonable because full-scale, verbal and performance IQ results are typically highly intercorrelated), while avoiding inappropriate effect size weighting due to dependent data [40]. We used a correlated-effects model because data dependence was primarily caused by correlations between participants' domain intelligence scores from different (sub)domains. We used τ2 (random-effects maximum likelihood) and ω2 by means of simplistic methods of moments estimators to estimate weights [41]. Fisher's z-transformed correlation coefficients were used for analyses.

Moderator analyses

Subgroup analyses

Influences of categorical variables were examined in a series of mixed-effects subgroup analyses. We assessed group differences according to sample type (healthy versus clinical), age (children/adolescents versus adults), sex (men versus women) and g-ness. Furthermore, another two supplemental subgroup analyses were performed to (i) further corroborate results of bias analyses (i.e. published in a peer-reviewed journal versus obtained from grey literature or through personal communication) and (ii) assess possible influences of different brain volume measurement types (i.e. total brain volume versus intracranial volume).

Meta-regressions

Single regressions

Initially, a series of single regressions of effect sizes on study publication years was calculated for each IQ domain, to assess evidence for a potential decline effect. Differences of associations between intelligence domains were evaluated by means of a RVE-based meta-regression. For each RVE-regression, we ran one model with and one model without intercept to examine differences of coefficients compared to full-scale IQ and regression-based effect estimates for each domain, respectively.

Multiple regressions

Subsequently, theory-guided hierarchical multiple precision-weighted mixed-effects meta-regressions were calculated for each domain. In three blocks we included (i) g-ness (only for full-scale IQ analyses; fair/good versus excellent), publication year, publication status; (ii) male ratio, mean sample age and (iii) study goal (i.e. assessment of IQ and brain volume association was the primary study goal versus not), number of included covariates in study. Model fit between blocks was compared by means of likelihood ratio tests. In supplemental analyses, we repeated these calculations only in studies that reported total brain volume (results omitted for brevity).

Dissemination bias

We used several dissemination bias detection methods to assess potential influences of summary effect-biasing artefacts. The use of multiple detection methods is sensible because different methods have been shown to not be equally sensitive to different bias scenarios and sources [42,43]. Results of bias analyses were based on published data only (excepting direct comparisons) and we focused on healthy samples (i.e. following the specifications of [25], and [23]), because bias may be expected to be stronger in neurotypical samples (i.e. because in non-neurotypical samples, expectations of certain effect strength observations may be lower and reporting of effect sizes therefore less dependent on conforming to preceding observations). Fisher's r-to-z transformed correlation coefficients were used (excepting p-value-based analyses) and analyses were carried out separately for full-scale, verbal and performance IQ. First, we provide power-enhanced funnel plots (i.e. sunset plots, see [44]; using the R package metaviz, [45]) comprising estimates for the median power of all studies, necessary true effect estimates for a median power reduction to 33% or 66%, results of the test of excess significance (TES; observed random-effects summary effects were used for study power calculations; [46]), as well as the R-Index for expected replicability [47]. Second, we used Sterne & Egger's regression approach [48] by regressing effect sizes on the standard normal deviate effect size of the inverse standard error (p-values < 0.10 were considered indicative of bias). Third, the trim-and-fill method [49] was applied to detect potential funnel plot asymmetry. Resulting bias-adjusted estimates were interpreted in terms of a sensitivity analysis rather than a corrected summary effect estimate. Fourth, we used three novel methods that are based on the distribution of published p-values and provide a means to estimate summary effect sizes and detect evidence for p-hacking and dissemination bias. The methods p-curve [50], p-uniform [51] and p-uniform* [52] are based on the idea that p-values follow a uniform distribution when the true investigated effect of any given research question is zero. In p-curve and p-uniform, only significant p-values are included in the analyses. This makes sense, because significant effects should have an identical publication probability. Observed right-skewness of these distributions are interpreted to be indicative of a true non-zero effect while observed left-skewness indicates confounding effects of p-hacking. Examinations of p-curves allow consequently a formal test of p-hacking by examining the shape of the observed p-value distribution (p-curve) or comparing observed with expected conditional p-values (p-uniform). By minimizing a loss function resulting from inspection of the observed p-value distribution (p-curve) or assessing the value for which the conditional p-value distribution is uniform (p-uniform), summary effects can be estimated. Effect-size estimations that are based on p-uniform* rely on both significant and non-significant p-values (here, it is assumed that publication probabilities are identical within, but not across, groups). P-uniform has been shown to be more precise than both p-curve and p-uniform in the presence of non-trivial between-studies heterogeneity and allows assessment of the extent of unobserved heterogeneity [52]. P-uniform and p-uniform* were calculated by means of the R package puniform [53] and the R code available from www.p-curve.com/Supplement. Fifth, we used two selection-model approaches that were either based on the four identical weight functions for p-values, as used in Pietschnig et al. [24]), or on the standard error of the effect sizes [54]. By means of these approaches, the robustness of the summary effect can be assessed when primary study effects are weighted according to them being more or less significant (p-values were weighted according to the approach of [55]) and more or less accurate [54]. We used metasens [56] to conduct the Copas and Shi analysis. Sixth, we inspected the overlap between two DerSimonian-Laird estimator-based confidence intervals by means of the approach of Henmi & Copas [57]. This method can be seen as a type of sensitivity analysis in which conventionally calculated confidence intervals are compared with those of a hybrid estimation resulting from the use of fixed-effect weights but random-effects heterogeneity estimates. Large discrepancies between effect and confidence interval estimates are considered to be indicative of bias, although there is no consensus about a suitable hard-and-fast decision criterion. Seventh, we directly assessed potential bias by regressing effect sizes on publication type (published versus unpublished effects). Moreover, we assessed cross-temporal changes in effect size in another meta-regression model, in order to assess the evidence for a potentially confounding decline effect in the data, as evidenced in a predecessor meta-analysis on this topic [24]. Finally, we conducted two cumulative meta-analyses according to (i) sample size and (ii) publication year to further illustrate potential influences of small-study effects and cross-temporal changes.

Exploring the multiverse

To untangle specific analytical design choices (i.e. reflecting researcher degrees of freedom) from the general pattern of brain size and IQ associations, we explored the multiverse of possible (reasonable) meta-analytic design specifications [14] using the R Code available from https://osf.io/kqgey/.

Combinatorial meta-analysis

To this end, we first used a combinatorial meta-analysis to examine estimates for a large random selection of any possible subset of the 2 − 1 possible (not necessarily reasonable) combinations of the available data [20]. This approach can be interpreted as a sensitivity analysis by allowing the identification of outlier studies that overproportionally influence effect estimations. Consequently, 2123 and 271 (about 10 undecillions and 2 undecillions) combinations were possible for an exhaustive selection of full-scale IQ subsets for healthy and patient samples, respectively (270 and 245 as well as 249 and 233 combinations were possible for verbal and performance IQ). For our analyses, we randomly drew 100 000 subsets out of these possible combinations to illustrate outlier influences in GOSH plots (Graphic Display of Heterogeneity). Specifically, GOSH plots allow a visual inspection of summary effect distributions and their associated between-studies heterogeneity when any kind of (un)reasonable specification has been used. Moreover, numerical inspections of dispersion values (e.g. interquartile ranges) and the effect distribution permit an evaluation of the influence of moderating variables (i.e. narrow intervals and symmetrical distributions indicated well-interpretable summary effects).

Specification-curve meta-analysis

By means of another method, we examined several possible reasonable specifications. In our meta-analytic specification-curve analyses (see [14]), we introduced three which factors: (1) sample age: adults versus children/adolescents versus either; (2) sample type: healthy versus patient versus either; (3) g-ness (for full-scale IQ only): fair or good versus excellent versus either; and two how factors: (1) effect size type: non-transformed Pearson rs versus r-to-z-transformed effect size versus small sample bias corrected rs versus range departure corrected rs; (2) analysis approach: Hedges-Olkin random-effects estimation versus Hunter-Schmidt effect estimation versus unweighted estimation versus RVE). Consequently, these potentially influential analytic choices yielded 3 × 3 × 3 = 27 ways for which data to analyze and 4 × 4 = 16 ways for how to do it, thus totaling 27 × 16 = 432 reasonable specifications. In our further analyses in the subsets of verbal and performance data only, this number was reduced to 144 reasonable specifications because g-ness was not relevant in these cases. Specifications that comprised less than two effect sizes were dropped from analyses.

Final sample

We included effect sizes of 86 studies from Pietschnig et al. [24] ([58] and [59] were presently excluded because (i) of duplicate effect size publication and (ii) brain-volume measurements had not been performed in vivo) as well as 57 newly identified studies. Because more recent studies were based on larger samples, this update more than tripled the included participant numbers compared to Pietschnig et al. [24], thus representing a crucial expansion of the empirical knowledge base. In all, we extracted 454 effect sizes including 194 independent samples for full-scale, 115 for verbal, and 82 for performance IQ associations (Ns = 26 764; 7667; and 5984; respectively). The majority of samples comprised healthy participants and about a third comprised patients (ks = 123 versus 71; 70 versus 45; 49 versus 33; Ns = 23 403 versus 5361; 5440 versus 2237; 4162 versus 1858 for healthy and patient samples for full-scale, verbal and performance IQ, respectively). A PRISMA flowchart for the study inclusion process is provided in figure 1 (for study characteristics, table 1).
Figure 1

PRISMA flowchart.

Table 1

Characteristics of included studies. Note. NA = info not available; Review: 1 = included in McDaniel [23], 2 = included in Pietschnig et al. [24], 3 = included in Gignac & Bates [25], 4 = included in present update; Reporting: reported = published in a journal article, grey = published as thesis/dissertation, PC = result obtained via personal communication; FSIQ = full-scale IQ; Type of test: IQ assessment used in study; subtest abbreviations: arith = arithmetic, bd = block design, com = comprehension, ds = digit symbol, inf = information, lm = logical memory, lns = letter-number sequencing, mr = matrix reasoning, obj = object assembly, pc = picture completion, pic = picture arrangement, sim = similarities, span = digit span (b stand for backwards), ss = symbol search, ss p + f = spatial span forwards and backwards, vpa = verbal pair associates; domain indices of the Wechsler scales are abbreviated as follows: POI = perceptual organization index, PRI = perceptual reasoning index, PSI = processing speed index, VCI = verbal comprehension index, WMI = working memory index; full information explaining all abbreviations are available in the codebook and data files in supplemental materials. Published study outcomes with r = exactly 0 represent correlations set to zero, because no eligible numerical value was available.

studyyearreviewsample typemean agemale ratio (%)reportingIQ domaintestnr
Yeo et al. [60]19872, 4patients38.4034.00reportedFSIQWAIS410.007
Yeo et al. [60]19872, 4patients38.4034.00reportedverbalWAIS verbal410.12
Yeo et al. [60]19872, 4patients38.4034.00reportedperformanceWAIS performance410.06
Willerman et al. [5]19911, 2, 3, 4healthy18.900.00reportedFSIQWAIS-R: voc, sim, bd, pc200.33
Willerman et al. [5]19911, 2, 3, 4healthy18.90100.00reportedFSIQWAIS-R: voc, sim, bd, pc200.51
Andreasen et al. [61]19931, 2, 3, 4healthy38.000.00reportedFSIQWAIS-R300.44
Andreasen et al. [61]19931, 2, 3, 4healthy38.00100.00reportedFSIQWAIS-R370.40
Andreasen et al. [61]19932, 4healthy38.000.00reportedverbalWAIS-R verbal300.43
Andreasen et al. [61]19932,4healthy38.00100.00reportedverbalWAIS-R verbal370.33
Andreasen et al. [61]19932, 4healthy38.000.00reportedperformanceWAIS-R performance300.30
Andreasen et al. [61]19932, 4healthy38.00100.00reportedperformanceWAIS-R performance370.43
Raz et al. [62]19931, 2, 3, 4healthy43.8059.00reportedfluidCFIT290.43
Raz et al. [62]19932, 4healthy43.8059.00reportedverbalExtended Vocabulary (V3)290.10
Castellanos et al. [63]19941, 2, 4healthy12.10100.00reportedverbalWISC-R: voc460.33
Harvey et al. [64]19942, 4patients35.6038.00reportedverbalNART260.38
Harvey et al. [64]19942, 4patients31.1077.00reportedverbalNART480.24
Harvey et al. [64]19942, 4healthy31.6055.00reportedverbalNART340.69
Jones et al. [65]19942, 4healthy31.7064.00reportedverbalNART or WAIS-R verbal670.30
Wickett et al. [66]19941, 2, 3, 4healthy25.000.00reportedFSIQMAB400.40
Wickett et al. [66]19942, 4healthy25.000.00reportedverbalMAB verbal400.44
Wickett et al. [66]19942, 4healthy25.000.00reportedperformanceMAB performance400.28
Bigler [67]19952, 4patients29.4071.00reportedFSIQWAIS-R72−0.03
Egan et al. [68]19951, 2, 3, 4healthy22.50100.00reportedFSIQWAIS-R400.31
Egan et al. [68]19952, 4healthy22.50100.00reportedverbalWAIS-R verbal400.21
Egan et al. [68]19952, 4healthy22.50100.00reportedperformanceWAIS-R performance400.22
Haier et al. [69]19952, 4patients26.3954.00reportedFSIQWAIS-R280.65
Kareken et al. [70]19951, 2, 4healthy27.6663.00PCFSIQWAIS-R680.30
Kareken et al. [70]19954patients29.7563.00reportedverbalCOWA, Animal Naming, Boston Naming, Token Test, WRAT: Reading680.36
Kareken et al. [70]19954healthy27.6663.00reportedverbalCOWA, Animal Naming, Boston Naming, Token Test, WRAT: Reading680.24
Kareken et al. [70]19954patients29.7563.00reportedperformanceWAIS-R: bd; Benton Line Orientation, Geometric Figure Drawings680.18
Kareken et al. [70]19954healthy27.6663.00reportedperformanceWAIS-R: bd; Benton Line Orientation, Geometric Figure Drawings680.26
Raz et al. [71]19952, 4patients35.2077.00reportedFSIQWPPSI-R + BCS11−0.24
Reiss et al. [72]19952, 4healthy11.2842.00PCFSIQWISC-R or SB or BSID870.00
Reiss et al. [72]19952, 4patients10.8035.00reportedFSIQWISC-R or SB or BSID510.25
Reiss et al. [73]19961, 2, 4healthy10.600.00PCFSIQunknown FS570.37
Reiss et al. [73]19962, 4healthy10.10100.00PCFSIQunknown FS120.52
Blatter et al. [74]19972, 4patientsNANAreportedverbalWAIS-R verbal220.57
Blatter et al. [74]19972, 4patientsNANAreportedperformanceWAIS-R performance210.47
Mori et al. [75]19972, 4patients70.2038.00reportedFSIQWAIS-R600.40
Mori et al. [75]19972, 4patients70.2038.00reportedverbalWAIS-R verbal600.37
Mori et al. [75]19972, 4patients70.2038.00reportedperformanceWAIS-R performance600.37
Paradiso et al. [76]19972, 3, 4healthy24.8053.00reportedFSIQWAIS-R620.38
Paradiso et al. [76]19972, 4healthy24.8053.00reportedverbalWAIS-R: voc620.27
Paradiso et al. [76]19972, 4healthy24.8053.00reportedperformanceWAIS-R: bd620.32
Paradiso et al. [76]19974healthy24.8053.00reportedverbalWAIS-R: span620.11
Flashman et al. [77]19981, 2, 4healthy27.0053.00reportedFSIQWAIS-R900.25
Flashman et al. [77]19982, 4healthy27.0053.00reportedverbalWAIS-R verbal900.16
Flashman et al. [77]19982, 4healthy27.0053.00reportedperformanceWAIS-R performance900.26
Gur et al. [78]19991, 2, 3, 4healthy25.000.00reportedFSIQWAIS-R: voc, bd; CVLT, JLO400.40
Gur et al. [78]19991, 2, 3, 4healthy27.00100.00reportedFSIQWAIS-R: voc, bd; CVLT, JLO400.39
Gur et al. [78]19992, 4healthy25.000.00reportedverbalWAIS-R: voc; CVLT400.40
Gur et al. [78]19992, 4healthy27.00100.00PCverbalWAIS-R: voc; CVLT400.00
Gur et al. [78]19994healthy25.000.00reportedperformanceWAIS-R: bd; JLO400.57
Gur et al. [78]19994healthy27.00100.00reportedperformanceWAIS-R: bd; JLO400.35
Leonard et al. [79]19992, 4patients43.00100.00PCverbalWAIS-R verbal370.00
Leonard et al. [79]19992, 4healthy42.00100.00PCverbalWAIS-R verbal330.00
Leonard et al. [79]19992, 4patients43.00100.00PCperformanceWAIS-R performance370.00
Leonard et al. [79]19992, 4healthy42.00100.00PCperformanceWAIS-R performance330.00
Tan et al. [80]19991, 2, 3, 4healthy22.000.00reportedfluidCFIT540.62
Tan et al. [80]19991, 2, 3, 4healthy22.00100.00reportedfluidCFIT490.28
Warwick et al. [81]19992, 4patients21.600.00PCverbalQuick IQ Test [82]110.00
Warwick et al. [81]19992, 4healthy21.500.00PCverbalQuick IQ Test [82]130.00
Warwick et al. [81]19992, 4patients21.80100.00PCverbalQuick IQ Test [82]100.00
Warwick et al. [81]19992, 4patients21.80100.00PCverbalQuick IQ Test [82]100.00
Warwick et al. [81]19992, 4healthy21.50100.00PCverbalQuick IQ Test [82]250.00
Warwick et al. [81]19992, 4patients21.63100.00reportedverbalQuick IQ Test [82]450.31
Warwick et al. [81]19992, 4patients21.550.00reportedverbalQuick IQ Test [82]240.53
Garde et al. [83]20001, 2, 3, 4healthy80.700.00PCFSIQWAIS220.22
Garde et al. [83]20001, 2, 3, 4healthy80.70100.00PCFSIQWAIS460.07
Isaacs et al. [84]20002, 4healthy7.7573.00PCFSIQWISC-III11−0.03
Isaacs et al. [84]20002, 4healthy7.7538.00PCFSIQWISC-III80.55
Isaacs et al. [84]20002, 4healthy7.7573.00PCverbalWISC-III verbal11−0.04
Isaacs et al. [84]20002, 4healthy7.7538.00PCverbalWISC-III verbal80.57
Isaacs et al. [84]20002, 4healthy7.7573.00PCperformanceWISC-III performance11−0.18
Isaacs et al. [84]20002, 4healthy7.7538.00PCperformanceWISC-III performance80.35
Kumra et al. [85]20002, 4patients12.3081.00PCFSIQWISC-III or WISC-R or WAIS: voc, bd270.00
Kumra et al. [85]20002, 4patients14.4057.00PCFSIQWISC-III or WISC-R or WAIS: voc, bd440.00
Lawson et al. [86]20002, 4patientsNANAreportedFSIQWISC-III or WPPSI-R or DAS or SB or GMDS470.43
Pennington et al. [87]20001, 2, 4healthy19.0644.00reportedFSIQWISC-R or WAIS-R FS360.31
Pennington et al. [87]20002, 4healthy16.9758.00reportedFSIQWISC-R or WAIS-R FS960.42
Schoenemann et al. [88]20001, 2, 3, 4healthy23.200.00PCfluidRSPM720.21
Schoenemann et al. [88]20002, 4healthy23.200.00reportedverbalMAB vocabulary360.12
Wickett et al. [89]20001, 2, 3, 4healthy24.97100.00reportedFSIQMAB680.35
Wickett et al. [89]20002, 4healthy24.97100.00reportedverbalMAB verbal680.33
Wickett et al. [89]20002, 4healthy24.97100.00reportedperformanceMAB performance680.31
Castellanos et al. [90]20012, 4patients9.700.00reportedFSIQWISC-R or WISC-III: voc, bd400.36
Coffey et al. [91]20012, 4healthy74.8538.00reportedverbalVerbal fluecy Task319−0.06
Coffey et al. [91]20012, 4healthy74.8538.00reportedperformanceWAIS-R: bd3180.06
Aylward et al. [92]20021, 2, 4healthyNA100.00PCFSIQunknown FS46−0.13
Aylward et al. [92]20021, 2, 4healthyNANAPCFSIQunknown FS300.08
Aylward et al. [92]20022, 4patients18.8087.00reportedFSIQunknown FS670.10
Aylward et al. [92]20022, 4patients18.8087.00reportedverbalunknown verbal670.08
Aylward et al. [92]20022, 4healthy18.9092.00reportedverbalunknown verbal83−0.01
Aylward et al. [92]20022, 4patients18.8087.00reportedperformanceunknown performance670.10
Aylward et al. [92]20022, 4healthy18.9092.00reportedperformanceunknown performance830.09
MacLullich et al. [93]20021, 2, 3, 4healthy67.80100.00reportedfluidRSPM950.39
MacLullich et al. [93]20022, 4healthy67.80100.00reportedverbalNART970.30
Nosarti et al. [94]20021, 2, 4healthy14.9065.00PCFSIQunknown FS420.37
Shapleske et al. [95]20021, 2, 3, 4healthy33.30100.00PCFSIQunknown FS230.13
Collinson et al. [96]20032, 4healthy16.4060.00PCFSIQWISC-R or WAIS-R22−0.13
Collinson et al. [96]20032, 4patients16.8067.00PCFSIQWISC-R or WAIS-R32−0.27
Collinson et al. [96]20032, 4patients16.8067.00PCverbalWISC-R or WAIS-R verbal32−0.28
Collinson et al. [96]20032, 4healthy16.4060.00PCverbalWISC-R or WAIS-R verbal22−0.09
Collinson et al. [96]20032, 4patients16.8067.00PCperformanceWISC-R or WAIS-R performance32−0.19
Collinson et al. [96]20032, 4healthy16.4060.00PCperformanceWISC-R or WAIS-R performance22−0.17
Giedd [97]20031, 2, 4healthyNA0.00PCFSIQunknown FS80.46
Giedd [97]20031, 2, 4healthyNA100.00PCFSIQunknown FS70.17
Giedd [97]20031, 2, 4healthyNA0.00PCFSIQunknown FS7−0.67
Giedd [97]20031, 2, 4healthyNA100.00PCFSIQunknown FS70.67
Giedd [97]20031, 2, 4healthyNA0.00PCFSIQunknown FS390.34
Giedd [97]20031, 2, 4healthyNA100.00PCFSIQunknown FS630.27
Kesler et al. [98]20032, 4patients26.1652.00reportedFSIQWAIS-R250.47
Kesler et al. [98]20032, 4patients26.1652.00reportedverbalWAIS-R verbal250.57
Yurgelun-Todd et al. [99]20032, 4healthy14.600.00reportedFSIQShipley total240.20
Yurgelun-Todd et al. [99]20032, 4healthy14.50100.00reportedFSIQShipley total130.26
Yurgelun-Todd et al. [99]20032, 4healthy14.600.00reportedverbalShipley verbal240.17
Yurgelun-Todd et al. [99]20032, 4healthy14.50100.00reportedverbalShipley verbal130.19
Yurgelun-Todd et al. [99]20034healthy14.600.00reportedverbalWAIS-III: span240.19
Yurgelun-Todd et al. [99]20034healthy14.50100.00reportedverbalWAIS-III: span130.55
Yurgelun-Todd et al. [99]20034healthy14.600.00reportedperformanceWAIS-III: ds240.07
Yurgelun-Todd et al. [99]20034healthy14.50100.00reportedperformanceWAIS-III: ds130.48
Frangou et al. [100]20041, 2, 4healthy15.0550.00reportedFSIQWISC-III or WAIS-III400.41
Isaacs et al. [101]20042, 4healthy15.900.00PCFSIQWechsler FS380.24
Isaacs et al. [101]20042, 4healthy15.90100.00PCFSIQWechsler FS380.27
Isaacs et al. [101]20042, 4healthy14.8650.00PCFSIQWechsler FS160.49
Isaacs et al. [101]20042, 4healthy15.600.00PCverbalWechsler verbal380.20
Isaacs et al. [101]20042, 4healthy15.900.00PCperformanceWechsler performance380.21
Isaacs et al. [101]20042, 4healthy15.90100.00PCperformanceWechsler performance380.15
Ivanovic et al. [102,103]20041, 2, 4healthy18.000.00reportedFSIQWAIS-R490.37
Ivanovic et al. [102,103]20041, 2, 4healthy18.00100.00reportedFSIQWAIS-R470.55
Ivanovic et al. [102,103]20042, 4healthy18.000.00reportedverbalWAIS-R verbal490.33
Ivanovic et al. [102,103]20042, 4healthy18.00100.00reportedverbalWAIS-R verbal470.55
Ivanovic et al. [102,103]20042, 4healthy18.000.00reportedperformanceWAIS-R performance490.38
Ivanovic et al. [102,103]20042, 4healthy18.00100.00reportedperformanceWAIS-R performance470.52
Rojas et al. [104]20042, 3, 4healthy43.6247.00PCFSIQWAIS-R or WAIS-III170.31
Rojas et al. [104]20042, 4patients30.3087.00PCFSIQWAIS-R or WAIS-III150.07
Rojas et al. [104]20042, 4patients30.3087.00PCverbalWAIS-R or WAIS-III verbal150.30
Rojas et al. [104]20042, 4healthy43.6247.00PCverbalWAIS-R or WAIS-III verbal170.19
Rojas et al. [104]20042, 4patients30.3087.00PCperformanceWAIS-R or WAIS-III performance150.15
Rojas et al. [104]20042, 4healthy43.6247.00PCperformanceWAIS-R or WAIS-III performance170.27
Toulopoulou et al. [105]20042, 4patients42.2350.00reportedFSIQWAIS-R2010.28
Toulopoulou et al. [105]20042, 4patients42.2350.00reportedverbalWAIS-R verbal2010.28
Waiter et al. [106]20042, 4healthy15.50100.00PCFSIQWISC-III-R or WAIS-IV160.13
Waiter et al. [106]20042, 4patients15.40100.00PCFSIQWISC-III-R or WAIS-IV16−0.06
Waiter et al. [106]20042, 4patients15.40100.00PCverbalWISC-III-R or WAIS-IV verbal16−0.17
Waiter et al. [106]20042, 4healthy15.50100.00PCverbalWISC-III-R or WAIS-IV verbal160.20
Waiter et al. [106]20042, 4patients15.40100.00PCperformanceWISC-III-R or WAIS-IV performance160.10
Waiter et al. [106]20042, 4healthy15.50100.00PCperformanceWISC-III-R or WAIS-IV performance160.23
Antonova et al. [107]20052, 4patients40.4960.00PCverbalWAIS-III: voc440.16
Antonova et al. [107]20052, 4healthy33.7258.00PCverbalWAIS-III: voc430.24
Lodygensky et al. [108]20052, 4healthy8.4257.00PCFSIQWISC-R210.46
Lodygensky et al. [108]20052, 4patients8.5853.00PCFSIQWISC-R600.35
Thoma et al. [109]20052, 3, 4healthy23.50100.00reportedFSIQRPM, TrailsAB, WAIS-R: voc, bd, ds; VMRT, COWA190.27
Debbané et al. [110]20062, 4healthy15.1043.00PCFSIQWISC-III or WAIS-III410.16
Debbané et al. [110]20062, 4patients16.7037.00PCFSIQWISC-III or WAIS-III430.16
Rojas et al. [111]20062, 4healthy21.41100.00PCFSIQWAIS-III or WISC-III230.46
Rojas et al. [111]20062, 4patients20.79100.00PCFSIQWAIS-III or WISC-III240.30
Rojas et al. [111]20062, 4patients20.79100.00PCverbalWAIS-III or WISC-III verbal240.28
Rojas et al. [111]20062, 4healthy21.41100.00PCverbalWAIS-III or WISC-III verbal230.55
Rojas et al. [111]20062, 4patients20.79100.00PCperformanceWAIS-III or WISC-III performance240.31
Rojas et al. [111]20062, 4healthy21.41100.00PCperformanceWAIS-III or WISC-III performance230.09
Staff et al. [112]20061, 2, 4healthy79.5061.00PCfluidRSPM102−0.10
Staff et al. [112]20062, 4healthy79.5061.00PCverbalNART102−0.14
Voelbel et al. [113]20062, 4healthy10.77100.00PCFSIQWISC-III13−0.11
Voelbel et al. [113]20062, 4patients10.16100.00PCFSIQWISC-III380.02
Voelbel et al. [113]20062, 4patients10.08100.00PCFSIQWISC-III12−0.14
Voelbel et al. [113]20062, 4patients10.16100.00PCverbalWISC-III verbal380.08
Voelbel et al. [113]20062, 4patients10.08100.00PCverbalWISC-III verbal120.23
Voelbel et al. [113]20062, 4healthy10.77100.00PCverbalWISC-III verbal13−0.15
Voelbel et al. [113]20062, 4healthy10.77100.00PCperformanceWISC-III performance130.06
Voelbel et al. [113]20062, 4patients10.08100.00PCperformanceWISC-III performance12−0.48
Voelbel et al. [113]20062, 4patients10.16100.00PCperformanceWISC-III performance38−0.02
Wozniak et al. [114]20062, 4healthy12.4046.20PCFSIQWISC-III or WISC-IV130.59
Wozniak et al. [114]20062, 4patients12.3050.00PCFSIQWISC-III or WISC-IV140.41
Chiang et al. [115]20072, 4patients29.2045.00reportedverbalWAIS verbal39−0.02
Chiang et al. [115]20072, 4healthyNANAreportedverbalWAIS verbal16−0.44
Chiang et al. [115]20072, 4patients29.2045.00reportedperformanceWAIS performance390.10
Chiang et al. [115]20072, 4healthyNANAreportedperformanceWAIS performance160.41
DeBoer et al. [116]20072, 4healthy10.50NAPCFSIQWISC-III or WISC-IV20−0.55
DeBoer et al. [116]20072, 4patients10.75NAPCFSIQWISC-III or WISC-IV210.25
DeBoer et al. [116]20072, 4patients10.75NAPCverbalWISC-III or WISC-IV: VCI210.30
DeBoer et al. [116]20072, 4healthy10.50NAPCverbalWISC-III or WISC-IV: VCI20−0.20
DeBoer et al. [116]20072, 4patients10.75NAPCperformanceWISC-III or WISC-IV: POI210.38
DeBoer et al. [116]20072, 4healthy10.50NAPCperformanceWISC-III or WISC-IV: POI20−0.22
Doernte [117]20074healthy58.500.00greyFSIQHAWIE-R: sim, info, bd, pc18−0.23
Doernte [117]20074healthy58.50100.00greyFSIQHAWIE-R: sim, info, bd, pc170.18
Doernte [117]20074patients59.100.00greyFSIQHAWIE-R: sim, info, bd, pc12−0.02
Doernte [117]20074patients59.10100.00greyFSIQHAWIE-R: sim, info, bd, pc23−0.01
Fine et al. [118]20072, 4healthy40.1045.00PCFSIQWASI44−0.11
Fine et al. [118]20072, 4healthy10.4763.00PCFSIQWASI240.23
Luders et al. [119]20072, 3, 4healthy28.4845.00reportedFSIQWAIS-R620.28
Nakamura et al. [120]20072, 3, 4healthy40.8090.00PCFSIQWAIS-III440.38
Nakamura et al. [120]20072, 4patients40.6090.00PCFSIQWAIS-III430.32
Nakamura et al. [120]20072, 4patients40.6090.00PCverbalWAIS-III verbal440.26
Nakamura et al. [120]20072, 4healthy40.8090.00PCverbalWAIS-III verbal440.40
Nakamura et al. [120]20072, 4patients40.6090.00PCperformanceWAIS-III performance440.34
Nakamura et al. [120]20072, 4healthy40.8090.00PCperformanceWAIS-III performance430.29
Narr et al. [121]20074healthy28.2446.20reportedFSIQWAIS630.36
Schottenbauer et al. [122]20072, 3, 4healthy34.320.00PCFSIQWAIS-R220.60
Schottenbauer et al. [122]20072, 3, 4healthy37.77100.00PCFSIQWAIS-R350.33
Schottenbauer et al. [122]20072, 4patients40.960.00PCFSIQWAIS-R690.34
Schottenbauer et al. [122]20072, 4patients39.64100.00PCFSIQWAIS-R2050.28
Schottenbauer et al. [122]20072, 4patients40.900.00PCverbalWAIS-R: voc680.43
Schottenbauer et al. [122]20072, 4healthy34.320.00PCverbalWAIS-R: voc220.54
Schottenbauer et al. [122]20072, 4patients39.66100.00PCverbalWAIS-R: voc2020.28
Schottenbauer et al. [122]20072, 4healthy37.77100.00PCverbalWAIS-R: voc350.38
Schottenbauer et al. [122]20072, 4patients40.900.00PCperformanceWAIS-R: bd680.29
Schottenbauer et al. [122]20072, 4healthy34.320.00PCperformanceWAIS-R: bd220.30
Schottenbauer et al. [122]20072, 4patients39.65100.00PCperformanceWAIS-R: bd2030.17
Schottenbauer et al. [122]20072, 4healthy37.77100.00PCperformanceWAIS-R: bd350.17
Schumann et al. [123]20072, 4healthy13.10100.00reportedFSIQWASI220.41
Schumann et al. [123]20072, 4healthy13.10100.00reportedverbalWASI verbal220.38
Schumann et al. [123]20072, 4healthy13.10100.00reportedperformanceWASI performance220.25
Amat et al. [124]20082, 3, 4healthy31.5056.00PCFSIQWAIS-R27−0.11
Amat et al. [124]20082, 4healthy31.5056.00PCverbalWAIS-R verbal27−0.29
Amat et al. [124]20082, 4healthy31.5056.00PCperformanceWAIS-R performance270.18
Choi et al. [125]20084healthy21.6054.30reportedFSIQWAIS-R1640.35
Ebner et al. [126]20082, 4patients34.5268.00PCverbalMWT-B440.15
Ebner et al. [126]20082, 4healthy32.4551.00PCverbalMWT-B37−0.13
Raz et al. [127]20082, 4healthy51.1143.00PCfluidCFIT (form 2)550.18
Raz et al. [127]20082, 4patients59.7525.00PCfluidCFIT (form 2)32−0.02
Raz et al. [127]20082, 4patients59.7525.00PCverbalVocabulary Test (V2 & V3)310.15
Raz et al. [127]20082, 4healthy51.1143.00PCverbalVocabulary Test (V2 & V3)550.13
Castro-Fornieles et al. [128]20092, 4patients14.508.00PCverbalWISC-R: voc120.11
Castro-Fornieles et al. [128]20092, 4healthy14.6011.00PCverbalWISC-R: voc90.43
Castro-Fornieles et al. [128]20092, 4patients14.508.00PCperformanceWISC-R: bd120.38
Castro-Fornieles et al. [128]20092, 4healthy14.6011.00PCperformanceWISC-R: bd90.55
Miller et al. [129]20092, 4healthy9.2533.00reportedFSIQWJIII (GIA)120.23
Miller et al. [129]20092, 4healthy12.08NAreportedfluidWJIII: thinking ability11−0.11
Miller et al. [129]20092, 4patients16.5363.00reportedFSIQWJIII (GIA)16−0.30
Miller et al. [129]20092, 4healthy12.08NAreportedverbalWJIII: Cog verbal ability11−0.65
Miller et al. [129]20092, 4healthy9.25NAreportedverbalWJIII: Cog verbal ability50.84
Miller et al. [129]20092, 4patients16.53NAreportedverbalWJIII: Cog verbal ability60.76
Qiu et al. [130]20092, 4healthy10.5053.00PCFSIQWISC-III or WISC-IV660.26
Qiu et al. [130]20092, 4patients10.4057.00PCFSIQWISC-III or WISC-IV470.26
Qiu et al. [130]20092, 4patients10.4057.00PCverbalWISC-III or WISC-IV: VCI470.21
Qiu et al. [130]20092, 4healthy10.5053.00PCverbalWISC-III or WISC-IV: VCI660.35
Qiu et al. [130]20092, 4patients10.4057.00PCperformanceWISC-III or WISC-IV: POI470.20
Qiu et al. [130]20092, 4healthy10.5053.00PCperformanceWISC-III or WISC-IV: PRI660.12
Shenkin et al. [131]20092, 3, 4healthy78.4029.00reportedFSIQMHT, RSPM, Verbal fluency, lm990.21
Shenkin et al. [131]20092, 4healthy78.4029.00reportedverbalCOWA (verbal fluency)1070.13
Van Leeuwen et al. [132]20092, 4healthy9.1050.00reportedfluidRSPM2090.20
Van Leeuwen et al. [132]20092, 4healthy9.1050.00reportedverbalWISC-III: comp2090.33
Van Leeuwen et al. [132]20092, 4healthy9.1050.00reportedperformanceWISC-III: POI2090.28
Van Leeuwen et al. [132]20094healthy9.1050.00reportedperformanceWISC-III: PSI2090.12
Weniger et al. [133]20092, 4patients32.000.00PCFSIQHAWIE-R100.02
Weniger et al. [133]20092, 3, 4healthy33.000.00PCFSIQHAWIE-R250.15
Weniger et al. [133]20092, 4patients32.000.00PCFSIQHAWIE-R130.27
Weniger et al. [133]20092, 4patients32.000.00PCverbalHAWIE-R verbal130.35
Weniger et al. [133]20092, 4healthy33.000.00PCverbalHAWIE-R verbal250.00
Weniger et al. [133]20092, 4patients32.000.00PCverbalHAWIE-R verbal10−0.17
Weniger et al. [133]20092, 4patients32.000.00PCperformanceHAWIE-R performance100.23
Weniger et al. [133]20092, 4healthy33.000.00PCperformanceHAWIE-R performance250.24
Weniger et al. [133]20092, 4patients32.000.00PCperformanceHAWIE-R performance130.16
Zeegers et al. [134]20092, 4patients3.7291.00reportedFSIQunknown FS210.06
Zeegers et al. [134]20092, 4patients3.4492.00reportedFSIQunknown FS100.73
Betjemann et al. [135]20102, 4healthy11.4052.00reportedverbalWISC-R verbal1420.14
Betjemann et al. [135]20102, 4healthy11.4052.00reportedperformanceWISC-R performance1420.42
Hermann [136]20102, 3, 4healthy33.3442.00PCFSIQWechsler FS670.31
Hermann [136]20102, 4patients36.0935.00PCFSIQWechsler FS770.21
Hermann [136]20102, 4patients36.0935.00PCverbalWechsler verbal770.28
Hermann [136]20102, 4healthy33.3442.00PCverbalWechsler verbal670.23
Hermann [136]20102, 4patients36.0935.00PCperformanceWechsler performance770.09
Hermann [136]20102, 4healthy33.3442.00PCperformanceWechsler performance670.33
Hogan et al. [137]20102, 4healthy68.6953.00PCfluidRSPM2340.11
Hogan et al. [137]20102, 4healthy68.6953.00PCverbalNART2350.00
Isaacs et al. [138]20102, 4healthy15.750.00PCFSIQWISC-III or WAIS-III240.00
Isaacs et al. [138]20102, 4healthy15.75100.00reportedFSIQWISC-III or WAIS-III260.36
Isaacs et al. [138]20102, 4healthy15.750.00PCverbalWISC-III or WAIS-III verbal240.00
Isaacs et al. [138]20102, 4healthy15.75100.00reportedverbalWISC-III or WAIS-III verbal260.48
Isaacs et al. [138]20102, 4healthy15.750.00PCperformanceWISC-III or WAIS-III performance240.00
Isaacs et al. [138]20102, 4healthy15.75100.00reportedperformanceWISC-III or WAIS-III performance260.19
Lange et al. [139]20102, 4healthy10.880.00reportedFSIQWASI1660.22
Lange et al. [139]20102, 4healthy10.95100.00reportedFSIQWASI1430.23
Wallace et al. [140]20102, 4healthy11.8048.00reportedFSIQWASI6490.14
Wallace et al. [140]20102, 4healthy11.8048.00reportedverbalWASI verbal6490.13
Wallace et al. [140]20102, 4healthy11.8048.00reportedperformanceWASI performance6490.14
Ashtari et al. [141]20112, 3, 4healthy18.50100.00reportedFSIQWRAT-III140.57
Ashtari et al. [141]20112, 4patients19.30100.00reportedFSIQWRAT-III140.29
Chen et al. [142]20114healthy22.5644.00reportedFSIQWASI270.02
Chen et al. [142]20114patients23.0746.70reportedFSIQWASI300.68
Chen et al. [142]20114patients23.0227.00reportedFSIQWASI370.41
Kievit et al. [143]20112, 3, 4healthy21.1036.00PCFSIQWAIS-III800.29
Kievit et al. [143]20112, 4healthy21.1036.00PCverbalWAIS-III verbal800.23
Tate et al. [144]20112, 4patients81.7043.00PCFSIQShipley1940.00
Aydin et al. [145]20122, 4healthy15.10100.00reportedFSIQWISC-R300.40
Aydin et al. [145]20122, 4healthy15.10100.00reportedverbalWISC-R verbal300.26
Aydin et al. [145]20122, 4healthy15.10100.00reportedperformanceWISC-R performance300.34
Burgaleta et al. [146]20122, 3, 4healthy19.8844.00reportedFSIQ9 tests from APM, DAT-AR-5, PMR-R1000.17
Bigler et al. [147]20134patients10.6658.00reportedperformanceWISC-IV: PSI470.00
Bigler et al. [147]20134patients10.6768.00reportedperformanceWISC-IV: PSI320.00
Bigler et al. [147]20134patients10.1458.00reportedperformanceWISC-IV: PSI270.00
Royle et al. [148]20132, 3, 4healthy72.47100.00reportedFSIQWAIS-III: ins, span, mr, bd, ss, ds2930.26
Royle et al. [148]20132, 3, 4healthy72.600.00reportedFSIQWAIS-III: ins, span, mr, bd, ss, ds3270.27
Royle et al. [148]20134healthy72.47100.00reportedverbalWAIS-III: lns2930.10
Royle et al. [148]20134healthy72.600.00reportedverbalWAIS-III: lns3270.22
Royle et al. [148]20134healthy72.47100.00reportedverbalWAIS-III: span b2930.11
Royle et al. [148]20134healthy72.600.00reportedverbalWAIS-III: span b3270.23
Royle et al. [148]20134healthy72.47100.00reportedperformanceWAIS-III: bd2930.25
Royle et al. [148]20134healthy72.600.00reportedperformanceWAIS-III: bd3270.25
Royle et al. [148]20134healthy72.47100.00reportedperformanceWAIS-III: mr2930.14
Royle et al. [148]20134healthy72.600.00reportedperformanceWAIS-III: mr3270.18
Royle et al. [148]20134healthy72.47100.00reportedperformanceWAIS-III: ds2930.22
Royle et al. [148]20134healthy72.600.00reportedperformanceWAIS-III: ds3270.33
Royle et al. [148]20134healthy72.47100.00reportedperformanceWAIS-III: ss2930.17
Royle et al. [148]20134healthy72.600.00reportedperformanceWAIS-III: ss3270.34
Zelko et al. [149]20134healthy14.9053.00reportedFSIQWAIS or WISC: voc, sim, pc, bd, arith, ds, ss360.25
Zelko et al. [149]20134patients14.6049.00reportedFSIQWAIS or WISC: voc, sim, pc, bd, arith, ds, ss1080.23
Zelko et al. [149]20134patients14.6049.00reportedverbalWAIS or WISC: voc, sim1080.23
Zelko et al. [149]20134healthy14.9053.00reportedverbalWAIS or WISC: voc, sim360.04
Zelko et al. [149]20134patients14.6049.00reportedverbalWAIS or WISC: arith, span1080.26
Zelko et al. [149]20134healthy14.9053.00reportedverbalWAIS or WISC: arith, span360.33
Zelko et al. [149]20134patients14.6049.00reportedperformanceWAIS or WISC: pc, bd1080.21
Zelko et al. [149]20134healthy14.9053.00reportedperformanceWAIS or WISC: pc, bd360.30
Zelko et al. [149]20134patients14.6049.00reportedperformanceWAIS or WISC: cod, sym1080.09
Zelko et al. [149]20134healthy14.9053.00reportedperformanceWAIS or WISC: cod, sym36−0.12
Bjuland et al. [150]20144healthy20.3042.00reportedFSIQWAIS-III600.36
Bjuland et al. [150]20144patients20.1041.00reportedFSIQWAIS-III430.56
Bjuland et al. [150]20144patients20.3041.00reportedverbalWAIS-III: VCI420.44
Bjuland et al. [150]20144patients20.3041.00reportedverbalWAIS-III: WMI420.54
Bjuland et al. [150]20144patients20.3041.00reportedperformanceWAIS-II: POI430.48
Bjuland et al. [150]20144patients20.3041.00reportedperformanceWAIS-II: PSI430.48
Grunewaldt et al. [151]20144patients10.1734.80reportedFSIQWISC-III210.00
Grunewaldt et al. [151]20144patients10.1734.80reportedverbalWISC-III: WMI210.00
Jenkins et al. [152]20144healthy11.7041.70reportedFSIQWASI, WISC-III or WPPSI-R: voc, mr1020.19
MacDonald et al. [153]20144healthy11.60100.00reportedFSIQWASI1420.23
MacDonald et al. [153]20144healthy11.300.00reportedFSIQWASI1610.22
MacDonald et al. [153]20144healthy11.60100.00reportedverbalWASI verbal1420.13
MacDonald et al. [153]20144healthy11.300.00reportedverbalWASI verbal1610.18
MacDonald et al. [153]20144healthy11.60100.00reportedperformanceWASI performance1420.29
MacDonald et al. [153]20144healthy11.300.00reportedperformanceWASI performance1610.19
McCoy et al. [154]20144patients13.00100.00reportedFSIQWISC-IV (GAI)100.59
McCoy et al. [154]20144patients13.000.00reportedFSIQWISC-IV (GAI)160.62
Zhu et al. [155]20144healthy20.4141.00reportedFSIQWAIS-R (Chinese)3160.10
Boberg et al. [156]20154healthy8.0054.00greyFSIQWISC-IV (Swedish)100.69
Boberg et al. [156]20154healthy8.2035.80greyFSIQWISC-IV (Swedish)90.00
Boberg et al. [156]20154healthy8.3050.00greyFSIQWISC-IV210.00
Grazioplene et al. [157]20154healthy26.2051.00reportedFSIQWAIS-IV: voc, sim, mr, bd2850.28
Grazioplene et al. [157]20154healthy23.50100.00reportedFSIQWAIS-IV: voc, sim, mr, bd1070.08
Grazioplene et al. [157]20154healthy21.7054.00reportedFSIQWASI1250.04
Grazioplene et al. [157]20154healthy26.2051.00reportedverbalWAIS-IV: voc, sim2850.18
Grazioplene et al. [157]20154healthy21.7054.00reportedverbalWASI verbal1250.04
Grazioplene et al. [157]20154healthy23.50100.00reportedverbalWAIS-III: voc, sim1070.10
Grazioplene et al. [157]20154healthy26.2051.00reportedperformanceWAIS-IV: mr, bd2850.30
Grazioplene et al. [157]20154healthy21.7054.00reportedperformanceWASI performance1250.04
Grazioplene et al. [157]20154healthy23.50100.00reportedperformanceWAIS-III: mr, bd1070.04
Lefebvre et al. [158]20154healthy17.0083.00reportedFSIQWechsler3660.23
Lefebvre et al. [158]20154patients16.6088.00reportedFSIQWechsler3280.04
Lefebvre et al. [158]20154patients16.6088.00reportedverbalunknown verbal2410.08
Lefebvre et al. [158]20154healthy17.0083.00reportedverbalunknown verbal2970.22
Lefebvre et al. [158]20154patients16.6088.00reportedperformanceunknown performance2410.17
Lefebvre et al. [158]20154healthy17.0083.00reportedperformanceunknown performance2970.18
Paul et al. [159]20154healthy24.570.00reportedverbalReading Span, Rotation Span, Symmetrie Span900.25
Paul et al. [159]20154healthy24.07100.00reportedverbalReading Span, Rotation Span, Symmetrie Span1210.18
Walters et al. [160]20154patients17.32100.00reportedFSIQWAIS or WISC: voc, mr1780.19
Ballester-Plane et al. [161]20164patients25.1067.00reportedFSIQRCPM300.73
Ballester-Plane et al. [161]20164patients25.1067.00reportedverbalPPVT-III300.71
Ballester-Plane et al. [161]20164patients25.1067.00reportedperformanceWASI performance300.72
Bohlken et al. [162]20164healthy32.6942.00reportedFSIQWAIS-III: ds, bd, arith., span, inf1640.26
Bohlken et al. [162]20164healthy32.7042.00reportedverbalWAIS-III: inf1640.18
Bohlken et al. [162]20164healthy32.7042.00reportedverbalWAIS-III: arith1640.26
Bohlken et al. [162]20164healthy32.7042.00reportedverbalWAIS-III: span1640.00
Bohlken et al. [162]20164healthy32.7042.00reportedperformanceWAIS-III: bd1640.31
Bohlken et al. [162]20164healthy32.7042.00reportedperformanceWAIS-III: ds1640.12
Ferreira et al. [163]20164healthy45.1049.00reportedverbalWAIS-III: voc730.36
Ferreira et al. [163]20164healthy45.1049.00reportedverbalWAIS-III: inf730.50
Ferreira et al. [163]20164healthy45.1049.00reportedperformanceWAIS-III: bd730.33
Gregory et al. [164]20164healthy14.7057.00reportedFSIQConditional Exclusion, Emotional Differentiation, Emotional Identification, Face Memory, Letter N-Back, Line Orientation, Matrix Reasoning, Verbal Reasoning, Visual Object Learning, Word Memory; WRAT: Reading6620.24
Monson et al. [165]20164patients7.5050.00reportedFSIQWASI1340.26
Gregory et al. [164]20164patients7.5050.00reportedverbalWASI verbal1340.11
Gregory et al. [164]20164patients7.5050.00reportedperformanceWASI performance1340.31
Nikolaidis et al. [166]20164healthy21.1534.00reportedfluidRAPM, Shipley Abstraction, Letter Sets, Spatial Relations, Paper Folding, Form Boards710.44
Nikolaidis et al. [166]20164healthy21.1534.00reportedverbalVisual Short-Term Memory, Spatial Working Memory, Running Span710.13
Paul et al. [159]20164healthy24.570.00reportedfluidBOMAT, Number Series, Letter Sets900.14
Paul et al. [159]20164healthy24.07100.00reportedfluidBOMAT, Number Series, Letter Sets1210.13
Treit et al. [167]20164patients12.5053.00reportedFSIQWRIT or WISC500.21
Treit et al. [167]20164healthy11.9048.00reportedFSIQWRIT or WISC660.09
Amaral et al. [168]20174healthy3.00100.00reportedFSIQMSEL490.35
Amaral et al. [168]20174patients3.08100.00reportedFSIQMSEL19−0.18
Amaral et al. [168]20174patients3.13100.00reportedFSIQMSEL1100.01
Arhan et al. [169]20174healthy9.2046.00reportedFSIQWISC-R (Turkish)460.51
Arhan et al. [169]20174healthy9.2046.00reportedverbalWISC-R: voc460.71
Arhan et al. [169]20174healthy9.2046.00reportedverbalWISC-R: sim460.54
Arhan et al. [169]20174healthy9.2046.00reportedverbalWISC-R: inf460.38
Arhan et al. [169]20174healthy9.2046.00reportedverbalWISC-R: comp460.45
Arhan et al. [169]20174healthy9.2046.00reportedverbalWISC-R: arith460.24
Arhan et al. [169]20174healthy9.2046.00reportedverbalWISC-R: span460.31
Arhan et al. [169]20174healthy9.2046.00reportedperformanceWISC-R: pc460.77
Arhan et al. [169]20174healthy9.2046.00reportedperformanceWISC-R: pic460.24
Arhan et al. [169]20174healthy9.2046.00reportedperformanceWISC-R: bd460.04
Arhan et al. [169]20174healthy9.2046.00reportedperformanceWISC-R: obj460.04
Arhan et al. [169]20174healthy9.2046.00reportedperformanceWISC-R: ds460.27
Martinez et al. [170]20174healthy19.600.00reportedverbalDAT-VR, PMA-V, Reading Span, Letter Memory, Keep Track, Flanker (verbal + numerical), Simple Recognition verbal task400.28
Martinez et al. [170]20174healthy20.20100.00reportedverbalDAT-VR, PMA-V, Reading Span, Letter Memory, Keep Track, Flanker (verbal + numerical), Simple Recognition verbal task40−0.04
Martinez et al. [170]20174healthy19.600.00reportedspatialDAT-SR, PMA-S, Rotation of Solid Figures, Dot Matrix, 2-Backl task, spatial Simon task, spatial Simple Recognition400.39
Martinez et al. [170]20174healthy20.20100.00reportedspatialDAT-SR, PMA-S, Rotation of Solid Figures, Dot Matrix, 2-Backl task, spatial Simon task, spatial Simple Recognition400.24
Ritchie et al. [171]20174healthy92.1045.00reportedfluidWAIS-III: ds, WMS-R: Logical Memory Story A, Phonemic Verbal Fluency [172]340.23
Ritchie et al. [171]20174healthy92.1045.00reportedperformanceWAIS-III: ds340.19
van der Linden et al. [173]20174healthy28.820.00reportedFSIQPenn Progressive Matrices, Peabody Vocabulary, Oral Reading Recognition, List Sorting, Picture Sequence Memory, Penn Line Orientation, Dimensional Change Card Sorting, Word Memory, Salthouse Pattern Comparison, Flanker Task5030.26
van der Linden et al. [173]20174healthy28.82100.00reportedFSIQPenn Progressive Matrices, Peabody Vocabulary, Oral Reading Recognition, List Sorting, Picture Sequence Memory, Penn Line Orientation, Dimensional Change Card Sorting, Word Memory, Salthouse Pattern Comparison, Flanker Task3930.25
Vreeker et al. [174]20174healthy44.6049.00reportedFSIQWAIS-III (Dutch): info, srith., bf, ds1600.28
Annink et al. [175]20184patients9.7948.00reportedFSIQWISC-III (Dutch)520.43
Jensen et al. [176]20184healthy24.9159.00PCFSIQWAIS-III (Danish): voc, sim, bd, mr560.30
Jensen et al. [176]20184patients24.6957.40PCFSIQWAIS-III (Danish): voc, sim, bd, mr540.14
Jensen et al. [176]20184healthy24.9159.00PCverbalWAIS-III (Danish): voc, sim560.30
Jensen et al. [176]20184patients24.6957.40PCverbalWAIS-III (Danish): voc, sim540.09
Jensen et al. [176]20184healthy24.9159.00PCperformanceWAIS-III (Danish): bd, mr560.19
Jensen et al. [176]20184patients24.6957.40PCperformanceWAIS-III (Danish): bd, mr540.20
Lammers et al. [177]20184patients72.0061.00reportedFSIQPaired Associate Learning, Verbal Recognition Memory, Spatial Span Length, Simple Reaction Time, TrailAB, Grooved Pegboard task2430.18
Mankovsky et al. [178]20184patients62.3033.00reportedverbalRAVLT: immediate recall, delayed recall, delayed recognition + WAIS-III: span930.02
Mankovsky et al. [178]20184patients62.3033.00reportedperformanceTrails (A), SCWT (I + II), WAIS-II: ds930.08
Nygaard et al. [179]20184patients18.9660.00reportedFSIQWASI: voc, mr820.30
Sreedharan et al. [180]20184patients10.8066.00reportedFSIQWISC (Malayalam translation)300.00
Takeuchi et al. [181]20184healthy20.8058.00reportedFSIQTanaka B Intelligence Scale13190.07
Tozer et al. [182]20184patients70.0165.00reportedFSIQspan, lm, Visual Reproduction, BIRT Memory and Information Processing Battery, Speed of Information Processing, ds, Grooved Pegboard, trails, Verbal Fluency, WCST1180.23
Tozer et al. [182]20184patients70.0165.00reportedperformanceunknown1150.28
Ahn et al. [183]20194patients32.9742.00reportedFSIQK-WAIS-R380.00
Ahn et al. [183]20194patients32.9742.00reportedverbalWAIS-R verbal380.00
Ahn et al. [183]20194patients32.9742.00reportedperformanceWAIS-R performance380.00
Bathelt et al. [184]20194healthy9.9354.00reportedFSIQWASI-II: Reasoning; AWMA: Digit Recall, Backward Digit Recall, Dot Matrix, Mr X630.07
Bathelt et al. [184]20194patients9.3564.70reportedFSIQWASI-II: Reasoning; AWMA: Digit Recall, Backward Digit Recall, Dot Matrix, Mr X1390.02
Cox et al. [185]20194healthy63.13100.00reportedFSIQcomposite of 4 tests39000.21
Cox et al. [185]20194healthy63.130.00reportedFSIQcomposite of 4 tests41920.26
de Zwarte et al. [186]20194patients27.4960.00reportedFSIQWAIS-III (Dutch): inf, arith, bd, ds5160.29
de Zwarte et al. [186]20194patients52.8532.00reportedFSIQGIT (short)850.06
Elliott et al. [187]20194healthy45.0048.00reportedFSIQWAIS-IV5960.35
Elliott et al. [187]20194healthy22.2347.00reportedFSIQShipley11630.12
Elliott et al. [187]20194healthy20.2647.00reportedFSIQWASI: voc, mr5150.16
Hiraiwa et al. [188]20194patients9.4352.00reportedFSIQWISC-IV270.34
van Haren et al. [189]20194healthy12.7453.00reportedFSIQWISC-III or WAIS-III: voc, inf, bd, pic400.34
van Haren et al. [189]20194patients13.7730.00reportedFSIQWISC-III or WAIS-III: voc, inf, bd, pic400.53
van Haren et al. [189]20194patients14.5256.00reportedFSIQWISC-III or WAIS-III: voc, inf, bd, pic630.39
Cadenas-Sanchez et al. [190]20204patients10.0060.00reportedFSIQK-BIT100−0.03
Cadenas-Sanchez et al. [190]20204patients10.0060.00reportedverbalDelayed non-Match to Sample Task1000.06
Corley et al. [191]20204healthy79.4053.10reportedFSIQWAIS-III: mr, bd, ss, ds; WMS-III: spatial span f + b, vpa, lm; Four-Choice Reaction Time, Visual Inspection Time, NART, WTAR, PVF3580.19
Corley et al. [191]20204healthy79.4053.10reportedverbalNART, WTAR, PVF3580.08
Corley et al. [191]20204healthy79.4053.10reportedverbalWMS-III: vpa, lm; WAIS-III: ds (b)3580.16
Corley et al. [191]20204healthy79.4053.10reportedperformanceWAIS-III: mr, bd; WMS-III: spatial span f + b3580.12
Corley et al. [191]20204healthy79.4053.10reportedperformanceWAIS-III: ss, ds; Four Choice Reaction Time, Visual Inspection Time3580.29
de Zwarte et al. [186]20204patients32.1446.00reportedFSIQmostly abbreviated Wechsler Scales9680.20
de Zwarte et al. [186]20204patients27.4841.00reportedFSIQmostly abbreviated Wechsler Scales5070.20
Elias [192]20204patients69.02100.00greyverbalWTAR490.35
Laliberté Durish [193]20204patients12.3062.40greyFSIQWASI-II3040.19
Laliberté Durish [193]20204patients12.5051.90greyFSIQWASI-II161−0.07
Laliberté Durish [193]20204healthy39.6043.00reportedFSIQVerbal Learning, ds, Conditional Exclusion, Spatial Working Memory, Facial Memory, TrailAB, Continuous Performance, Letter-Number Seq, Balloon Analog, Oral Word Association, Emotional Recognition12160.12
Mitchell et al. [194]20204healthy22.3038.00reportedFSIQMAB (5 subtests); WAIS-R: ds10970.25
Williams et al. [195]20204patients14.5487.74reportedFSIQWechsler3020.08
Williams et al. [195]20204healthy14.5480.40reportedFSIQWechsler3520.31
Yankowitz et al. [196]20204healthy13.1072.70reportedFSIQDAS (GCA) or WISC-IV or WASI (I or II)2160.38
Yankowitz et al. [196]20204patients13.0082.10reportedFSIQDAS (GCA) or WISC-IV or WASI (I or II)2400.05
Yankowitz et al. [196]20204healthy20.60100.00reportedFSIQWISC-III or WASI-I or WAIS (R or III)890.35
Yankowitz et al. [196]20204patients16.60100.00reportedFSIQWISC-III or WASI-I or WAIS (R or III)860.25
Hedderich et al. [197]20214patients26.7057.40reportedFSIQabb. WAIS-III (German)970.38
Naef et al. [198]20214patients26.7138.60reportedFSIQshort form of WAIS-IV440.14
PRISMA flowchart. Characteristics of included studies. Note. NA = info not available; Review: 1 = included in McDaniel [23], 2 = included in Pietschnig et al. [24], 3 = included in Gignac & Bates [25], 4 = included in present update; Reporting: reported = published in a journal article, grey = published as thesis/dissertation, PC = result obtained via personal communication; FSIQ = full-scale IQ; Type of test: IQ assessment used in study; subtest abbreviations: arith = arithmetic, bd = block design, com = comprehension, ds = digit symbol, inf = information, lm = logical memory, lns = letter-number sequencing, mr = matrix reasoning, obj = object assembly, pc = picture completion, pic = picture arrangement, sim = similarities, span = digit span (b stand for backwards), ss = symbol search, ss p + f = spatial span forwards and backwards, vpa = verbal pair associates; domain indices of the Wechsler scales are abbreviated as follows: POI = perceptual organization index, PRI = perceptual reasoning index, PSI = processing speed index, VCI = verbal comprehension index, WMI = working memory index; full information explaining all abbreviations are available in the codebook and data files in supplemental materials. Published study outcomes with r = exactly 0 represent correlations set to zero, because no eligible numerical value was available.

Results

A summary effect of r = 0.23 for full-scale IQ was observed (k = 194; I2 = 60.70; 95% CI [0.21; 0.26]), when all available independent effect sizes were synthesized by means of the Hedges & Olkin approach. Summary effects were somewhat smaller when associations were limited to verbal (r = 0.20; k = 115; I2 = 43.84; 95% CI [0.16; 0.23]) or performance IQ domains (r = 0.20; k = 82; I2 = 27.66; 95% CI [0.17; 0.24]). The Hunter & Schmidt-typed synthesis of artefact-corrected coefficients was broadly consistent with the results of the Hedges & Olkin approach, but unsurprisingly yielded somewhat larger effects for full-scale (r = 0.26; k = 116; I2 = 61.28; 95% CI [0.21; 0.031]), verbal (r = 0.22; k = 50; I2 = 50.28; 95% CI [0.15; 0.29]) and performance IQ (r = 0.25; k = 45; I2 = 30.42; 95% CI [0.20; 0.31]). Of note, these analyses were based on fewer observations compared to the other approaches, because necessary information for corrections (e.g. within-sample standard deviations of IQ scores) had not been reported. Results from the robust variance estimation-based approach were virtually identical with the Hedges & Olkin analyses showing small-to-moderate associations for full-scale (r = 0.23; k = 203; I2 = 54.85; 95% CI [0.20; 0.25]), verbal (r = 0.21; k = 141; I2 = 48.34; 95% CI [0.18; 0.24]) and performance IQ (r = 0.21; k = 110; I2 = 32.41; 95% CI [0.18; 0.24]). This pattern of results remained virtually identical when analyses were limited to healthy (neurotypical) samples, although effect sizes were somewhat larger in this subgroup ranging from r = 0.19 to 0.24 for the Hedges & Olkin, 0.23 to 0.29 for the Hunter & Schmidt, and 0.20 to 0.24 for the RVE method (left side of table 2; figure 2 for a forest plot of the Hedges & Olkin analysis). Patient samples also showed non-trivial small-to-moderate effects, although they were slightly weaker than those of healthy samples (excepting Hedges & Olkin, as well as RVE analyses for verbal IQ; right part of table 2; figure 3 for a forest plot of the Hedges & Olkin-typed analysis).
Table 2

Summary effects of healthy and patient samples according to three different analysis approaches. Note. In the RVE approach, the number of synthesized effect sizes is followed by the number of independent samples in parentheses; I2 = percentage of variability due to variability of true effects; LCI = lower bound of 95% confidence interval; UCI = upper bound of 95% confidence interval.

healthy samples
patient samples
knI2rLCIUCIknI2rLCIUCI
full-scale IQ
 Hedges & Olkin approach12323 40355.750.240.220.2771536160.430.210.160.26
 psychometric meta-analysis69905732.760.290.240.3347377372.110.200.110.29
 robust variance estimation (RVE)128 (121)24 54353.040.240.220.2775 (69)718556.340.200.150.27
verbal IQ
 Hedges & Olkin approach70544047.470.190.140.2345223735.950.220.160.28
 psychometric meta-analysis31234944.660.230.150.311994852.330.190.060.32
 robust variance estimation (RVE)93 (75)826252.650.200.160.2548 (46)255039.320.220.160.28
performance IQ
 Hedges & Olkin approach49416231.960.210.170.2533185823.660.190.120.25
 psychometric meta-analysis28219215.230.280.220.331787935.300.200.080.32
 robust variance estimation (RVE)74 (59)809531.690.220.190.2636 (32)207833.900.190.130.25
Figure 2

Forest plot of healthy samples (Hedges & Olkin model).

Figure 3

Forest plot of patient samples (Hedges & Olkin model).

Forest plot of healthy samples (Hedges & Olkin model). Forest plot of patient samples (Hedges & Olkin model). Summary effects of healthy and patient samples according to three different analysis approaches. Note. In the RVE approach, the number of synthesized effect sizes is followed by the number of independent samples in parentheses; I2 = percentage of variability due to variability of true effects; LCI = lower bound of 95% confidence interval; UCI = upper bound of 95% confidence interval. Results of leave-one-out analyses did not show any meaningful influences of single leverage points on effect-size estimations for all three IQ domains in overall analyses or subsets. Outlier analyses revealed a maximum of three potential leverage points in the datasets (see, electronic supplementary material, S1). Because recalculations of effect estimates when omitting these data points did not lead to meaningful changes in summary effect calculations (the largest observed influence of any of these outliers led to changes in the third decimal place of the summary effect in either domain), all subsequent analyses were performed without excluding these data points.

Subgroup analyses

No significant group differences were observed between healthy and patient-based samples in full-scale (k = 194; Q(1) = 1.06; p = 0.304), verbal (k = 115; Q(1) = 0.75; p = 0.385) or performance IQ effects (k = 82; Q(1) = 0.41; p = 0.520). However, we assessed healthy (neurotypical) and patient-based samples separately in the subsequent subgroup analyses to allow meaningful comparisons with the results of previous accounts (i.e. [23-25]). Positive meaningful associations between IQ and brain volume were observable within all investigated subsets (i.e. regardless of sample type: healthy versus patient samples or IQ domain: full-scale versus verbal versus performance IQ; excepting a trivial positive full-scale IQ and brain volume associations for grey literature findings in patient; note that another three out of a total of 65 summary effects failed to reach nominal significance), thus generalizing across publication status, sample age, volumetric measurement type (i.e. total brain versus intracranial volume), sex, and g-ness (see, table 3). Somewhat numerically larger effects were consistently observed in healthy compared to patient samples for full-scale and performance, but not for verbal IQ.
Table 3

Subgroup analyses for full-scale, verbal, and performance IQ. Note. Grey literature results were excluded from subgroup calculations for verbal and performance IQ because of low cell frequencies; TBV = Total brain volume; ICV = intracranial volume.

healthy samples
patient samples
knI2rLCIUCIknI2rLCIUCI
full-scale IQ
 total12323 40355.750.240.220.2771536160.430.210.160.26
 publication statusk = 123; Q(2) = 2.08; p = 0.353k = 71; Q(2) = 6.72; p = 0.035
  reported7421 82666.780.250.220.2847384267.970.240.180.31
  grey literature57524.260.10−0.340.50450058.890.05−0.170.27
  personal communication44150222.540.210.140.2820101930.040.170.090.25
 agek = 123; Q(1) < 0.01; p = 0.969k = 71; Q(1) = 2.50; p = 0.114
  children/adolescents56432633.300.240.190.2936256954.460.170.100.24
  adults6719 07765.720.240.210.2835279261.470.250.180.32
 volumetric measurement typek = 94; Q(1) = 0.91; p = 0.339k = 62; Q(1) = 3.34; p = 0.068
  TBV7817 89148.860.240.210.2745391358.160.190.130.25
  ICV16444880.080.280.190.3617107768.400.310.180.43
sexk = 60; Q(1) = 0.10; p = 0.751k = 18; Q(1) = 4.44; p = 0.035
  men36613712.910.260.220.291273522.150.160.060.26
  women2459940.020.260.230.296160<0.010.330.150.49
g-nessk = 106; Q(2) = 23.69; p < 0.001k = 63; Q(2) = 0.38; p = 0.829
  fair g-ness756359.000.330.160.4726291.940.42−1.001.00
  good g-ness4818 30956.700.190.160.2230304562.320.200.130.27
  excellent g-ness5133644.070.310.270.3431140445.860.220.140.30
verbal IQ
 total70544047.470.190.140.2345223735.950.220.150.28
 publication statusk = 70; Q(1) = 0.85; p = 0.358k = 44; Q(1) = 2.39; p = 0.302
  reported42433654.480.200.140.2521131662.360.260.150.37
  grey literature1490.350.080.58
  personal communication28110433.790.150.07.242387214.680.190.110.26
 agek = 70; Q(1) = 0.49; p = 0.484k = 45; Q(1) = 7.11; p = 0.008
  children/adolescents26213128.060.210.140.28136930.010.110.020.20
  adults44330951.640.180.120.2332154433.840.250.180.32
 volumetric measurement typek = 58; Q(1) = 3.47; p = 0.062k = 36; Q(1) = 4.19; p = 0.004
  TBV45429640.380.170.120.2123121327.720.170.080.25
  ICV1376564.640.300.150.431375654.600.310.190.43
 sexk = 35; Q(1) = 0.08; p = 0.772k = 15; Q(1) = 2.36; p = 0.125
  men21102128.080.230.150.315126<0.010.380.120.60
  women14632<0.010.250.170.3210443<0.010.230.130.32
Performance IQ
 total49416231.960.210.170.2533185823.660.190.120.25
 publication statusk = 49; Q(1) = 2.65; p = 0.104k = 33; Q(1) = 0.81; p = 0.370
  reported28354950.320.230.180.2816113554.520.210.100.32
  grey literature
  personal communication21613<0.010.170.100.23171206<0.010.160.080.23
 agek = 49; Q(1) = 0.53; p = 0.469k = 33; Q(1) = 1.84; p = 0.175
  children/adolescents23207532.430.230.160.291376724.610.140.030.24
  adults26208729.680.200.150.2520109136.900.220.130.31
 volumetric measurement typek = 38; Q(1) = 2.33; p = 0.127k = 25; Q(1) = 0.708; p = 0.400
  TBV32370043.730.210.160.26161012<0.010.160.090.23
  ICV6180<0.010.290.170.40960979.250.26<0.010.48
 sexk = 25; Q(1) = 0.05; p = 0.820k = 9; Q(1) = 5.01; p = 0.025
  men1671717.440.240.160.3163306.210.11−0.070.28
  women9429<0.010.250.170.32391<0.010.270.130.40
Subgroup analyses for full-scale, verbal, and performance IQ. Note. Grey literature results were excluded from subgroup calculations for verbal and performance IQ because of low cell frequencies; TBV = Total brain volume; ICV = intracranial volume. There was only one significant difference between subset summary effects in healthy participants, indicating stronger associations with tests that were deemed to possess excellent compared to those that possess good g-ness. For patient samples, publication type in full-scale IQ, volumetric measurement in verbal IQ, and sex in performance IQ showed significantly larger effects for published, intracranial volume and female samples. However, these patient-based findings should be taken with a grain of salt because of low sample numbers (for details, see rightmost columns of table 3). Of note, between-studies variation remained moderate-to-large according to widely accepted classifications (0–25% suggest trivial, 25–50% small, 50–75% moderate and 75–100% large heterogeneity; see [199]), indicating potential further sources for explaining unobserved heterogeneity.

Meta-regressions

Single regressions

Primary study publication years negatively predicted effect sizes of brain size associations with full-scale IQ in the total and healthy samples (bs = −0.004 and −0.005; ps = 0.028 and 0.006, respectively), but not in patient samples (b < 0.001; p = 0.979). In the verbal and performance IQ domains, publication years again showed consistently negative influences on associations in total and healthy samples, although associations failed to reach nominal statistical significance (bs range: −0.006 to ≥0.001; p range: 0.055 to 0.883; effect declines for healthy samples by domain are illustrated in the electronic supplementary material, S1). Regression coefficients for patient samples were inconsistent in signs (ps > 0.475). In our RVE-based comparisons of intelligence domains for our total samples (i.e. full-scale versus verbal versus performance IQ), no significant differences were observable between associations (all ps > 0.05, when referencing to full-scale IQ). When removing the intercept from the model, regression coefficients yielded similar values for full-scale, performance and verbal IQ (rs = 0.22, 0.21 and 0.23, respectively; all ps < 0.001). These outcomes were similar when running regressions on either healthy or patient samples only. No significant domain differences in association strength were observed in either group (all ps > 0.05) and once more full-scale, verbal and performance IQ showed consistently significant coefficients for healthy and patient samples (rs = 0.24 and 0.20, 0.20 and 0.23, 0.24 and 0.18, respectively).

Multiple regressions

In our hierarchical multiple meta-regressions, g-ness, primary study publication years and primary study publication status emerged as the most meaningful predictors of associations with full-scale IQ in healthy samples in our first block, explaining 46.18% of variance. Positive associations with g-ness indicated stronger effects for more highly g-loaded tests, thus corroborating our results from the subgroup analyses. Negative associations with publication years pointed towards declining effect sizes over time and effects were bigger when they had been published than when they had been obtained from the grey literature or personal communications. Neither addition of male ratio and mean age in our second nor study goal and number of corrections within studies in our third block led to improved model fit (top left of table 4).
Table 4

Theory-guided multiple hierarchical meta-regressions by sample type and IQ domain. Note. SE b = standard error of regression coefficient; g-ness: 0 = fair/good, 1 = excellent; NoC = Number of controlled variables; Publication status: 0 = published, 1 = unpublished (i.e. obtained from grey literature or personal communications); Study goal: 0 = report of correlation was not focus of study, 1 = report of correlation was focus of study; if likelihood ratio tests of block 2 versus block 1 did not yield significant results, block 3 was compared with block 1; all VIFs < 1.88.

healthy samples
patient samples
BLCIUCISE bpBLCIUCISE bp
full-scale IQ
 block 1k = 104; R2 = 46.18k = 62; R2 = 15.72
 g-ness0.0640.0200.1090.0230.0050.027−0.1040.1580.0650.680
  publication year−0.008−0.012−0.0050.002<0.0010.001−0.0080.0100.0040.781
  publication status−0.096−0.172−0.0200.0380.014−0.124−0.2500.0030.0630.055
 block 2k = 104; R2 = 58.94; χ2(2) = 2.179; p = 0.336k = 62; R2 = 25.09; χ2(2)=4.013; p = 0.135
  g-ness0.0690.0240.1130.0220.0030.028−0.1030.1590.0650.666
  publication year−0.008−0.012−0.0050.002<0.0010.001−0.0070.0100.0040.753
  publication status−0.098−0.175−0.0210.0390.013−0.115−0.2400.0100.0620.070
  male ratio−0.016−0.0860.0530.0350.647−0.212−0.4460.0230.1170.076
  mean age0.001≥0.0010.0020.0010.121−0.001−0.0040.0020.0020.650
 block 3k = 104; R2 = 58.73; χ2(4)=4.791; p = 0.310k = 62; R2 = 36.67; χ2(4)=9.894; p = 0.042
  g-ness0.0660.0220.1100.0220.0040.058−0.0790.1960.0690.399
  publication year−0.007−0.011−0.0030.0020.0010.001−0.0080.0100.0040.797
  publication status−0.092−0.178−0.0060.0430.037−0.107−0.2320.0180.0620.091
 male ratio−0.014−0.0830.0560.0350.694−0.267−0.502−0.0320.1170.027
  mean age0.001−0.0010.0020.0010.299≥0.001−0.0040.0030.0020.908
  study goal−0.019−0.0740.0370.0280.509−0.117−0.2470.0130.0650.078
  NoC−0.014−0.0340.0060.0100.1620.020−0.0310.0710.0250.443
verbal IQ
 block 1k = 67; R2 = 10.30k = 41; R2 = <0.01
  publication year−0.006−0.012≥0.0010.0030.046−0.002−0.0100.0070.0040.639
  publication status−0.049−0.1440.0470.0480.313−0.038−0.1660.0900.0630.552
 block 2k = 67; R2 = 43.72; χ2(2) = 5.024; p = 0.081k = 41; R2 = 26.81; χ2(2) = 2.173; p = 0.337
  publication year−0.006−0.011≥0.0010.0030.039−0.002−0.0100.0070.0040.658
  publication status−0.036−0.1290.0570.0470.443−0.031−0.1620.1000.0650.629
  male ratio−0.008−0.1310.1160.0620.901−0.070−0.2930.1530.1100.527
  mean age−0.002−0.004≥0.0010.0010.0250.002−0.0020.0060.0020.291
 block 3k = 67; R2 = 74.19; χ²(4)=13.649; p = 0.009k = 41; R2 = 39.13; χ²(4)=4.083; p = 0. 395
  publication year−0.006−0.011−0.0010.0030.022−0.004−0.0140.0060.0050.466
  publication status−0.086−0.1850.0140.0500.0910.029−0.1330.1910.0800.718
  male ratio−0.027−0.1450.0910.0590.651−0.061−0.2870.1660.1110.590
  mean age−0.001−0.003<0.0010.0010.1200.002−0.0020.0060.0020.315
  study goal0.045−0.0420.1320.0440.308−0.066−0.2460.1140.0890.463
  noc−0.051−0.087−0.0150.0180.0060.032−0.0280.0920.0300.290
performance IQ
 block 1k = 47; R2 < 0.01k = 31; R2 = 41.92
  publication year−0.003−0.0080.0030.0030.3310.004−0.0060.0140.0050.449
  publication status−0.057−0.1530.0400.0480.245−0.043−0.1840.0980.0690.539
 block 2k = 47; R2 = 33.65; χ²(2)=3.800; p = 0.150k = 31; R2 = 61.61; χ²(2)=1.579; p = 0.454
  publication year−0.003−0.0080.0030.0030.3210.004−0.0060.0140.0050.434
  publication status−0.055−0.1510.0410.0480.255−0.035−0.1800.1100.0700.624
  male ratio−0.032−0.1530.0890.0600.602−0.074−0.3220.1730.1200.542
  mean age−0.002−0.004≥0.0010.0010.0310.001−0.0020.0050.0020.510
 block 3k = 47; R2 = 26.54; χ2(4) =7.162; p = 0.128k = 31; R2 = 99.86; χ2(4) = 14.259; p = 0.007
  publication year−0.001−0.0070.0050.0030.665<0.001−0.0090.0100.0050.937
  publication status−0.030−0.1410.0800.0550.5810.093−0.0580.2430.0730.216
  male ratio−0.038−0.1560.0810.0590.524−0.080−0.2890.1290.1010.438
  mean age−0.001−0.0030.0010.0010.480−0.001−0.0040.0030.0020.745
  study goal−0.076−0.1740.0230.0490.128−0.111−0.2650.0440.0750.152
  NoC−0.016−0.0490.0180.0170.3590.0860.0260.1450.0290.006
Theory-guided multiple hierarchical meta-regressions by sample type and IQ domain. Note. SE b = standard error of regression coefficient; g-ness: 0 = fair/good, 1 = excellent; NoC = Number of controlled variables; Publication status: 0 = published, 1 = unpublished (i.e. obtained from grey literature or personal communications); Study goal: 0 = report of correlation was not focus of study, 1 = report of correlation was focus of study; if likelihood ratio tests of block 2 versus block 1 did not yield significant results, block 3 was compared with block 1; all VIFs < 1.88. Effects on full-scale IQ in patient samples were on the whole weaker and showed a single nominally significant effect of male ratio in block three, indicating smaller associations in men than in women (top right of table 4). However, this finding should be interpreted with caution, because male ratio did not emerge as a significant predictor in any other of our analyses and may be driven by untypically small full-scale IQ correlations of exclusively male patient samples, as evident from our subgroup estimates (table 3). For verbal IQ, publication year once more negatively predicted effect sizes, but was only significant for healthy samples. In healthy samples, model fit significantly improved when number of corrections within primary studies was included, indicating larger associations when fewer corrections were used. In patient samples, no included predictor yielded significant associations in any block of the model (centre rows of table 4). For performance IQ, neither variables in the first nor in the subsequent blocks showed significant influences on effect sizes in healthy samples. A similar pattern emerged for patient samples, showing no significant influences in the initial two blocks. Block 3 showed significantly improved model fit compared to Block 1, which was driven by a positive association between the number of corrections within primary studies, indicating smaller associations when fewer corrections were used (bottom of table 4). First, a power-enhanced funnel plot [44] was used to visually inspect full-scale IQ effects of healthy samples. The funnel plot appeared to be fairly symmetrical, but figure 4 shows that most studies had lower power than desirable, yielding a median power estimate of 60.7%. The TES did not provide evidence for confounding dissemination bias (p = 0.152), but the replicability index was low (52.5%), thus indicating the presence of inflated results. Median power for verbal and performance IQ was suboptimal as well, yielding 31.2% and 45.9%. Similar to full-scale IQ, TES-results did not indicate dissemination bias in either verbal (p = 0.431) or performance IQ (p = 0.931), but replicability indices were once again low (13.9% and 38.3%, respectively).
Figure 4

Power-enhanced funnel plot of published healthy sample effect sizes for full-scale IQ.

Power-enhanced funnel plot of published healthy sample effect sizes for full-scale IQ. Second, Sterne & Egger's regressions showed significant evidence for funnel-plot asymmetry in full-scale IQ (p < 0.001), thus pointing toward confounding dissemination bias. Although there was no bias evidence for verbal IQ (p = 0.265), results for performance IQ were once again significant (p = 0.011). Third, trim-and-fill analyses for full-scale IQ indicated 20 missing studies on the left side of the summary effect, leading to an adjusted effect estimate of r = 0.21. Although this adjusted estimate should not be used as a correction of the observed summary effect, it serves as a sensitivity analysis that is presently indicative of summary effect inflation. Similar results were observed for verbal and performance IQ were 7 and 11 effect sizes were detected to be missing on the left-hand side, thus yielding adjusted effects of 0.16 and 0.17, respectively (see electronic supplementary material, S1). Fourth, p-curve did not indicate any evidence for confounding p-hacking (no evidence for right skew in binomial or continuous tests for full and half p-curves; all ps < 0.001), and primary studies were on average sufficiently powered to indicate evidential value of the observed non-null effect (all ps > 0.999 for binomial and continuous tests against 33% power). p-curve-based effect estimations yielded a similar effect to the above reported traditional estimate (r = 0.24). Results from p-uniform analyses did not indicate confounding bias either (p = 0.975), but yielded a somewhat larger summary effect (r = 0.26; 95% CI [0.22; 0.28]). The estimate based on p-uniform* showed a somewhat smaller summary effect than the other two p-value-based estimation methods (r = 0.22; 95% CI [0.18; 0.26]). Both verbal and performance IQ showed no evidence for p-hacking (all ps < 0.001), and the available primary studies appeared to provide evidential value (all ps > 0.748). Effect estimates were similar to traditional calculations yielding r = 0.21 and 0.24, respectively. Results from p-uniform did not indicate bias in either domain (ps > 0.768) and effect estimates were similar to p-curve-based values for verbal (r = 0.22; 95% CI [0.15; 0.29]) and performance IQ (r = 0.23; 95% CI [0.16; 0.31]). Once more, p-uniform*-based estimates were lower than those of p-uniform yielding r = 0.18 (95% CI [0.11; 0.25]) for verbal and r = 0.23 (95% CI [0.15; 0.29]) for performance IQ. Fifth, full-scale IQ-adjusted parameters (i.e. based on the four different weight functions from [200]) did not vary considerably (ranging from r = 0.23 to 0.25) and were broadly consistent with the observed unadjusted r = 0.25 in.this subset. Results for verbal and performance IQ conformed to this observation, showing rather small variations between different weight function estimates (r range for verbal IQ: 0.15–0.19, r range for performance IQ: 0.20–0.22) and corresponding broadly to the unadjusted rs = 0.20 and 0.23. By contrast, the selection model of Copas & Shi [54] provided some evidence for dissemination bias in full-scale IQ, indicating 44 missing studies on the left side of the observed summary effect and suggesting an adjusted summary effect of r = 0.20. No missing effect sizes were detected for verbal IQ, leading to a virtually identical summary effect of r = 0.20, but 11 effects were estimated to be missing on the left side of the performance IQ summary effect, thus leading to an adjustment to r = 0.18. Sixth, confidence intervals estimates for full-scale IQ of the Henmi & Copas [57] approach differed somewhat from the conventional DerSimonian-Laird estimation (95% CIs [0.17; 0.27] versus [0.22; 0.28]). For verbal and performance IQ, confidence intervals showed similar patterns (95% CIs [0.10; 0.23] versus [0.14; 0.26] and [0.14; 0.26] versus [0.18; 0.28], for conventional and Henmi-Copas estimates, respectively). Seventh, in a subgroup analysis published summary effects did not differ significantly from unpublished summary effects (i.e. effect sizes from grey literature or personal communications) for full-scale IQ (Q(1) = 1.720; p = 0.190). However, as expected, published effects were larger than unpublished ones (rs = 0.25 versus 0.21, respectively). Results for verbal and performance IQ domains were virtually identical, yielding no nominally significant differences, but consistently larger published summary effects (table 3 for parameters). Finally, visual inspections of cumulative forest plots of full-scale IQ associations show a clear pattern of systematically decreasing effect sizes over time, thus conforming to our findings of significant negative influences of study years on effect sizes (figure 5). When cumulating results according to sample sizes, a less unequivocal picture emerged, although larger samples appeared to show weaker effects than smaller samples, thus conforming to the above evidence that suggests the presence of some confounding bias in the present data (figure 6). Virtual identical pictures for both publication years and sample sizes result from cumulating verbal and performance IQ data (see electronic supplementary material, S1).
Figure 5

Cumulative forest plot of healthy sample effects according to publication year for full-scale IQ.

Figure 6

Cumulative forest plot of healthy sample effects according to sample size for full-scale IQ.

Cumulative forest plot of healthy sample effects according to publication year for full-scale IQ. Cumulative forest plot of healthy sample effects according to sample size for full-scale IQ.

Multiverse meta-analyses

Combinatorial meta-analyses

Results of combinatorial meta-analyses did not show evidence for substantial deviations of the majority of possible combinations from the estimated summary effects. For healthy samples, the interquartile range of the summary-effect distribution for full-scale IQ analyses merely amounted to a difference of 0.02 in r values (Q1 = 0.24; Q3 = 0.26; figure 7). Visual inspection of GOSH-plots for the verbal and performance IQ analyses did not indicate systematic patterns either, although interquartile ranges were somewhat larger than for full-scale IQ, most likely owing to the smaller number of included effect sizes (interquartile ranges = 0.03 and 0.03; Q1 = 0.17 and 0.20; Q3 = 0.20 and 0.23; figures 8 and 9). Results of combinatorial meta-analyses for patient samples showed similar patterns and are included in electronic supplementary material, S1.
Figure 7

GOSH-plot of 100 000 randomly sampled healthy subsets of all possible combinations for full-scale IQ.

Figure 8

GOSH-plot of 100 000 randomly sampled healthy subsets of all possible combinations for verbal IQ.

Figure 9

GOSH-plot of 100 000 randomly sampled healthy subsets of all possible combinations for performance IQ.

GOSH-plot of 100 000 randomly sampled healthy subsets of all possible combinations for full-scale IQ. GOSH-plot of 100 000 randomly sampled healthy subsets of all possible combinations for verbal IQ. GOSH-plot of 100 000 randomly sampled healthy subsets of all possible combinations for performance IQ.

Specification-curve meta-analyses

These indicated that virtually any reasonable specification in any domain consistently leads to a positive small-to-moderate association between brain volume and IQ. For full-scale IQ, summary effects ranged from a minimum of r = 0.10 to a maximum effect of r = 0.37. As can be seen in figure 10, most specifications yielded values in the r = 0.20 range. These specifications tended to provide comparatively precise estimates (i.e. effects with narrow confidence intervals), while estimates at the lower and upper end of the effect distribution were less precise. Verbal and performance IQ specification curves were consistent with these observations, almost invariably yielding positive small-to-moderate effects, mostly in the r = 0.20 range (figures 11 and 12; minimum rs = 0.11 and <0.01; maximum rs = 0.33 and 0.32, respectively).
Figure 10

Descriptive meta-analytic specification-curve plot (see [14]) of summary effects from all reasonable specifications for full-scale IQ. Note. The top panel shows summary effects with 95% confidence intervals according to effect strength. The middle panel indicates the number of samples within respective subsets. The bottom panel indicates the respective ‘which’ and ‘how’ factors that were used for the respective effect estimation. Warmer colours indicate lower effect precision (i.e. larger confidence intervals).

Figure 11

Descriptive meta-analytic specification-curve plot (see [14]) of summary effects from all reasonable specifications for verbal IQ. Note. The top panel shows summary effects with 95% confidence intervals according to effect strength. The middle panel indicates the number of samples within respective subsets. The bottom panel indicates the respective ‘which’ and ‘how’ factors that were used for the respective effect estimation. Warmer colours indicate lower effect precision (i.e. larger confidence intervals).

Figure 12

Descriptive meta-analytic specification-curve plot (see [14]) of summary effects from all reasonable specifications for performance IQ. Note. The top panel shows summary effects with 95% confidence intervals according to effect strength. The middle panel indicates the number of samples within respective subsets. The bottom panel indicates the respective ‘which’ and ‘how’ factors that were used for the respective effect estimation. Warmer colours indicate lower effect precision (i.e. larger confidence intervals).

Descriptive meta-analytic specification-curve plot (see [14]) of summary effects from all reasonable specifications for full-scale IQ. Note. The top panel shows summary effects with 95% confidence intervals according to effect strength. The middle panel indicates the number of samples within respective subsets. The bottom panel indicates the respective ‘which’ and ‘how’ factors that were used for the respective effect estimation. Warmer colours indicate lower effect precision (i.e. larger confidence intervals). Descriptive meta-analytic specification-curve plot (see [14]) of summary effects from all reasonable specifications for verbal IQ. Note. The top panel shows summary effects with 95% confidence intervals according to effect strength. The middle panel indicates the number of samples within respective subsets. The bottom panel indicates the respective ‘which’ and ‘how’ factors that were used for the respective effect estimation. Warmer colours indicate lower effect precision (i.e. larger confidence intervals). Descriptive meta-analytic specification-curve plot (see [14]) of summary effects from all reasonable specifications for performance IQ. Note. The top panel shows summary effects with 95% confidence intervals according to effect strength. The middle panel indicates the number of samples within respective subsets. The bottom panel indicates the respective ‘which’ and ‘how’ factors that were used for the respective effect estimation. Warmer colours indicate lower effect precision (i.e. larger confidence intervals).

Discussion

In this quantitative research synthesis, we show that positive associations of in vivo brain volume with IQ are highly reproducible. This link is consistently observable regardless of which empirical studies are included in a formal meta-analysis and how they are analysed. Results of our analyses convergently indicate that the effect strength must be assumed to be small-to-moderate in size, with the best available estimates for healthy participants in full-scale IQ ranging from r = 0.24 (uncorrected; approximately 6% explained variance) to 0.29 (corrected approximately 8% explained variance). Effects for full-scale IQ appear to be stronger and more systematically related to moderators compared to verbal and performance IQ. However, these three intelligence domains are highly intercorrelated and their correlation with IQ test results are to be seen as manifestations of a largely similar true effect across domains. We, therefore, focus on full-scale IQ findings of healthy samples in our discussion, unless indicated otherwise.

Comparisons with previous meta-analyses

The strengths of the observed summary effects in the present meta-analysis correspond closely to those identified by Pietschnig et al. [24], although the number of participants in this updated analysis is more than three times larger. The observed association for full-scale IQ in healthy samples (i.e. corresponding to selection criteria of the meta-analyses from [25], and [23]) resulted in an estimate of r = 0.24 (95% CI [0.22; 0.27]), thus indicating considerably lower associations than those reported by Gignac & Bates [25]) and McDaniel [23]). Key characteristics of the available meta-analyses are summarized in table 5.
Table 5

Characteristics of available meta-analyses on the in vivo brain volume and intelligence link. Note. k = number of independent samples in analysis; summary effect = best estimate according to authors of meta-analysis; when both Hedges & Olkin- as well as Hunter & Schmidt-typed analyses were performed, both estimates are provided, respectively.

McDaniel [23]Pietschnig et al. [24]Gignac & Bates [25]present study
sample typehealthy sampleshealthy and patient sampleshealthy sampleshealthy and patient samples
K3714832194
N15308034230526 764
meta-analytic approachHunter & Schmidt-typedHedges & Olkin-typedHunter & Schmidt-typedBoth
summary effect (r)0.330.24 (healthy r = 0.24; patient r = 0.20)0.400.23/0.26 (healthy r = 0.24/0.29; patient r = 0.21/0.20)
meta-analysis overlapall studies of McDaniel [23] includedsubset of Pietschnig et al. [24]all studies of Pietschnig et al. [24] included
Characteristics of available meta-analyses on the in vivo brain volume and intelligence link. Note. k = number of independent samples in analysis; summary effect = best estimate according to authors of meta-analysis; when both Hedges & Olkin- as well as Hunter & Schmidt-typed analyses were performed, both estimates are provided, respectively. It could be argued that these inconsistencies are to a certain extent due to the differing methodological focus of the used analyses because both meta-analyses of Gignac & Bates [25] and McDaniel [23] reported values that were corrected for direct range restriction. However, when we respecified our analyses to apply identical methods, full-scale IQ associations for healthy samples once more led to a lower estimate, yielding r = 0.29. This indicates that the reported estimates of prior Hunter & Schmidt-based syntheses were inflated (i.e. even before accounting for dissemination bias). This idea is supported by our analyses of individual data subsets that used the very same specifications as these prior studies. For instance, Gignac & Bates [25] showed that IQ assessments with higher g-ness (i.e. reflecting abilities that are more closely related to psychometric g, thus providing a better representation of cognitive abilities) yielded larger associations than less g-loaded assessments. They concluded that the most salient estimate of the brain volume and IQ association averages r = 0.40 (i.e. corresponding to about 16% of explained variance), based on a specific subset of effect sizes that should provide the most credible results (i.e. using healthy samples, tests with excellent g-ness and attenuation-corrected effect sizes only). None of the reasonable specifications that were included in our specification curve analysis yielded a summary effect that was larger than r = 0.37. Importantly, this most extreme upper value of all possible specifications was based on the very same inclusion criteria as the specification that is supposed to represent the best operationalization of this association according to Gignac & Bates [25], healthy samples, excellent g-ness, range departure corrected, Hunter & Schmidt estimator), excepting sample age (this uppermost value was based on children/adolescents only; the same specification with all ages yielded r = 0.34, corresponding to 11% of explained variance). This is important for a number of reasons. First, it shows that the specification that was chosen by Gignac & Bates [25] leads to estimates in the extreme upper tail of the distribution of reasonable summary effects. Besides yielding uncharacteristically large values, these estimates have large confidence intervals (i.e. representing higher effect volatility), because they are based on comparatively small sample numbers. Results from our combinatorial meta-analyses showed that at least 75% (i.e. the bottom three quartiles) of results yielded values below r = 0.26. Second, these findings suggest that the estimate reported in Gignac & Bates [25] must be considered to have been inflated, even when one was to assume that this extreme specification yields the most salient estimate for the brain volume and IQ association (i.e. the summary effect in [25], exceeds the upper threshold of any estimate of the present summary effect distribution). Third, the lower summary effects in the present analyses compared to the earlier estimate of Gignac & Bates [25], when identical specifications were used, indicate that the studies that were added in the present update of the literature reported lower correlations, thus conforming to a decline effect [21,22]. Consistent with this interpretation, publication years of primary studies predicted brain volume and IQ associations negatively, indicating decreasing effect sizes over time. Cross-temporally declining effect sizes have been demonstrated to be prevalent in psychological science in general and intelligence research in particular, especially when initial study sample sizes are small [22]. This means that early and small n (=imprecise) primary study reports represent more often than not overestimates of the brain size and IQ association, thus having led to inflated meta-analytic summary effects. The presently observed effect declines and comparatively large effect estimates of early small-n studies (e.g. [5]) are consistent with the decline effect and its assumed drivers.

Moderators

It is unsurprising that effects were typically stronger in healthy than in patient samples because the included patients suffered from different conditions that are likely to impair cognitive functioning (e.g. autism, brain traumas, schizophrenia) which is bound to introduce statistical noise into the data. Therefore, effects of moderators were substantially weaker and less unequivocal for patients than for healthy samples. Consistent with Gignac & Bates [25], there were stronger associations with highly g-loaded tests compared to fairly g-loaded ones in healthy participants (uncorrected rs = 0.31 versus 0.19; Q(2) = 23.69; p < 0.001), but not in patient samples. These results were supported by the findings from our regression analyses where larger g-ness positively predicted effect sizes of healthy participants. Within any examined subgroup, correlations that had been reported within publications were numerically larger than those that had been obtained through personal communications or from the grey literature. This suggests that correlations were selectively reported in the published literature although only differences in full-scale IQ associations of healthy samples reached nominal significance. This observation is consistent with effect inflation because larger associations are more likely than smaller ones to be numerically reported in the literature (numerically stronger effects are more likely to become significant—depending on sample sizes and accuracy—and therefore more likely to be published), thus potentially leading to inadequate assumptions of the readers about the effect strength. This finding is supported by results from our regression analyses that showed weaker effects of unpublished than published effect sizes. This suggests that the reported effects in the brain size and intelligence literature are more often inflated than not, thus conforming to results from Pietschnig et al. [24]. In a similar vein, publication years were negatively related to effect sizes, thus indicating a confounding decline effect [21] and conforming to cross-temporally decreasing effect sizes as reported in an earlier meta-analysis [24]. The only further moderator with consistent directions in terms of the observed association appeared to be measurement type which consistently yielded larger estimates for intracranial than for total brain volume, although these differences did not reach nominal significance (except for verbal IQ in patient samples). There were no consistent patterns in regard to age or sex in subgroup or regression analyses, thus conforming to a previous account that indicated that brain volume and IQ associations generalize over participant age bands and sex ([24]; but see [23], for conflicting findings). Three of our formal methods for detecting dissemination yielded significant bias indications for both full-scale and performance IQ (Sterne & Egger's regression, Trim-and-Fill analysis, Copas & Shi's method), while only one method (Trim-and-Fill analysis) indicated bias in verbal IQ. The evidence for bias was stronger for full-scale than performance IQ. It should be noted, that both Sterne and Egger's regression, as well as the Trim-and-Fill analysis, are funnel plot asymmetry-based methods and consequently particularly sensitive for the detection of small-sample effects. This means that the detected bias seems to be rooted in the correspondingly large error variance of underpowered (i.e. small sample size) studies and is consistent with previously raised concerns about suboptimal power in neuroscientific research [201]. Viewed from this perspective, declining effect sizes over time appear to be somewhat reconciliatory, because this may well mean that average study power has increased in this field (or at least in studies addressing this research question). The low observed replicability indices for all three domains further corroborate the evidence for effect inflation. Similarly, results of our effect estimations by means of p-value-based methods support the evidence for confounding dissemination bias, as previously observed in regard to this research question [24]. This interpretation is consistent with larger effects from published sources than from those that were obtained from the grey literature or personal communications, although these differences only reached nominal significance in meta-regressions, but not subgroup analyses. The present findings contrast the conclusions of Gignac & Bates [25] who did not identify bias evidence in their analysis. This discrepancy may be due to two different causes. On the one hand, Gignac & Bates [25] included unpublished results in the publication bias detection analyses (i.e. results that [24], had obtained from the grey literature or through personal communications with authors), which (i) prevent potential bias from detection and (ii) are conceptually unsuitable to be used in p-curve and p-uniform analyses [50,51]. On the other hand, different methods of dissemination bias detection are not equally sensitive for different forms of bias, thus necessitating a triangulation of methods for bias estimation according to current recommendations [42]. Relying on comparatively few and conceptually similar detection methods (i.e. publication bias tests of two p-value-based methods; p-curve and p-uniform; Henmi-Copas approach) may have contributed to the non-detection of bias evidence in this past meta-analysis [25], particularly because these methods are not suitable to detect small-sample effects. Although the present findings indicate a presence of confounding publication bias, this should not be interpreted as evidence against a brain volume and IQ link. As pointed out above, these associations appear to generalize across numerous potential moderators and replicate well in terms of the identified direction. However, confounding dissemination bias suggests that the obtained summary effects in many primary studies (and even some meta-analyses) represent inflated estimates of the true association. However, it needs to be acknowledged that the future development of more reliable methods for assessing IQ on the one or in vivo brain volume on the other hand may lead to larger correlation estimates in primary studies. Nonetheless, the strength of the brain volume and IQ association must be considered to be small-to-medium-sized at best.

Significance of the observed effect

On the one hand, the strength of the observed summary effect suggests that effects of mere neuron numbers, glial cells, or brain reserve are unlikely candidates for the explanation of between-individuals intelligence differences. On the other hand, the effect is clearly non-trivial and has turned out to be remarkably reproducible in terms of its positive direction across a large number of primary studies. Consequently, brain volume should not be seen as a supervenient (i.e. one-to-one) but rather an isomorphic (i.e. many-to-one) proxy of human intelligence. This may mean that brain volume in its own right is too coarse of a measure to reliably predict intelligence differences. It seems likely that examining the role of functional aspects (e.g. white matter integrity) and more fine-grained structural elements (e.g. cortical thickness; see [2]) may help in further clarifying the neurobiological bases of human intelligence.

Conclusion

In the present meta-analysis, we show evidence for modestly sized associations between in vivo brain volume and IQ. The effect appears to be stronger when more g-loaded test instruments are used and when full-scale IQ, rather than verbal or performance IQ domains, are assessed. This link appears to be remarkably robust in terms of other potentially moderating variables, generalizing across age, sex or primary study properties, such as study goal and number of variables, that have been controlled for. Importantly, examination of all possible reasonable specifications in regard to how meta-analytical calculations were performed and which studies had been included in the analysis showed consistently positive associations that ranged from r = 0.10 to 0.37, with a majority of specifications yielding values around r = 0.24. Although in the available literature this link replicated well across studies, some allowance must be made for overestimations of summary effects due to confounding dissemination biases. This indicates that the observed summary effect estimates must be considered to be inflated, thus representing an upper threshold of the true brain size and IQ association.
  143 in total

1.  Volumetric magnetic resonance imaging study of the brain in subjects with sex chromosome aneuploidies.

Authors:  M M Warwick; G A Doody; S M Lawrie; J N Kestelman; J J Best; E C Johnstone
Journal:  J Neurol Neurosurg Psychiatry       Date:  1999-05       Impact factor: 10.154

2.  Improved tests for a random effects meta-regression with a single covariate.

Authors:  Guido Knapp; Joachim Hartung
Journal:  Stat Med       Date:  2003-09-15       Impact factor: 2.373

3.  Confidence intervals for random effects meta-analysis and robustness to publication bias.

Authors:  Masayuki Henmi; John B Copas
Journal:  Stat Med       Date:  2010-10-20       Impact factor: 2.373

4.  Increasing Transparency Through a Multiverse Analysis.

Authors:  Sara Steegen; Francis Tuerlinckx; Andrew Gelman; Wolf Vanpaemel
Journal:  Perspect Psychol Sci       Date:  2016-09

5.  Intra-individual variability in neurocognitive function in schizophrenia: relationships with the corpus callosum.

Authors:  Ji-In Ahn; Seung-Taek Yu; Gyhye Sung; Tai-Kiu Choi; Kang-Soo Lee; Minji Bang; Sang-Hyuk Lee
Journal:  Psychiatry Res Neuroimaging       Date:  2018-11-10       Impact factor: 2.376

6.  Individual differences in the dominance of interhemispheric connections predict cognitive ability beyond sex and brain size.

Authors:  Kenia Martínez; Joost Janssen; José Ángel Pineda-Pardo; Susanna Carmona; Francisco Javier Román; Yasser Alemán-Gómez; David Garcia-Garcia; Sergio Escorial; María Ángeles Quiroga; Emiliano Santarnecchi; Francisco Javier Navas-Sánchez; Manuel Desco; Celso Arango; Roberto Colom
Journal:  Neuroimage       Date:  2017-04-13       Impact factor: 6.556

7.  Heterogeneity of brain lesions in pediatric traumatic brain injury.

Authors:  Erin D Bigler; Tracy J Abildskov; Joann Petrie; Thomas J Farrer; Maureen Dennis; Nevena Simic; H Gerry Taylor; Kenneth H Rubin; Kathryn Vannatta; Cynthia A Gerhardt; Terry Stancin; Keith Owen Yeates
Journal:  Neuropsychology       Date:  2013-07       Impact factor: 3.295

8.  Brain volume, asymmetry and intellectual impairment in relation to sex in early-onset schizophrenia.

Authors:  Simon L Collinson; Clare E Mackay; Anthony C James; Digby J Quested; Tania Phillips; Neil Roberts; Timothy J Crow
Journal:  Br J Psychiatry       Date:  2003-08       Impact factor: 9.319

9.  Basal ganglia volume and shape in children with attention deficit hyperactivity disorder.

Authors:  Anqi Qiu; Deana Crocetti; Marcy Adler; E Mark Mahone; Martha B Denckla; Michael I Miller; Stewart H Mostofsky
Journal:  Am J Psychiatry       Date:  2008-11-17       Impact factor: 18.112

10.  Brain structural differences between 73- and 92-year olds matched for childhood intelligence, social background, and intracranial volume.

Authors:  Stuart J Ritchie; David Alexander Dickie; Simon R Cox; Maria Del C Valdés Hernández; Ruth Sibbett; Alison Pattie; Devasuda Anblagan; Paul Redmond; Natalie A Royle; Janie Corley; Susana Muñoz Maniega; Adele M Taylor; Sherif Karama; Tom Booth; Alan J Gow; John M Starr; Mark E Bastin; Joanna M Wardlaw; Ian J Deary
Journal:  Neurobiol Aging       Date:  2017-10-16       Impact factor: 4.673

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.