Literature DB >> 35843514

Proportional intracranial volume correction differentially biases behavioral predictions across neuroanatomical features, sexes, and development.

Elvisha Dhamala¹, Leon Qi Rong Ooi², Jianzhong Chen³, Ru Kong⁴, Kevin M Anderson⁵, Rowena Chin⁵, B T Thomas Yeo⁶, Avram J Holmes⁷.

Abstract

Individual differences in brain anatomy can be used to predict variations in cognitive ability. Most studies to date have focused on broad population-level trends, but the extent to which the observed predictive features are shared across sexes and age groups remains to be established. While it is standard practice to account for intracranial volume (ICV) using proportion correction in both regional and whole-brain morphometric analyses, in the context of brain-behavior predictions the possible differential impact of ICV correction on anatomical features and subgroups within the population has yet to be systematically investigated. In this work, we evaluate the effect of proportional ICV correction on sex-independent and sex-specific predictive models of individual cognitive abilities across multiple anatomical properties (surface area, gray matter volume, and cortical thickness) in healthy young adults (Human Connectome Project; n = 1013, 548 females) and typically developing children (Adolescent Brain Cognitive Development study; n = 1823, 979 females). We demonstrate that ICV correction generally reduces predictive accuracies derived from surface area and gray matter volume, while increasing predictive accuracies based on cortical thickness in both adults and children. Furthermore, the extent to which predictive models generalize across sexes and age groups depends on ICV correction: models based on surface area and gray matter volume are more generalizable without ICV correction, while models based on cortical thickness are more generalizable with ICV correction. Finally, the observed neuroanatomical features predictive of cognitive abilities are unique across age groups regardless of ICV correction, but whether they are shared or unique across sexes (within age groups) depends on ICV correction. These findings highlight the importance of considering individual differences in ICV, and show that proportional ICV correction does not remove the effects of cranial volume from anatomical measurements and can introduce ICV bias where previously there was none. ICV correction choices affect not just the strength of the relationships captured, but also the conclusions drawn regarding the neuroanatomical features that underlie those relationships.

Entities: Chemical

Keywords: Behavioral prediction; Cortical surface area; Cortical thickness; Development; Gray matter volume; Intracranial volume; Neuroanatomy; Proportional correction; Sex differences

Mesh：

Year: 2022 PMID： 35843514 PMCID： PMC9425854 DOI： 10.1016/j.neuroimage.2022.119485

Source DB: PubMed Journal: Neuroimage ISSN： 1053-8119 Impact factor: 7.400

Introduction

A primary goal of research in the brain sciences is to establish the relationship between neurobiological features and behavioral traits, allowing for both the understanding and prediction of individual differences across health and disease (Yarkoni and Westfall 2017; Kohoutová et al., 2020; Bzdok and Yeo 2017). While there is an extensive history of work linking specific neuroanatomical properties with focused areas of cognition and behavior, only recently have large-scale collaborative efforts begun to provide the power necessary for data-driven discovery science (Somerville et al., 2018; Van Essen et al. 2013; Casey et al., 2018; Alexander et al., 2017; Holmes et al., 2015; Sudlow et al., 2015; Satterthwaite et al., 2014). Mounting evidence suggests that core features of brain anatomy are predictive of human behavior, but vary widely across populations and change across development within individuals (Bethlehem et al., 2021). Although this work has traditionally taken a cross-sectional, group-level approach, there is a growing understanding of the importance of accurately translating predictive models across both adult and developmental populations (Rosenberg, Casey, and Holmes 2018). To date, however, there is little empirical data on how the choice of both anatomical features and associated covariates may impact predictive accuracy or model generalizability, particularly within distinct demographic groups. Across the cerebral cortex, individual variability in anatomical features, including surface area, gray matter volume, and cortical thickness, are predictive of a diverse set of behavioral traits, ranging from cognition (Seidlitz et al., 2018) to personality and mental health (Ooi et al., 2022). The establishment of meaningful imaging-based predictive models requires accurate and reliable measurements (Ge et al., 2017). Although, the within-sample predictions derived from brain anatomy generally account for less variance than those based on patterns of function and/or connectivity (Mansour et al., 2021; Dhamala et al., 2021; Ooi et al., 2022), anatomical estimates are highly reliable (Holmes et al., 2015), highlighting their potential utility for brain-behavior predictive modeling. Relationships between neuroanatomical properties and cognitive abilities vary across the sexes (Gur et al., 1999; Gur and Gur 2016) and between healthy and clinical populations (Ehrlich et al., 2012; Hartberg et al., 2011) throughout the lifespan (Krogsrud et al., 2021). These studies, while crucial for the establishment of the neuroanatomical correlates of behavior, have largely focused on univariate analyses, leaving much to be understood about the multivariate associations that exist throughout the brain. The volume of the cranium, typically referred to as intracranial volume (ICV) or estimated total intracranial volume, was historically thought to increase during development and remain stable throughout adulthood (Matsumae et al., 1996), with larger volumes in males than in females throughout the lifespan (De Bellis et al. 2001; Cosgrove et al., 2007). More recent studies have confirmed greater ICV in males as well as changes in ICV throughout the lifespan (Caspi et al., 2020; Mills et al., 2016). ICV shows significant increases throughout childhood and adolescence (Mills et al., 2016; Dong, Castellanos, et al. 2020), followed by gradual increases in early adulthood until the fourth decade of life after which it begins to decrease (Caspi et al., 2020). These changes in ICV parallel shifts in cortical expansion (Hill et al., 2010), myelination (Grydeland et al., 2019), structure-function coupling (Baum et al., 2020), and functional maturation of association networks (Dong et al., 2021) throughout the lifespan (Sydnor et al., 2021). Across development, males exhibit larger ICV relative to females, along with a steeper rate of change during childhood and adolescence, as well as higher reduction rates after the fifth decade of life (Caspi et al., 2020). When investigating neuroanatomical properties (and their relationships to behavior) across the sexes and in different age groups, it is standard to correct for variations in ICV (Pintzka et al., 2015; Buckner et al., 2004), as corrected properties are assumed to be more valid than uncorrected measures (Sanfilipo et al., 2004), providing regional anatomical estimates unbiased by global shifts in head size across the population. However, different ICV correction methods can unintentionally introduce biases pertaining to cranial volume into the regional anatomical estimates. Critically, ICV itself is also related to behavioral and psychological constructs of interests, including cognition (Van Loenhoud et al. 2018; MacLullich et al., 2002). Accordingly, the use of ICV correction may influence relationships between neuroanatomical properties and other variables of interest, an effect that could vary in impact across populations. The presence of sex and/or gender differences in brain anatomy (e.g., Joel et al., 2015; Chekroud et al., 2016) and associated brain-behavior relationships has been a subject of pointed debate in the field (Ingalhalikar et al., 2014; Wierenga et al., 2019; Gur and Gur 2016). Several groups have investigated multivariate brain-behavior relationships in males and females (Dhamala et al., 2022; Jiang et al., 2020, et al. 2020). While some data suggests the presence of dissociable functional connections and neuroanatomical features underlying cognitive abilities in men and women (Jiang et al., 2020, et al. 2020), other work has revealed that both sexes rely on shared functional connections (Dhamala et al., 2022). However, this prior work has largely neglected the possible differential impact of ICV correction across the sexes, which could serve to amplify or mask group-level predictive features. Along these lines, recent work indicates that ICV correction methods can reduce both univariate sex differences and the accuracy of multivariate sex prediction based on gray matter volume (Sanchis-Segura et al., 2019; Sanchis-Segura et al., 2020). Meanwhile, in clinical populations, ICV correction can alter the relationships captured between regional neuroanatomical properties and behaviors (Voevodskaya et al., 2014). Although these data suggest ICV correction can influence both the results and their subsequent interpretation, it remains to be established whether these effects are consistent between sexes. Moreover, given the unique trajectories of ICV and regional neuroanatomy during development and adulthood (Caspi et al., 2020; Voevodskaya et al., 2014), it is likely that ICV correction will have differential effects across the lifespan. In the current study, we sought to uncover the extent to which a widely used ICV correction method, proportion correction, might differentially bias brain-behavior predictions and associated interpretations across diverse populations. To directly address this open question, we investigated the sex-independent and sex-specific effects of accounting for individual differences in ICV using proportional corrections on predictions of cognition based on surface area, gray matter volume, and cortical thickness in healthy young adults from the Human Connectome Project (HCP) and typically developing children from the Adolescent Brain Cognitive Development (ABCD) dataset. First, examining the differences in predictive accuracy based on ICV-uncorrected and ICV-(proportion)-corrected neuroanatomical measures, we demonstrate that ICV correction reduces predictive accuracies achieved by sex-independent and sex-specific models based on surface area and gray matter volume but increases predictive accuracies achieved by models based on cortical thickness. Second, evaluating the effects of ICV correction on model generalizability across sexes and datasets, we determine that predictive models based on uncorrected measures of surface area and gray matter volume are more generalizable than their corrected counterparts. Conversely, models based on ICV-corrected measures of cortical thickness are more generalizable than their uncorrected counterparts. Third, investigating the influence of ICV correction on the associations identified between neuroanatomical features and cognitive abilities, we reveal that distinct neuroanatomical features are associated with cognition across children and adults regardless of ICV correction, but those associations are shared across sexes for uncorrected measures of surface area and gray matter volume and for ICV-corrected measures of cortical thickness. Collectively, these results highlight the differential effects of ICV correction on behavioral predictions across neuroanatomical features, sexes, and age groups. Based on these findings, we speculate that ICV carries behaviorally relevant information, and this must be taken into consideration when developing predictive models to capture brain-behavior relationships across distinct populations that are likely to differ in ICV.

Methods

An overview of our experimental workflow is shown in Fig. 1. The methods used in this study build upon those previously described in our prior work (Dhamala et al., 2021; Dhamala et al., 2022; Anderson et al., 2021; Ooi et al., 2022) to perform novel analyses investigating effects of intracranial volume correction on sex-independent and sex-specific predictive modeling of behavioral traits in distinct populations.

Fig. 1.

Experimental Workflow.

(A) Dataset: Healthy young adults from the Human Connectome Project dataset, and typically developing children from the Adolescent Brain Cognitive Development dataset were included in the study. (B) Behavioral Data: Cognitive scores were compiled for each subject based on NIH Toolbox Cognitive Battery Task Scores. Total, Crystallized, and Fluid composites, as well as individual task scores within the Crystallized and Fluid domains were considered. (C) Neuroanatomical Features: Each subject’s native surface space was projected onto the 400-region Schaefer parcellation, and the T1-weighted anatomical image was used to extract regional surface area, gray matter volume, and cortical thickness for each of the 400 regions of interest. For each subject, these regional measures were either left uncorrected, or proportionally corrected for total intracranial volume. (D) Linear ridge regression models were trained to predict individual cognitive scores based on each of the neuroanatomical features. Males and females from each of the datasets were split into train (66%) and test sets. For each dataset, sex-independent models were trained on both male and female subjects, while sex-specific models were trained separately for each sex. All models employed three-fold cross-validation to optimize the regularization parameter. Sex-independent models were evaluated on sex-independent test sets from both datasets, and sex-specific models were evaluated on male- and female- specific test sets from both datasets.

Datasets

We considered healthy young adult participants from the Human Connectome Project (HCP) – Young Adult S1200 release (Van Essen et al. 2013). The HCP dataset is a community-based sample of twins, siblings, and unrelated individuals who were assessed on a comprehensive set of neuroimaging and behavioral batteries. After pre-processing quality control of imaging data, as described in (Li et al., 2019; Kong et al., 2021; Ooi et al., 2022), we filtered participants based on availability of anatomical scans and behavioral scores of interests (Fig. 1 A). Our final HCP sample comprised 1013 adults (548 males; 22–37 years old). Although the term gender is used in the HCP Data Dictionary, the term sex is used in this article because the database collected self-reported biological sex information as opposed to gender identification. We also considered typically developing children from the Adolescent Brain Cognitive Development (ABCD) 2.0.1 release (Casey et al., 2018). The ABCD dataset is a large community-based sample of children and adolescents who were assessed on a comprehensive set of neuroimaging, behavioral, and developmental batteries. After pre-processing quality control of imaging data, as described in (Ooi et al., 2022; Chen et al., 2022), we filtered participants based on availability of anatomical scans and behavioral scores of interest (Fig. 1 B–C). Our final ABCD sample comprised 1823 children (979 males; 9–10 years old).

Image acquisition and processing

Minimally processed T1-weighted anatomical images (0.7 mm isotropic for HCP; 1.0 mm isotropic for ABCD) were used for the analyses. Details about the acquisition protocol and processing pipelines are described elsewhere for HCP (Glasser et al., 2013; Marcus et al., 2013) and ABCD (Hagler et al., 2019). Briefly, for HCP, the T1-weighted images underwent gradient distortion correction, followed by alignment to match the MNI template, and the anterior commissure/posterior commissure axis while maintaining the original brain size and shape. Then, an initial robust brain extraction was performed, and a field map was used to remove readout distortion to produce the minimally processed T1-weighted image (Glasser et al., 2013; Marcus et al., 2013). For ABCD, the T1-weighted images underwent gradient distortion correction and bias field correction. Next, the images were aligned to match an average reference brain in standard space and the anterior commissure/posterior commissure axis to produce the minimally processed T1-weighted image (Hagler et al., 2019).

Behavioral data

The NIH Toolbox Cognition Battery is an extensively validated battery of neuropsychological tasks used to assess language, executive function, episodic memory, processing speed, and working memory based on seven individual test instruments (Fig. 1B) (Carlozzi et al., 2017; Gershon et al., 2013; Heaton et al., 2014; Mungas et al., 2014; Weintraub et al., 2013; Weintraub et al., 2014; Zelazo et al., 2014; Zelazo and Bauer 2013). Initial factor analysis of the individual task scores yields three composite scores: total, crystallized, and fluid. Broadly, crystallized cognition represents language abilities, while fluid cognition represents executive function, episodic memory, processing speed, and working memory. These composite scores tend to be more reliable and stable but may fail to capture variability in individual tasks (Heaton et al., 2014). Accordingly, all individual task scores as well as the composite scores were used in the analyses.

Neuroanatomical features

For each participant, the native fs_LR32k surface space was projected onto the 400-region Schaefer parcellation (Schaefer et al., 2018) using HCP workbench, and the T1-weighted anatomical image was used to extract cortical surface area, cortical gray matter volume, and cortical thickness for each of the 400 regions of interest (ROIs) using FreeSurfer 6.0′ s mris_anatomical_stats (Dale et al., 1999) (Fig. 1C). Measures of intracranial volume (ICV) were obtained from FreeSurfer’s estimated total intracranial volume. Surface area, gray matter volume, and thickness were proportionally corrected for individual differences in ICV by dividing the raw values by ICV. Both ICV-uncorrected and ICV-corrected anatomical measures were used in the analyses. Regional measures were also summarized at a network-level by computing the sum of the surface areas and gray matter volumes and the average of cortical thickness measures across all parcels within a network. Correlations between network-level measures of neuroanatomical features and ICV, as well as between ICV and the cognitive scores, age and the cognitive scores, and age and ICV were computed in a sex-specific manner for each dataset using Pearson’s correlation.

Predictive modelling

Linear ridge regression models and more complex deep learning algorithms achieve comparable predictive accuracies of behavioral traits based on neuroimaging data (He et al., 2020), but linear ridge regression models are less computationally expensive and more interpretable. In this study, sex-independent and sex-specific linear ridge regression models were trained to predict each behavioral score (individual task scores and composite scores) based on each anatomical measure (intracranial volume, surface area, gray matter volume, thickness) (Fig. 1D). Separate models were trained for ICV-uncorrected and ICV-corrected measures of surface area, gray matter volume, and thickness, as well as for each of the two datasets (HCP and ABCD). Sex-independent models were trained on data from all subjects, while male- and female- specific models were trained only on data from males and females, respectively. For each model, data were randomly shuffled and split into 100 distinct train (66%) and test sets without replacement. For the HCP data, family structure was considered when splitting the data such that related participants were placed either in the train or the test set but not split across both. Similarly, for the ABCD data, imaging site was considered such that all participants from a given site were placed either in the train or the test set but not split across both. The regularization parameter was optimized using three-fold cross-validation within the training set. Family structure and imaging site were similarly accounted for in the cross-validation as in the initial train-test split. Once optimized, sex-independent models were evaluated on test sets from both datasets while sex-specific models were evaluated on test sets from both datasets and across both sexes. This was repeated for the 100 distinct train-test splits to obtain a distribution of performance metrics. The accuracy of each model is defined as the correlation between the true and predicted behavioral scores for each split. Average accuracy was computed by taking the mean across the 100 distinct train-test splits.

Model significance

All models were evaluated on whether they performed better than chance using null distributions of performance as previously described (Dhamala et al., 2021; Dhamala et al., 2022; Parkes et al., 2021). For each predictive model, the behavioral score was randomly permuted 10,000 times. Each permutation was used to train and test a null model using a randomly selected regularization parameter from the set of selected parameters for the original model. Prediction accuracy from each of the original model’s 100 train-test splits were then compared to the median prediction accuracy from the null distribution. The p-value for the model’s significance is defined as the proportion of 100 original models with prediction accuracies less than or equal to the median performance of the null model. In other words, a model is considered to be significant if it performed better than the median null performance for more than 95 of the 100 original models. The p values were then corrected for multiple comparisons across all cognitive scores using the Benjamini-Hochberg False Discovery Rate (q = 0.05) procedure (Benjamini and Hochberg 1995).

Model comparisons

Models trained on ICV-uncorrected versus ICV-corrected anatomical measures were compared to one another to evaluate significant differences in performance using a two-tailed exact test of differences (MacKinnon 2009) as previously described (Dhamala et al., 2021). The p-value for the model comparison is defined as the proportion of pairs of 100 models where one model’s prediction accuracy either is less than or equal to the other model. In other words, one model is considered to be significantly better than the other model if it performed better than the other for more than 95 of the 100 paired models. The p values were then corrected for multiple comparison across all cognitive scores using the Benjamini-Hochberg False Discovery Rate (q = 0.05) procedure (Benjamini and Hochberg 1995).

Model generalizability

For sex-independent models, models trained on a given dataset were evaluated on both datasets. For each anatomical modality, an average model prediction accuracy was computed for each train/test dataset combination by taking the mean prediction accuracy across all cognitive scores. For sex-specific models, models trained on a given sex from a given dataset were evaluated on both sexes from both datasets. For each anatomical modality, a model prediction accuracy was computed for each train/test dataset and sex combination by taking the mean prediction accuracy across all cognitive scores. Model generalizability is defined as the accuracy obtained when a given model is evaluated on a population (i.e., a given sex and/or dataset) that is unique from the population in which it was trained. This is distinct from the model accuracy which is defined as the accuracy obtained when evaluating the model on a (hold out) test set that is from the same population as the training set. Model generalizability was also computed separately for each of the three cognitive domains; total (composite score only), crystallized (composite score and individual task scores), and fluid (composite score and individual task scores). Percent difference (% difference) between the ICV-uncorrected (runcorrected) and ICV-corrected (rcorrected) prediction accuracies was calculated as follows:

Feature weights

Raw feature weights obtained from the linear ridge regression models were transformed used the Haufe transformation (Haufe et al., 2014) to increase their interpretability and reliability (Tian and Zalesky 2021; Chen et al., 2022). For each train-test split, the raw feature weights, W, the covariance of the input data (anatomical modality) from that train set, Σ, and the covariance of the output data (behavioral score) from that train set, Σ, were used to compute the Haufe-transformed feature weights, A, as follows: These Haufe-transformed feature weights were then averaged across the 100 splits to obtain a mean regional feature weight. To compare pairs of regression models, correlations between mean region feature weights were evaluated using Pearson’s correlation. Absolute regional feature weights were mapped to a network-level by assigning each Schaefer cortical parcel to one of 17 networks from the Yeo 17-network parcellation (Yeo et al., 2011) to generate network-level feature weights. Divergence in feature weights between models were evaluated using exact tests for differences.

Data and code availability

All data used in this study are openly available and can be accessed directly from the HCP (https://www.humanconnectome.org/study/hcp-young-adult) and ABCD (https://abcdstudy.org/) websites. Code used to generate the results presented here are available on GitHub (https://github.com/elvisha/neuroanatomical-predictions-of-behaviour).

Results

Intracranial volume is uniquely related to brain anatomy across populations

Regional uncorrected and proportion-corrected measures of surface area, gray matter volume, and cortical thickness were summarized at the network-level by taking the sum of the surface areas and gray matter volumes and average of cortical thickness measures across all parcels within a network. These uncorrected and proportion-corrected network summaries of the anatomical properties were then correlated with total intracranial volume (ICV) to evaluate network-specific relationships between the neuroanatomical features and ICV, as shown in Fig. 2.

Fig. 2.

Total surface area and gray matter volume, and average cortical thickness across distinct cortical networks exhibit correlations with intracranial volume.

Correlation (Pearson’s correlation coefficient) between network-level uncorrected (A) and proportion-corrected (B) neuroanatomical properties (total surface area, total gray matter volume, and average cortical thickness) and total intracranial volume across both datasets (HCP and ABCD). SalVenAttn – Salience/Ventral Attention; DorsAttn – Dorsal Attention; SomMot – Somatomotor. Networks are ordered from heteromodal (left) to unimodal (right).

Within the HCP dataset (Fig. 2A), correlations between uncorrected surface area and ICV ranged between 0.55 and 0.80 (mean ± standard deviation = 0.73 ± 0.06), with somewhat stronger relationships present in heteromodal association cortices (0.77 ± 0.02) than in unimodal somatosensory/motor (somato/motor) cortices (0.68 ± 0.07). Similarly, correlations between uncorrected gray matter volume and ICV ranged between 0.59 and 0.80 (0.75 ± 0.05) with somewhat stronger relationships in heteromodal association cortices (0.78 ± 0.02) than in unimodal cortices (0.70 ± 0.06). However, for uncorrected cortical thickness, correlations were generally weaker and ranged between 0.06 and 0.24 (0.16 ± 0.05), and somewhat stronger relationships were observed in unimodal (0.19 ± 0.05) than association cortices (0.14 ± 0.04). Within the ABCD dataset (Fig. 2A), correlations between ICV and uncorrected surface area or gray matter volume were generally comparable across unimodal and heteromodal cortices. Correlations between uncorrected surface area and ICV ranged between 0.51 and 0.74 (0.67 ± 0.06), while those between uncorrected gray matter volume and ICV ranged between 0.59 and 0.77 (0.69 ± 0.05). However, correlations between uncorrected cortical thickness and ICV exhibited the opposite pattern as that observed in HCP: correlations ranged between −0.02 and 0.30 (0.10 ± 0.08) with stronger relationships in heteromodal association cortices (0.14 ± 0.07) than in unimodal somato/motor cortices (0.05 ± 0.06). These data are consistent with prior work indicating a staggered maturation of cortical gray matter across development, in which the unimodal somato/motor and visual territories develop prior to the heteromodal association areas (Dong et al., 2021; Sydnor et al., 2021). Across both datasets, correlations between all proportion-corrected neuroanatomical features and ICV were negative and comparable across unimodal and heteromodal cortices (Fig. 2B). In HCP, the correlations ranged between −0.38 and −0.59 (−0.47 ± 0.05) for surface area, −0.35 and −0.51 (−0.43 ± 0.05) for gray matter volume, and −0.91 and −0.95 (−0.94 ± 0.01) for cortical thickness. In ABCD, they ranged between −0.18 and −0.41 (−0.27 ± 0.06) for surface area, −0.13 and −0.37 (−0.24 ± 0.07) for gray matter volume, and −0.87 and −0.93 (−0.91 ± 0.02) for cortical thickness. These data suggest that proportion correction is unsuccessful in removing all of the variance related to individual differences in ICV for surface area and gray matter volume, and instead introduces additional information about ICV into cortical thickness measures. Similar results were observed when correlations between ICV and the neuroanatomical properties were evaluated in a sex-specific manner in HCP (Figure S1A, S2A) and ABCD (Figure S1B, S2B). These data suggest that ICV is differentially related to structural organization of the cortex during childhood and adulthood. Moreover, correcting for individual differences in ICV using the proportion correction method can induce distinct effects in populations across the lifespan and inadvertently introduce information about intracranial volume into the neuroanatomical measures.

Intracranial volume is distinctly related to abilities across cognitive domains

Sex-specific correlations between ICV and the ten cognitive scores (three composite, seven individual task) were computed (Figure S3A). Across both datasets and sexes, correlations between ICV and the total and crystallized composite scores ranged between 0.16 and 0.24. Similar correlations ranging from 0.14 to 0.22 were observed with individual task scores within the crystallized domain. Weaker relationships were identified between ICV and the fluid composite (0.08 – 0.18) as well as individual fluid task scores (0.00 – 0.15) with the exception of the Working Memory (List Sort) score (0.13 – 0.19). Sex-specific correlations between age and the ten cognitive scores were also computed (Figure S3B). In HCP, males and females exhibited mostly negative relationships between age and fluid abilities with correlations ranging from −0.18 to 0.03. Males exhibited positive relationships between age and crystallized abilities (0.11 – 0.23) while females exhibited no clear relationships (−0.04 – 0.03). In ABCD, both sexes exhibited positive relationships between age and cognition with correlations ranging between 0.06 and 0.36. Finally, sex-specific correlations between age and ICV were also analyzed (Figure S3C). Weak relationships were observed in HCP (r = 0.02 in males, r = −0.14 in females) and ABCD (r = 0.06 in males, r = 0.07 in females). All observed correlations between ICV, cognitive scores, and age are small relative to historical standards and muted relative to correlations between ICV and regional neuroanatomical properties. Sex-independent and sex-specific models were trained to predict the cognitive scores in both HCP (adults) and ABCD (children) based on ICV. Prediction accuracies obtained by these models are shown in Fig. 3. In HCP and ABCD, sex-independent and sex-specific models successfully predicted total and crystallized composite scores, as well as task scores within the crystallized domain (corrected p < 0.05). Across both datasets, mixed results were observed for sex-independent and sex-specific model predictions of fluid abilities where many predictions were not better than chance (corrected p < 0.05).

Fig. 3.

Intracranial volume predicts cognitive abilities.

Prediction accuracies (Pearson’s correlation coefficient between observed and predicted scores) for sex-independent and sex-specific models predicting cognitive scores in HCP (top) and ABCD (bottom) based on ICV. Sex-independent (left), male-specific (middle), and female-specific (right) predictions are shown, Black asterisks (*) denote that the model performed above chance levels based on permutation tests (corrected p < 0.05). The shape of the violin plots indicates the entire distribution of values, dashed lines indicate the median, and dotted lines indicate the interquartile range.

Relationships between brain volume and general intelligence have been previously reported (Gignac and Bates 2017). Here, in both children and adults, crystallized and fluid domains of cognition are differentially related to and predicted by ICV, suggesting ICV carries behaviorally relevant information that is partially distinct across cognitive domains. The observed opposite relationships between age and cognitive scores in children and adults also suggests that even within the relatively narrow age range, there may exist underlying associations between age and cognition. However, the lack of clear relationships between age and ICV suggest that any observed relationships between age and cognition are not driven by ICV.

Intracranial volume correction differentially biases prediction accuracies across neuroanatomical features

Sex-independent models were trained to predict ten distinct behavioral scores using either ICV-uncorrected or ICV-corrected anatomical measures of surface area, gray matter volume, or cortical thickness in both HCP (adults) and ABCD (children). Prediction accuracies obtained by these models are shown in Fig. 4.

Fig. 4.

Accounting for intracranial volume reduces predictive accuracies of cognition based on surface area and gray matter volume, and increases predictive accuracies based on cortical thickness.

Prediction accuracies (Pearson’s correlation coefficient between observed and predicted scores) for sex-independent models predicting cognitive scores in HCP (A) and ABCD (B). Predictions based on surface area (left), gray matter volume (middle), and cortical thickness (right) using ICV-uncorrected (green) and ICV-corrected (orange) anatomical properties are shown. Green and orange asterisks (*) denote that the model performed above chance levels based on permutation tests (corrected p < 0.05). Black asterisks (*) denote that model performance was significantly different between the ICV-uncorrected and ICV-corrected predictions based on exact tests for differences (corrected p < 0.05). The shape of the violin plots indicates the entire distribution of values, dashed lines indicate the median, and dotted lines indicate the interquartile range.

Sex-independent models based on uncorrected and corrected measures of surface area, gray matter volume, and cortical thickness successfully predicted (corrected p < 0.05) total composite scores and scores within the crystallized domain in both datasets, but generally only successfully predicted scores within the fluid domain in HCP. Models trained to predict cognitive scores in HCP achieved higher mean prediction accuracies based on uncorrected measures of surface area (r = 0.190 for ICV-uncorrected, r = 0.142 for ICV-corrected) and gray matter volume (r = 0.200 for ICV-uncorrected, r = 0.153 for ICV-corrected), and corrected measures of cortical thickness (r = 0.119 for ICV-uncorrected, r = 0.174 for ICV-corrected) (Fig. 4A). Similarly, models trained to predict cognitive scores in ABCD yielded higher within-dataset mean prediction accuracies based on uncorrected measures of surface area (r = 0.114 for ICV-uncorrected, r = 0.064 for ICV-corrected) and gray matter volume (r = 0.123 for ICV-uncorrected, r = 0.087 for ICV-corrected), and corrected measures of cortical thickness (r = 0.106 for ICV-uncorrected, r = 0.142 for ICV-corrected) (Fig. 4B). While this general trend was evident across analyses, most of these differences were non-significant at the level of the individual cognitive scores being predicted (corrected p > 0.05). The one notable exception: models predicting cognitive scores within the crystallized domain in ABCD were significantly more accurate when using uncorrected measures of surface area or gray matter volume than when using their corrected counterparts. Previous work has demonstrated that transformation of neurobiological variables can strengthen or weaken the brain-behavior associations captured by predictive models (Li et al., 2019). Our current findings show that ICV correction can similarly strengthen or weaken relationships between neuroanatomical features and individual cognitive abilities, revealing unique impacts in prediction accuracies across children and adults.

Effects of intracranial volume correction on prediction accuracies differ across sexes and age groups

Sex-specific models were trained to predict ten distinct behavioral scores using either ICV-uncorrected or ICV-corrected anatomical measures of surface area, gray matter volume, or cortical thickness in both HCP and ABCD (Fig. 5).

Fig. 5.

Accounting for intracranial volume differentially impacts predictive accuracies across surface area, gray matter volume, and cortical thickness in a sex specific manner.

Prediction accuracies (Pearson’s correlation coefficient between observed and predicted scores) for sex-specific models predicting cognitive scores in HCP males (A), HCP females (B), ABCD males (C), and ABCD females (D). Predictions based on surface area (left), gray matter volume (middle), and cortical thickness (right) using ICV-uncorrected (green) and ICV-corrected (orange) anatomical properties are shown. Green and orange asterisks (*) denote that the model performed above chance levels based on permutation tests (corrected p < 0.05). Black asterisks (*) denote that model performance was significantly different between the ICV-uncorrected and ICV-corrected predictions based on exact tests for differences (corrected p < 0.05). The shape of the violin plots indicates the entire distribution of values, dashed lines indicate the median, and dotted lines indicate the interquartile range..

Within datasets, sex-specific models based on uncorrected and corrected measures of surface area, gray matter volume, and cortical thickness generally successfully predicted (corrected p < 0.05) total composite scores and scores within the crystallized domain in females in both datasets. Sex-specific models generally successfully predicted (corrected p < 0.05) total composite scores and scores within the crystallized domain in males within both datasets when using uncorrected measures of surface area or gray matter volume, or ICV-corrected measures of cortical thickness. Models trained to predict cognitive scores in HCP males achieved higher mean prediction accuracies when based on uncorrected measures of surface area (r = 0.149 for ICV-uncorrected, r = 0.053 for ICV-corrected) and gray matter volume (r = 0.146 for ICV-uncorrected, r = 0.061 for ICV-corrected), and corrected measures of cortical thickness (r = 0.019 for ICV-uncorrected, r = 0.119 for ICV-corrected) (Fig. 5A). Similarly, models trained to predict cognitive scores in HCP females achieved higher mean prediction accuracies when based on uncorrected measures of surface area (r = 0.188 for ICV-uncorrected, r = 0.123 for ICV-corrected) and gray matter volume (r = 0.194 for ICV-uncorrected, r = 0.141 for ICV-corrected), and corrected measures of cortical thickness (r = 0.128 for ICV-uncorrected, r = 0.174 for ICV-corrected) (Fig. 5B). Models trained to predict cognitive scores in ABCD demonstrated similar trends. Higher mean prediction accuracies were achieved in males using models based on uncorrected measures of surface area (r = 0.093 for ICV-uncorrected, r = −0.003 for ICV-corrected) and gray matter volume (r = 0.106 for ICV-uncorrected, r = 0.035 for ICV-corrected), and ICV-corrected measures of cortical thickness (r = 0.066 for ICV-uncorrected, r = 0.111 for ICV-corrected) (Fig. 5C). Likewise, in females, higher mean prediction accuracies were achieved using models based on uncorrected measures of surface area (r = 0.152 for ICV-uncorrected, r = 0.071 for ICV-corrected) and gray matter volume (r = 0.140 for ICV-uncorrected, r = 0.092 for ICV-corrected), and ICV-corrected measures of cortical thickness (r = 0.079 for ICV-uncorrected, r = 0.162 for ICV-corrected) (Fig. 5D). While differences based on uncorrected and corrected measures were generally non-significant at the level of individual cognitive scores, there were two noteworthy exceptions. First, models trained on uncorrected measures of surface area and gray matter volume in ABCD males and females significantly outperformed those trained on corrected measures (corrected p < 0.05) to predict cognitive scores within the crystallized domain. Second, models trained in ABCD females achieved significantly higher prediction accuracies (corrected p < 0.05) using uncorrected measures of surface area and gray matter volume, and ICV-corrected measures of cortical thickness to predict the Working Memory task score. Finally, within-dataset prediction accuracies were typically numerically higher in females than in males in HCP and ABCD. In line with previous work, these data highlight the presence of differential brain-behavior predictive relationships across the sexes (Dhamala et al., 2022; Jiang et al., 2020, et al. 2020). These results also emphasize the unique effects of ICV correction not just across age groups, but also across sexes within a given age group.

Intracranial volume differentially influences generalizability of predictive models across neuroanatomical features, sexes, and age groups

Sex-independent models trained on each dataset were evaluated across both datasets. Mean prediction accuracies (across all cognitive scores) obtained by these models when evaluated within and between datasets are shown in Figure S4A. Prediction accuracies obtained for each cognitive score when evaluating the models between datasets are shown in Fig. 6.

Fig. 6.

Intracranial volume correction reduces generalizability of models based on surface area and gray matter volume but increases generalizability of models based on cortical thickness.

Generalizability of sex-independent (A) and sex-specific (B) models across sexes (males and females) and datasets (HCP and ABCD). Mean prediction accuracies across all 10 cognitive scores based on surface area (left), gray matter volume (middle), and cortical thickness (right) using raw anatomical properties are shown in the top panels, and predictions using ICV proportion-corrected anatomical properties are shown in the bottom panels. The populations that the models were trained on are shown along the rows, and the populations that the models were tested on are shown along the columns.

Sex-independent models trained to predict cognitive scores in HCP (Figs. 6A, S4A) were more generalizable to ABCD based on uncorrected measures of surface area (r = 0.099 for ICV-uncorrected, 0.007 for ICV-corrected) and gray matter volume (r = 0.093 for ICV-uncorrected, r = 0.013 for ICV-corrected), and corrected measures of cortical thickness (r = −0.044 for ICV-uncorrected, r = 0.079 for ICV-corrected). Similarly, models trained to predict cognitive scores in ABCD (Figs. 6A, S4B) were more generalizable to HCP based on uncorrected measures of surface area (r = 0.144 for ICV-uncorrected, r = −0.023 for ICV-corrected) and gray matter volume (r = 0.118 for ICV-uncorrected, r = −0.009 for ICV-corrected), and corrected measures of cortical thickness (r = −0.036 for ICV-uncorrected, r = 0.147 for ICV-corrected). Sex-specific models trained on each sex from each dataset were evaluated across both sexes and both datasets. Mean prediction accuracies obtained by these models when evaluated within and between sexes and datasets are shown in Fig. 6B. Prediction accuracies obtained for each cognitive score when evaluating the models between datasets are shown in Figures S5–S8. In brief, sex-specific models exhibited similar overall trends in generalizability as those described for the sex-independent models above. Male- and female- specific models were more generalizable across sexes within datasets than they were across datasets when based on uncorrected measures of surface area or gray matter volume, or corrected measures of cortical thickness (Fig. 6B). Moreover, sex-specific models based on ICV-corrected measures of surface area or gray matter volume, or uncorrected measures of cortical thickness generally achieved negative mean prediction accuracies when evaluated between datasets (Fig. 6B). Given the unique relationships between ICV and cognition across the different cognitive domains (total, crystallized, and fluid) outlined in Figure S3, we next examined each cognitive domain separately (see Figure S9 for the sex-independent models, and Figures S10–S12 for the sex-specific models). Sex-independent models trained to predict total cognition were more generalizable across datasets based on uncorrected measures of surface area (Figure S9A, left panel) and gray matter volume (Figure S9A, center panel), and corrected measures of cortical thickness (Figure S9A, right panel). Unsurprisingly, similar results were obtained for models trained to predict crystallized (Figure S9B) and fluid (Figure S9C) abilities, albeit at lower prediction accuracies for the fluid abilities. Sex-specific models across each cognitive domain exhibited similar results as the sex-independent ones (see Figures S10–S12 for the sex-specific models). Models predicting total (Figure S10), crystallized (Figure S11), and fluid (Figure S12) abilities were more generalizable to the opposite sex within datasets than they were to either sex in the opposite dataset when based on uncorrected measures of surface area and gray matter volume, or corrected measures of cortical thickness. Models typically did not generalize between datasets (i.e., achieved negative prediction accuracies) when based on corrected measures of surface area and gray matter volume, or uncorrected measures of cortical thickness. Differences in accuracy between models based on the ICV-uncorrected and ICV-corrected measures were also computed for each cognitive domain to quantify the effect of ICV correction and are shown in Fig. 7A for the sex-independent models and Fig. 7B for the sex-specific models. Relatedly, percent differences in accuracy were also computed and are shown in Figure S13. As per the results described above, ICV correction reduced generalizability of models based on surface area (Fig. 7A–B, left panels) and gray matter volume (Fig. 7A–B, center panels), and increased generalizability of models based on cortical thickness (Fig. 7A–B, right panels). Moreover, in line with the generally lower for predictions of fluid abilities, numerical differences between the uncorrected and corrected measures were somewhat smaller compared to the total and crystallized domains, but percent differences are broadly comparable across the cognitive domains for sex-independent and sex-specific models. For sex-independent models, greater effects of ICV correction are observed in all three cognitive domains when evaluating model generalizability across sexes/datasets than model accuracy within a given sex and dataset. Models trained to predict total cognition exhibited lower differences within datasets than between datasets for surface area (Fig. 7A, top left panel), gray matter volume Fig. 7A, top center panel), and cortical thickness (Fig. 7A, top right panel). Similar patterns of larger effects of ICV correction for predictions across datasets than within datasets were observed for sex-independent predictions of crystallized (Fig. 7A, middle panels) and fluid abilities (Fig. 7B, bottom panels). Sex-specific models to predict total (Fig. 7B, top panels), crystallized (Fig. 7B, middle panels), and fluid (Fig. 7B, bottom panels) abilities exhibited similar trends such that stronger effects of ICV correction were present when evaluating model predictions across datasets and sexes, than within datasets. Within datasets, predictions within and between sexes were generally similarly influenced by ICV correction in HCP and in ABCD.

Fig. 7.

Intracranial volume correction differentially reduces generalizability of models based on surface area and gray matter volume but increases generalizability of models based on cortical thickness across cognitive domains.

Difference in generalizability of sex-independent (A) and sex-specific (B) models across sexes (males and females) and datasets (HCP and ABCD). Difference between average prediction accuracies is based on ICV-uncorrected and ICV-corrected measures of surface area (left), gray matter volume (middle), and cortical thickness (right) to predict total (top), crystallized (middle), and fluid (bottom) abilities. Positive (warmer) values indicate that the ICV-corrected measures outperformed the ICV-uncorrected measures. The populations that the models were trained on are shown along the rows, and the populations that the models were tested on are shown along the columns.

Out-of-sample validation of predictive models using external datasets is typically considered the gold standard. These findings reveal that ICV correction more strongly affects out-of-sample predictions using external datasets than predictions on hold-out test sets within a dataset. These data also show that the correction affects males and females equally. Although speculative, this perhaps reflects differences in brain-behavior relationships throughout development and adulthood.

Intracranial volume correction uniquely affects interpretations of brain-behavior relationships across neuroanatomical features and age groups

Regional Haufe-transformed feature weights were summarized at a network-level based on the Yeo 17-network solution (Yeo et al., 2011). Absolute relative network-level feature weights are shown in Fig. 8 for the sex-independent models and in Figure S14–S15 for the sex-specific models. Within each dataset, measures of surface area and gray matter volume exhibit similar associations with individual cognitive abilities regardless of ICV correction. In HCP, surface area within the visual networks and gray matter volume within the default mode, language, and control networks are strongly associated with cognition (Fig. 8A, left and center panels). In ABCD, surface area within the default mode network, and to a lesser extent the somatomotor network, as well as gray matter volume in the visual, somatomotor, and dorsal attention networks are strongly associated with cognition (Fig. 8B, left and middle panels). Uncorrected measures of cortical thickness across a diverse set of networks are associated with cognitive abilities in both datasets (Fig. 8A–B, top right panels). However, with ICV correction, opposing gradients of associations emerge in HCP and ABCD (Fig. 8A–B, bottom right panels). In HCP, cortical thickness of regions within heteromodal association cortices are most strongly associated with cognitive scores while regions within unimodal cortices are weakly associated. However, in ABCD, this pattern is reversed and cortical thickness of regions within unimodal somato/motor cortices are most strongly associated with cognitive scores while regions within heteromodal association cortices are weakly associated.

Fig. 8.

The predictive relationships linking cognition with the anatomy of association and unimodal cortices across populations can be revealed or obscured though the use of intracranial volume correction.

Absolute relative network-level feature weights to predict each of the cognitive scores in HCP (A) and ABCD (B). Feature weights for models based on surface area (left), gray matter volume (middle), and cortical thickness (right) using ICV-uncorrected anatomical properties are shown in the top panels, and predictions using ICV-corrected anatomical properties are shown in the bottom panels. SalVenAttn – Salience/Ventral Attention; DorsAttn – Dorsal Attention; SomMot – Somatomotor. Networks are ordered from heteromodal (left) to unimodal (right)..

These results emphasize that while ICV correction may not always affect network-level interpretations of brain-behavior relationships, it can reveal underlying relationships. The specific brain-behavior relationships we do capture, and the unique patterns they exhibit in the two datasets, are in line with prior work demonstrating non-linear maturation trajectories for cortical expansion, cortical thinning, intracortical myelination, functional maturation, and structure-function coupling (Sydnor et al., 2021) where unimodal somato/motor networks achieve maturity earlier in childhood followed by heteromodal association cortices later in adolescence.

Intracranial volume correction differentially influences interpretations of brain-behavior relationships across sexes

Feature weights used to predict cognitive scores were extracted from the sex-specific models and Haufe-transformed. Regional surface area and gray matter volume feature weights to predict the Total Composite score are shown in Fig. 9A, and regional cortical thickness feature weights to predict the Total Composite are shown in Fig. 10A. Across sexes and datasets, there are widespread positive associations between the uncorrected measures of surface area and gray matter volume, and cognitive scores throughout the whole brain (Fig. 9A). However, with ICV correction, HCP males and females demonstrated significantly weaker surface area (corrected p < 0.05) and gray matter volume (corrected p < 0.05) associations with cognition. While similar trends are present in ABCD males and females, the decrease in the strength of the associations is not significant. Regional feature weights of cortical thickness exhibit slightly different trends: in HCP, males and females exhibit diffuse positive and negative associations between uncorrected measures of cortical thickness and cognitive abilities, while in ABCD, males generally exhibit positive associations and females exhibit pre-dominantly negative associations (Fig. 10A). However, across both sexes and datasets, there exist widespread strong negative associations between ICV-corrected measures of cortical thickness and cognition. These differences in associations between the uncorrected and ICV-corrected measures were not significant.

Fig. 9.

Regional surface area and gray matter volume associations with cognition are unique across age groups regardless of ICV correction, but they are shared across sexes without ICV correction and unique across sexes with ICV correction.

Regional feature weights (A) and the correlations between them (B) for sex-specific models based on surface area (left) and gray matter volume (right). Models trained on HCP males, HCP females, ABCD males, and ABCD females using ICV-uncorrected anatomical properties are shown in the top panels, and predictions using ICV-corrected anatomical properties are shown in the bottom panels. Feature weights to predict total cognition are shown in (A) on lateral left (left) and right (right) cortical surfaces. Heatmaps of correlations of regional feature weights are ordered along the rows and columns based on the populations the models were trained on in the following order: HCP Males, HCP Females, ABCD Males, ABCD Females. Within the blocks for each of those training sets, regional feature weights are ordered based on the cognitive scores being predicted as follows: Total Composite, Crystallised Composite, Reading Decoding, Vocabulary Comprehension, Fluid Composite, Visual Episodic Memory, Cognitive Flexibility (Card Sort), Inhibition (Flanker), Processing Speed, Working Memory (List Sorting).

Fig. 10.

Associations between regional cortical thickness and cognition are unique across age groups regardless of ICV correction, but they are unique across sexes without ICV correction and shared across sexes with ICV correction.

Regional feature weights (A) and the correlations between them (B) for sex-specific models based on cortical thickness. Models trained on HCP males, HCP females, ABCD males, and ABCD females using ICV-uncorrected anatomical properties are shown in the top panels, and predictions using ICV-corrected anatomical properties are shown in the bottom panels. Feature weights to predict total cognition are shown in (A) on lateral left (left) and right (right) cortical surfaces. Heatmaps of correlations of regional feature weights are ordered along the rows and columns based on the populations the models were trained on in the following order: HCP Males, HCP Females, ABCD Males, ABCD Females. Within the blocks for each of those training sets, regional feature weights are ordered based on the cognitive scores being predicted as follows: Total Composite, Crystallised Composite, Reading Decoding, Vocabulary Comprehension, Fluid Composite, Visual Episodic Memory, Cognitive Flexibility (Card Sort), Inhibition (Flanker), Processing Speed, Working Memory (List Sorting).

Correlations between the feature weights were also analyzed and are shown for surface area and gray matter volume in Fig. 9B, and for cortical thickness weights in Fig. 10B. Across datasets, there is little to no overlap in the features used to predict cognitive abilities based on uncorrected measures of surface area (average correlation between regional feature weights, r = 0.00) and gray matter volumes (r = −0.01) (Fig. 9B, top panels). However, within datasets, male- and female- specific models rely on shared features to predict cognitive scores based on uncorrected measures of surface area (r = 0.81 for HCP, r = 0.81 for ABCD) and gray matter volume (r = 0.86 for HCP, r = 0.67 for ABCD). With ICV correction, across datasets, there remain little to no overlap in the features used by models based on surface area (r = 0.00) or gray matter volume (r = 0.01), but correlations observed within datasets between sexes are also generally reduced for both surface area (r = 0.24 for HCP, r = 0.29 for ABCD) and gray matter volume (r = 0.24 for HCP, r = 0.31 for ABCD) (Fig. 9B, bottom panels). The opposite pattern is observed with cortical thickness: feature weights extracted from models using ICV-corrected measures are correlated across sexes (r = 0.80 for HCP, r = 0.78 for ABCD) but not datasets (r = −0.20), while those extracted from models using uncorrected measures are reduced across sexes (r = 0.43 for HCP, r = 0.32 for ABCD) and datasets (r = −0.04) (Fig. 10B). Correcting for ICV can reduce univariate sex differences in neuroanatomical properties as well as multivariate predictions of biological sex (Sanchis-Segura et al., 2019; Sanchis-Segura et al., 2020). Relationships between regional neuroanatomical properties and behaviors can also be altered by ICV correction in clinical populations (Voevodskaya et al., 2014). Our findings provide additional evidence that ICV correction can influence interpretation of regional-level brain-behavior relationships, particularly between sexes within a dataset. Based on these data, we emphasize that the unique effects of ICV correction across populations throughout the lifespan bias not only the strength of the brain-behavior relationships we can capture, but also their interpretability.

Discussion

The application of predictive modeling in neuroimaging has provided foundational insights into the neurobiological correlates of behavior. While population-level associations between neuroanatomy and cognition have been extensively studied, prior predictive modeling work has not explicitly addressed the extent to which these relationships are shared across sexes and age groups. A standard practice when studying brain anatomy is to correct for individual differences in ICV using proportion correction, but the impact of this correction on the brain-behavior predictions had not previously been examined. Here, we demonstrate that proportional ICV correction differentially biases behavioral predictions and the subsequent interpretations of the underlying brain-behavior relationships across neuroanatomical properties, sexes, and age groups. For both the ABCD (n = 1823; children) and HCP (n = 1003; adults) datasets, the size of individual cortical regions (in terms of surface area and gray matter volume) predicts behavioral traits within and between sexes and datasets with greater accuracy and generalizability when ICV correction is not implemented. The captured associations between behavioral traits and regional surface area or gray matter volume are unique across children and adults regardless of ICV correction, but unique across sexes with ICV correction and shared otherwise. Conversely, regional cortical thickness predicts behavioral traits with greater accuracy and generalizability when individual differences in ICV are corrected. The associations between behavioral traits and regional cortical thickness are also consistently unique across children and adults but shared across sexes with ICV correction and unique otherwise. Taken together, these results reveal the differential effects of ICV correction on accuracy, generalizability, and interpretability of behavioral predictions across neuroanatomical features, sexes, and age groups. There are marked differences in head size between individuals, presenting a challenge for the measurement of regional volumes and surface areas. Established differences in gray matter volume and ICV between the sexes and throughout the lifespan (Caspi et al., 2020; De Bellis et al. 2001; Bethlehem et al., 2021) have led to widespread implementations of ICV correction when studying brain-behavior relationships across populations. In this present work, we described how proportional ICV correction influences accuracy of behavioral predictions within a population, generalizability of the predictions across populations, and interpretations of the underlying brain-behavior relationships. We began by quantifying how ICV relates to uncorrected neuroanatomical properties and cognitive domains. Here, we observed diverging gradients in the network-specific relationships with uncorrected neuroanatomical properties across adults and children. In adults, surface area and gray matter volume across all networks were strongly related to ICV, but exhibited stronger correlations in heteromodal association cortices than in unimodal somato/motor cortices. Cortical thickness was weakly related to ICV across all networks, but exhibited stronger correlations with ICV in unimodal somato/motor cortices. In children, relationships with ICV were equally strong and comparable between unimodal somato/motor cortices and heteromodal association cortices for surface area and gray matter volume, while correlations with cortical thickness were weak overall but slightly stronger in heteromodal association cortices than unimodal cortices. These data are consistent with extensive work in neurodevelopment establishing that unimodal cortices exhibit earlier cortical expansion and cortical thinning than heteromodal cortices (Sydnor et al., 2021). We also quantified relationships between ICV and proportion-corrected neuroanatomical properties. In doing so, we observed that across both datasets and all networks, the relationships with surface area and gray matter volume were inverted and reduced in magnitude. Meanwhile, correlations between ICV and proportion-corrected cortical thickness were strongly negative. We also noted that in adults and children, ICV is more strongly correlated with crystallized abilities than fluid abilities, which is in agreement with existing work (Farias et al., 2012). Relatedly, ICV alone more successfully predicted crystallized abilities in adults and children, whereas predictions of fluid abilities were only sometimes better than chance levels. Collectively, these data reveal that relationships between ICV and cognitive domains exist during childhood and adulthood, but relationships between ICV and neuroanatomical features are unique across those populations. The analyses also demonstrate that proportional ICV correction does not entirely remove information pertaining to ICV from measures of surface area and gray matter volume, and actually introduces information about ICV to measures of cortical thickness. Therefore, this field standard method may not be ideal when accounting for individual differences in brain volume. The rapidly growing use of predictive modeling in neuroimaging to map brain-behavior relationships has yielded numerous important advances in recent years. Studies have investigated how preprocessing (Li et al., 2019), data transformation (Parkes et al., 2021), predictive algorithms (He et al., 2020), neuroimaging features (Dhamala et al., 2021; Greene et al., 2018), model translation (He et al., 2022), parcellation choices (Dhamala et al., 2021), sample sizes (Marek et al., 2022), and phenotype selection (Chen et al., 2022) can influence neuroimaging-based predictions of individual behaviors. Unfortunately, these studies have in large part relied on single datasets of healthy young adults to train and evaluate model performance, even though it is becoming increasingly evident that models must be not only replicable and reliable within a dataset (Tian and Zalesky 2021), but also generalizable across datasets (Scheinost et al., 2019). In this study, we quantified the extent to which predictive models generalize across distinct populations, evaluating whether this generalizability is influenced by ICV correction. Upon confirming how ICV is related to neuroanatomical features and cognition, we sought to determine how ICV correction differentially influences predictive models of behaviors based on the size or thickness of cortical territories in adults and children. ICV correction reduced (within sex and/or dataset) accuracy and (between sex and/or dataset) generalizability of predictions based on surface area and gray matter, and increased accuracy and generalizability based on cortical thickness. We speculate that these effects are driven by the inherent relationships that ICV has with both regional neuroanatomy and behavior. Given that surface area and gray matter volume are more strongly correlated to ICV than cortical thickness, and the proportion correction reduces and inverts those relationships, ICV correction for measures of surface area and gray matter volumes results in the removal of behaviorally relevant information (captured in ICV) thus impairing predictions. Conversely, proportional ICV correction for cortical thickness introduces ICV-relevant information into the measures that may indirectly, and potentially artifactually, enhance the predictions. This is further supported by our observations that ICV alone can predict cognitive behaviors at levels comparable to that of uncorrected measures of surface area and gray matter volume, and corrected measures of cortical thickness. These observed effects were more pronounced in predictions across datasets than within datasets. As previously mentioned, ICV tends to differ between children and adults and these group differences may explain why models are more influenced by ICV correction when evaluating their generalizability across populations than within populations. Finally, these effects of ICV correction were surprisingly comparable across cognitive domains even though the cognitive domains themselves are differentially related to and predicted by ICV. This suggests that the effect of ICV correction on prediction of a behavioral trait is, to some extent, independent of the relationship that ICV shares with that behavioral trait. Therefore, even in the absence of the underlying relationships between ICV and the behavior of interest, researchers must be aware of the influence that the correction may impart on their predictive modeling and subsequent interpretations. This work provides the basis for further exploration into whether ICV correction equally influences predictions of non-cognitive behaviors, including personality, and mental health. The use of open-access neuroimaging datasets has gained considerable popularity in recent years (Madan 2021; Bzdok and Yeo 2017). Several studies have used these large-scale datasets to model brain-behavior relationships, but if and how those models can be interpreted is still up for debate (Tian and Zalesky 2021; Kohoutová et al., 2020). Here, we evaluated how ICV correction influences model interpretations of the neurobiological features that underlie individual cognitive abilities at both a network-level and a regional-level. At a network-level, feature weights extracted from models based on surface area and gray matter volume were generally unchanged with ICV correction. However, network-level features of cortical thickness demonstrated no interpretable trends without ICV correction, but a definitive gradient of network contributions emerged with ICV correction. These network-level weights across all neuroanatomical features were generally shared across sexes and unique across age groups regardless of ICV correction. At a regional level, examining the feature weights from the uncorrected models may lead one to conclude that relationships between the neuroanatomical features and cognition are unique across age groups but shared across sexes within age groups for surface area and gray matter volume but unique across sexes for cortical thickness. However, if ICV correction is implemented, we observe a different pattern: relationships remain unique across age groups, and are also unique across sexes within age groups for surface area and gray matter volume, but are shared across sexes for cortical thickness. Previously, we suggested that ICV correction inadvertently removes behaviorally relevant information from models based on surface area and gray matter volume and introduces it into models based on cortical thickness. The same mechanism may also explain the discrepancies in feature weights between uncorrected and ICV-corrected models. At the population-level males and females differ in ICV. However, this broad trend masks the presence of substantial variability and overlapping phenotypic distributions across populations. Predictions based on uncorrected measures of surface area and gray matter volume and corrected measures of cortical thickness may rely heavily on overlapping ICV information shared across the sexes (within each age group) while those based on corrected measures of surface area and gray matter volume and uncorrected measures are more reliant on unique relationships that are not driven by ICV. These findings serve as a cautionary tale for researchers using predictive modeling approaches to identify complex multivariate brain-behavior relationships across healthy and clinical populations without considering how factors such as ICV correction may be undermining their efforts and unintentionally biasing their interpretations. Although many researchers have studied age- and sex- related differences in cortical organization and cognitive abilities, most prior experiments have focused on one or the other, or relied on univariate analyses (Cummings et al., 2020; Giedd and Rapoport 2010; Gong et al., 2011; Gur and Gur 2017; Jäncke 2018; Lenroot and Giedd 2010; Scheinost et al., 2015; Hagmann et al., 2010; Fair et al., 2009; Satterthwaite et al., 2015). Leveraging two large, open-access datasets, we quantified age- and sex- specific neurobiological correlates of cognition using multivariate predictive modeling approaches. Surface area of unimodal somato/motor regions, and gray matter volume and cortical thickness of heteromodal association regions were most strongly associated of cognition in adults. In children, the opposite was observed: surface area of heteromodal association regions, and gray matter volume and cortical thickness in unimodal somato/motor regions exhibited the strongest associations with cognition. Cortical surface area increases during childhood, before reaching a global peak at around 9 years of age and then slowly declining (Wierenga et al., 2014). Gray matter volume exhibits a similar trajectory but peaks occur between ages 11 and 14 (Gogtay and Thompson 2010). Global cortical thickness increases from birth until early childhood (Wang et al., 2019), before declining throughout adolescence and adulthood (Zhou et al., 2015). Studies of surface area, gray matter volume, and cortical thickness have established a progression of cortical maturation along the somatomotor-association axis: progression begins in unimodal somato/motor cortices and ends in heteromodal association cortices (Sydnor et al., 2021; Gogtay and Thompson 2010). The differential relationships we observe between the neuroanatomical properties and individual cognitive abilities in children and adults, suggest that brain-behavior relationships likely exhibit a similar developmental trajectory along the somatomotor-association axis as the neuroanatomical properties themselves. Given the cross-sectional nature of this study, we are limited in our ability to draw conclusions about these trajectories. However, future analyses incorporating longitudinal samples from ABCD will be able to capture these trajectories more definitively and further clarify best practices for ICV correction in phenotypic prediction across different modeling scenarios. The present analyses reveal that proportional ICV correction does not sufficiently remove brain volume effects from neuroanatomical measures and can introduce ICV bias where there previously was none. Feature weights obtained from the predictive models reveal that the correction can also influence how we interpret relationships between neuroanatomical properties and cognition. This leads us to the question of whether the biases in prediction accuracy and interpretability due to the correction are generally beneficial or harmful. The use of predictive modeling in neuroscience and medicine has led to broader questions regarding whether accuracy or interpretability (or both) should be prioritized (Bzdok and Ioannidis 2019; Yarkoni and Westfall 2017). Present analyses suggest that if the goal is to yield the most accurate behavioral predictions based on surface area or gray matter volume, uncorrected measures are likely the preferred option. If instead the main purpose of a study is to generate an accurate model based on cortical thickness, ICV-corrected measures are likely preferred, although the improved phenotypic predictions may be artifactual. Finally, if the main focus is to determine the underlying brain-behavior relationships between neuroanatomical properties and behaviors, the use of uncorrected features will reveal direct relationships whereas corrected features will reveal relative associations that may or may not be biased by individual differences in ICV. Hence, a single overarching recommendation to the field about whether or not ICV correction should be implemented is not supported or justified by these reported findings. Rather, these data highlight the need to consider specific study goals, sample composition, possible correlates of ICV, and the potential downstream consequences of proportional ICV correction in future work. Of note, the findings of this study are subject to several limitations. First, these analyses are focused only on evaluating the effects of one of the most widely used ICV correction method: the proportion correction. Other methods for the correction include covariate regression, non-linear modulation based on voxel-based morphometry (Good et al., 2001), the power-corrected proportion (Liu et al., 2014), and the residuals adjustment (Arndt et al., 1991; Mathalon et al., 1993). Methods that rely on population-level information (i.e., covariate regression, power-corrected proportion, and residuals adjustment) for the correction must be implemented separately within each cross-validation fold for every train-test split to prevent data leakage and then applied to the test set. Consequently, the model would have less utility when generalizing it to a population that is distinct from the training set (i.e., a different sex or age group). Prior work has also demonstrated that brain regions and networks exhibit differential areal scaling relative to total area (Reardon et al., 2018). Larger brains demonstrate greater areal expansion in heteromodal association networks than unimodal somato/sensory networks and limbic networks. While prior work focused on regional and total surface area, it is likely that similar patterns may also exist for gray matter volume and cortical thickness relative to ICV. Non-linear scaling of brain regions relative to the whole brain across different brain sizes suggests that the use of proportional ICV correction or linear ICV correction methods that seek to regress or residualize brain volume effects would be unlikely to remove the regional effects completely and equally across individuals with different brain sizes. Consequently, such approaches may also result in greater effects of ICV correction across sexes and age groups where brain volumes are likely to be different and residualizing based on one population (i.e., male adults) is unlikely to generalize to another population (i.e., female children). Second, this study only assessed the effects of proportional ICV correction rather than proportional correction of the phenotype being used for prediction (i.e., dividing regional surface area by total surface area or regional cortical thickness by average cortical thickness). Proportional measures of phenotypes (i.e., the proportion of total surface area comprised by a given region) can be of interest for reasons beyond just correcting for differences in total measures. For example, abnormal enlargement or shrinkage of a given region or network relative to the whole brain may be representative of underlying neurological or psychiatric illnesses and have subsequent influences on behaviors and behavioral predictions. In our present analyses, we find differential contributions of networks to cognitive predictions, but we are not able to assess whether the same networks would emerge when relying on proportional measures of the neuroanatomical measures. Third, the two datasets we relied on for this study capture a relatively small range of ages. HCP includes subjects between the ages of 22 and 37 while ABCD includes subjects who are 9–10 years old. Given this limited age range, we are unable to identify the network-level trajectories of brain-behavior relationships that exist throughout the lifespan. Future analyses of these relationships in adolescents and older adults can be used to supplement our findings and establish trajectories of associations between cognition and neuroanatomical organization in unimodal somatosensory and heteromodal association cortices. Fourth, structural and functional brain organization are influenced by both genetic (Anderson et al., 2021; Ge et al., 2017; Gu et al., 2021; Sabuncu et al., 2016) and environmental factors (Lenroot and Giedd 2008; Blakemore 2012; Tost et al., 2015; Tooley et al., 2021; Tooley et al., 2019). Likewise, there are known genetic and environmental influences on individual cognitive abilities. (Tucker-Drob, Briley, and Harden 2013; Bartels et al., 2002; Rindermann et al., 2010; Krogsrud et al., 2021; Ge et al., 2017). Although the HCP and ABCD datasets are considerably heterogeneous in terms of race/ethnicity, the samples are derived entirely from the United States (US) so environmental factors, including trauma, socioeconomic status, and chemical exposures, are likely to be largely shared across the participants from both datasets. Therefore, these predictive models and the brain-behavior relationships they’ve captured may be specific to a US-centric population and lack generalizability to other environmental contexts. Recent work has also demonstrated that behavioral prediction models based on neuroimaging can exhibit biases across races/ethnicities (Li et al., 2022). Consequently, it is crucial that we consider the complex interplay of ICV, demographics, and cognitive performance when developing models to capture brain-behavior relationships. Finally, we rely on a single dataset for each age group studied, and for each of the two datasets, we included a subset of the participants in these analyses based on data quality and availability. The adults and children in the included subsets of these datasets are not necessarily representative of the general population so we cannot rule out the possibility that the effects observed here are driven by differences in scanners, imaging parameters, or scanning acquisitions in the subset of participants included, unrelated to the age differences. Although all of the HCP data were collected at a single site using the same scanner, the ABCD participants were scanned using two different scanners (GE and Siemens) across 19 different sites, suggesting our results are broadly generalizable across scanners and sites. Moreover, given that our findings align with established developmental cortical maturation trajectories (Blakemore 2012; Stiles and Jernigan 2010; Casey et al., 2005), it is likely that our results are capturing core age-related effects. Related work on behavioral predictions in ABCD based on functional connectivity has also demonstrated that results obtained using a similar subset of participants as included in our analyses are comparable to results obtained using larger populations with more liberal quality control thresholds or using similar sample sizes that are matched for age, sex, family income, and behavior as the broader ABCD population (Chen et al., 2022).

Conclusion

An understanding of the effects of data transformation on predictive models of brain-behavior relationships can enable to development of more accurate, generalizable, and interpretable models. In this work, we establish the differential impact of ICV correction on models of cognition based on cortical size and thickness in adults and children. Accuracy and generalizability were reduced with ICV correction for models based on size (surface area and gray matter volume) but increased for models based on thickness. Interpretability of the features that these models relied on were also affected by ICV correction: brain-behavior associations were unique across children and adults regardless of the correction, but only unique across sexes for models based on ICV-corrected measures of cortical size and uncorrected measures of cortical thickness. Taken together, these findings emphasize that we must carefully consider individual differences in ICV when evaluating brain-behavior associations across populations as those differences, and their potential interactions with demographics, environmental factors, and cognitive performance, may influence the strength and associated interpretability of the underlying relationships.

107 in total

1. Intracranial capacity and brain volumes are associated with cognition in healthy elderly men.

Authors: A M J MacLullich; K J Ferguson; I J Deary; J R Seckl; J M Starr; J M Wardlaw
Journal: Neurology Date: 2002-07-23 Impact factor: 9.910

2. Accelerated longitudinal cortical thinning in adolescence.

Authors: Dongming Zhou; Catherine Lebel; Sarah Treit; Alan Evans; Christian Beaulieu
Journal: Neuroimage Date: 2014-10-13 Impact factor: 6.556

3. I. NIH Toolbox Cognition Battery (CB): introduction and pediatric data.

Authors: Sandra Weintraub; Patricia J Bauer; Philip David Zelazo; Kathleen Wallner-Allen; Sureyya S Dikmen; Robert K Heaton; David S Tulsky; Jerry Slotkin; David L Blitz; Noelle E Carlozzi; Richard J Havlik; Jennifer L Beaumont; Dan Mungas; Jennifer J Manly; Beth G Borosh; Cindy J Nowinski; Richard C Gershon
Journal: Monogr Soc Res Child Dev Date: 2013-08

4. Maximal brain size remains an important predictor of cognition in old age, independent of current brain pathology.

Authors: Sarah Tomaszewski Farias; Dan Mungas; Bruce Reed; Owen Carmichael; Laurel Beckett; Danielle Harvey; John Olichney; Amanda Simmons; Charles Decarli
Journal: Neurobiol Aging Date: 2011-05-04 Impact factor: 4.673

Review 5. Neuroimaging of the Philadelphia neurodevelopmental cohort.

Authors: Theodore D Satterthwaite; Mark A Elliott; Kosha Ruparel; James Loughead; Karthik Prabhakaran; Monica E Calkins; Ryan Hopson; Chad Jackson; Jack Keefe; Marisa Riley; Frank D Mentch; Patrick Sleiman; Ragini Verma; Christos Davatzikos; Hakon Hakonarson; Ruben C Gur; Raquel E Gur
Journal: Neuroimage Date: 2013-08-03 Impact factor: 6.556

6. Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures.

Authors: Avram J Holmes; Marisa O Hollinshead; Timothy M O'Keefe; Victor I Petrov; Gabriele R Fariello; Lawrence L Wald; Bruce Fischl; Bruce R Rosen; Ross W Mair; Joshua L Roffman; Jordan W Smoller; Randy L Buckner
Journal: Sci Data Date: 2015-07-07 Impact factor: 6.444

7. The power-proportion method for intracranial volume correction in volumetric imaging analysis.

Authors: Dawei Liu; Hans J Johnson; Jeffrey D Long; Vincent A Magnotta; Jane S Paulsen
Journal: Front Neurosci Date: 2014-11-06 Impact factor: 4.677

8. Morphometric Similarity Networks Detect Microscale Cortical Organization and Predict Inter-Individual Cognitive Variation.

Authors: Jakob Seidlitz; František Váša; Maxwell Shinn; Rafael Romero-Garcia; Kirstie J Whitaker; Petra E Vértes; Konrad Wagstyl; Paul Kirkpatrick Reardon; Liv Clasen; Siyuan Liu; Adam Messinger; David A Leopold; Peter Fonagy; Raymond J Dolan; Peter B Jones; Ian M Goodyer; Armin Raznahan; Edward T Bullmore
Journal: Neuron Date: 2017-12-21 Impact factor: 17.173

9. Marked effects of intracranial volume correction methods on sex differences in neuroanatomical structures: a HUNT MRI study.

Authors: Carl W S Pintzka; Tor I Hansen; Hallvard R Evensmoen; Asta K Håberg
Journal: Front Neurosci Date: 2015-07-09 Impact factor: 4.677

10. Waves of Maturation and Senescence in Micro-structural MRI Markers of Human Cortical Myelination over the Lifespan.

Authors: Håkon Grydeland; Petra E Vértes; František Váša; Rafael Romero-Garcia; Kirstie Whitaker; Aaron F Alexander-Bloch; Atle Bjørnerud; Ameera X Patel; Donatas Sederevicius; Christian K Tamnes; Lars T Westlye; Simon R White; Kristine B Walhovd; Anders M Fjell; Edward T Bullmore
Journal: Cereb Cortex Date: 2019-03-01 Impact factor: 5.357