Literature DB >> 31676860

Genome-wide association analysis of 19,629 individuals identifies variants influencing regional brain volumes and refines their genetic co-architecture with cognitive and mental health traits.

Bingxin Zhao¹, Tianyou Luo¹, Tengfei Li^2,3, Yun Li^1,4,5, Jingwen Zhang⁶, Yue Shan¹, Xifeng Wang¹, Liuqing Yang⁷, Fan Zhou¹, Ziliang Zhu¹, Hongtu Zhu^8,9.

Abstract

Volumetric variations of the human brain are heritable and are associated with many brain-related complex traits. Here we performed genome-wide association studies (GWAS) of 101 brain volumetric phenotypes using the UK Biobank sample including 19,629 participants. GWAS identified 365 independent genetic variants exceeding a significance threshold of 4.9 × 10-10, adjusted for testing multiple phenotypes. A gene-based association study found 157 associated genes (124 new), and functional gene mapping analysis linked 146 additional genes. Many of the discovered genetic variants and genes have previously been implicated in cognitive and mental health traits. Through genome-wide polygenic-risk-score prediction, more than 6% of the phenotypic variance (P = 3.13 × 10-24) in four other independent studies could be explained by the UK Biobank GWAS results. In conclusion, our study identifies many new genetic associations at the variant, locus and gene levels and advances our understanding of the pleiotropy and genetic co-architecture between brain volumes and other traits.

Entities: Chemical

Mesh：

Substances：
Genetic Markers

Year: 2019 PMID： 31676860 PMCID： PMC6858580 DOI： 10.1038/s41588-019-0516-6

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Regional brain volumes are heritable measures of brain functional and structural changes. Volumetric variations of human brain are known to be phenotypically and genetically associated with heritable cognitive and mental health traits[1-5], and it is an active research area to understand the shared genetic influences on these traits[6]. Individual variations of human brain volume are usually quantified by magnetic resonance imaging (MRI). In region of interest (ROI)-based analysis, whole brain MRIs are processed and annotated onto many pre-defined ROIs, and then regional volumetric phenotypes are generated to measure the structure of brain ROIs. Both twin and population-based studies have shown that these volumetric phenotypes can be highly or moderately heritable. The heritability of brain regions estimated from twin studies can be larger than 80%[7-12]. For example, the heritability of basal ganglia structures (putamen, caudate, pallidum) and limbic and diencephalic regions (hippocampus, amygdala, thalamus) was reported to range from 0.60 to 0.85[11]. Common genetic variants (typically single-nucleotide polymorphisms (SNPs)) can account for more than 50% phenotypic variation in the general population[13-17]. The SNP heritability[18] estimates of accumbens area, amygdala, putamen, palladium, caudate, thalamus and hippocampus range from 0.40 to 0.54[15]. A highly polygenic or omnigenic[19,20] genetic architecture has been observed, which indicates that a large number of genetic variants influence regional brain volumes and their genetic contributions are widespread across the genome. Several genome-wide association studies (GWAS)[3,14,17,21-25] have been conducted to identify genetic risk variants for brain volumetric phenotypes. However, except for the whole brain volume and volumes of a few specific ROIs (e.g., hippocampus in subcortical area[3,17,26]), GWAS of most brain volumetric phenotypes were insufficiently powered, for which the largest sample size of discovery GWAS was less than 10,000 in Elliott et al.[14]. Such GWAS sample size is much smaller than those of recent GWAS of other heritable brain-related traits, such as cognitive function[27], neuroticism[28], and intelligence[29], where sample sizes ranged from 269,867 to 449,484. Given the polygenic nature of brain volumes, most of the genetic risk variants may remain undetected, and GWAS with larger sample size can uncover more associated variants and enrich the pleiotropy and genetic co-architecture with other traits. Recently, the UK Biobank (UKB[30]) study team has collected and released MRI data for more than 20,000 participants. In addition, publicly available imaging genetic datasets also emerge from several other independent studies, including Philadelphia Neurodevelopmental Cohort (PNC[31]), Alzheimer’s Disease Neuroimaging Initiative (ADNI[32]), Pediatric Imaging, Neurocognition, and Genetics (PING[33]), and the Human Connectome Project (HCP[34]), among others. These datasets provide a new opportunity to perform better-powered GWAS of all ROI brain volumes. Here we downloaded the raw MRI data from these data resources and processed the data using consistent standard procedures via advanced normalization tools (ANTs[35,36]) to generate 101 regional (and total) brain volume phenotypes (referred as ROI volumes), including total brain volume (TBV), gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). We used 19,629 UKB individuals of British ancestry in the main discovery GWAS. Four other datasets with relatively small sample sizes (total sample size 2,192 after quality controls) were used to validate the UKB findings, and finally, a meta-analysis was performed to combine all the data. We started our analysis of UKB data by estimating SNP heritability, which is the proportion of phenotypic variation that can be explained by the additive effects of all common autosomal variants[37]. Since the UKB MRI data were released at different time points, we organized them in two parts: the first part was released in 2017 (which we refer to as phase 1, n = 9,198), most of which has been analyzed in Elliott et al.[14], and the second part was released in 2018 (which we refer to as phase 2, n = 10,431). To detect any potential heterogeneity between the two phases, we compared the SNP heritability estimated in phase 2 data to those in phase 1 data. We then carried out GWAS to identify the associated genetic variants for each ROI volume. We performed gene-based association analysis via MAGMA[38] to uncover gene-level associations, and performed post-GWAS functional mapping and annotation (FUMA[39]) to explore the functional consequences of the significant genetic variants. We calculated the pairwise genetic correlation between ROI volumes and 50 brain-related complex traits by the linkage disequilibrium (LD) score regression (LDSC[40]). To confirm the robustness of UKB GWAS findings, we jointly analyzed the UKB GWAS results with those from PNC, ADNI, PING and HCP. We developed genome-wide polygenic risk scores (PRS) to assess the predictive ability of the UKB GWAS results on the four other datasets. GWAS summary statistics of the UKB sample and meta-analysis for the five studies have been made publicly available at https://med.sites.unc.edu/bigs2/data/gwas-summary-statistics/.

RESULTS

SNP heritability estimates of the two UKB phases.

In Supplementary Figure 1, we compare the SNP heritability (h) estimated separately from UKB phase 1 and 2 data. The sample correlation coefficient of these estimates was 0.85 (correlation = 0.85), indicating moderate to high level of agreement in terms of the degree of genetic contributions to ROI between the two phases. The mean h across 101 ROI volumes was 0.41 for phase 1 and 0.37 for phase 2. The difference of mean h was not significant (two-sided t-test, P = 0.12). Ten ROIs had >0.6 h estimates in both phases, including TBV, cerebellar vermal lobules VIII-X, cerebellar vermal lobules I-V, brain stem, left/right cerebellum exterior, left/right cerebellum white matter, and left/right putamen. The h estimates from the combined data were highly correlated with those from phase 1 (correlation = 0.93) and phase 2 (correlation = 0.95) (Supplementary Figs. 2 and 3). The h and the corresponding 95% confidence interval (CI) are illustrated in Supplementary Figures 4-6. The h estimates, standard errors, raw and Bonferroni-corrected P-values from the one-sided likelihood ratio tests are provided in Supplementary Table 1. In the combined data, h of most ROIs was significant after Bonferroni correction for multiple testing (mean h = 0.40, h range = (0.12, 0.72), standard error = 0.15). SNP heritability estimates of left basal forebrain (h = 0.10) and optic chiasm (h = 0.06) were not significant. These h estimates were comparable with previous results[14,15]. In addition, for each ROI, we examined the genetic correlation (gc) of its regional volumes collected in the two phases. The gc estimates distributed around the point one, and the 95% CIs of gc estimates covered the point one for most ROIs (Supplementary Table 1 and Supplementary Fig. 7). In summary, SNP heritability and genetic correlation analyses indicate that most ROI volumes are heritable and have largely consistent genetic basis in the two phases data.

Significant GWAS associations of 101 ROI volumes.

We carried out GWAS of the 101 ROI volumes using 8,944,375 genetic variants after genotyping quality controls. Manhattan and QQ plots of all 101 phenotypes are displayed in Supplementary Datasets 1 and 2, respectively. In the rest of this paper, we use 4.9 × 10−10 (that is, 5 × 10−8/101, additionally adjusted for all 101 GWAS performed) as the significance threshold for genetic variant-level associations unless otherwise stated. We found that 365 independent significant variants had 494 significant associations with 58 ROIs (Supplementary Tables 2 and 3) at the 4.9 × 10−10 significance level. Independent significant variants were defined as significant variants that were independent of other significant variants by FUMA (Methods). The number of associations for each ROI is displayed in Figure 1 and Supplementary Table 2. Left/right hippocampus, left/right putamen, and cerebellar vermal lobules VIII-X had at least 30 independent significant variants. The number of independent significant associations on each chromosome is shown in Supplementary Table 4. Chromosome 12 had the largest number of independent variant-level associations after weighting by chromosome length (Supplementary Fig. 8).

Figure 1 ∣

Number of independent significant variant-level associations discovered in UKB GWAS (n = 19,629 subjects) at different significance levels.

The P-values are raw P-values of two-sided t-test statistics. The outer layer counts the number of associations for each ROI volume with P < 5 × 10−8, the middle layer counts the ones with P < 5 × 10−9, and inner layer counts P < 4.9 × 10−10. The 4.9 × 10−10 threshold corresponds to adjusting for testing multiple imaging phenotypes with Bonferroni correction.

Based on the pre-calculated LD structure from the 1000 Genomes reference panel[41], variants in LD with independent significant variants were identified and then (independent) lead variants and genetic risk loci were defined (Methods). The 494 independent significant variant-level associations were further characterized as 170 significant associations between genetic risk loci and ROI volumes (Supplementary Table 5). Brain stem, X4th ventricle, cerebellar vermal lobules VIII-X, cerebellar vermal lobules VI-VII, left/right putamen, left/right cerebellum exterior, left/right hippocampus, left/right lateral ventricle, left pallidum, TBV and WM had at least five associated loci (Supplementary Table 2). Each chromosome had at least one associated locus except for chromosomes 13, 21 and 22 (Supplementary Table 6). Results at significance thresholds 5 × 10−8 and 5 × 10−9 are also provided in the above tables and summarized in Supplementary Table 7. We also performed association analysis for 283,120 genetic variants on the X chromosome (Methods) but observed no significant association at the 4.9 × 10−10 significance level.

Concordance with previous GWAS results.

We performed association lookups for the 365 independent significant variants and their correlated variants in the NHGRI-EBI GWAS catalog[42]. We found that 166 independent significant variants (associated with 47 ROI volumes) have previously reported GWAS associations with other traits (Supplementary Table 8). Our results tagged many variants that were previously reported in GWAS of ROI volumes, including 19 variants in van der Meer et al.[3] for hippocampal subfield volumes, 12 in Hibar et al.[17] for subcortical brain region volumes, 6 in Chen et al.[43] for putamen volume, 4 in Bis et al.[25] for hippocampal volume, 2 in Hibar et al.[21] for hippocampal volume, 2 in Stein et al.[44] for brain structure, 2 in Ikram et al.[24] for intracranial volume, 1 in Furney et al.[45] for whole brain volume, and 1 in Baranzini et al.[46] for normalized brain volume (Supplementary Table 9). For the other traits, we highlighted previous associations of 46 variants with mental health disorders (such as schizophrenia, autism spectrum disorder (ASD), and depression), 98 with cognitive functions, 25 with educational attainment, 24 with neuroticism, 14 with Parkinson’s disease, 4 with reaction time, and 3 with Alzheimer’s disease. We observed more overlap with previous GWAS results when the significance threshold was relaxed to 5 × 10−8 (Supplementary Table 10). We also compared our results with those reported in Elliott et al.[14], who performed GWAS of 3,144 imaging phenotypes (including brain volume phenotypes processed by FreeSurfer[47]) using the UKB phase 1 data (n = 8,428). When both were corrected for the number of GWAS analyses performed, 26 of the 78 significant variants reported in Elliott et al.[14] were in LD (r2 ≥ 0.6) with our independent significant variants (Supplementary Table 11). When both were relaxed to the 5 × 10−8 significance threshold, 124 of their 616 significant variants were in LD with our independent significant variants.

Gene-based association analysis and functional mapping.

We performed gene-based association analysis with GWAS summary statistics for 18,796 candidate genes (Methods). We found 281 significant gene-level associations (P < 2 × 10−8, adjusted for multiple traits) between 157 genes and 55 ROIs (Supplementary Table 12). Our results replicated 33 genes discovered in previous studies, including FOXO3 in Baranzini et al.[46] for normalized brain volume, GATAD2B in Hibar et al.[48] for lentiform nucleus volume, GNA12 in Sprooten et al.[49] for white matter integrity, MCC in Kim and Webster[50] for brain cytoarchitecture, HMGA2 and HRK in Stein et al.[44] for brain structure, KANSL1, MAPT, STH and CENPW in Ikram et al.[24] for intracranial volume, GMNC, WNT3 and PDCD11 in Klein et al.[51] for intracranial volume, SLC44A5 in Furney et al.[45] for whole brain volume, MSRB3, BCL2L1, DCC and CRHR1 in Hibar et al.[17] for subcortical brain region volumes, LEMD3, WIF1 and ASTN2 in Bis et al.[25] for hippocampal volume, MAST4, FAM53B, METTL10 and FAF1 in van der Meer et al.[3] for hippocampal subfield volumes, DSCAML1 and KTN1 in Chen et al.[43] for putamen volume, and ZIC4, VCAN, PAPPA, DRAM1, DAAM1 and ALDH1A2 in Elliott et al.[14] for brain imaging measurements. We found that 124 genes were novel and had not been linked to ROI volumes previously (Supplementary Table 13). Of the 157 detected genes, 70 have previously been implicated with cognitive functions, intelligence, education, neuroticism, neuropsychiatric and neurodegenerative diseases/disorders, such as IGF2BP1[29,52], WNT3[27,28,53,54], PLEKHM[54-56], and AGBL2[28,54,57,58]. Particularly, 47 of the 70 pleiotropic genes were novel genes of ROI volumes, and thus these findings substantially uncovered the gene-level pleiotropy between ROI volumes and these traits (Fig. 2).

Figure 2 ∣

Genes identified in gene-based association analysis of ROI volumes (n = 19,629 subjects) that have been linked to cognitive traits and mental health disorders in previous GWAS.

For each of the ROI-associated genes listed in the x-axis, we manually checked the previously reported associations on the NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/). The novel and previously reported genes of ROI volumes were labeled with two different colors (orange and green, respectively).

The independent significant variants were also annotated by functional consequences on gene functions (Supplementary Table 14 and Supplementary Fig. 9), and were subsequently mapped to genes according to physical position, expression quantitative trait loci (eQTL) association (for brain tissues), and 3D chromatin (Hi-C) interaction (Methods). Functional gene mapping yielded 505 significant associations for 279 genes and 53 ROIs (Supplementary Table 15). Of the 279 genes, 163 were not discovered in the above gene-based association analysis, which replicated more previous findings on ROI volumes, such as FBXW8 in Stein et al.[44] for brain structure, WNT16 in Zheng et al.[59] for cortical thickness, TBPL2 in Chen et al.[43] for putamen volume, FAT3 in Hibar et al.[17] for subcortical brain region volumes, FAM175B, LHPP, SLC4A10, RNFT2, TESC, FOXD2, DMRTA2, CDKN2C and DPP4 in van der Meer et al.[3] for hippocampal subfield volumes, and EPHA3, SLC39A8, BANK1, CHPT1, ACADM, FAM3C, L3HYPDH, JKAMP, and AQP9 in Elliott et al.[14] for brain imaging measurements. We found that 53 (41 new) of the 163 genes were associated with cognitive functions, intelligence, education, neuroticism, neuropsychiatric and neurodegenerative disorders, such as NT5C2[28,55,60,61], ADAM10[61,62], and GOSR1[27,55] (Supplementary Fig. 10). Particularly, 182 significant Hi-C interactions were observed in the Hi-C functional mapping analysis (Supplementary Table 16), which yielded 33 significant associations between 13 genes and 16 ROIs (Supplementary Table 17). Of the 13 genes, 5 were not mapped by physical position or eQTL association, such as C5orf64 for left pericalcarine. C5orf64 has been reported to be associated with cognitive functions and intelligence[27], education and math ability[55], as well as risk behaviors[63] and Alzheimer’s disease[64]. In addition, we explored the biological interpretations of our GWAS results by performing several enrichment and annotation analyses, including gene property analysis by MAGMA and chromatin-based annotation analysis by stratified LDSC[65] (Methods). To gain more insights into the biological mechanisms, we used DEPICT[66] and MAGMA to conduct gene set analysis (Methods). The results can be found in Supplementary Note and are summarized in Supplementary Tables 18-21. In general, though some positive results can be obtained from these analyses, the present GWAS still has limited power to infer the specific biological pathway(s) influencing brain ROI volumes, and future GWAS with larger sample size is needed to further explore the biological mechanisms of brain imaging phenotypes.

Joint analysis with four independent datasets.

To validate the UKB GWAS results, we repeated GWAS of 101 ROI volumes separately on data obtained from four other independent studies: PNC (n = 537), HCP (n = 334), PING (n = 461), and ADNI (n = 860). Due to the small sample size of these four datasets, the probability of replicating significant findings in the UKB was low. Instead, we checked whether the effect signs were concordant in the five studies and whether the P-value of top UKB risk variants decreased after meta-analysis (Methods). Smaller P-values after meta-analysis indicate similar variant effects in independent samples[67,68]. We carried out a joint analysis on 3,841,911 genetic variants that were present in all five sets of GWAS results. For the 7,310 significant associations (at 4.9 × 10−10 significance level), 63.8% (4,666) associations had the same effect signs across the five studies, and 97.0% (7,090) associations had the same effect signs in at least four studies (including UKB). Specifically, the number of genetic variants that had the same effect sign as UKB was 6,823 (93.3%) for ADNI, 6,436 (88.0%) for HCP, 6,455 (88.3%) for PING, and 6,648 (91.0%) for PNC. Exact binomial test[69] showed a significant non-random agreement in effect signs across all the four studies (one-sided P < 2.2 × 10−16, null hypothesis: agreement has a probability 0.5). 93.9% (1,877) of the top 2,000 significant associations had smaller P-value after meta-analysis, and 91.4% (6,678) of the 7,310 associations were enhanced. We then performed meta-analysis on all 8,944,375 UKB GWAS genetic variants (variants were allowed to be missing in the four independent datasets). Compared to the UKB GWAS results (Supplementary Table 2, Supplementary Fig. 11, and Supplementary Note), there were more significant associations after meta-analysis: 29,585 significant associations at 5 × 10−8 significance level and 16,591 at 4.9 × 10−10 significance level (Supplementary Table 22 and Supplementary Fig. 12).

Genetic correlation with other traits.

We used the meta-analysis GWAS results to estimate the genetic correlation with other traits via LDSC. As positive controls, we first estimated the genetic correlation between several UKB ROIs volumes (TBV, left/right thalamus proper, left/right caudate, left/right putamen, left/right pallidum, left/right hippocampus, left/right accumbens area) and their corresponding traits studied in the ENIGMA consortium[70]. The gc estimates were all significant (P < 4.13 × 10−6), and average correlation was 0.95 (Supplementary Table 23). We then collected 50 sets of publicly available GWAS summary statistics (Supplementary Table 24) and calculated their pairwise genetic correlation with ROI volumes (Supplementary Table 25). We mainly focused on traits that showed evidence of pleiotropy in association lookups. There were 22 significant associations after adjusting for multiple testing by the Benjamini-Hochberg (B-H) procedure at 0.05 level (Supplementary Table 26 and Supplementary Fig. 13). Significant genetic correlations linked 13 ROI volumes with general cognitive functions, education (education years, college completion), intelligence, numerical reasoning, reaction time, depressive symptoms, neuroticism, and bipolar disorder (BD) (Fig. 3), which matched our findings in variant and gene level lookups. Particularly, TBV had positive correlations with cognitive functions, education, intelligence, and numerical reasoning (gc range = (0.20, 0.25), mean = 0.22, P-value range = (1.52 × 10−11, 3.45 × 10−5)). These results matched the previous finding that brain size has small but significant connections with cognitive performance[71]. Reaction time had negative correlations with left/right pallidum, left/right ventral DC, and WM (gc range = (−0.20, −0.13), P-value range = (3.80 × 10−7, 1.14 × 10−4)). The negative correlations between reaction time and WM volumes have been previously reported[72,73]. Further details can be found in Supplementary Note. When the FDR level was relaxed to 0.1, suggestive evidence was observed for more brain-related traits, such as ASD and sleep traits (Supplementary Table 26 and Supplementary Fig. 14). In conclusion, our results confirm the significant genetic correlation among these traits and quantify the degree of their genetic overlaps.

Figure 3 ∣

Selected pairwise genetic correlations between ROI volumes (n = 21,821 subjects) and other traits.

The pairwise genetic correlations were estimated and tested by LDSC (https://github.com/bulik/ldsc). Stars are significant associations after adjusting for multiple testing by the Benjamini-Hochberg procedure at 0.05 significance level. The y-axis lists the ROI volumes. The x-axis provides the name of cognitive or mental health traits, the consortium sharing the GWAS summary statistics, and the corresponding sample sizes (see Supplementary Table 24 for further information about these studies).

Predictive ability of the UKB GWAS results.

We examined the out-of-sample prediction power of the UKB GWAS summary statistics using polygenic risk scores prediction[74]. We first used a ten-fold cross-validation design to examine the prediction power within the UKB sample for seven ROIs, including thalamus proper, caudate, putamen, pallidum, hippocampus, accumbens area, and TBV (Methods). The polygenic profiles can explain 1.18%-3.93% phenotypic variance (P-value range = (7.88 × 10−210, 4.90 × 10−72)) for these ROIs. The largest R-squared 3.93% was observed on putamen. Next, we used ROI-derived profiles to carry out cross-trait prediction on brain-related traits including education, reaction time, numeric memory, and fluid intelligence. The largest R-squared of a single profile was 0.24% (P = 7.53 × 10−7), which occurred when using the TBV-derived profile to predict fluid intelligence. When putting the profiles of seven ROIs together in one multivariate model, the R-squared for predicting fluid intelligence can be improved to 0.52% (P = 1.89 × 10−9). These results are summarized in Supplementary Table 27. We then used the GWAS summary statistics of 19,629 UKB individuals to construct polygenic profiles on subjects in PNC, HCP, PING, and ADNI. We found that, for 11 ROIs (Fig. 4), the genetically predicted regional volume was significantly associated with the observed ROI volume in all four validation datasets after Bonferroni correction (that is, 101 × 4 = 404 tests), and can account for 1.17%-6.38% phenotypic variance (P-value range = (3.31 × 10−24, 1.68 × 10−5)) (Supplementary Table 28). For example, the R-squared of right putamen-derived profile was 6.38% in ADNI and 4.85% in PNC. Furthermore, 29 genetically predicted regional volumes were significant in at least three of the four datasets, 56 in at least two datasets, and 84 in at least one dataset (Supplementary Figs. 15-17). In summary, our within-UKB and out-of-UKB PRS analyses clearly indicate that UKB GWAS summary statistics of ROI volumes have widespread prediction power across ROIs. However, the R-squared can be low when predicting other brain-related complex traits. Such results are unsurprising because the genetic correlations among these traits were found to be small (though significant) in LDSC analysis.

Figure 4 ∣

Prediction accuracy (incremental R-squared) of polygenic risk scores constructed by UKB GWAS (n = 19,629 subjects) summary statistics on the four independent datasets.

The y-axis lists the ROI volumes (left/right cerebellum exterior, left/right putamen, left/right cerebellum white matter, left hippocampus, cerebellar vermal lobules VIII-X, X4th ventricle, right accumbens area and TBV). The x-axis lists the four independent cohorts (ADNI, HCP, PING and PNC). The displayed numbers are the proportions of phenotypic variation that can be additionally explained by polygenic risk scores, i.e., the incremental R-squared (see Methods for details of polygenic risk prediction).

DISCUSSION

In this study, we presented GWAS of 101 ROI volumes using data of 19,629 UKB individuals. Our novel contributions include: (i) identification of many new genetic associations at variant, locus, and gene levels; (ii) insights into the genetic co-architecture of brain volume phenotypes and other brain-related complex traits; (iii) validation of the UKB results in independent studies; and (iv) assessment of the predictive power of UKB GWAS results. Significant (P < 4.9 × 10−10) associations were found for 58 of the 101 ROIs. With larger sample size, the present study replicated many known genetic variants but also prioritized new ones. Compared to Elliott et al.[14], our GWAS not only discovered more genetic variants, but also enriched the degree of (statistical) pleiotropy[75] of the associated genes and characterized the shared genetic influences with cognitive and mental health traits. Our SNP heritability estimates are aligned with those previous results of existing twin studies. For example, our results supported previous findings that the degree of genetic control varies across different regions within the brain[7,12,76,77]. We also confirmed that cortical ROIs have larger variability in their heritability estimates than subcortical and ventricular ROIs[11]. In addition, some subcortical ROIs, such as putamen, cerebellum white matter, and brain stem[11,78], were confirmatively highly heritable. On the other hand, SNP heritability of ROI volumes were found to be generally lower than estimates reported in twin studies[7-10]. This is expected[79] and may indicate that genetic influences cannot be fully captured by additive effects of common genetic variants[37]. Such gaps may inspire future work to explore the effects of rare genetic variants on ROI volumes and to better model the genetic variation of the brain. The present GWAS still faces some limitations. First, the current GWAS sample size of ROI volumes (and many other brain imaging phenotypes) is still far from sufficient. The highly polygenic genetic architecture of ROI volumes requires a larger number of individuals to identify many weak causal variants. In the era of sharing GWAS summary statistics, well-powered GWAS is essential for ROI volumes to be linked to the genetic co-architecture atlas with other complex traits. For example, a recent study of Watanabe et al.[75] to discover the global overview of genetic co-architecture of 2,965 traits only focused on GWAS with sample size larger than 50,000, with the average sample size of selected traits being 256,276. In our genetic correlation analysis, we only obtained limited number of significant correlations, even though many pleiotropic genes were found in association lookups. In addition, ROI-derived PRS currently may have insufficient power to predict other brain-related traits. Therefore, we expect that GWAS of ROI volumes with larger sample size will be available and can further improve our understating of genetic overlaps underlying other traits. Besides increasing the sample size, combining genotyping data with external information, such as gene expression data[80], may also help elucidate causal mechanisms, improve prediction performance, and identify genetic connections among traits. Second, potential imaging artifacts, such as MRI hardware and software changes[81], may cause unwanted variation in downstream genetic analyses, especially when combining multi-site and multiple-phase neuroimaging data[82-84]. In the present GWAS, we confirmed that the pairwise genetic correlations between UKB phases 1 and 2 data distributed around the point one, and verified that the UKB GWAS results had satisfactory prediction ability on four other independent datasets. However, we found that the SNP heritability estimates of the two phases data were not perfectly harmonized. The inadequate GWAS sample size may partially explain the variation in these heritability estimates, but it is also possible that artificial factors impaired the consistency of our results (see Table 1 of Smith and Nichols[82] for a list of common imaging batch effects). Future studies that integrate data from more sites and phases are expected to be batch effects-aware and to confirm the previous GWAS findings.

METHODS

GWAS participants and phenotypes.

We performed GWAS separately on five publicly available datasets: the UK Biobank (UKB, http://www.ukbiobank.ac.uk/resources/) study, the Human Connectome Project (HCP,https://www.humanconnectome.org/) study, the Pediatric Imaging, Neurocognition, and Genetics (PING, http://pingstudy.ucsd.edu/resources/genomics-core.html) study, the Philadelphia Neurodevelopmental Cohort (PNC, https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v1.p1) study, and the Alzheimer’s Disease Neuroimaging Initiative (ADNI, http://adni.loni.usc.edu/data-samples/) study. The main GWAS made use of data of 19,629 individuals of British ancestry from the UKB study, and the four other GWAS were performed on individuals of European ancestry (see Supplementary Table 29 for a summary of sample size of each GWAS). The raw MRI, covariates and genetic data were downloaded from each data resource. We processed the MRI data locally using consistent procedures via advanced normalization tools (ANTs, http://stnava.github.io/ANTs/) to generate ROI volume phenotypes for each dataset. The processing steps are detailed in Supplementary Note, and we removed three ROIs (X5th ventricle and left/right lesion) with missing rates > 99%. For each phenotype and continuous covariate variable, we further removed values greater than five times the median absolute deviation from the median value. All individuals were aged between 3 and 92 years. More information about study cohorts can be found in Supplementary Table 30 and the Supplementary Note.

Heritability estimation and genome-wide association analysis.

We estimated the proportion of variation explained by all autosomal genetic variants in UKB using GCTA-GREML analysis[85] (http://cnsgenomics.com/software/gcta/). The adjusted covariates included age (at imaging), age-squared, sex, age-sex interaction, age-squared-sex interaction, TBV (for ROIs other than TBV itself), as well as the top 40 genetic principle components (PCs) provided by UKB[86] (Data-Field 22009). The heritability estimates were tested in one-sided likelihood ratio tests. For genetic variants of autosomes, we performed association analysis for each ROI volume using PLINK[87] (https://www.cog-genomics.org/plink2/). The same set of covariates as in GCTA-GREML analysis were adjusted. The marginal genetic effects were tested in two-sided t-tests. GWAS were also separately performed on PING, PNC, ADNI, and HCP data. In these four datasets, we adjusted for age, age-squared, sex, age-sex interaction, age-squared-sex interaction, TBV (for ROIs other than TBV itself), and top ten genetic PCs estimated from the genetic variants. We also adjusted for Alzheimer’s disease status in ADNI GWAS. To examine the genetic correlation between UKB phase 1 and phase 2 data, we performed GWAS separately on data of the two phases. For genetic variants on the X chromosome, we performed association analysis using XWAS[88] (version 3.0, http://keinanlab.cb.bscb.cornell.edu/content/xwas/). We coded male genotypes on X chromosome as 0/2, and sex was considered as a covariant in the model.

Genomic risk loci characterization and comparison with previous findings.

Genomic risk loci were defined using FUMA online platform (version 1.3.4, http://fuma.ctglab.nl/). We input the UKB GWAS summary statistics obtained from PLINK. FUMA first identified independent significant variants, which were defined as variants with a P-value smaller than the predefined threshold and independent of other significant variants at r2 < 0.6. Using these independent significant variants, FUMA then constructed LD blocks for independent significant variants by tagging all variants that had a MAF ≥ 0.0005 and were in LD (r2 ≥ 0.6) with at least one of the independent significant variants. These variants included those from the 1000 Genomes reference panel and may not have been included in the present study. Based on these independent significant variants, (independent) lead variants were also identified as those that were independent from each other (r2 < 0.1). If LD blocks of independent significant variants were closed (<250 kb based on the closest boundary variants of LD blocks), they were merged to a single genomic locus. Thus, each genomic locus could contain more than one independent significant variants and lead variants. Independent significant variants and all the tagged variants were subsequently searched by FUMA in the NHGRI-EBI GWAS catalog (version 2019-01-31, https://www.ebi.ac.uk/gwas/) to look for their reported associations (P < 9 × 10−6) with any traits.

Gene-based association analysis and functional annotation.

Gene-based association analysis was carried out for 18,796 protein-coding genes using MAGMA (v1.07, https://ctg.cncr.nl/software/magma/), which was also implemented in FUMA. Genetic variants were mapped according to their psychical positions, and then the gene-based P-values were calculated by the GWAS summary statistics of mapped variants. Default MAGMA parameters were used, which mapped genetic variants to genes with no window around genes (window size = 0). In functional annotation and mapping analysis, variant-level signals were annotated with their biological functionality and then were linked to genes by a combination of positional, eQTL, and 3D chromatin interaction mappings. Specifically, independent significant variants and all the tagged variants were first annotated for functional consequences on gene functions (e.g., intergenic, intronic, exonic) using ANNOVAR[89] (version 2017-01-11). Functionally-annotated variants were then mapped to 35,808 candidate genes based on physical position on the genome (tissue/cell types for 15-core chromatin state: brain), eQTL associations (tissue types: GTEx[90] v7 brain, BRAINEAC[91], and CommonMind Consortium[92]) and chromatin interaction mapping (built-in chromatin interaction data: dorsolateral prefrontal cortex, hippocampus[93]; annotate enhancer/promoter regions: E053-E082 brain[94]). We used default values for all other parameters. For the detected genes, we performed lookups in the NHGRI-EBI GWAS catalog (version 2019-05-03) again to explore the previously reported associations with the same or other traits. We focused on traits including cognitive functions (such as general cognitive ability, cognitive performance, and empathy quotient), intelligence, educational attainment, math ability (such as highest math class taken and self-reported math ability), reaction time, neuroticism, neurodegenerative diseases (such as Alzheimer’s disease and Parkinson’s disease), and neuropsychiatric disorders (such as major depressive disorder, schizophrenia, and bipolar disorder).

Biological annotation and enrichment analyses.

For the 14 brain tissues (GTEx[90] v7), we performed gene property analysis via MAGMA. That is, for each candidate gene, we tested whether its tissue-specific expression levels can be linked to the strength of its association with ROI volumes. We also performed cell-type/tissue-specific chromatin-based annotation analysis using stratified LDSC (https://github.com/bulik/ldsc/wiki/Cell-type-specific-analyses). The cell-type/tissue-specific annotations of DNase I hypersensitivity and activating histone marks (H3K27ac, H3K4me3, H3K4me1, H3K9ac and H3K36me3) were from the Roadmap Epigenomics consortium[94] and the ENCODE project[95]. For each annotation, we tested whether it had an enriched contribution to per-SNP heritability, conditional on the other annotations. DEPICT (version 1 rel194, https://github.com/perslab/depict) and MAGMA gene set analyses were used to explore the implicated biological pathway by the UKB GWAS summary statistics. Specifically, DEPICT tested 10,968 reconstituted gene sets, and the GWAS summary statistics with P < 10−5 were used as input. The MAGMA gene set analysis examined 10,678 gene sets from the Molecular Signatures Database[96] (MSigDB, v6.2, http://software.broadinstitute.org/gsea/msigdb), including 4,761 curated gene sets and 5,917 Gene Ontology (GO) terms. All parameters in these analyses were set as default.

Meta-analysis of GWAS results.

We meta-analyzed the UKB, PING, PNC, ADNI, and HCP GWAS summary results using METAL (https://genome.sph.umich.edu/wiki/METAL) with the sample-size weighted approach. Since the sample sizes of four other datasets were small, we removed the variants that were not presented in the UKB data.

Genetic correlation estimation with LDSC.

LD Hub (v1.9.1, http://ldsc.broadinstitute.org/ldhub/) was used to estimate the genetic correlation between several UKB ROIs volumes and their corresponding traits studied in the ENIGMA consortium (http://enigma.ini.usc.edu/). The LDSC software (v1.0.0, https://github.com/bulik/ldsc) was then used to estimate the pairwise genetic correlation with 50 sets of collected GWAS summary statistics. In addition, for each ROI, we also examined the genetic correlation between its regional volumes collected in UKB phases 1 and 2. We used the pre-calculated LD scores provided by LDSC (https://data.broadinstitute.org/alkesgroup/LDSCORE/), which were computed using 1000 Genomes European data. We used HapMap3[97] variants and removed all variants in the major histocompatibility complex (MHC) region.

Polygenic scoring.

Polygenic profiles were created to examine the out-of-sample prediction power of the GWAS results. Specifically, we used PLINK to generate risk scores in testing data by summarizing across variants, weighed by their effect sizes estimated from training data. To account for the LD structure, two procedures were used: (i) LD-based pruning (window size 50, step 5, r2 = 0.2); and (ii) posterior effect size estimation under continuous shrinkage prior with an external LD reference panel[98] (https://github.com/getian107/PRScs). We tried five P-value thresholds for predictor selection in each of the two procedures: 1, 0.5, 0.05, 5 × 10−4 and 5 × 10−8. Thus, ten polygenic profiles were generated for each ROI volume, and we reported the best prediction power that can be achieved by a single profile of the ten. The association between polygenic profile and phenotype was estimated and tested in linear regression model, adjusting for the effects of age and sex. The additional phenotypic variation that can be explained by polygenic profile (i.e., the incremental R-squared) was used to measure the prediction power. For UKB dataset, we randomly divided the 19,629 UKB individuals into ten folds, then used nine of these folds as training data to rerun GWAS, and created polygenic profiles on the individuals in the remaining fold, which served as testing data. We repeated this procedure ten times such that each fold alternated to serve as the testing data for exactly one time. We examined seven ROIs including thalamus proper, caudate, putamen, pallidum, hippocampus, accumbens area, and TBV. For the first six ROIs, their volumes were the sum of volumes of the corresponding left and right ROIs. We then used these ROI-derived profiles to predict four brain-related traits: education (Data-Field: 845), reaction time (Data-Field: 20023), numeric memory (Data-Field: 4282), and fluid intelligence (Data-Field: 20016). We first assessed the cross-trait prediction ability of each profile, and then we selected the best profile for each ROI and put the seven profiles together in one model for multivariate analysis. Next, we used the UKB GWAS results to perform prediction on ADNI, PING, PNC and HCP data for all 101 ROI volumes. The prediction accuracy was evaluated on all samples in the four testing sets (with phenotype and genetic data available), not limited to individuals of European ancestry used in GWAS.

Reporting summary.

Further information on research design is available in the Life Sciences Reporting Summary linked to this article.

Data availability

The data used in this work were obtained from five publicly available datasets: the UK Biobank (UKB) study, the Human Connectome Project (HCP) study, the Pediatric Imaging, Neurocognition, and Genetics (PING) study, the Philadelphia Neurodevelopmental Cohort (PNC) study, and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. We used 50 sets of publicly available GWAS summary statistics from several GWAS databases. The data resources are summarized in Supplementary Table 24. All UKB and meta-analysis GWAS summary statistics of 101 ROI volumes can be found at: https://med.sites.unc.edu/bigs2/data/gwas-summary-statistics/.

Code availability

We made use of publicly available software and tools. All codes used to generate results that are reported in this paper are available upon request.

80 in total

1. The relationship between brain volumes and intelligence in bipolar disorder.

Authors: Annabel Vreeker; Lucija Abramovic; Marco P M Boks; Sanne Verkooijen; Annet H van Bergen; Roel A Ophoff; René S Kahn; Neeltje E M van Haren
Journal: J Affect Disord Date: 2017-07-06 Impact factor: 4.839

2. Conceptual and data-based investigation of genetic influences and brain asymmetry: a twin study of multiple structural phenotypes.

Authors: Lisa T Eyler; Eero Vuoksimaa; Matthew S Panizzon; Christine Fennema-Notestine; Michael C Neale; Chi-Hua Chen; Amy Jak; Carol E Franz; Michael J Lyons; Wesley K Thompson; Kelly M Spoon; Bruce Fischl; Anders M Dale; William S Kremen
Journal: J Cogn Neurosci Date: 2013-11-27 Impact factor: 3.225

3. Genetic and environmental influences on the size of specific brain regions in midlife: the VETSA MRI study.

Authors: William S Kremen; Elizabeth Prom-Wormley; Matthew S Panizzon; Lisa T Eyler; Bruce Fischl; Michael C Neale; Carol E Franz; Michael J Lyons; Jennifer Pacheco; Michele E Perry; Allison Stevens; J Eric Schmitt; Michael D Grant; Larry J Seidman; Heidi W Thermenos; Ming T Tsuang; Seth A Eisen; Anders M Dale; Christine Fennema-Notestine
Journal: Neuroimage Date: 2009-09-26 Impact factor: 6.556

4. Genetic and environmental influences on neuroimaging phenotypes: a meta-analytical perspective on twin imaging studies.

Authors: Gabriëlla A M Blokland; Greig I de Zubicaray; Katie L McMahon; Margaret J Wright
Journal: Twin Res Hum Genet Date: 2012-06 Impact factor: 1.587

5. What twin studies tell us about the heritability of brain development, morphology, and function: a review.

Authors: Arija G Jansen; Sabine E Mous; Tonya White; Danielle Posthuma; Tinca J C Polderman
Journal: Neuropsychol Rev Date: 2015-02-12 Impact factor: 7.444

6. Distinct Genetic Influences on Cortical and Subcortical Brain Structures.

Authors: Wei Wen; Anbupalam Thalamuthu; Karen A Mather; Wanlin Zhu; Jiyang Jiang; Pierre Lafaye de Micheaux; Margaret J Wright; David Ames; Perminder S Sachdev
Journal: Sci Rep Date: 2016-09-06 Impact factor: 4.379

7. Do regional brain volumes and major depressive disorder share genetic architecture? A study of Generation Scotland (n=19 762), UK Biobank (n=24 048) and the English Longitudinal Study of Ageing (n=5766).

Authors: E M Wigmore; T-K Clarke; D M Howard; M J Adams; L S Hall; Y Zeng; J Gibson; G Davies; A M Fernandez-Pujals; P A Thomson; C Hayward; B H Smith; L J Hocking; S Padmanabhan; I J Deary; D J Porteous; K K Nicodemus; A M McIntosh
Journal: Transl Psychiatry Date: 2017-08-15 Impact factor: 6.222

8. Beyond a bigger brain: Multivariable structural brain imaging and intelligence.

Authors: Stuart J Ritchie; Tom Booth; Maria Del C Valdés Hernández; Janie Corley; Susana Muñoz Maniega; Alan J Gow; Natalie A Royle; Alison Pattie; Sherif Karama; John M Starr; Mark E Bastin; Joanna M Wardlaw; Ian J Deary
Journal: Intelligence Date: 2015 Jul-Aug

9. Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112 151).

Authors: G Davies; R E Marioni; D C Liewald; W D Hill; S P Hagenaars; S E Harris; S J Ritchie; M Luciano; C Fawns-Ritchie; D Lyall; B Cullen; S R Cox; C Hayward; D J Porteous; J Evans; A M McIntosh; J Gallacher; N Craddock; J P Pell; D J Smith; C R Gale; I J Deary
Journal: Mol Psychiatry Date: 2016-04-05 Impact factor: 15.992

10. Genome-wide association studies of brain imaging phenotypes in UK Biobank.

Authors: Lloyd T Elliott; Kevin Sharp; Fidel Alfaro-Almagro; Sinan Shi; Karla L Miller; Gwenaëlle Douaud; Jonathan Marchini; Stephen M Smith
Journal: Nature Date: 2018-10-10 Impact factor: 49.962

58 in total

1. A human cell atlas of fetal chromatin accessibility.

Authors: Silvia Domcke; Andrew J Hill; Riza M Daza; Junyue Cao; Diana R O'Day; Hannah A Pliner; Kimberly A Aldinger; Dmitry Pokholok; Fan Zhang; Jennifer H Milbank; Michael A Zager; Ian A Glass; Frank J Steemers; Dan Doherty; Cole Trapnell; Darren A Cusanovich; Jay Shendure
Journal: Science Date: 2020-11-13 Impact factor: 47.728

2. Breed Differences in Dog Cognition Associated with Brain-Expressed Genes and Neurological Functions.

Authors: Gitanjali E Gnanadesikan; Brian Hare; Noah Snyder-Mackler; Josep Call; Juliane Kaminski; Ádám Miklósi; Evan L MacLean
Journal: Integr Comp Biol Date: 2020-10-01 Impact factor: 3.326

3. Volumetric GWAS of medial temporal lobe structures identifies an ERC1 locus using ADNI high-resolution T2-weighted MRI data.

Authors: Shan Cong; Xiaohui Yao; Zhi Huang; Shannon L Risacher; Kwangsik Nho; Andrew J Saykin; Li Shen
Journal: Neurobiol Aging Date: 2020-07-14 Impact factor: 4.673

4. Brain structure and problematic alcohol use: a test of plausible causation using latent causal variable analysis.

Authors: Alexander S Hatoum; Emma C Johnson; Arpana Agrawal; Ryan Bogdan
Journal: Brain Imaging Behav Date: 2021-07-21 Impact factor: 3.978

5. Novel loci and potential mechanisms of major depressive disorder, bipolar disorder, and schizophrenia.

Authors: He Wang; Zhenghui Yi; Tieliu Shi
Journal: Sci China Life Sci Date: 2021-06-16 Impact factor: 6.038

6. Pleiotropic effects of telomere length loci with brain morphology and brain tissue expression.

Authors: Gita A Pathak; Frank R Wendt; Daniel F Levey; Adam P Mecca; Christopher H van Dyck; Joel Gelernter; Renato Polimanti
Journal: Hum Mol Genet Date: 2021-06-26 Impact factor: 6.150

7. Characterizing the effect of background selection on the polygenicity of brain-related traits.

Authors: Frank R Wendt; Gita A Pathak; Cassie Overstreet; Daniel S Tylee; Joel Gelernter; Elizabeth G Atkinson; Renato Polimanti
Journal: Genomics Date: 2020-12-02 Impact factor: 5.736

8. Transcriptome-wide association analysis of brain structures yields insights into pleiotropy with complex neuropsychiatric traits.

Authors: Bingxin Zhao; Yue Shan; Yue Yang; Zhaolong Yu; Tengfei Li; Xifeng Wang; Tianyou Luo; Ziliang Zhu; Patrick Sullivan; Hongyu Zhao; Yun Li; Hongtu Zhu
Journal: Nat Commun Date: 2021-05-17 Impact factor: 14.919

9. Deep learning model reveals potential risk genes for ADHD, especially Ephrin receptor gene EPHA5.

Authors: Lu Liu; Xikang Feng; Haimei Li; Shuai Cheng Li; Qiujin Qian; Yufeng Wang
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

10. X-chromosome influences on neuroanatomical variation in humans.

Authors: Travis T Mallard; Siyuan Liu; Jakob Seidlitz; Zhiwei Ma; Dustin Moraczewski; Adam Thomas; Armin Raznahan
Journal: Nat Neurosci Date: 2021-07-22 Impact factor: 24.884