Sahin Naqvi1,2, Yoeri Sleyp3, Hanne Hoskens3,4, Karlijne Indencleef4,5, Jeffrey P Spence6, Rose Bruffaerts7,8,9, Ahmed Radwan4,10, Ryan J Eller11, Stephen Richmond12, Mark D Shriver13, John R Shaffer14,15, Seth M Weinberg14,15,16, Susan Walsh11, James Thompson17, Jonathan K Pritchard6, Stefan Sunaert4,10, Hilde Peeters3, Joanna Wysocka18,19,20, Peter Claes21,22,23,24. 1. Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA. naqvi@stanford.edu. 2. Departments of Genetics and Biology, Stanford University School of Medicine, Stanford, CA, USA. naqvi@stanford.edu. 3. Department of Human Genetics, KU Leuven, Leuven, Belgium. 4. Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium. 5. Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium. 6. Departments of Genetics and Biology, Stanford University School of Medicine, Stanford, CA, USA. 7. Department of Neurosciences, KU Leuven, Leuven, Belgium, Hasselt University, Hasselt, Belgium. 8. Neurology Department, University Hospitals Leuven, Leuven, Belgium, Hasselt University, Hasselt, Belgium. 9. Biomedical Research Institute Hasselt University Hasselt Belgium, Hasselt University, Hasselt, Belgium. 10. Department of Imaging and Pathology, Translational MRI, KU Leuven, Leuven, Belgium. 11. Department of Biology, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA. 12. Applied Clinical Research and Public Health, School of Dentistry, Cardiff University, Cardiff, UK. 13. Department of Anthropology, Pennsylvania State University, State College, PA, USA. 14. Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA. 15. Department of Oral and Craniofacial Sciences, Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, PA, USA. 16. Department of Anthropology, University of Pittsburgh, Pittsburgh, PA, USA. 17. Department of Psychology, George Mason University, Fairfax, VA, USA. 18. Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA. wysocka@stanford.edu. 19. Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA. wysocka@stanford.edu. 20. Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA. wysocka@stanford.edu. 21. Department of Human Genetics, KU Leuven, Leuven, Belgium. peter.claes@kuleuven.be. 22. Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium. peter.claes@kuleuven.be. 23. Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium. peter.claes@kuleuven.be. 24. Murdoch Children's Research Institute, Melbourne, Victoria, Australia. peter.claes@kuleuven.be.
Abstract
Evidence from model organisms and clinical genetics suggests coordination between the developing brain and face, but the role of this link in common genetic variation remains unknown. We performed a multivariate genome-wide association study of cortical surface morphology in 19,644 individuals of European ancestry, identifying 472 genomic loci influencing brain shape, of which 76 are also linked to face shape. Shared loci include transcription factors involved in craniofacial development, as well as members of signaling pathways implicated in brain-face cross-talk. Brain shape heritability is equivalently enriched near regulatory regions active in either forebrain organoids or facial progenitors. However, we do not detect significant overlap between shared brain-face genome-wide association study signals and variants affecting behavioral-cognitive traits. These results suggest that early in embryogenesis, the face and brain mutually shape each other through both structural effects and paracrine signaling, but this interplay may not impact later brain development associated with cognitive function.
Evidence from model organisms and clinical genetics suggests coordination between the developing brain and face, but the role of this link in common genetic variation remains unknown. We performed a multivariate genome-wide association study of cortical surface morphology in 19,644 individuals of European ancestry, identifying 472 genomic loci influencing brain shape, of which 76 are also linked to face shape. Shared loci include transcription factors involved in craniofacial development, as well as members of signaling pathways implicated in brain-face cross-talk. Brain shape heritability is equivalently enriched near regulatory regions active in either forebrain organoids or facial progenitors. However, we do not detect significant overlap between shared brain-face genome-wide association study signals and variants affecting behavioral-cognitive traits. These results suggest that early in embryogenesis, the face and brain mutually shape each other through both structural effects and paracrine signaling, but this interplay may not impact later brain development associated with cognitive function.
The human cerebral cortex forms the outer layer of gray matter of the brain and underpins cognitive function. It is characterized by complex folding patterns varying between species and individuals[1,2]. Family- and twin-based studies indicate substantial heritability of brain shape[3,4], and a recent genome-wide association study (GWAS) found that brain shape is highly polygenic with genetic correlations to a broad range of neuropsychiatric disorders and behavioral-cognitive phenotypes[5]. These studies focused on pre-defined, univariate measures of brain shape, such as total or regional surface area, extracted from structural magnetic resonance imaging (MRI) scans[6], which cannot capture morphological complexities of the cortical surface. We recently developed a data-driven approach to phenotyping complex, multidimensional traits[7]; this multivariate approach, when applied to facial surface images, revealed numerous loci with no previously known role in human face shape variation[7,8]. Here, we implemented this approach to discover associations between common genetic variants and brain shape, using MRI data from middle-aged participants in the UK Biobank (UKB) free of disease diagnosis.In addition to sharing complex morphologies, the development of the brain and face is highly integrated due to shared developmental lineage, spatial proximity, and signaling crosstalk between both structures[9]. Early in embryonic development, the rostral end of the ectodermally-derived neural tube gives rise to the forebrain, which in turn gives rise to the cerebrum that encompasses the cerebral cortex[10]. Just before forebrain formation, a subset of neuroepithelial cells within the neural folds give rise to facial progenitor cells called cranial neural crest cells (CNCCs)[11]. Following specification, CNCCs undergo an epithelial-to-mesenchymal transition and migrate ventrally[12], giving rise to most of the craniofacial skeleton and connective tissue[13]. Early brain growth rates can modulate both positioning and outgrowth of the facial prominences[14,15], as well as induce flexion and bone deposition of CNCC-derived basicranial bones[16,17] and neurocranial sutures[18,19], respectively. Finally, paracrine factors secreted by either the developing forebrain[20-23] or CNCCs[24-26] modulate the face or brain development, respectively.These physical and molecular interactions have been detailed by studies in developing chick and mouse embryos, but are also supported by widespread co-occurrence of neurodevelopmental and craniofacial malformations in rare human syndromes[27]. This phenomenon was noticed by Demyer et al.[28] in 1964, when he coined the phrase “the face predicts the brain” to describe correlations between the severity of brain and face malformations in holoprosencephaly patients. While in some cases this co-occurrence may be caused by pleiotropic gene functions, a number of human syndromes have been mapped to genes functioning in brain-face crosstalk through paracrine signaling[29-31]. Nonetheless, close developmental links between face and brain are underappreciated; whether and how they extend to common human genetic variation influencing brain and face shape is unknown.
RESULTS
Multivariate GWAS of brain shape
We adapted our previously published data-driven phenotyping approach[7] to brain shape, as measured by MRI scans of 19,644 individuals in UKB. Participants included were of primarily European ancestry, such that results do not pertain to cross-population brain shape differences . We focused on the mid-cortical surface (midway between the white-grey matter interface and the pial surface with the cerebrospinal fluid, as extracted using FreeSurfer[6]), which we refer to as brain shape. Using mid-cortical surfaces represented by a mesh of three-dimensional vertices, the method segments brain shape in a global-to-local manner, yielding brain segments at different hierarchical levels of scale. Within each segment, principal component analysis (PCA) is used to describe effects in multivariate shape-space explaining between-individual variation, and canonical correlation analysis (CCA) is used to define, for each variant, the linear combination of PCs maximally associated with single nucleotide polymorphism (SNP) dosage. Unsurprisingly, GWAS of left and right hemispheres from the same individuals showed highly concordant results (Supplementary Fig. 1); we therefore performed subsequent analyses using left-right hemisphere averaged data.Applying this pipeline to UKB MRI data defined 285 hierarchical segments (Fig. 1, Supplementary Table 1), decomposing brain shape into different levels of detail, from larger brain segments with integrated variation, to more smaller brain segments with local effects. Each hierarchical level is a bipartition of its parent; the first level consisted of the entire brain, while the second and third levels segmented the whole brain into halves and quadrants, respectively, and the final, ninth level resulted in numerous smaller segments (Figure 1b, right). Many smaller segments from the seventh hierarchical level onwards were discarded due to small surface areas, resulting in fewer total segments than the 511 (29 - 1) expected. Nevertheless, the ninth hierarchical level yielded a substantial number (74) of retained segments; a tenth level would contribute few additional segments (Supplementary Fig. 2). The segmentation broadly agreed with the commonly-used Desikan-Killiany[32], Destrieux[33], and Glasser[34] brain atlases (Supplementary Fig. 3). Prior to GWAS, we adjusted for covariates including total brain volume, height, BMI, sex, and population structure, as well as performing standard SNP filtering and quality control (Methods). Applying linkage disequilibrium score regression (LDSC)-based heritability estimation to each segment’s GWAS (see online Methods, Supplementary Note for details on extension to multivariate traits) yielded intercept values close to 1 (range across segments 0.987–1.007, mean 1.001, Supplementary Table 1), indicating minimal confounding by population structure or cryptic relatedness. In total, we conducted 285 multivariate GWAS using CCA, each corresponding to one segment. 38,630 SNPs showed genome-wide significant (P < 5 x 10−8) association with brain shape in at least one segment; of these, 23,413 reached study-wide significance (P < 2.07 x 10−10 correcting for the number of effective GWAS, estimated by permutation, Methods) in at least one segment. Collapsing these SNPs into independent signals based on linkage disequilibrium and distance yielded 472 and 242 loci reaching genome- and study-wide significance, respectively (Supplementary Table 2). Most of the 472 loci showed effects on multiple segments (305/472, 65%), and many showed effects on multiple quadrants (158/472, 33%) (Figure 1, Supplementary Table 2), consistent with global-to-local effects at multiple levels of brain shape. Masking of associations from progressively higher hierarchical levels revealed that segments from higher levels contributed a substantial fraction of associations – for example, segments beyond the first three levels contributed 169 and 55 loci reaching genome-wide and study-wide significance, respectively (Extended Data Fig. 1). Associations between the 472 loci and brain shape were depleted from the frontal lobe segments (except for the most anterior orbitofrontal cortex) and enriched in the occipital and temporal lobe segments (Supplementary Fig. 4), mostly in agreement with point-wise heritability estimates (Extended Data Fig. 2).
Fig. 1:
Multivariate genome-wide association study (GWAS) of brain shape.
a, Upstream processing of UK Biobank (UKB) magnetic resonance imaging (MRI) images. b, in the polar dendrogram on the left, each concentric ring of filled circles corresponds to a hierarchical level (labeled i-ix) shown on the right, and the filled circle colors correspond to the respective segments in the same hierarchical level. c, Ideogram showing genomic locations and regional effects of 472 genome-wide significant loci for brain shape. Circles and diamonds represent associations passing the study-wide or genome-wide significance thresholds, colors represent broad regions of the brain with the indicated effects.
Extended Data Fig. 1
Number of additional brain shape loci contributed by hierarchical levels.
For all genome-wide (left) or study-wide (right) significant associations, associations with all segments in hierarchical levels up to the indicated number were masked, and the number of remaining associations was assessed.
Extended Data Fig. 2
Point-wise SNP heritability estimates across the mid-cortical surface.
Colors represent the total SNP heritability (computed by a linear mixed model approach, see Methods) at each point on the mid-cortical surface, represented by a set of three-dimensional coordinates in each individual.
We assessed the overlap between the 472 loci and previous GWAS of brain surface areas or subcortical volumes[5,35-39]. The 472 loci recapitulated 27-78% of the associations reported in previous studies; the highest overlap of 78% was with a recent study of univariate brain surface area[5], the phenotype most comparable to the shape measures studied here (Table 1). Of the 472 loci, 121 overlapped with those reported in previous studies on brain surface area or subcortical volume, while 351 represent previously undescribed associations with brain morphology. To assess the reproducibility of the 472 loci on the same shape measures, we analyzed MRI data from the Adolescent Brain Cognitive Development (ABCD) study[40]. Of the 472 loci, 466 were tested for replication (Methods). At 5% FDR, we replicated at least one associated segment for 305 of 466 (65.4%) loci, and 2,645 of 3,586 (73.8%) locus-segment combinations (Supplementary Table 3). We observed consistent rates when subdividing based on the hierarchical level of the segments being replicated, albeit with a slight decrease in replication rate at higher levels (Extended Data Fig. 3). These replication rates are notable given the substantial age difference of the ABCD cohort (9–10 years versus 40–70 years in UKB). The high reproducibility of GWAS results between the two cohorts suggests that despite the known continued growth and morphological changes of the brain throughout adolescence and into adulthood[41], many of the observed associations with brain shape originate during development and are maintained throughout life.
Table 1.
Overlap between previous GWAS of brain surface areas or subcortical volumes with brain shape GWAS in this study.
‘Subcortical combined’ refers to a combined set of loci from four studies of subcortical volume measures[35–38].
Study
# loci tested
# lead SNP P < 5x10-8
# proxy SNP P < 5x10-8
% overlap
Subcortical combined[31–34]
65
15
18
27.6
Grasby et al[5]
301
195
236
78.4
Zhao et al[35]
494
212
273
55
Extended Data Fig. 3
Replication rates in the ABCD cohort by hierarchical level.
Only segments in the indicated hierarchical level were considered, and all loci (left) or locus-segment pairs (right) reaching genome-wide significance in those segments were tested for replication in the ABCD cohort at a 5% FDR.
We next used Functional Mapping and Annotation of GWAS (FUMA)[42] and the Genomic Regions Enrichment of Annotations Tool (GREAT)[43] to identify pathways enriched among genes near the 472 loci, as well as curated gene panels used to guide disease diagnoses[44] to identify disease associations (Methods). As expected, we found strong enrichment for brain-specific processes (neurogenesis, axonogenesis, neuron differentiation, nervous system development, neuron projection guidance), morphogenesis-related processes (anatomical structure morphogenesis, animal organ morphogenesis), and neurodevelopmental disorders (intellectual disability, malformations of cortical development, ciliopathies). We also observed a weak enrichment of terms related to formation and closure of the neural tube, suggesting that early developmental events impact adult brain shape. Surprisingly, we also observed strong enrichment of terms related specifically to CNCC development and migration, as well as weaker enrichment of broader terms encompassing skeletal system development, chondrogenesis, and osteogenesis (Supplementary Data 1). Furthermore, we found strong and weak enrichments for craniosynostosis (premature closer of the cranial bone sutures) and clefting gene panels, respectively. These enrichments suggest a link between variation in brain shape and craniofacial skeletal development.
Loci affecting both brain and face shape
To more directly test for sharing of genetic effects between brain and face shape, we intersected the 472 loci described in this study with 203 loci previously associated with face shape in European-ancestry individuals through a similar, open-ended phenotyping approach[8]. Thirty-seven of the loci for brain shape are linked to (r2 > 0.2) at least one of the face shape loci, significantly above random expectation (P = 2.03 x 10−22, OR = 10.6) and greater than the overlap with other traits that have similar numbers of genome-wide significant associations in the NCBI-EBI GWAS Catalog[45] (Extended Data Fig. 4). Identifying signals showing genome-wide significant association with one of brain or face shape and suggestive (P < 5 x 10−7) association with the other resulted in 76 brain-face shared loci (Figure 2a).
Extended Data Fig. 4
Overlap between genome-wide significant brain shape loci and genome-wide significant loci from 430 other studies.
GWAS hits (number on x-axis) for other studies were obtained from the NCBI-EBI GWAS Catalog, and P-values (left, y-axis) and odds ratios (right, y-axis) for significance of overlap with regions in LD (> 0.2) with brain shape loci were computed using bedtools’ fisher function (see Methods). Note that relative to other traits with equivalent numbers of GWAS hits, face shape shows overlap with brain shape loci greater in both significance and magnitude.
Fig. 2:
Loci affecting both brain and face shape.
a, Miami plot of GWAS for brain (top) and face (bottom) shape. For each SNP, p-values aggregated across all brain or face segments are plotted. All 76 loci reaching genome-wide (P < 5 x 10−8) significance in one study and genome-wide suggestive (P < 5 x 10−7) significance in the other are highlighted by unfilled circles. Right-tailed, one-sided P-values were computed based on canonical correlation analysis (CCA) chi-squared statistics; exact p-values are available in Supplementary Table 4. Loci near candidate genes highlighted in the text and in b and c are labeled, generally on the side where they show greater significance of association. b, Expression (in transcripts per million, TPM) of candidate genes near brain-face shared loci in cranial neural crest cells (CNCCs) of different passages, representing different stages of maturation, from early (P1) to late (P4) and their chondrocyte (Chond. D9) derivatives[52] (left), and three dimensional forebrain organoids at various stages of differentiation[53] (right), further sorted into glial or neuronal lineages or profiled as whole organoids. c, Regional phenotypic effects of four candidate loci, showing effects of linked single nucleotide polymorphisms (SNPs) on brain (left) or face (right) shape. Segments shown are of hierarchical level v, −log10(p-values) are normalized to the maximum at each locus. Full face and brain images from all 76 brain-face shared loci corresponding to all hierarchical levels can be found online (see Data Availability)
Genes near the 76 brain-face shared loci were strongly enriched for disease associations, including “skeletal disorders” and “hearing and ear disorders”, consistent with the contribution of CNCCs to craniofacial skeleton and ear structures. We next manually scanned the 76 brain-face shared loci for genes with known roles in craniofacial or brain development from human syndromes and/or mouse knockouts (Supplementary Table 4). We observed that many of the shared brain-face loci include genes encoding transcription factors (TFs) involved in neural crest formation and/or craniofacial skeletal development. Some of those TFs (for example DLX5/6, SOX9, ZEB2, ZIC2, ZIC3, TCF4) have known functions in both neural crest and brain development, and this pleiotropy may account for the shared brain-face genetic signals. However, other shared brain-face signals are associated with TFs thought to function primarily during neural crest rather than brain development, and whose mutation causes specific craniofacial defects; those TFs include ALX1 and ALX4 (associated with frontonasal dysplasias[46,47]), TWIST1 (associated with Saethre-Chotzen Syndrome[48,49]), PAX3 (associated with Waardenburg syndrome[50]), and TFAP2B (associated with CHAR syndrome[51]). Consistent with the primary role of these TFs in facial development, transcriptome analysis showed high expression in in-vitro derived human CNCCs and their chondrocyte derivatives[52], but low/no expression in either glia or neurons of human forebrain organoids spanning a range of developmental stages[53] (Figure 2b). These observations suggest that genetic variants affecting key craniofacial TFs have a greater than previously appreciated impact on brain shape.Interactions between face and brain can be architectural, with the forebrain acting as a structural support for facial development, and facial skeletal structures flexing to accommodate early brain growth[54]. However, these interactions can also involve paracrine signaling, with fibroblast growth factor (FGF), Hedgehog, and bone morphogenetic protein (BMP) pathways known to mediate the signaling from the developing brain to the face[20-22]. Interestingly, genes encoding members of all three pathways, FGF (FGF2, FGF13, FGF18, SPRY2), Hedgehog (PTCH1), and BMP (BMP2, BMP4) are among the shared brain-face loci. For example, mutations in PTCH1, encoding the receptor for the sonic hedgehog ligand, cause holoprosencephaly[55], a congenital, structural forebrain anomaly with associated craniofacial malformations. Conversely, CNCCs secrete anti-BMP signaling molecules that modulate forebrain development[24,25]; expression of these BMP antagonists is dependent on the SIX family TFs, whose perturbation in CNCCs leads to both craniofacial malformations and secondary pre-otic brain defects[56]. SIX1/4 is also among the 76 brain-face shared loci (Figure 2a). Furthermore, genes linked to other signaling pathways, including Wnt (DAAM1, DAAM2, TNKS, AHI1, FBXW11, MCC) and transforming growth factor beta (LEMD3, PPP2R3A), are among the shared brain-face loci. Not unexpectedly, and in contrast to craniofacial TFs, signaling pathway ligands, receptors and regulators are variably expressed between in-vitro derived CNCCs and brain organoids (Figure 2b).Phenotypically, these highlighted loci largely affect brain shape in the frontal and temporal lobes, and face shape in the forehead and nose, as exemplified by PAX3 and ALX1 (Figure 2c), consistent with the physical proximity of the frontonasal prominence and the forebrain during development. Phenotypic effects distinct from this pattern include effects of variants near BMP4 and DLX6 on jaw and chin morphology, consistent with their known roles in mandibular development[57,58], and effects of variants near PTCH1 on occipital lobe morphology (Figure 2c). Together, these results suggest that both cell-intrinsic mechanisms and paracrine signaling pathways contribute to the substantial number of loci with shared associations with brain and face shape.
Genome-wide sharing of signals with neuropsychiatric disorders and behavioral-cognitive traits
We next asked whether the brain-face overlap among genome-wide significant loci held across the genome, also considering GWAS of neuropsychiatric disorders and behavioral-cognitive traits. LDSC can estimate genetic correlations between univariate traits using signed summary statistics[59]. However, this approach is not applicable to unsigned statistics yielded by CCA. We therefore applied an alternative method of assessing genome-wide sharing of signals between two GWAS, summarizing SNP p-values within approximately independent LD blocks and computing Spearman correlations between the two summarized profiles (Methods). When applied to pairs of univariate GWAS, the Spearman correlation method was largely concordant with, albeit generally smaller in magnitude than, unsigned estimates of LDSC-estimated genetic correlations (Extended Data Fig. 5), indicating that it is a conservative, robust measure for quantifying genome-wide sharing of GWAS signals.
Extended Data Fig. 5
Comparison of LDSC genetic correlations and Spearman correlation between pairs of univariate traits.
Each point represents a pair of univariate traits (of all those considered in this study, see Methods), while the x- and y-axes indicate the absolute value of the LDSC-estimated genetic correlation and the estimated genome-wide sharing of effects by the Spearman correlation method. Point colors and shapes indicate significance (P < 0.05) from LDSC or the Spearman correlation method, respectively. Exact p-values are provided in Supplementary Table 6.
We first assessed sharing of association signals between 63 face segments and 285 brain segments (Supplementary Table 5). All four main facial quadrants, representing shape variation within the forehead, nose, lower face (mandible and cheeks) and philtrum, respectively, showed the most sharing with frontal lobe segments, particularly the most anterior portions such as the rostral prefrontal cortex, and the least sharing with parietal lobe segments (Figure 3a). Furthermore, among the facial quadrants, the forehead and nose showed more sharing with frontal lobe segments than the philtrum and lower face. These genome-wide correlations are consistent with the phenotypic effects of top brain-face shared loci (Figure 2c, Supplementary Fig. 5).
Fig. 3:
Genome-wide sharing of signals with neuropsychiatric disorders and behavioral-cognitive traits.
Genome-wide sharing of signals between any two given GWAS was assessed by Spearman correlation of linkage disequlibrium block-average SNP −log10(p-values) (Methods). a, Spearman correlations between GWAS of indicated facial quadrants and brain segments. b, Spearman correlations between GWAS of selected neuropsychiatric disorders, behavioral-cognitive traits, subcortical volume measures and brain segments, or select immune traits. All brain segments in a,b are from hierarchical level v segmentation, with the exception of Hippocampus, where hierarchical level vi segmentation shows a strong correlation of shape of the hippocampal region with volume. c, Spearman correlations between shape effects on the full brain (left) or face (right) with the indicated traits. * 5% false discovery rate (FDR) based on bootstrapped p-value (Methods). Images of brain-trait correlations at all six hierarchical levels can be found online (see Data Availability). Abbreviations: ADHD, Attention Deficit Hyperactivity Disorder ;GEN, generalized epilepsy; JME, juvenile myoclonic epilepsy; ICV, intracranial volume; T1D, type 1 diabetes; SSC, systemic sclerosis; RA, rheumatoid arthritis.
We next assessed sharing of signals with other brain-related traits. We used publicly-available genome-wide summary statistics for a range of neuropsychiatric disorders, behavioral-cognitive traits, and subcortical brain volumes from studies other than UKB, since our Spearman correlation measure does not control for sample overlap (Supplementary Table 6). As approximate negative controls, we used four immune-related diseases shown to have minimal genetic correlation with schizophrenia and bipolar disorder[60]. Subcortical volumes showed the most sharing with brain shape in the corresponding regions, but the magnitude of these correlations was relatively low (on par with sharing between brain and face shape), indicating that our multivariate GWAS approach detects effects beyond those resulting from changes in relative subcortical volume (Figure 3b). We found that disorders with primarily developmental etiology showed substantial sharing with brain shape in regions previously linked to these disorders. For instance, schizophrenia and attention deficit hyperactivity disorder (ADHD) showed sharing with shape variation in the primary auditory[61,62] and prefrontal cortex regions[63], respectively. In contrast, we did not observe this association for Alzheimer’s disease, caused by plaque buildup and neurodegeneration much later in life. Behavioral-cognitive traits such as intelligence, neuroticism and worry showed broader patterns of sharing with brain shape, reflecting the involvement of distributed cortical regions in these traits[64-66] (Figure 3b). Sharing between brain shape and the immune diseases was generally lower than with neuropsychiatric disorders, behavioral-cognitive traits, or subcortical volumes, but reached significance for type 1 diabetes and rheumatoid arthritis (Figure 3c). This overlap may be because these immune traits have genetic correlation with brain-related traits other than those tested previously (schizophrenia and bipolar disorder), as suggested by a significant genetic correlation between rheumatoid arthritis and intelligence (Extended Data Fig. 6).
Extended Data Fig. 6
Genetic correlations between RA (rheumatoid arthritis) and univariate brain-related traits.
Finally, we compared the degree to which face shape shares signals with neuropsychiatric disorders, behavioral-cognitive traits, and subcortical volumes. Brain shape shares significant (FDR 5%) signal with most neuropsychiatric traits, as well as all behavioral-cognitive and subcortical volume traits analyzed. In contrast, face shape does not show significant sharing with any of the neuropsychiatric disorders or behavioral-cognitive traits, and significant but weaker sharing with the subcortical volume measures (Figure 3c). To confirm these patterns using univariate approaches, we performed GWAS on the most heritable individual PCs of full brain or face shape and computed genetic correlations using LDSC. Although genetic correlation estimates were noisy due to low heritability of univariate shape GWAS, they agreed with our Spearman correlation measure, finding non-zero genetic correlations between both brain and face shape and subcortical volumes, and between brain shape and both autism spectrum disorder and bipolar disorder (Extended Data Fig. 7). Thus, the substantial sharing of signals between brain and face shape (Figure 3a) appears to be mostly independent of neuropsychiatric disorder risk and behavioral-cognitive traits, perhaps because mutual influences of face and brain shape on each other involve phenotypic effects on brain shape distinct from those influencing neuropsychiatric disorder risk and behavioral-cognitive traits.
Extended Data Fig. 7
Genetic correlations between the most heritable brain (top two rows) or face (bottom two rows) shape PCs and other traits.
Points (center of error bars) represent estimated genetic correlations (rg) between the top ten shape PCs (for segment 1, the full brain or face) with heritability z-score > 3 and each of the indicated univariate traits using LD score regression. Error bars represent 95% confidence intervals. *, 5% FDR for indicated PC; +, 10% FDR.
Cell-types influencing brain and face shape
Our results thus far suggest that a substantial fraction of brain shape variation is underpinned by face shape, but that these observed effects are largely independent of effects shared between brain shape and other cognitive traits. To test this idea further, we sought to identify the cell-types most enriched for heritability of brain shape, face shape, and other cognitive traits. Partitioning heritability into cell-type-specific functional annotations via stratified LD score regression (S-LDSC) can prioritize trait-relevant cell-types, but was developed for univariate traits[67]; we thus sought to extend the theoretical framework of S-LDSC to multivariate traits such as our brain and face shape GWAS. We proved that when applying unstratified LDSC[59] to χ2 statistics obtained from multivariate traits with independent dimensions and further corrected for dimensionality, the LDSC-estimated heritability equals the average heritability of the component univariate traits (see Methods, Supplementary Note), a proof that we validated through heritability estimation of each PC making up the full face (Extended Data Fig. 8). By extension, heritability enrichments obtained by applying S-LDSC on multivariate, corrected χ2 statistics partitioned by annotation represent the average heritability enrichment for each component univariate trait (see Methods, Supplementary Note).
Extended Data Fig. 8
SNP heritability of individual face shape PCs and multivariate face shape estimated by LDSC.
Points (center of error bars) represent estimated SNP heritability of each PC. Error bars represent 95% confidence intervals. The red line represents the mean heritability of all 70 PCs, and the blue line indicates the heritability obtained by applying LDSC to corrected χ2 statistics from the multivariate CCA GWAS using all 70 PCs.
We collected genome-wide data on open chromatin (inferred from Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq)) and active regulatory regions (inferred from chromatin immunoprecipitation followed by sequencing (ChIP-seq) of histone marks) from a variety of cell-types and tissues, including in-vitro derived CNCCs and their chondrocyte derivatives[52,68], embryonic craniofacial tissue at different stages of development[69], neuronal and glial cells from three-dimensional forebrain organoids at various differentiation stages[53], and both fetal and adult brain tissue[70]. We quantified brain and face shape heritability enrichments for these cell-type-specific annotations (Supplementary Data 2). Face shape showed significant (5% FDR) heritability enrichment specific to regulatory regions in craniofacial cell-types (mean Z-score 4.58) (Figure 4a). Brain shape showed significant and comparable heritability enrichments for regulatory regions in craniofacial cell-types and tissues, brain organoids, and fetal brain tissue (mean Z-scores 4.23, 3.23, 3.33, respectively) (Figure 4b). Within brain organoids, the strongest enrichments were for early-stage glial cells and whole organoids (mean Z-score 4.11) (Extended Data Fig. 9), consistent with an important role for radial glial cells in corticogenesis and in agreement with enrichments of brain surface area heritability[5]. The strong enrichments for craniofacial cell-types, which were more significant than organoid enrichments in the orbitofrontal and medial temporal lobes (Supplementary Fig. 6), suggest that heritability shared between brain and face shape is mediated primarily by CNCCs and their derivatives early in embryonic development. Consistent with this idea, quantifying brain shape heritability enrichments after removing the 76 brain-face shared loci resulted in decreased enrichment for CNCCs (Z-score difference −0.68) and slightly increased enrichment for early-stage glial cells (Z score difference 0.23) (Extended Data Fig. 10).
Fig. 4:
Partitioned heritability enrichments based on cell-type-specific regulatory annotations.
Heritability enrichment Z-scores, as estimated by stratified linkage disequilibrium score regression (S-LDSC), of a) multivariate shape for the first 7 face segments, b) multivariate shape for the first 7 brain segments, excluding segment 4 which had low heritability, c) neuropsychiatric disorders, d) behavioral cognitive traits, and e) subcortical volume measures. Heritability enrichments were estimated for annotations based on open chromatin (based on Assay for Transposase-Accesible Chromatin using sequencing (ATAC-seq)), regulatory regions (based on chromatin immune precipitation followed by sequencing (ChIP-seq) of multiple histone modifications), or a combination of the two. Annotations for the indicated samples, representing in-vitro-derived cell-types, primary tissues, or a combination of both (see Methods for source papers), were added to the S-LDSC baseline model, and the resulting Z-score was scaled by column to visualize relative enrichments between traits. * 5% FDR based on unscaled Z-scores. Trait abbreviations as in Figure 3, with AN representing anorexia nervosa.
Extended Data Fig. 9
Partitioned heritability enrichments for brain shape with respect to stage- and cell-type-specific brain organoid open chromatin.
S-LDSC coefficient Z-scores and heritability fold-enrichment for annotations corresponding to the indicated cell-type and differentiation day were computed as described in Methods. Regression lines represent the linear best fit with intercept and organoid differentiation day as dependent variable, and grey areas represent 95% confidence intervals. P-values are from a two-tailed F-test.
Extended Data Fig. 10
Partitioned heritability enrichments for brain shape with respect to open chromatin in CNCCs or early glial organoid cells, with or without 76 brain-face shared loci.
S-LDSC Z-scores were calculated using full brain shape as the trait and the most enriched craniofacial (top) or brain organoid (bottom) ATAC-seq dataset as annotations. Z-scores were re-estimated (blue) after removing all SNPs in the same approximately independent LD block as one of the 76 brain-face shared loci (see Methods for details).
Finally, we quantified heritability enrichments for neuropsychiatric disorders, behavioral-cognitive traits, and subcortical volumes. Neuropsychiatric disorders and behavioral-cognitive traits showed enrichment patterns distinct from those of brain shape, with significant enrichment for both fetal and adult brain tissue (mean Z-scores 2.17 and 2.64, respectively), and broad enrichment across stages and cell-types of brain organoids (mean Z-score 2.46). In contrast to brain shape, these traits showed no enrichment for craniofacial cell-types or tissues (mean Z-score −0.92) (Figure 4c). Subcortical volumes showed mixed enrichment patterns, with some regions (amygdala, caudate) similar to those of multivariate brain shape and others (putamen) closer to those of neuropsychiatric disorders and behavioral-cognitive traits. These results suggest that while much of the shared genetic variation between brain and face shape is mediated by regulatory regions in CNCCs and their craniofacial derivatives, variation in these regions does not appear to impact neuropsychiatric disorder risk or other behavioral-cognitive traits.
DISCUSSION
Here, we applied multivariate phenotyping to discover numerous loci underlying common variation in brain shape. While these loci broadly implicate known pathways in brain development, the precise mechanisms by which they modulate brain shape are unknown, suggesting further avenues of investigation. As part of our study, we extended techniques for estimating genome-wide and partitioned heritability, originally developed for univariate traits, to multivariate traits. We anticipate that these and similar extensions will become increasingly useful with the greater availability of high-dimensional imaging or morphological data in large sample sizes.We found a striking convergence of common genetic variation affecting brain and face shape, at least in part mediated by regulatory regions active in CNCCs and their derivatives. These observations suggest a larger than previously appreciated role of the face in shaping development of the brain and its morphological variation between individuals. However, these shared genetic effects do not appear to significantly impact neuropsychiatric disorder risk or cognitive functions. Our results are therefore consistent with a model whereby CNCCs and their derived cranial structures significantly influence brain shape through both physical interactions and paracrine signaling early in embryogenesis, but later shaping of cortical morphology, through processes such as the folding of the cortical surface[71], has a greater impact on cognitive traits. Nevertheless, we cannot exclude the possibility that future GWAS of cognitive traits shows significant overlap with brain-face shared genetic effects, perhaps due to alternative trait definitions or to greater statistical power.A number of developmental mechanisms could mediate shared brain-face genetics. One potential contribution comes from the common neuroepithelial origins of the two structures, with genes influencing growth, patterning and cell fate decisions within the neural plate ultimately affecting cell allocation within distinct parts of the brain and face; examples of such neural plate genes within brain-face shared loci include ZIC2 and ZIC3
[72-74]. Another potential mechanism entails common genetic variation modulating expression of genes with independent roles in both brain and face development. SOX9, encoding a TF with key functions in neural crest development and chondrogenesis, but which is also required for gliogenesis[75], is an attractive candidate for this mechanism. Nonetheless, the primary impact of brain-face shared genetic effects on facial regions from the frontonasal prominence and anterior forebrain regions of the brain suggests additional, proximity-based mechanisms, which can be either structural, or mediated by paracrine signaling. While brain and face development must be tightly coordinated, the brain is thought to have greater structural effects on craniofacial development than the reverse, as the forebrain can serve as structural support for facial development[54] as well as induce flexion of the basicranium and bone deposition at coronal sutures through growth-dependent tensile forces[17,18,54]. However, we find multiple brain-face shared loci near TFs with known, cell-intrinsic roles in, and expression specific to, CNCCs and their derivatives. Furthermore, mutations in these TFs result in malformations of the frontal facial skeleton, such as coronal synostosis (TWIST1)[48,49] or fronotonasal dysplasias (ALX1 and ALX4)[46,47]. One explanation for these results is that these TFs control regulatory programs ultimately modulating the ability of the craniofacial skeleton to respond to and accommodate brain growth, causing subtle changes in brain shape. It is also possible, however, that these TFs exert some phenotypic effects on brain shape by regulating expression of signaling ligands secreted from the face. For example, CNCCs secrete BMP antagonists that modulate forebrain development by blocking BMP and FGF production in the anterior neural ridge[24,25]. BMP antagonist production in CNCCs is regulated by the SIX family TFs[56], with SIX1/4 lying near a shared brain-face GWAS signal (Figure 2a). In the reverse direction, studies in chick embryos have shown that Fgf, Shh, and BMP ligands are secreted by the forebrain and regulate the formation of the frontonasal ectodermal zone (FEZ), a signaling center that in turn patterns the frontonasal prominence of the developing face [20-22,76]. Notably, our study implicates all three of these signaling pathways, nominating specific ligands and receptors whose modulation may be associated with the brain-face crosstalk. Furthermore, our study nominates other pathways, such as Wnt and TGF-beta, for roles in paracrine brain-face signaling. Altogether, we uncovered common genetic variants yielding numerous candidate molecular players whose diverse mechanistic roles in mediating brain-face interactions during development can be examined in future studies.Relationships of facial shape with cognitive and personality traits fascinated humans since ancient times, from the Ancient Greeks, who introduced ‘physiognomy’ to describe a practice of assessing one’s personality from facial appearance[77], through the Vedic traditions of Samudrika Shastra[78] and to the Chinese art of face reading[79]. The concept of physiognomy was revived in the 18th century by Johan Kaspar Lavater, and later lead to a related pseudoscientific theory, phrenology, popularized by Franz Josef Gall. Both theories have a troubled history, as they have been used to justify racial discrimination as well as eugenic theories[80,81]. While the original formation of physiognomy has been debunked, modern studies have found correlations between facial width-to-height ratios and aggressive tendencies[82], with regrettable renewed efforts in using machine learning approaches to detect such correlations raising serious ethical concerns[83,84]. Our results argue that while the ancient human intuition of a close relationship between the face and brain has genetic support at the morphological level, there does not appear to be genetic evidence for the supposed predictive value of face shape in behavioral-cognitive traits, which formed the core of physiognomy and related theories.
METHODS
UK Biobank data preprocessing
The UK Biobank project (UKB) encompasses ~500,000 British volunteers with informed consent containing genetics, non-imaging variables and brain imaging data acquired using a fixed protocol[85]. Hereby, brain T1-weighted magnetic resonance imaging (MRI) scans of the UKB, as well as genotyping and covariate information (e.g. sex, age, height, weight, among others), were used as the discovery dataset. We utilized release v1.5 (August 2018) which holds a cohort of 21,780 subjects. This cohort was composed of an adult population (40 to 70 years old, mean of 60 years old), with slightly more females than males (51.6% vs. 48.4% respectively), a predominantly self-reported white British ancestry (97.1%), and an average body mass index (BMI) of 26.6.For 21,780 subjects, we processed raw MRI data for a surface-based analysis of the cortex using the following four-step procedure. Further details for each step are provided in Supplementary Note, section ‘UK Biobank data processing.’First, the cortical surfaces were segmented and reconstructed from the MRI volumetric data using recon-all (FreeSurfer[86]
v.6.0.0;
URL section). In this step 20,409 images were processed successfully.Second, to obtain a minimally preprocessed pipeline similar to the one of the Human Connectome Project (HCP – URL Section), the Connectivity Informatics Technology Initiative file format) (CITIFY, URL section) was used to convert FreeSurfer’s recon-all command output to a HCP-style file format and structure[87].Third, from the CIFTIFY output, we selected the mid-cortical surface of the left and right hemisphere, which is the surface that runs at the mid-distance between the white surface (at the interface between gray and white matter) and the pial surface (the external cortical surface)[88]. The mid-cortical surface does not over or under-represent gyri or sulci[89], but is otherwise an arbitrary choice.Fourth, as quality control for each hemisphere, we checked the resulting mid-cortical surfaces for mesh artifacts in a semi-automatic manner. All images passed this quality control, yielding 20,407 processed images.For the list of 20,407 subjects with preprocessed images, we selected genomic data from the UK Biobank, which consisted of the version 3 (March 2018) imputed SNP genotypes, imputed to the Haplotype Reference Consortium and merged UK10K and 1000 Genomes (phase 3) panels. See Supplementary Note, section ‘UK Biobank data processing’ for more details on filtering of SNPs and individuals based on ancestry and relatedness. This resulted in 9,705,931 filtered SNPs for GWAS analysis on 19,670 unrelated subjects of European descent.For the list of 19,670 subjects with preprocessed brain and genetic data, we collected the following list of covariates to control for during statistical testing: genetic sex, age, age-squared, height, weight, diastolic blood pressure, systolic blood pressure, and the first 20 genetic PCs. Furthermore, the following imaging specific parameters were also included, following Elliot et al.[90]: volumetric scaling from T1 head image to standard space, XYZ-position of brain mask in scanner co-ordinates, Z-position of table/coil in scanner co-ordinates, date of attending assessment center, and assessment center (coded as a dummy variable for each of the 21 centers). See Supplementary Note, section ‘UK Biobank data processing’ for more details on covariate-based filtering individuals. Next, to symmetrize brain shape, the right hemisphere was reflected to the side of the left hemisphere, by changing the sign of the x-coordinate for all of the 29,759 3D vertices on the surface of the right hemisphere. We performed a generalized Procrustes superimposition (GPA)[91], thus eliminating differences in position, orientation, and scale (measured by centroid size) of all left and right hemispheres pooled together. We computed the symmetric brain component as the vertex-wise averaged brain surface of paired and superimposed left and right hemispheres. This resulted in a final discovery dataset of 19,644 subjects containing preprocessed MRI image data on the mid-cortical symmetrized surface, 9,705,931 imputed SNPs, and 54 covariates.
ABCD study data preprocessing
The Adolescent Brain Cognitive Development Study (ABCD) (URL section) is a longitudinal study following brain development and health through adolescence[40]. A total of 11,411 MRI scans with additional information on sex and age were available from the data release of April 2019 and of those 11,393 images were processed successfully using the four-step imaging preprocessing described above.In total 10,627 individuals from the ABCD dataset provided with genetic data on 517,724 SNP variants. These were imputed via the Odyssey[92] pipeline using the SHAPEIT4[93] and IMPUTE5[94] workflow to phase and impute respectively. The Haplotype Reference Consortium (HRC)[95] reference panel was used for imputation. Quality control prior to phasing and imputation includes using the McCarthy Group’s Imputation preparation program (URL section) to check and fix strand, alleles, position, and reference/alternative problems as well as removing ambiguous A/T and G/C SNPS with minor allele frequencies greater than 0.4. See Supplementary Note, section ‘ABCD study data preprocessing’ for more details on phasing, imputation, and ancestry-based selection. These steps resulted in a final replication dataset of 4,470 individuals with preprocessed MRI image data, representing brain shape, 15.3M imputed SNPs and 7 covariates (sex, age and the first 5 genetic PCs). The minimum and maximum age of this final replication dataset, was 8.9 years and 11 years, respectively, with a mean age of 9.9 years. 46.5% are female and 53.5% are male.
Auxiliary traits GWAS summary statistics
We collected publicly available genome-wide summary statistics for 26 auxiliary traits encompassing neuropsychiatric disorders[96-101], behavioral-cognitive traits[102-104], subcortical volume measures[36-38], and immune-related disorders[105-108] with limited genetic correlation with schizophrenia or bipolar disorder[60]. In Supplementary Table 6, we provide links to relevant publications and URLs for these summary statistics.
Point-wise SNP-heritability estimation of the mid-cortical surface
For each of the 29,759 vertices of the averaged mid-cortical 3D surfaces in UKB we computed a multivariate (X, Y and Z coordinate per vertex) narrow-sense heritability from common SNP variants using a linear mixed model (LMM). A genomic relationship matrix (GRM) modelled as random effects in the LMM was computed from LD-pruned SNPs (Plink 1.9, 50 variant window-size, 5 variant step size, 0.2 r2). The first 10 genetic PCs and additional covariates (sex, age, height, weight, diastolic and systolic blood pressure) were modelled as fixed effects in the LMM. We used the open-source software SNPLib (URL Section)[109], whose implementation is equivalent to GCTA[110] for a homogenous population.
Global-to-local (G2L) segmentation of the mid-cortical surface
UKB served as discovery cohort using a data-driven global-to-local (G2L) segmentation of brain shape similar to previous work on face shape[7,111]. First, the superimposed and symmetrized mid-cortical surfaces were corrected using a partial least-squares regression (PLSR, function plsregress from Matlab 2019b) for all UK Biobank covariates listed above, augmented with centroid size to eliminate allometric effects of size on brain shape[91]. Second, pair-wise structural connections based on the RV-coefficient[112] between each pair of 3D surface vertices generated a squared similarity matrix (29,759 x 29,759). Third, a Laplacian transformation was applied to enhance similarities prior to eigendecomposition of this squared matrix. Finally, within the eigen spectral map, K-means++ clustering was used to group highly correlated vertices that segment the brain into separate modules. This was done in a bifurcating hierarchical manner using eight levels, resulting in a total of 511 hierarchically linked facial segments, with 1, 2, 4, 8, 16, 32, 64, 128, 256 non-overlapping modules at levels 0, 1, 2, 3, 4, 5, 6, 7, 8. In contrast to our work on facial shape[7,111], we removed segments with fewer vertices than 1% of the total vertex count. This resulted in 285 segments across eight levels as depicted in Figure 1. While the precise choice of number of hierarchical levels and vertex cutoff is arbitrary, the number of additional segments followed an “elbow” trajectory, with few segments being retained at hierarchical levels greater than the ninth (Extended Data Fig. 2). Segmentation depth and cutoff criteria were determined prior to performing GWAS. See Supplementary Note, section ‘Global-to-local (G2L) segmentation of the mid-cortical surface’ for details on the multivariate phenotyping approach within each brain segment.
Overlap of brain atlases with G2L segmentation
We investigated the overlap of brain segments at each of the eight levels from our G2L segmentation with brain regions from three commonly used brain atlases (Desikan Killiany (34 distinct gyral based regions)[32], Destrieux (74 distinct gyral and sulcal based regions)[33], and the Glasser (180 distinct multi-modal based regions)[34]). See Supplementary Note, section ‘Overlap of brain atlases with G2L segmentation’ for details on computing overlap between our brain segments and brain atlases.
G2L multivariate genome-wide discovery
The global-to-local phenotyping partitioned brain shape into overlapping (across different hierarchical levels), as well as non-overlapping (within a single hierarchical level) segments, each of which was represented by a different subset of mid-cortical surface vertices and spanned by multiple dimensions of variation (principal components). See Supplementary Note, section ‘G2L multivariate genome-wide discovery’ for details on the canonical correlation (CCA)-based approach used to discover SNP-phenotype associations.A significance threshold of P ≤ 5 x 10−8 was used to declare “genome-wide significance”, which corresponds to a Bonferroni correction for 1 million independent tests in a European-ancestry cohort[113]. Due to 285 multivariate GWAS runs, the multiple comparisons burden was magnified. Therefore, we also determined a more stringent threshold for declaring “study-wide significance” which accounts for the effective number of independent tests. In a first instance, the number of eigenvalues larger than one of a pairwise multivariate correlation (RV-coefficient) matrix (285 x 285)[114], determined a total of 210 independent tests. In a second instance, following the procedure in Kanai et al.[115], we obtained an empirical estimate of the number of independent tests using the 472 lead SNPs representing the genome-wide significant independent loci, to keep the estimations computationally tractable. See Supplementary Note, section ‘G2L multivariate genome-wide discovery’ for details on empirical estimation of the number of independent tests.
Peak detection, overlap and annotations
We observed 38,630 SNPs and 23,413 SNPs at the level of genome-wide and study-wide significance, respectively. These were clumped into 472 (genome-wide) and 243 (study-wide) independent loci in three steps, detailed in Supplementary Note, section ‘Peak detection, overlap and annotations’.To study functional enrichment for genes near the 472 genome-wide lead SNPs, we performed gene ontology (GO) analysis using GREAT[43] (v4.0.4) and FUMA[42] (v1.3.6) using default settings. GO terms that were significant by both binomial and hypergeometric tests (False Discovery Rate (FDR) q-value < 0.05) across three or two windows were reported as strongly and weakly enriched respectively.In determining overlap between lead SNPs from different GWAS, we used a similar strategy: two lead SNPs tag the same genetic locus if they are within 10kb of each other or if they are within 1Mb of each other and with r2 > 0.2. For quantifying overlap between the 472 brain shape loci and 430 other studies from the NCBI-EBI GWAS Catalog, we defined LD blocks of 0.2 around the 472 loci using Plink 1.9, and then calculated odds ratio and P for the overlap between these blocks and any given GWAS using bedtools v2.27.1, fisher function.In determining brain-face shared loci, we first considered the 472 genome-wide lead SNPs from the brain GWAS and looked for any SNP within 10kb or within 1Mb and LD > 0.2 of these lead SNPs with at least a genome-wide suggestive (P ≤ 5 x 10−7) association with face shape[111]. This resulted in 57 loci with evidence of association in brain and face shape. Then we took the 203 genome-wide lead SNPs reported in the face GWAS[111], and clumped them if two lead SNPs were within 10kb or within 10Mb with r2 > 0.01. For the resulting 197 independent genome-wide facial lead SNPs we selected any SNP within 10kb or within 1Mb and with r2 > 0.2 with at least suggestive (P ≤ 5 x 10−7) association with brain shape. This resulted in another 54 loci with evidence of association in brain and face shape and together with the previous 57 loci they were clumped (within 10kb or within 1Mb and r2 > 0.2) into a final set of 76 brain-face shared loci. We manually identified candidate genes in the vicinity of the 76 brain-face shared loci. For each locus, we first considered all genes within 500kb of the lead SNP. We primarily relied on evidence for these genes’ involvement in a human craniofacial or neurodevelopmental syndrome, or for evidence of craniofacial or neurodevelopmental defects in knockouts of their orthologs in mice. We also considered associations with Gene Ontology (GO) terms related to craniofacial development, neurodevelopment, or skeletal system development. In some cases (i.e. SOX9, where enhancer-promoter interactions over 1Mb have been described[52]), we extended the window to within 750kb of the lead SNP.
ABCD replication testing
The ABCD study data was used for replication, with UKB discovery cohort used as a phenotyping reference. First, the GPA superimposed and symmetrized mid-cortical shapes were corrected for sex, age, and the first five genetic PCs, augmented with centroid size to eliminate allometric effects of size on brain shape[91] using PLSR. Second, the PLSR residuals that were centered on average brain shape of the ABCD study were added to average brain shape of UKB. Third, the corrected and re-centered brain shapes were segmented using the G2L segmentation and projected onto the principal components of the UKB segments.For each discovery lead SNP in a particular brain segment the replication panel was projected onto the latent shape trait of the lead SNP. This generated univariate projection scores as phenotypes[116] to test for in the replication panel that are equivalent to the latent shape traits or phenotypes in the discovery panel. See Supplementary Note, section ‘ABCD replication testing’ for details on replication testing and FDR thresholds[117].
Clinical gene-panel overlap
Gene panels were downloaded from the Genomics England PanelApp website. Only panels used for clinical interpretation in the 100,000 Genomes project were selected (provided by PanelApp[44]). The clinical gene-panels were merged in disease (sub)categories according to the 100,000 Genomes Project criteria (e.g. the clinical gene panel “Intellectual Disability” belongs to the sub-category “Neurodevelopmental Disorders”, which is part of the “Neurology and Neurodevelopment” disease category). Only genes with high confidence for gene-disease association were included in the clinical gene panels. See Supplementary Note, section ‘Clinical gene-panel overlap’ for details on calculation of gene set overlaps and significance.
Expression analyses of candidate genes at brain-face overlapping loci
Gene expression levels (log2(TPM) values) for three-dimensional forebrain organoids and purifying neuronal or glial lineages were obtained from Trevino et al.[53] (GSE132403). Raw RNA-seq reads from CNCCs at passages 1-4, as well as day 9 chondrocytes derived from P4 CNCCs, were obtained from Long et al.[52] (GSE145327), and TPM values were quantified using kallisto v0.44.0[118] with sequence-biased bias correction.
LD score regression SNP-heritability for multivariate traits
In the Supplementary Note, we provide a general proof showing that when applying LDSC to summary statistics of a multivariate GWAS, albeit with a small correction to the resulting χ2 statistics, the heritability estimated by the LDSC slope is equal to , which is a D-dimensional generalization of heritability for genetic and phenotypic covariance matrices ∑ and ∑ , respectively. When the dimensions of the multivariate trait are either genetically or phenotypically uncorrelated, this expression simplifies to the average SNP-heritability across dimensions. Similarly, when applying stratified LD Score regression (S-LDSC), one obtains enrichments for partitioned average heritability. We further show that is an appropriate multivariate generalization of heritability since it satisfies the following four properties: 1) invariance to units of measurement, 2) coordinate-free, 3) linear in ∑, and 4) maximized with a value of 1 when ∑ = ∑.Thus, for brain and face shape, we applied LDSC and S-LDSC using the published software (URL section) to corrected χ2 statistics from GWAS of each brain or face segment. We used unmodified chi-squared values for the univariate traits analyzed (including indicated cases where we performed individual, univariate GWAS for each brain and face shape PC). While using unmodified chi-square values results in a small bias, we used unmodified statistics for consistency with previous studies. We limited S-LDSC analyses to traits with SNP-heritability Z-scores > 7, as in Finucaine et al.[67]
Functional annotations for stratified LD score regression
We downloaded a range of publicly-available cell-type and sample-specific annotations representing open chromatin and/or active regulatory regions. Specifically, we obtained data on open chromatin (all ATAC-seq peaks) from brain organoids[53], fetal brain tissue[119], and CNCCs and derived chondrocytes[52]. ATAC-seq reads from Long et al. were mapped to hg19 with bowtie2[120] with default settings, and peaks were called using MACS2[121] with default settings. Annotations for active regulatory regions (based on a range of epigenomic marks) were obtained from CNCCs[68], embryonic craniofacial tissues[69], fetal and adult brain tissue[70], and broad groupings of cell-types[67]. For CNCCs[68], we combined all regions annotated as enhancers (weak, intermediate, strong) or promoters (weak and strong); For embryonic craniofacial tissues, we combined all regions with the following annotations from the 25-state chromHMM model: ‘Enh,’ ‘TxReg,’ ‘PromD1,’ ‘PromD2,’ ‘PromU,’ ‘TssA.’ For fetal and adult brain tissue, we combined all regions with the following annotations from the 15-state chromHMM model: ‘1_TssA,’ ‘2_TssAFlnk,’ ‘7_Enh,’ ‘6_EnhG.’ Each annotation was individually added to the baselineLD model from Finucaine et al. The resulting S-LDSC output (heritability fold-enrichment magnitude and significance, as well as coefficient Z-scores) are provided in Supplementary Data 2. When quantifying heritability enrichments with brain-face shared loci removed, we removed all SNPs within the same approximately independent LD block[122] as any of the 76 brain-face shared loci and re-computed LD scores.
Quantifying sharing of signals between pairs of GWAS
To assess the extent to which genome-wide profiles of association were shared between a pair of GWAS, we computed a Spearman correlation between two vectors of LD-block organized association p-values. First, genome-wide SNPs were selected to overlap with the HapMap3 SNPs[123] and SNPs within the major histocompatibility complex region were removed. Second, we organized SNPs within 1,725 blocks in the human genome that can be treated as approximately independent in individuals of European ancestries[122]. For every LD block we computed the mean SNP −log10(p-value), and then computed a rank-based Spearman correlation using the averaged association value (n=1,725) for each LD block. A standard error of the Spearman correlation was estimated using statistical resampling with 100 bootstrap cycles with replacement from the 1,725 LD blocks.
URLs
UK biobank: https://www.ukbiobank.ac.uk/about-biobank-uk/Human Connectome Project: http://www.humanconnectomeproject.org/Adolescent Brain Cognitive Development (ABCD) study: https://abcdstudy.org/about/SNPLIB:
https://github.com/jiarui-li/SNPLIBFreesurfer:
https://surfer.nmr.mgh.harvard.edu/Ciftify:
https://github.com/edickie/ciftify
and
https://www.nitrc.org/projects/cifti/Conte69
Atlas:
http://brainvis.wustl.edu/wiki/index.php//Caret:Atlases/Conte69_AtlasMcCarthy Tools. https://www.well.ox.ac.uk/~wrayner/tools/LDSC: https://github.com/bulik/ldsc/wiki
Number of additional brain shape loci contributed by hierarchical levels.
For all genome-wide (left) or study-wide (right) significant associations, associations with all segments in hierarchical levels up to the indicated number were masked, and the number of remaining associations was assessed.
Point-wise SNP heritability estimates across the mid-cortical surface.
Colors represent the total SNP heritability (computed by a linear mixed model approach, see Methods) at each point on the mid-cortical surface, represented by a set of three-dimensional coordinates in each individual.
Replication rates in the ABCD cohort by hierarchical level.
Only segments in the indicated hierarchical level were considered, and all loci (left) or locus-segment pairs (right) reaching genome-wide significance in those segments were tested for replication in the ABCD cohort at a 5% FDR.
Overlap between genome-wide significant brain shape loci and genome-wide significant loci from 430 other studies.
GWAS hits (number on x-axis) for other studies were obtained from the NCBI-EBI GWAS Catalog, and P-values (left, y-axis) and odds ratios (right, y-axis) for significance of overlap with regions in LD (> 0.2) with brain shape loci were computed using bedtools’ fisher function (see Methods). Note that relative to other traits with equivalent numbers of GWAS hits, face shape shows overlap with brain shape loci greater in both significance and magnitude.
Comparison of LDSC genetic correlations and Spearman correlation between pairs of univariate traits.
Each point represents a pair of univariate traits (of all those considered in this study, see Methods), while the x- and y-axes indicate the absolute value of the LDSC-estimated genetic correlation and the estimated genome-wide sharing of effects by the Spearman correlation method. Point colors and shapes indicate significance (P < 0.05) from LDSC or the Spearman correlation method, respectively. Exact p-values are provided in Supplementary Table 6.
Genetic correlations between RA (rheumatoid arthritis) and univariate brain-related traits.
Genetic correlations between the most heritable brain (top two rows) or face (bottom two rows) shape PCs and other traits.
Points (center of error bars) represent estimated genetic correlations (rg) between the top ten shape PCs (for segment 1, the full brain or face) with heritability z-score > 3 and each of the indicated univariate traits using LD score regression. Error bars represent 95% confidence intervals. *, 5% FDR for indicated PC; +, 10% FDR.
SNP heritability of individual face shape PCs and multivariate face shape estimated by LDSC.
Points (center of error bars) represent estimated SNP heritability of each PC. Error bars represent 95% confidence intervals. The red line represents the mean heritability of all 70 PCs, and the blue line indicates the heritability obtained by applying LDSC to corrected χ2 statistics from the multivariate CCA GWAS using all 70 PCs.
Partitioned heritability enrichments for brain shape with respect to stage- and cell-type-specific brain organoid open chromatin.
S-LDSC coefficient Z-scores and heritability fold-enrichment for annotations corresponding to the indicated cell-type and differentiation day were computed as described in Methods. Regression lines represent the linear best fit with intercept and organoid differentiation day as dependent variable, and grey areas represent 95% confidence intervals. P-values are from a two-tailed F-test.
Partitioned heritability enrichments for brain shape with respect to open chromatin in CNCCs or early glial organoid cells, with or without 76 brain-face shared loci.
S-LDSC Z-scores were calculated using full brain shape as the trait and the most enriched craniofacial (top) or brain organoid (bottom) ATAC-seq dataset as annotations. Z-scores were re-estimated (blue) after removing all SNPs in the same approximately independent LD block as one of the 76 brain-face shared loci (see Methods for details).
Authors: Lachlan T Strike; Narelle K Hansell; Baptiste Couvy-Duchesne; Paul M Thompson; Greig I de Zubicaray; Katie L McMahon; Margaret J Wright Journal: Cereb Cortex Date: 2019-03-01 Impact factor: 5.357
Authors: Peter Claes; Jasmien Roosenboom; Julie D White; Tomek Swigut; Dzemila Sero; Jiarui Li; Myoung Keun Lee; Arslan Zaidi; Brooke C Mattern; Corey Liebowitz; Laurel Pearson; Tomás González; Elizabeth J Leslie; Jenna C Carlson; Ekaterina Orlova; Paul Suetens; Dirk Vandermeulen; Eleanor Feingold; Mary L Marazita; John R Shaffer; Joanna Wysocka; Mark D Shriver; Seth M Weinberg Journal: Nat Genet Date: 2018-02-19 Impact factor: 38.330
Authors: Julie D White; Karlijne Indencleef; Sahin Naqvi; Ryan J Eller; Hanne Hoskens; Jasmien Roosenboom; Myoung Keun Lee; Jiarui Li; Jaaved Mohammed; Stephen Richmond; Ellen E Quillen; Heather L Norton; Eleanor Feingold; Tomek Swigut; Mary L Marazita; Hilde Peeters; Greet Hens; John R Shaffer; Joanna Wysocka; Susan Walsh; Seth M Weinberg; Mark D Shriver; Peter Claes Journal: Nat Genet Date: 2020-12-07 Impact factor: 38.330
Authors: Chun Chieh Fan; Robert Loughnan; Carolina Makowski; Diliana Pecheva; Chi-Hua Chen; Donald J Hagler; Wesley K Thompson; Nadine Parker; Dennis van der Meer; Oleksandr Frei; Ole A Andreassen; Anders M Dale Journal: Nat Commun Date: 2022-05-03 Impact factor: 17.694
Authors: Rose Bruffaerts; Dorothy Gors; Alicia Bárcenas Gallardo; Mathieu Vandenbulcke; Philip Van Damme; Paul Suetens; John C van Swieten; Barbara Borroni; Raquel Sanchez-Valle; Fermin Moreno; Robert Laforce; Caroline Graff; Matthis Synofzik; Daniela Galimberti; James B Rowe; Mario Masellis; Maria Carmela Tartaglia; Elizabeth Finger; Alexandre de Mendonça; Fabrizio Tagliavini; Chris R Butler; Isabel Santana; Alexander Gerhard; Simon Ducharme; Johannes Levin; Adrian Danek; Markus Otto; Jonathan D Rohrer; Patrick Dupont; Peter Claes; Rik Vandenberghe Journal: Brain Commun Date: 2022-07-18
Authors: Sahin Naqvi; Hanne Hoskens; Franziska Wilke; Seth M Weinberg; John R Shaffer; Susan Walsh; Mark D Shriver; Joanna Wysocka; Peter Claes Journal: Annu Rev Genomics Hum Genet Date: 2022-04-28 Impact factor: 9.340