Literature DB >> 34516910

Atypical genomic cortical patterning in autism with poor early language outcome.

Michael V Lombardo^1,2, Lisa Eyler^3,4, Tiziano Pramparo⁵, Vahid H Gazestani⁵, Donald J Hagler^6,7, Chi-Hua Chen³, Anders M Dale^6,7,8, Jakob Seidlitz^9,10, Richard A I Bethlehem^2,11, Natasha Bertelsen^1,12, Cynthia Carter Barnes⁵, Linda Lopez⁵, Kathleen Campbell^5,13, Nathan E Lewis^14,15,16, Karen Pierce⁵, Eric Courchesne⁵.

Abstract

Cortical regionalization develops via genomic patterning along anterior-posterior (A-P) and dorsal-ventral (D-V) gradients. Here, we find that normative A-P and D-V genomic patterning of cortical surface area (SA) and thickness (CT), present in typically developing and autistic toddlers with good early language outcome, is absent in autistic toddlers with poor early language outcome. Autistic toddlers with poor early language outcome are instead specifically characterized by a secondary and independent genomic patterning effect on CT. Genes involved in these effects can be traced back to midgestational A-P and D-V gene expression gradients and different prenatal cell types (e.g., progenitor cells and excitatory neurons), are functionally important for vocal learning and human-specific evolution, and are prominent in prenatal coexpression networks enriched for high-penetrance autism risk genes. Autism with poor early language outcome may be explained by atypical genomic cortical patterning starting in prenatal development, which may detrimentally affect later regional functional specialization and circuit formation.

Entities: Chemical

Year: 2021 PMID： 34516910 PMCID： PMC8442861 DOI： 10.1126/sciadv.abh1663

Source DB: PubMed Journal: Sci Adv ISSN： 2375-2548 Impact factor: 14.136

INTRODUCTION

It is widely accepted that autism spectrum disorder (ASD) is etiologically and clinically highly heterogeneous (). This heterogeneity is theorized to manifest as a complex multiscale cascade from a diverse genetic architecture that converges onto a common set of downstream hierarchical mechanisms linked to the domains of early social-communication and restricted repetitive behaviors (). Within the context of precision medicine (), the major priorities for the field are to best understand how this complex multiscale cascade takes place for individuals with differing clinical outcomes and to isolate what could the common set of downstream mechanisms be for these individuals (, , ). Regarding different clinical outcomes in ASD, perhaps the most understudied yet most important distinction is between those with relatively intact and good language levels and those who are minimally verbal or have very poor early language outcome (). ASD individuals with poor language represent a sizeable proportion of the early diagnosed population and are the most in need of intervention to facilitate better outcomes (). However, because language level is a key ingredient in helping to facilitate better outcomes, available early interventions may be least effective for these types of individuals (). A better understanding of the underlying biology behind this good versus poor language distinction may be key to developing new individualized interventions that may be more effective at facilitating better outcomes. Thus, a key question looms about whether good versus poor language in ASD represents a biologically distinct subtype with different multiscale biological cascades from genomics, up to neural phenotypes, and through to behavior. If good versus poor language signals a biologically distinct subtype, what is the common downstream explanation for how diverse genetic mechanisms lead to altered brain and behavioral phenotypic development? At the nexus of this puzzle, our prior functional magnetic resonance imaging (fMRI) work showed that the neural systems responsible for language respond differently between good (i.e., ASD Good) and poor (i.e., ASD Poor) early language outcome subtypes (). Linked to this functional abnormality are different large-scale patterns of activity in blood leukocyte gene coexpression modules (). This work suggests that early cortical functional specialization for language is lacking in the ASD Poor subtype and that large-scale functional genomic signal may explain this type of pathology. This large-scale functional genomic signal can be characterized as an omnigenic () array of genes that are typically broadly expressed across many organs and tissues, including the brain, and are highly active during prenatal periods of development (). This prenatal enrichment is key because a large proportion of broadly expressed ASD risk genes remarkably show peak expression during early prenatal periods when processes such as cell proliferation, differentiation, neurogenesis, and migration are highly prominent (, ). If these genes affect proliferation, differentiation, neurogenesis, and migration, it follows then that macroscale structural features of the developing cerebral cortex that are predicated on these processes, such as surface area (SA) and cortical thickness (CT) (–), may also be substantially altered in ASD Poor versus Good language outcome subtypes. The prenatal actions of broadly expressed genes may also be important for telling us about some emergent consequences of such perturbations on how the cortex is genomically patterned and thus regionally and functionally differentiated. It is well established that during prenatal periods, the cortex is patterned by gene expression gradients that follow anterior-posterior (A-P) and dorsal-ventral (D-V) axes (, –). This prenatal genomic patterning is the beginning of cortical arealization processes that allow different cortical regions to develop their own cellular, functional, and circuit identities (, ) and thus aid in later regional functional specialization. In the adult brain, this genomic gradient patterning is still evident and correlates with large-scale gradient patterning of structural and functional features (, , –). Cortical arealization or patterning may also be atypical in ASD. Functional connectome gradient organization is altered in ASD (). Case control comparisons of gene expression in postmortem cortical tissue have found dysregulation of cortical patterning genes and attenuation of gene expression differences in frontal versus temporal cortex (–). WNT signaling is also known to affect cortical patterning (, ), and WNT signaling abnormalities are also identified in ASD (, , , ), particularly within broadly expressed ASD risk genes (). Therefore, if broadly expressed genes prenatally affect proliferation, differentiation, neurogenesis, and migration processes differently in the ASD Good versus Poor subtypes, could this explain the lack of functional specialization seen in prior work [e.g., (, )]? Here, we investigated these questions by examining how early variability in morphometric measures of the cerebral cortex such as CT and SA is patterned by large-scale variability in gene expression measured in blood leukocytes. We find that CT and SA associations with large-scale gene expression patterns are different in ASD Poor versus Good early language outcome subtypes. This difference can be described as the absence of normative genomic patterning of CT and SA in the ASD Poor subtype along A-P and D-V gradients and the establishment of a second unique type of patterning of CT specific to the ASD Poor subtype. These A-P and D-V genomic patterning effects on CT and SA comprise many of the same genes involved in actual prenatal A-P and D-V gene expression gradients and prenatal cell types predicted to be involved in SA and CT (, , ). Consequently, these atypical genomic cortical patterning effects have important functional consequences, with enrichments in genes important for vocal learning, human evolution, and known ASD-associated genomic mechanisms.

RESULTS

Enlargements of cortical volume and SA in the ASD Poor language subtype

In this study, we examined a cohort of n = 123 toddlers (mean age in months = 27.82, SD = 9.32) with and without ASD [ASD, n = 76; TD (typically developing), n = 47] with both a T1-weighted structural MRI scan and a blood sample that was used to examine gene expression in blood leukocyte cells (see Materials and Methods for sample description and table S1 for characterization of the sample). The n = 76 ASD toddlers were split into ASD Poor (n = 38) and ASD Good (n = 38) early language outcome subtypes using 1 standard deviation cutoffs on Mullen expressive and receptive language (EL and RL) T-scores as in previous studies (see Materials and Methods) (, ). One of the most robust findings on early structural brain development in ASD is the on-average effect of early brain overgrowth in the first years of life (). Thus, we started by examining whether there are subtype differences on global measures such as total cortical volume (CV), SA, and mean CT. Statistical models controlling for age and sex identified a group effect for total CV and total SA but no effect of group for mean CT (table S2). The group effects for CV and SA were explained by enlargements in ASD Poor versus TD (Fig. 1). Upon examining regional-level SA or CT effects while adjusting for global differences, we find no evidence of SA or CT group differences for any region within the GCLUST parcellation (table S3). These results generally indicate that ASD Poor explains the on-average effect of early brain overgrowth in ASD. These effects are restricted to CV and SA and are not apparent in regional measures after global differences are accounted.

Fig. 1.

Subtype differences in total CV and total SA.

Standardized effect sizes (Cohen’s d) are shown for each pairwise group comparison for total CV (A), total SA (B), and mean CT (C). Asterisks indicate statistically significant pairwise group comparisons that survive false discovery rate (FDR) q < 0.05. Good, ASD Good language subtype; Poor, ASD Poor language subtype; TD, typically developing.

Subtype differences in total CV and total SA.

Normative associations between gene expression and SA or CT are preserved in TD and ASD Good but are absent in ASD Poor language subtypes

To identify large-scale associations between gene expression and regional SA or CT, we used weighted gene coexpression network analysis (WGCNA) to reduce expression data of 14,426 genes highly expressed in blood leukocytes to 21 coexpression modules (table S4). Coexpression modules were summarized by the module eigengene and were input into a partial least squares (PLS) analysis to test for large-scale multivariate associations with SA or CT phenotypes from the GCLUST parcellation, which is sensitive to genomic effects on SA and CT A-P and D-V gradients (, ). This analysis allowed us to identify statistically significant multivariate relationships between gene coexpression modules and SA or CT and then allowed for examination of how the relationship manifests across brain regions and coexpression modules and also how these relationships manifest in each group (see Materials and Methods for more details). For SA, PLS identified one statistically significant latent variable (LV) pair (SA LV1: d = 3.99, P = 0.0001, split-half Pucorr = 0.01, and Pvcorr = 0.06), which explains 36% of the covariance between SA and gene expression. A highly similar result was obtained with a PLS on vertex-wise data (fig. S1). However, a PLS model using vertex-wise data explained far less percentage of covariance (17%) than the GCLUST-parcellated PLS model (fig. S2). This indicates that the PLS model on GCLUST-parcellated features is more sensitive for highlighting associations between gene expression and SA. To decompose how this multivariate relationship manifests across coexpression modules and groups, in Fig. 2A, we show which coexpression modules have “non-zero” relationships in each group. These “non-zero modules” are the coexpression modules of highest importance, as they have 95% confidence intervals (CIs) estimated by bootstrapping that do not include a correlation of 0 and are thus indicative of stable coexpression modules highly contributing to the SA LV1 relationship. In contrast, coexpression modules that we dub as “zero modules” are those whereby the 95% CIs include a correlation of 0 and thus do not reliably contribute to the overall SA LV1 relationship.

Fig. 2.

Normative associations between gene expression and SA or CT are absent in ASD Poor.

Normative associations between gene expression and SA or CT are absent in ASD Poor.

(A) PLS correlations for each gene coexpression module (rows) and each group (columns). Modules with a black outline are non-zero modules where the correlation between gene expression and SA is significantly non-zero, as indicated by 95% bootstrap CIs not encompassing a correlation of 0. These non-zero modules are the strongest contributors to the PLS relationship. All nonoutlined cells are zero modules that are not sufficiently correlated to SA in a non-zero way (e.g., 95% bootstrap CIs include a correlation of 0). (B) BSRs for each brain region in the GCLUST parcellation. Regions with red BSRs have correlations that manifest in the directionality shown in heatmaps in (A). Brain regions with blue BSRs have correlations where the directionality is flipped relative to the heatmaps in (A). Stronger BSRs indicate regions that are more important in driving the SA LV1 relationship. (C) Similarity in PLS correlations between groups. In these scatterplots, each dot is a coexpression module, and the x and y axes indicate the PLS correlations for different groups. Dots colored in dark red and dark blue indicate the non-zero modules (red for positive correlations and blue for negative correlations), while gray dots indicate zero modules. The scatterplots with the orange outline indicate that the PLS SA LV1 relationship manifests similarly for TD and ASD Good. (D to F) The same as (A) to (C) except that they show the association between gene coexpression modules and CT (CT LV1). Non-zero modules for SA LV1 account for a good majority of all genes analyzed (68%) and were highly enriched for broadly expressed genes [enrichment odds ratio (OR) = 3.48, P = 1.90 × 10−71; table S5]. These two observations are compatible with predictions from the omnigenic theory of complex traits, in which variance in complex traits such as ASD is exerted en masse by a large majority of genes that affect the primary tissue of relevance and by genes that are broadly expressed across many organs and tissues, but which also have impact on the brain (). These effects are also in line with similar results observed for large-scale gene expression associations with functional imaging phenotypes in ASD early language outcome subtypes (). Figure 2A also shows that non-zero modules are highly similar for ASD Good and TD groups, whereas hardly any non-zero modules are present for ASD Poor. This similarity between ASD Good and TD can be quantified as a significant positive correlation in the PLS correlation values for these groups (r = 0.55, P = 0.008) (Fig. 2C). This result indicates that the SA LV1 relationship manifests similarly in TD and ASD Good groups. In contrast, there is a lack of correlation between ASD Poor and the other groups (ASD Poor–ASD Good: r = 0.27, P = 0.23; ASD Poor–TD: r = −0.41, P = 0.06) (Fig. 2C). Therefore, SA LV1 can be described as a large-scale SA–gene expression relationship that likely reflects a normative phenomenon present in TD and that is also preserved in the ASD Good subtype. However, this normative SA–gene expression relationship is absent in the ASD Poor subtype. PLS analysis applied to GCLUST CT data isolated two statistically significant LV pairs (CT LV1: d = 4.30, P = 0.0001, split-half Pucorr = 0.04, and Pvcorr = 0.01; CT LV2: d = 3.09, P = 0.0001, split-half Pucorr = 0.02, and Pvcorr = 0.05), explaining 37 and 19% of the covariance between CT and gene expression, respectively. PLS analysis on vertex-wise data produced similar results (fig. S1), but again, it was not as good as the PLS on GCLUST-parcellated features, as indicated by less percentage of covariance explained (CT LV1 = 17%; CT LV2 = 11%) (fig. S2). Similar to SA LV1, non-zero modules for CT LV1 comprise a large majority of all genes examined (65%), are enriched for broadly expressed genes (OR = 2.96, P = 4.43 × 10−43; table S5), and thus are compatible with predictions about omnigenic effects exerted by broadly expressed genes. The relationships are also highly similar for ASD Good and TD, but not ASD Poor (Fig. 2, D and F), which indicates that CT LV1 mostly pertains to a normative relationship preserved across TD and ASD Good, but which is absent in ASD Poor.

Atypical association between gene expression and CT, specific to ASD Poor language subtype

In contrast to CT LV1, the non-zero modules for CT LV2 are almost exclusively relevant for the ASD Poor subtype, comprise about 48% of all genes examined, do not show specific enrichment for broadly expressed genes (table S5), and do not show strong correlations between groups (Fig. 3, A and B). These results indicate that CT LV2 captures a relationship that is specific to ASD Poor. Furthermore, because PLS LVs are orthogonal to each other, CT LV2 is an independent relationship capturing effects that appear primarily in the ASD Poor subtype. While CT LV2’s non-zero modules appear to be somewhat overlapping to CT LV1, the way these modules can affect CT is sometimes opposite to the directionality shown for CT LV1. For example, superior parietal cortex in CT LV1 has negative brain bootstrap ratio (BSR) values (Fig. 2E), while in CT LV2, the BSR values are strong and positive (Fig. 3C). This reversal in BSR values indicates different directionality of the gene expression–CT relationship. In other brain regions such as the language-sensitive left hemisphere perisylvian, middle temporal, and inferior parietal cortices, CT LV1 shows BSRs that are close to 0 in CT LV1 (Fig. 2E), indicating little to no importance of these regions for LV1. However, the BSRs in CT LV2 for these regions are very strong (either blue- or red-colored BSRs in Fig. 3C), indicating that these regions are of strong importance for the CT LV2 relationship. These observations further indicate how CT LV2 captures a specific and independent type of genomic association with CT that is present primarily in ASD Poor.

Fig. 3.

Atypical association between gene expression and CT that is specific to the ASD Poor subtype.

Atypical association specific to ASD Poor between gene coexpression modules and CT, as described in CT LV2. (A) CT LV2 PLS correlations for each gene coexpression module (rows) and each group (columns). Modules with a black outline are non-zero modules where the correlation between gene expression and SA is significantly non-zero, as indicated by 95% bootstrap CIs not encompassing a correlation of 0. Nearly all of the non-zero modules for CT LV2 are present for the ASD Poor subtype. (B) Lack of similarity in CT LV2 PLS correlations between groups. In these scatterplots, each dot is a coexpression module, and the x and y axes indicate the PLS correlations for different groups. Dots colored in dark red and dark blue indicate the non-zero modules (red for positive correlations and blue for negative correlations), while gray dots indicate zero modules. (C) CT LV2 BSRs for each brain region in the GCLUST parcellation. Regions with red BSRs have correlations that manifest in the directionality shown in heatmaps in (A). Brain regions with blue BSRs have correlations where the directionality is flipped relative to the heatmaps in (A). Stronger BSRs indicate regions that are more important in driving the CT LV2 relationship.

Atypical association between gene expression and CT that is specific to the ASD Poor subtype.

A-P and D-V gradient patterning of SA and CT are atypical in the ASD Poor language subtype

We next investigated how large-scale genomic variability patterns SA and CT cortical phenotypes. The patterning of PLS BSR values (Fig. 4A) can be used to answer this question. Brain BSRs can be interpreted as pseudo Z-statistics computed for each brain region and indicate not only the importance of each brain region for the LV in question but also the directionality through which gene expression is associated with SA and CT. It is visually evident from Fig. 4A that BSR patterning is not uniform across cortical regions and varies considerably along A-P and D-V axes. With a two-cluster solution previously identified by Chen and colleagues (, ) to be the genetically parcellated A-P and D-V axes of SA and CT (Fig. 4B), we confirm that BSRs highly differ along these A-P and D-V clusters (Fig. 4D). This indicates that the relationship between gene expression and SA or CT at one pole of the A-P or D-V axes is different relative to the other pole.

Fig. 4.

Cortical patterning along genetic similarity gradients.

Cortical patterning along genetic similarity gradients.

(A) Brain BSR maps for SA LV1 (top), CT LV1 (middle), and CT LV2 (bottom). (B) Coarse two-cluster A-P and D-V genetic similarity partitions identified by Chen and colleagues (, ). (C) Rank ordering of regions by hierarchical genetic similarity found by Chen and colleagues (, ). The rank ordering here defines a genetic similarity gradient for how SA or CT varies across brain regions. Areas rank numbered close together are more genetically similar than regions numbered farther apart. (D) BSRs are differentiated along A-P and D-V axes (SA LV1, top; CT LV1, middle; CT LV2, bottom). (E) BSRs co-vary with genetic similarity gradients (SA LV1, top; CT LV1, middle; CT LV2, bottom). Perhaps even more notable than these differences between binary A-P and D-V partitions is that BSRs also covary along continuous A-P and D-V genetic similarity gradients. After ordering regions by genetic similarity gradients found by Chen and colleagues (, ) (Fig. 4C), we find that BSRs are highly correlated with the ordering along this axis of genetic similarity between regions (Fig. 4E). This indicates that large-scale blood leukocyte gene coexpression relationships with SA and CT reveal how the cortex is genomically patterned to promote the development of cortical regionalization and areal identity (). Because SA LV1 and CT LV1 are normative effects primarily relevant for TD and ASD Good, but not ASD Poor, these results indicate that normative genomic patterning of the cortex does not occur in the ASD Poor subtype. Conversely, CT in the ASD Poor subtype may be patterned in a different way given that CT LV2 was primarily relevant to this subtype and given that the BSR patterning is reversed for CT LV2 compared to CT LV1 (Fig. 3). Given the evidence of focal laminar patches throughout the cortex in ASD (), it will be important for future work to investigate further how these phenomena may be relevant to atypical CT patterning, particularly in the ASD Poor subtype. The use of the GCLUST parcellation does not appear to bias the emergence of these A-P, D-V, or genetic similarity gradients. In a vertex-wise PLS, we find that the patterning of brain BSRs follow the same types of gradients for SA LV1, CT LV1, and CT LV2 (fig. S3). Unlike the genomic patterning effects, we also found that patterning of the group differences in effect size for CT and SA does not follow similar A-P and D-V gradients (fig. S4). This result suggests that these cortical patterning effects are not simply effects that can be seen as on-average group differences in SA or CT, or biases due to the parcellation scheme, and point more toward the specific importance of how the underlying genomic mechanisms act to pattern SA and CT across the cortex.

Genes involved in gradient patterning of SA and CT follow similar gradients of gene expression during prenatal development

Because cortical regionalization begins in early prenatal periods from A-P and D-V gradient patterning of gene expression (, , ), we next assessed whether genes from SA and CT non-zero modules encompass many of the same genes that play important prenatal roles in the genomic gradient patterning of the cortex. Using the prenatal RNA sequencing (RNA-seq) data from the Development PsychENCODE dataset, we used sparse principal components analysis (PCA) () to identify A-P (PC1) and D-V (PC2) gene expression gradients and the most important genes contributing to those gradients from 12 regions of prenatal cortical tissue sampled from 12 to 24 weeks after conception (e.g., midgestation) (Fig. 5, A to C). SA LV1 and CT LV1 non-zero modules are highly enriched for genes that comprise the prenatal A-P and D-V gradients (Fig. 5D). Enrichments were also seen for CT LV2, but unlike SA LV1 and CT LV1, the enrichments were apparent for both zero and non-zero modules (Fig. 5D). These results suggest that the genes responsible for the normative SA LV1 and CT LV1 relationships are also genes in prenatal periods that act to initialize the regionalization and patterning of cortex along A-P and D-V axes. Because SA LV1 and CT LV1 relationships are largely absent in the ASD Poor subtype, this result suggests that the atypical genomic patterning of SA and CT in this subtype could stem from perturbations in earlier prenatal development.

Fig. 5.

Enrichment between PLS non-zero modules and genes involved in prenatal A-P and D-V expression gradients and prenatal cell types.

(A) Cortical brain areas sampled from 12 to 24 weeks after conception from the Development PsychENCODE RNA-seq dataset from Li and colleagues (). Adjustment-for-confounds PCA () was used to isolate (B) A-P (PC1) and (C) D-V (PC2) expression gradients. (D) −log10 P values for enrichment tests of non-zero and zero modules for SA LV1, CT LV1, and CT LV2 for genes isolated from PC1 and PC2. (E to G) Enrichments in prenatal cell types for SA LV1 (E), CT LV1 (F), and CT LV2 (G). Asterisks mark enrichments at FDR q < 0.01. MFC, medial prefrontal cortex; V1C, primary visual cortex; OFC, orbitofrontal cortex; DFC, dorsolateral refrontal cortex; M1C, primary motor cortex; VFC, ventrolateral prefrontal cortex; A1C, primary auditory cortex; ITC, inferior temporal cortex; STC, superior temporal cortex; IPC, posterior inferior parietal cortex; S1C, primary somatosensory cortex.

Enrichment between PLS non-zero modules and genes involved in prenatal A-P and D-V expression gradients and prenatal cell types.

Genes expressed in prenatal progenitor cell types explain SA associations, while genes expressed in later differentiated excitatory neurons explain CT associations

The evidence that SA and CT non-zero modules are enriched for genes that are important for midgestational A-P and D-V expression gradients leaves open the question of what prenatal cell types might explain such effects. The radial unit hypothesis () suggests that symmetric cell division in progenitor cell types (e.g., radial glia) in the ventricular zone leads to a substantial proliferation of radial units that then each become their own cortical columns and thus leads to substantial expansion of SA. Variation in this proliferative process in different parts of the ventricular zone protomap regulates regional differences in SA (, ). Therefore, progenitor cells are the primary cell types expected to explain SA effects. Programmed cell death could also be another mechanism regulating SA () and could implicate microglia involvement in SA. In contrast, CT is likely regulated by asymmetric cell division leading to more neurons within particular cortical columns () and intermediate progenitor (IP) cell types (). CT is also heavily influenced by dendritic arborization (). While arborization changes over development due to a variety of factors such as experience-dependent pruning, CT and the trajectory it follows over development are also known to be heavily influenced by genetic factors even in middle-aged adults, suggesting that individual differences in CT have a genetic and neurodevelopmental origin (). These ideas would support the prediction that relatively later differentiated cell types (compared to progenitor cells), such as excitatory neurons, could explain CT effects. Given that cell type markers from midgestational periods are available (), we next asked whether specific prenatal cell type markers are enriched for genes from SA and CT non-zero modules. In notable agreement with the prediction that progenitor cells explain SA effects (, ), we find that SA LV1 non-zero modules show enrichments for all progenitor cell types—ventricular and outer radial glia (vRG and oRG), cycling progenitors in S and G2M phases of cell cycle (PgS and PgG2M), and IPs. Several non-neuronal cells also show SA LV1 enrichments, including oligodendrocyte precursors (OPCs), endothelial cells (End), and microglia (Mic) (Fig. 5E and table S6). In contrast to these cell type enrichments, there is little evidence of enrichment of genes specific to later differentiated excitatory [maturing excitatory (ExM), migrating excitatory (ExN), maturing excitatory upper enriched (ExM-U), excitatory deep layer 1 (ExDp1), and excitatory deep layer 2 (ExDp2)] and inhibitory [interneuron caudal ganglion eminence (InCGE) and interneuron medial ganglion eminence (InMGE)] neurons. For CT LV1 and LV2, we identify enrichments for vRG and IP progenitor cell types that are compatible with the hypothesized effects of IP cells on CT (). However, CT LV1 and LV2 primarily show a different cell type enrichment profile from SA LV1, through the marked presence of enrichments with several types of excitatory neurons (Fig. 5, F and G, and table S6). This result indicates a notable contrast between the SA LV1 enrichment profile of primarily progenitor cell types and is compatible with the radial unit and protomap hypotheses (), differential SA and CT GWAS (genome wide association studies) enrichments (), and other viewpoints regarding contributors to CT (). These results also highlight the effects of non-neuronal cell types such as microglia cells. Microglia enrichments are present and particularly strong for SA LV1 and CT LV1 non-zero modules. This effect may have implications for programmed cell death and pruning explanations () and may be relevant to ideas behind ASD-relevant broadly expressed genes and their particularly strong effects on non-neuronal cell types such as microglia ().

Non-zero modules are enriched for genes involved in vocal learning

The results so far suggest that SA and CT non-zero modules are highly prenatally relevant for establishing cortical patterning and regionalization and implicate several cell types that may be of mechanistic importance to different ASD early language outcome subtypes. However, are the SA and CT non-zero modules also functionally relevant for processes that are essential for language development? Our prior work showed that PLS non-zero modules associated with speech-related fMRI response () were highly enriched for differentially expressed (DE) genes in Area X from a songbird model of vocal learning (). To test whether similar enrichments held up for SA and CT non-zero modules, we ran enrichment tests with vocal learning DE genes from Hilliard and colleagues (). We find similar types of enrichments between DE songbird vocal learning genes and PLS non-zero modules in SA LV1 (OR = 2.02, P = 1.05 × 10−4) and CT LV1 (OR = 1.90, P = 9.61 × 10−4) but not zero modules (P > 0.08) (Fig. 6, A to C, and table S6). For CT LV2, enrichments were present at false discovery rate (FDR) q < 0.05 (but not FDR q < 0.01) for both non-zero (OR = 1.62, P = 0.006) and zero modules (OR = 1.61, P = 0.017). These effects suggest that many genes responsible for vocal learning in songbirds are conserved and highly represented specifically within SA and CT non-zero modules that are relevant for groups with relatively intact language (e.g., TD and ASD Good).

Fig. 6.

Enrichments between PLS non-zero modules and songbird vocal learning or human-specific genes.

(A to C) Enrichments between DE songbird vocal learning genes and non-zero and zero modules for SA LV1 (A), CT LV1 (B), and CT LV2 (C). (D to F) Enrichments between human-specific genes and non-zero and zero modules for SA LV1 (D), CT LV1 (E), and CT LV2 (F). Asterisks mark enrichments at FDR q < 0.01. HS, human-specific.

Enrichments between PLS non-zero modules and songbird vocal learning or human-specific genes.

Genes within SA non-zero modules are specifically enriched in human-specific genes

Language is a uniquely human ability, and there is some evidence that genes implicated in human-specific evolution are also relevant for autism (, ). In prior work, we found that PLS non-zero modules associated with speech-related fMRI response () were enriched for DE genes in the cortex of humans versus nonhuman primates (i.e., “human-specific” genes). Given that cortical SA is a phenotype that is markedly expanded in human evolution, and much more so than CT, we investigated the hypothesis of whether SA non-zero modules would be specifically enriched for human-specific genes. Using three lists of human DE genes in prenatal, early postnatal, and adulthood periods (), we find that SA LV1 non-zero modules are specifically enriched for prenatal and adulthood human-specific genes (prenatal OR = 1.86, P = 1.93 × 10−3; adulthood OR = 1.97, P = 1.02 × 10−5) (Fig. 6D and table S6). In contrast, no such enrichments are found with genes relevant to CT LV1 or LV2 (Fig. 6, E and F). In addition to DE genes, we also examined genes that are targets of human-accelerated regions (HARs) or human-gained (HGE) or human-lossed enhancer (HLE) regions (). However, no enrichments for SA or CT were identified for HAR, HGE, and HLE genes (Fig. 6, D to F). These results expand on the notion that human-specific genes are of relevance to ASD by showing that the normative genomic mechanisms associated with SA are also genes of importance for human-specific evolution. Given that the SA LV1 relationship is absent in ASD Poor, this suggests that the loss of these normative associations may allow for early SA expansion and possibly early brain overgrowth for ASD Poor.

Non-zero modules are enriched for ASD-associated genes that affect prenatal development

Next, we asked whether SA and CT non-zero modules were relevant for known autism-associated genomic mechanisms. SA LV1 and CT LV1 non-zero or zero modules are not enriched for rare de novo protein-truncating variants (dnPTVs) () or other genes that are annotated as autism-associated in SFARI Gene (). However, CT LV2 non-zero modules were enriched for SFARI ASD genes (table S6). Thus, at the level of ASD risk gene mutations, CT LV2 was the only feature showing enrichments with non-zero modules. This could be compatible with the nature of CT LV2 being mostly specific to the ASD Poor subtype. At the level of genes with evidence of ASD-dysregulated expression from postmortem cortical tissue, we find that both CT LV1 and LV2 non-zero modules were enriched for ASD up-regulated genes (). In contrast, genes from cortically down-regulated coexpression modules () were highly enriched with genes from SA LV1 non-zero modules (table S6). This result shows an interesting contrast between CT and genes that show up-regulated expression versus SA and genes that show down-regulated expression in ASD. Non-zero modules from SA LV1, CT LV1, and CT LV2 are also enriched for coexpression modules that are highly transcriptionally active during prenatal periods and that contain many high-penetrance ASD-related mutations (Fig. 7, A to C, and table S6). This is compatible with the idea that broadly expressed genes can interact and affect key ASD risk genes, particularly in prenatal periods (, ). Downstream targets of highly penetrant genes such as FMR1 and CHD8 were also enriched in non-zero modules from SA LV1, CT LV1, and CT LV2. However, not all of these enrichments are specific to autism-associated genes. Genes differentially expressed in schizophrenia () were also significantly enriched in non-zero modules across SA LV1, CT LV1, and CT LV2.

Fig. 7.

Enrichment between PLS non-zero modules and autism-associated genes.

Enrichment between PLS non-zero modules and autism-associated genes.

(A to C) Enrichments between different autism-associated gene lists and non-zero and zero modules for SA LV1 (A), CT LV1 (B), and CT LV2 (C). (D to F) Enrichments between DE genes in specific cell types in autism and non-zero and zero modules for SA LV1 (D), CT LV1 (E), and CT LV2 (F). Asterisks mark enrichments at FDR q < 0.01. SCZ DE, DE genes in schizophrenia; BD DE, DE genes in bipolar disorder. Last, we examined enrichments with cell type–specific DE genes in autism (). Here, we found that only SA LV1 non-zero modules are enriched for DE genes in microglia cells (Fig. 7D). No other comparisons for DE cell types were statistically significant. See Fig. 7 and table S6 for a summary of autism-associated enrichments. The fact that non-zero modules are devoid of enrichments in most DE genes from specific cell types is compatible with the notion that these genes are of primary relevance for early prenatal periods and will not be a highly discoverable DE signal in postmortem ASD tissue. To aid future work examining specific genes of interest, we focused on identifying high-confidence ASD risk genes (annotated as the “high-confidence” category 1 list in SFARI Gene) that are also SA-relevant and prenatally relevant progenitor and A-P patterning genes (i.e., the intersection of SFARI ASD, SA non-zero modules, PC1 A-P genes, prenatal progenitor cell types, and ASD prenatal coexpression modules). SON and BAZ2B were identified, and these genes play roles in splicing, cell cycle, transcriptional regulation, and chromatin remodeling. For CT LV1 genes, we next searched for high-confidence ASD risk genes that were also prenatally relevant excitatory and D-V patterning genes (i.e., the intersection of SFARI ASD, CT LV1 non-zero modules, PC2 D-V genes, prenatal excitatory cell types, and ASD prenatal coexpression modules). Here, we find ASD high-confidence genes of ATRX, AUTS2, and BCL11A. In a similar search within CT LV2 non-zero modules of prenatal relevance to excitatory neurons and D-V patterning, we identified ATRX, AUTS2, BCL11A, CACNA1E, and MEIS2 as high-confidence ASD risk genes. A common theme of all these CT-relevant genes is their role in chromatin modification and remodeling (with the exception of CACNA1E) and their links to syndromes causing intellectual disability. In addition, with the exceptions of BCL11A and CACNA1E, all SA- and CT-relevant high-confidence genes listed here fall into the broadly expressed gene list, highlighting the importance of these high-impact genes in ASD biology ().

DISCUSSION

These findings represent a substantial enhancement to the mechanistic and clinical precision of our understanding of the prenatal brain basis behind different types of ASD. The evidence here solidifies the idea that the ASD Poor subtype is biologically distinct (, ) by revealing how large-scale functional prenatal genomic signal is differentially associated with structural cortical phenotypes such as CT and SA. Going beyond the idea of whether these subtypes are biologically distinct, we also need an explanation for how diverse genetic mechanisms affecting ASD individuals may converge onto a common atypical downstream biological process and to understand when this process manifests as different during development. The current work gives the first insights into how to answer this question for the critical ASD Poor subtype. The prenatal genomic patterning of the cerebral cortex is the key explanation behind how diverse genetic mechanisms affect brain development for the ASD Poor subtype. Normative genomic patterning gradients emerge in the first and second trimesters of prenatal development along A-P and D-V axes and allow cortical areas to develop distinct cellular, functional, and circuit-level identities (, –). Prenatal cortical arealization processes are critical for circuit formation, maturation of distinct types of neurophysiological response, and the development of regional- and network-level functional specialization that occurs with postnatal experience (). These processes are explained by genetic variation and manifest after birth as patterned effects in CT and SA phenotypes measured with structural MRI data (, ). This normative prenatal genomic cortical patterning effect is absent in the ASD Poor subtype. In addition to the lack of normative genomic cortical patterning of SA and CT, CT in ASD Poor is patterned in a unique manner (e.g., CT LV2) and potentially indicative of a separate route through which genomic pathology affects CT and penetrates up through to the poor language clinical phenotype. There are some caveats and limitations that are necessary to address to interpret the present findings. First, the sample size of this study is moderate to above average for what is typical in most toddlerhood brain imaging and gene expression studies (, ), and this is the first to relate MRI phenotypes such as SA and CT in toddlerhood to large-scale gene expression activity. Thus, future work replicating these findings with larger samples is needed. Within the context of gene expression studies of brain tissue, sample sizes are typically much smaller than the current work and deal with RNA quality that is much lower than what is typical in studies using blood samples. In addition, postmortem brain tissue studies typically have much larger age ranges spanning toddlerhood to adulthood, suggesting that there is much larger age-related heterogeneity in studies of brain tissue compared to blood. Thus, a relative strength of the current work compared to postmortem brain tissue studies is the restricted age range to the early toddler years, which helps enhance sensitivity for very early developmental effects. Combining these two caveats with the rare ability to make stratifications, the current study is ahead of the norms typical for the context of gene expression studies in patients with ASD. Second, it is notable that we did not compare ASD Poor to a non-ASD comparison group with language and/or developmental delay (LD/DD). In prior work, we have shown how the developmental clinical trajectories for a non-ASD LD/DD group are in fact different from the ASD Poor group, suggesting that ASD Poor is developmentally and behaviorally distinguished from LD/DD (). We also showed in prior work that speech-related fMRI response in ASD Poor was distinctly different from a non-ASD LD/DD comparison group (), which again supports the idea that ASD Poor is not simply just a reflection of LD or DD. In the current study, we did not have enough concurrent blood samples and MRI data from enough LD/DD participants for a sufficient comparison group. Future work should attempt to collect these data as a further comparison to ASD Poor to better understand whether the atypical genomic cortical patterning effects are indeed specific to ASD Poor. Third, it is important to clarify that while there is some utility in using blood gene expression to relate to neurodevelopmental mechanisms in autism, there are limitations in how far it can go in highlighting mechanisms that can only be identified in brain tissue. For example, brain-specific genes cannot be adequately assessed in blood, and thus, the findings here do not represent the contributions of such important genomic mechanisms and how they might affect neural phenotypes such as cortical patterning. Furthermore, using blood will not be able to capture tissue-specific effects regarding different isoforms, splicing, and/or epigenetic mechanisms. Despite these limitations, complex traits are theorized to be largely underpinned by omnigenic effects and include genes that are broadly expressed across several tissues other than the tissue of relevance (). Applied to autism, it is known that the genomic landscape of autism includes many genes that are broadly expressed across many tissues and have strong regulatory impact (). Given the inaccessibility of brain tissue in living patients, blood may be a key in vivo window into how some of these types of broadly expressed and regulatory genomic mechanisms affect complex cortical phenotypes in an omnigenic fashion (). In conclusion, in the face of large heterogeneity in the ASD population, the current work indicates that individuals with poor versus good early language outcome are explained by distinct genomic mechanisms that cascade to shape cortical phenotypes and later clinical outcomes. A common downstream impact of the diverse genomic mechanisms found in this work is the emergent effect of atypical genomic patterning of the cerebral cortex in the ASD Poor subtype. This atypical genomic cortical patterning effect points to early prenatal periods and the importance of an omnigenic signal driven by broadly expressed genes. The functional consequences of atypical genomic patterning of the cortex may be the curtailed development of molecular cortical arealization processes that prohibit canonical circuit formation and later regional functional specialization that is likely necessary for facilitating better outcomes in these individuals.

MATERIALS AND METHODS

Participants

This study was approved by the Institutional Review Board at the University of California, San Diego. Parents provided written informed consent according to the Declaration of Helsinki and were paid for their participation. Identical to the approach used in our earlier studies (, , ), toddlers were recruited through two mechanisms: community referrals (e.g., website) or a general population–based screening method called Get SET Early () that allowed for the prospective study of ASD beginning at 12 months based on a toddler’s failure of the CSBS-DP (Communication and Symbolic Behavior Scales Developmental Profile) Infant-Toddler Checklist (, ). All toddlers were tracked from an intake assessment around 12 months and followed roughly every 12 months until 3 to 4 years of age. All toddlers, including normal control participants, participated in a series of tests collected longitudinally across all visits, including the Autism Diagnostic Observation Schedule (Module T, 1, or 2) (), the Mullen Scales of Early Learning (), and the Vineland Adaptive Behavior Scales (). All testing occurred at the University of California, San Diego Autism Center of Excellence. Stratification of ASD Poor versus ASD Good was made on the basis of Mullen EL and RL T scores. An ASD toddler was classified as ASD Poor if both Mullen EL and RL T scores at the final outcome assessment was below 1 SD of the T score norm of 50 (i.e., T < 40). ASD Good labels were made if the toddler had either Mullen EL or RL T scores within 1 SD or above the normative T score of 50 (i.e., T ≥ 40). A total of n = 123 toddlers had T1 structural MRI and gene expression data available. From these 123 toddlers, n = 76 ASD individuals were examined and were split into the two language outcome subtypes—ASD Poor, n = 38 (32 males and 6 females; mean age at MRI scan = 29.01 months, SD at MRI scan = 7.22, range = 12 to 50 months); ASD Good, n = 38 (28 males and 10 females; mean age at MRI scan = 29.02 months, SD at MRI scan = 9.55, range = 14 to 46 months); and TD, n = 47 (25 males and 22 females; mean age at MRI scan = 25.91 months, SD at MRI scan = 10.44, range = 13 to 46 months). ASD subtypes and TD did not statistically differ in age at the time of scanning [F(2,120) = 1.62, P = 0.20]. For more demographic and phenotypic information, please see table S1.

Blood sample collection, leukocyte capture, RNA extraction, quality control, and sample preparation

Four to six milliliters of blood were collected into EDTA-coated tubes from toddlers on visits when they had no fever, cold, flu, infections or other illnesses, or use of medications for illnesses 72 hours before blood draw. Blood samples were passed over a LeukoLOCK filter (Ambion, Austin, TX, USA) to capture and stabilize leukocytes and immediately placed in a −20°C freezer. Given the role of the immune system in autism () as well as interactions between the brain and the immune system (), immune cells in blood such as leukocytes were specifically examined. This choice also allows for constraint on the cell types for which RNA might arise from in blood because whole blood is a bulk sample and RNA could potentially come from many different cell types (e.g., leukocytes and platelets). Total RNA was extracted following standard procedures and the manufacturer’s instructions (Ambion, Austin, TX, USA). LeukoLOCK disks (Ambion, catalog no. 1933) were freed from RNAlater, and TRI Reagent (Ambion, catalog no. 9738) was used to flush out the captured lymphocyte and lyse the cells. RNA was subsequently precipitated with ethanol and purified though washing and cartridge-based steps. The quality of mRNA samples was quantified by the RNA integrity number (RIN), values of 7.0 or greater were considered acceptable (), and all processed RNA samples passed RIN quality control. Quantification of RNA was performed using NanoDrop (Thermo Fisher Scientific, Wilmington, DE, USA). Samples were prepped in 96-well plates at the concentration of 25 ng/μl.

Gene expression and data processing

RNA was assayed at Scripps Genomic Medicine (La Jolla, CA, USA) for labeling, hybridization, and scanning with the Illumina BeadChips pipeline (Illumina, San Diego, CA, USA) per the manufacturer’s instruction. All arrays were scanned with the Illumina BeadArray Reader and read into Illumina GenomeStudio software (version 1.1.1). Raw data were exported from Illumina GenomeStudio, and data preprocessing was performed using the lumi package () for R (www.R-project.org) and Bioconductor (www.bioconductor.org) (). Raw and normalized data are part of larger sets deposited in the Gene Expression Omnibus (GEO) database (GSE42133; GSE111175). A larger primary dataset of blood leukocyte gene expression was available from 383 samples from 314 toddlers with the age range of 1 to 4 years old. The samples were assayed using the Illumina microarray platform on three batches. The datasets were combined by matching the Illumina Probe ID and probe nucleotide sequences. The final set included a total of 20,194 gene probes. Quality control analysis was performed to identify and remove 23 outlier samples from the dataset. Samples were marked as outlier if they showed low signal intensity (average signal 2 SD lower than the overall mean), deviant pairwise correlations, deviant cumulative distributions, deviant multidimensional scaling plots, or poor hierarchical clustering, as described elsewhere (). The high-quality dataset included 360 samples from 299 toddlers. High reproducibility was observed across technical replicates (mean Spearman correlation of 0.97 and median of 0.98). Thus, we randomly removed one of each of two technical replicates from the primary dataset. From the participants in the larger primary dataset, n = 123 also had MRI data, and thus, a total of n = 105 from the Illumina HT12 platform along with n = 18 from the Illumina WG6 platform were used in this study. Batch was not asymmetrically distributed across one subgroup more than another, as chi-square analyses on the contingency table between subgroup and batch show no effect [χ2(4) = 0.84, P = 0.93]. ASD subtypes and TD toddlers also did not statistically differ in age at the time of blood sampling [F(2,120) = 1.27, P = 0.28]. The 20,194 probes were then collapsed to 14,426 genes based on picking the probe with maximal mean expression across samples. Data were quantile-normalized and then adjusted for batch effects, sex, and RIN. These batch-, sex-, and RIN-adjusted data were used in all further downstream analyses. We also checked for differences in proportion estimates of different leukocyte cell types (i.e., neutrophils, B cells, T cells, natural killer cells, and monocytes) using the CellCODE deconvolution method () but found no evidence of differences across groups for any cell type (see table S7).

Weighted gene coexpression network analysis

We reduced the number of features in the gene expression dataset from 14,426 genes down to 21 modules of tightly coexpressed genes. This data reduction step was achieved using WGCNA, implemented within the WGCNA library in R (). Correlation matrices estimated with the robust correlation measure of biweight midcorrelation were computed and then converted into adjacency matrices that retain the sign of the correlation. These adjacency matrices were then raised to a soft power of 16 (fig. S5). This soft power was chosen by finding the first soft power where a measure of R2 scale-free topology model fit saturates. The soft power–thresholded adjacency matrix was then converted into a topological overlap matrix (TOM) and then a TOM dissimilarity matrix (e.g., 1-TOM). The TOM dissimilarity matrix was then input into agglomerative hierarchical clustering using the average linkage method. Gene modules were defined from the resulting clustering tree, and branches were cut using a hybrid dynamic tree cutting algorithm (deepSplit parameter = 4) (fig. S5). Modules were merged at a cut height of 0.2, and the minimum module size was set to 100. Only genes with a module membership of r > 0.2 were retained within modules. For each gene module, a summary measure called the module eigengene was computed as the first PC of the scaled (standardized) module expression profiles. We also computed module membership for each gene and module. Module membership indicates the correlation between each gene and the module eigengene (see table S4). Genes that could not be clustered into any specific module are left within the M0 module, and this module was not considered in any further analyses. Further WGCNA analyses were run separately within each group to check for preservation of detected modules across groups at a soft power threshold of 16. These analyses all indicated high levels of preservation (Zsummary > 10) () for all detected modules for each pairwise group comparison (fig. S6).

MRI data acquisition and analyses

Imaging data were collected on a 1.5-T General Electric MRI scanner during natural sleep at night; no sedation was used. Structural MRI data were collected with a T1-weighted IR-FSPGR (inversion recovery fast-spoiled prepared gradient recalled) sagittal protocol [TE (echo time) = 2.8 ms, TR (repetition time) = 6.5 ms, flip angle = 12°, bandwidth = 31.25 kHz, field of view = 24 cm, and slice thickness = 1.2 mm]. Cortical surface reconstruction was performed using FreeSurfer v5.3 (http://surfer.nmr.mgh.harvard.edu/) (–), which uses routinely acquired T1-weighted MRI volumes (), includes tools for estimation of brain morphometry measures such as CT and SA (, ), and enables interparticipant alignment via nonlinear, surface-based registration to an average brain, driven by cortical folding patterns (). FreeSurfer has been validated for use in children () and used successfully in large pediatric studies (, ). Total CV, SA, and mean CT were computed on the basis of the Desikan-Killiany parcellation. Regional SA and CT values were computed from a 12-region parcellation reported by Chen and colleagues (, ) based on genetic similarity in monozygotic twins. This parcellation scheme, known as GCLUST, is highly relevant for our purposes here because the parcellations are based on genetic patterning and has also been effectively used in developmental samples (). Thus, GCLUST should help increase statistical power while also minimizing multiple comparisons. The GCLUST parcellation is also important as it can be used to leverage information about genetic similarity gradients (e.g., rank ordering of regions by fuzzy clustering) in further analyses. The two-cluster A-P or D-V partitions found by Chen and colleagues (, ) are also relevant in further analyses for A-P and D-V gradient questions. For all 12 regions of the SA and CT GCLUST parcellation, global effects were controlled for by dividing SA values by the mean SA, and for CT, we subtracted the mean CT from each region, as was done in prior papers using this parcellation scheme (, ).

MRI–gene expression association analysis

To assess multivariate MRI–gene expression relationships, we used PLS analysis (). PLS is widely used in the neuroimaging literature, particularly when explaining multivariate neural responses in terms of multivariate behavioral patterns of variation or a design matrix. Given that the current dataset is massively multivariate both in terms of MRI and gene expression datasets, we used PLS to elucidate how variation in SA or CT covaries with gene expression as measured by module eigengene values of coexpression modules. PLS allows for identifying these relationships by finding latent MRI–gene expression variable pairs (LV) that maximally explain covariation in the dataset and that are uncorrelated with other MRI–gene expression LV pairs. The strength of such covariation is denoted by the singular value (d) for each brain–gene expression LV, and hypothesis tests are made via using permutation tests on the singular values. Furthermore, identifying brain regions that most strongly contribute to each LV pair is achieved via bootstrapping, whereby a BSR is created for each region, and represents the reliability of that region for contributing strongly to the LV pattern identified. The brain BSR is roughly equivalent to a Z-statistic and can be used to threshold data to find voxels that reliably contribute to an LV pair. The PLS analyses reported here were implemented within the plsgui MATLAB toolbox (www.rotman-baycrest.on.ca/pls/). Here, we ran two separate PLS analyses, one on SA and another on CT. Neuroimaging data entered into the PLS analyses come from the 12-region GCLUST parcellations for SA and CT. Because the TD group differed in the proportion and males versus females compared to the ASD groups, we used a linear model to remove the effect of sex from the SA and CT data. These SA and CT data with the sex effect removed were input into the PLS analysis. For gene expression data, we input module eigengene values for all 21 coexpression modules. For statistical inference on identified MRI–gene expression LV pairs, a permutation test was run with 10,000 permutations. To identify reliably contributing regions for MRI–gene expression LVs and to compute 95% CIs on MRI–gene expression correlations, bootstrapping was used with 10,000 resamples. Gene coexpression modules whereby 95% CIs do not encompass 0 are denoted as non-zero association modules. All other modules where 95% CIs include 0 are denoted as “zero” modules. In addition, we ran 10,000 split-half resamples whereby the correlation between brain and gene expression saliences (Ucorr and Dcorr) was computed between the 2 split-halves. These correlations between split-half saliences were then compared to the null distribution from 10,000 permutations to compute P values (Pucorr and Pdcorr), which statistically test the reliability of salience patterns in split-half resamples (). From the PLS results, we tested whether groups show similar correlation patterns across modules. To test this question, we computed Pearson correlations on the PLS correlation values for all pairwise group comparisons. Groups with similar PLS correlations will show statistically significant correlations. We also used the BSRs from the PLS analysis to identify whether BSRs covary along the genetic similarity gradients and A-P and D-V partitions found by Chen and colleagues (, ). Pearson correlations were used to identify correlations with genetic similarity gradients, while independent-sample t tests were used to compare A-P and D-V partitions. All PLS analyses were computed on vertex-wise data and GCLUST-parcellated data. This analysis used the same parameters (10,000 permutations and 10,000 bootstrap resamples) as the GCLUST analysis and was computed on the sex and mean SA- or CT-adjusted vertex-wise data. A-P and D-V distinctions were assessed on vertex-wise brain BSR data with violin boxplots (fig. S3) separated by A-P or D-V partitions. Genetic similarity gradient organization was assessed by first computing the median BSR for each GCLUST parcel and then computing the correlation between median BSR and the genetic similarity gradient rank ordering from GCLUST. Similarity in gene coexpression module PLS correlations between GCLUST and this vertex-wise analysis was also computed as Pearson correlations from the PLS correlation values for each module and group. Last, to assess which model (GCLUST or vertex-wise) was a better model, we assessed which model had the highest percentage covariance explained. Comparison between GCLUST and vertex-wise PLS results can be seen in figs. S1 to S3.

Gene set enrichment analyses

We analyze enrichment between genes from PLS non-zero and zero modules and a host of other gene lists defined by a variety of criteria (see below for details). For these gene set enrichment analyses, we used a custom R code written by M.V.L. (https://github.com/mvlombardo/utils/blob/master/genelistOverlap.R) that computes hypergeometric P values and enrichment ORs. The background pool for these enrichment tests was always set to 14,426. After all enrichment tests were computed, results are interpreted only if the enrichment was statistically significant after FDR correction for multiple comparisons at a threshold of FDR q < 0.01.

Prenatal gene expression gradients and cell types

To assess gradients in prenatal gene expression, we used RNA-seq data from the Development PsychENCODE dataset (http://development.psychencode.org) (). The data used were already preprocessed as described by Li and colleagues () (e.g., normalized and batch effects removed) and summarized to RPKM (reads per kilobase million). Sample data from all 12 available cortical regions from 12 to 22 weeks after conception were used to capture the midgestational window of interest. Before running the analysis, we removed low-expressing genes with log2(RPKM) below 2. The primary analysis to identify expression gradients was an adjustment-for-confounds PCA (AC-PCA) (), which allowed for adjustment due to repeat measurements from the same donor across sampled brain regions. Rank ordering of regions by A-P and D-V axes was used to statistically confirm that PC1 and PC2 components follow A-P and D-V gradients. Subsets of the most important genes for the top two PCs were identified with a sparse AC-PCA analysis, whereby the sparsity parameter, c2, was selected on the basis of a grid search with 10-fold cross validation. These PC1 and PC2 gene sets were used in enrichment tests with PLS non-zero or zero modules. We also examined enrichments between PLS non-zero and zero modules and prenatal cell types identified from single-cell RNA-seq on midgestational prenatal brain tissue (). These cell types included several classes of progenitor cells (vRG, oRG, PgS, PgG2M, and IPs), excitatory neurons (ExN, ExM, ExM-U, ExDp1, and ExDp2), inhibitory neurons (InCGE and InMGE), and other non-neuronal cell types (OPCs, Per, End, and Mic).

Tissue-specific enrichments

To better understand how genes expressed in blood leukocytes could be brain relevant, we annotated gene coexpression modules based on enrichments in genes known from expression across multiple tissues to be either broadly expressed or brain specific. Both of these categories contain genes that are expressed in cortical tissue but differ in the pattern of expression across other non-neuronal tissues. To define these lists, we downloaded transcript per million (TPM) normalized gene expression from 10,259 samples across 26 tissues from the GTEx dataset (www.gtexportal.org) (). In addition to brain and nerve tissue, the dataset included transcriptome data from 24 non-neuronal tissues, including the following: adipose, adrenal gland, blood vessel, breast, blood, skin, colon, esophagus, heart, liver, lung, salivary gland, muscle, ovary, pancreas, pituitary, prostate, small intestine, spleen, stomach, testis, thyroid, uterus, and vagina. We next defined a gene expressed in a tissue if it met two criteria. First, the gene TPM expression level was ≥3 in at least half of the samples from the tissue. Second, the median expression of the gene was equal or larger than its 25th percentile expression in GTEx cortex samples. The second criterion was included to account for the differences in the base expression level of the genes and their dosage-dependent translation and function. Broadly expressed genes were defined as genes that were expressed in ≥50% of non-neuronal tissues (i.e., tissues other than brain and nerve). The broadly expressed and brain-specific genes included genes that were expressed in the adult cortex based on the GTEx dataset.

Vocal learning enrichments

To test for enrichment between PLS non-zero modules and gene sets of functional relevance for language processes, we examined genes that are differentially expressed in a songbird vocal learning model. Songbirds are often used as animal models relevant for the vocal learning component of language (). We investigated enrichments with DE genes taken from a microarray dataset of Area X of songbirds (). To identify DE genes between singing and nonsinging birds, we reanalyzed this dataset (GEO accession ID: GSE34819) using limma (), and DE genes were identified if they passed Storey FDR q < 0.05 (). These DE genes were also used for enrichment tests in our prior work examining gene expression relationships with language-relevant functional neural phenotypes measured with fMRI ().

Human-specific enrichments

Given the uniquely human nature of language, we also tested hypotheses regarding enrichments with genes that are transcriptionally different in the cortical tissue between humans and other nonhuman primates across prenatal, early postnatal, and adult periods (). In addition, we also examined enrichments with genes linked to HARs, HGEs in prenatal and adult tissue, and HLEs ().

Autism-associated enrichments

Ample evidence suggests that prenatal periods are critical for ASD (, ). To test enrichment with prenatal ASD-associated coexpression modules, we used coexpression modules from a study that analyzed the Allen Institute BrainSpan dataset (). Parikshak and colleagues () analyzed only cortical regions from BrainSpan and identified M2 and M3 as prenatally active and enriched for rare protein-truncating variants with high penetrance for ASD. We also tested enrichments with gene lists known to be associated with ASD, either from genetic evidence or evidence from cortical transcriptomic dysregulation. In particular, we examined a list of 102 rare dnPTVs associated with ASD (); genes listed as ASD-associated in SFARI Gene (https://gene.sfari.org) in categories S, 1, 2, and 3 (downloaded on 16 July 2020) (); and DE genes and cortical coexpression modules measured from ASD postmortem frontal and temporal cortex tissue (, ). To contrast ASD DE genes to genes that are DE in other psychiatric diagnoses that are genetically correlated with autism, we also used DE genes in schizophrenia and bipolar disorder from the same study that identified ASD DE genes (). To go beyond DE genes identified in bulk tissue samples, we also examined ASD DE genes identified in specific cell types, particularly excitatory (ASD DE Excitatory) and inhibitory (ASD DE Inhibitory) neurons, microglia (ASD DE Microglia), astrocytes (ASD DE Astrocyte), oligodendrocytes (ASD DE Oligodendrocyte), and endothelial (ASD DE Endothelial) cells (). Last, we also tested for enrichments with known downstream targets of highly penetrant mutations known to be associated with ASD—FMRP and CHD8. For each, we had lists of downstream targets for two independent studies (–), where the overlap for FMRP targets was 3.71 and 27.61% for CHD8 targets.

73 in total

1. lumi: a pipeline for processing Illumina microarray.

Authors: Pan Du; Warren A Kibbe; Simon M Lin
Journal: Bioinformatics Date: 2008-05-08 Impact factor: 6.937

2. A new initiative on precision medicine.

Authors: Francis S Collins; Harold Varmus
Journal: N Engl J Med Date: 2015-01-30 Impact factor: 91.245

Review 3. Insights into the Neural and Genetic Basis of Vocal Communication.

Authors: Genevieve Konopka; Todd F Roberts
Journal: Cell Date: 2016-03-10 Impact factor: 41.582

4. Molecular microcircuitry underlies functional specification in a basal ganglia circuit dedicated to vocal learning.

Authors: Austin T Hilliard; Julie E Miller; Elizabeth R Fraley; Steve Horvath; Stephanie A White
Journal: Neuron Date: 2012-02-09 Impact factor: 17.173

5. A pediatric structural MRI analysis of healthy brain development from newborns to young adults.

Authors: Jacob Levman; Patrick MacDonald; Ashley Ruyan Lim; Cynthia Forgeron; Emi Takahashi
Journal: Hum Brain Mapp Date: 2017-09-12 Impact factor: 5.038

6. A perturbed gene network containing PI3K-AKT, RAS-ERK and WNT-β-catenin pathways in leukocytes is linked to ASD genetics and symptom severity.

Authors: Vahid H Gazestani; Tiziano Pramparo; Srinivasa Nalabolu; Benjamin P Kellman; Sarah Murray; Linda Lopez; Karen Pierce; Eric Courchesne; Nathan E Lewis
Journal: Nat Neurosci Date: 2019-09-23 Impact factor: 24.884

Review 7. Big data approaches to decomposing heterogeneity across the autism spectrum.

Authors: Michael V Lombardo; Meng-Chuan Lai; Simon Baron-Cohen
Journal: Mol Psychiatry Date: 2019-01-07 Impact factor: 15.992

8. The genetic architecture of the human cerebral cortex.

Authors: Katrina L Grasby; Neda Jahanshad; Jodie N Painter; Lucía Colodro-Conde; Janita Bralten; Derrek P Hibar; Penelope A Lind; Fabrizio Pizzagalli; Jason L Stein; Paul M Thompson; Sarah E Medland; Christopher R K Ching; Mary Agnes B McMahon; Natalia Shatokhina; Leo C P Zsembik; Sophia I Thomopoulos; Alyssa H Zhu; Lachlan T Strike; Ingrid Agartz; Saud Alhusaini; Marcio A A Almeida; Dag Alnæs; Inge K Amlien; Micael Andersson; Tyler Ard; Nicola J Armstrong; Allison Ashley-Koch; Joshua R Atkins; Manon Bernard; Rachel M Brouwer; Elizabeth E L Buimer; Robin Bülow; Christian Bürger; Dara M Cannon; Mallar Chakravarty; Qiang Chen; Joshua W Cheung; Baptiste Couvy-Duchesne; Anders M Dale; Shareefa Dalvie; Tânia K de Araujo; Greig I de Zubicaray; Sonja M C de Zwarte; Anouk den Braber; Nhat Trung Doan; Katharina Dohm; Stefan Ehrlich; Hannah-Ruth Engelbrecht; Susanne Erk; Chun Chieh Fan; Iryna O Fedko; Sonya F Foley; Judith M Ford; Masaki Fukunaga; Melanie E Garrett; Tian Ge; Sudheer Giddaluru; Aaron L Goldman; Melissa J Green; Nynke A Groenewold; Dominik Grotegerd; Tiril P Gurholt; Boris A Gutman; Narelle K Hansell; Mathew A Harris; Marc B Harrison; Courtney C Haswell; Michael Hauser; Stefan Herms; Dirk J Heslenfeld; New Fei Ho; David Hoehn; Per Hoffmann; Laurena Holleran; Martine Hoogman; Jouke-Jan Hottenga; Masashi Ikeda; Deborah Janowitz; Iris E Jansen; Tianye Jia; Christiane Jockwitz; Ryota Kanai; Sherif Karama; Dalia Kasperaviciute; Tobias Kaufmann; Sinead Kelly; Masataka Kikuchi; Marieke Klein; Michael Knapp; Annchen R Knodt; Bernd Krämer; Max Lam; Thomas M Lancaster; Phil H Lee; Tristram A Lett; Lindsay B Lewis; Iscia Lopes-Cendes; Michelle Luciano; Fabio Macciardi; Andre F Marquand; Samuel R Mathias; Tracy R Melzer; Yuri Milaneschi; Nazanin Mirza-Schreiber; Jose C V Moreira; Thomas W Mühleisen; Bertram Müller-Myhsok; Pablo Najt; Soichiro Nakahara; Kwangsik Nho; Loes M Olde Loohuis; Dimitri Papadopoulos Orfanos; John F Pearson; Toni L Pitcher; Benno Pütz; Yann Quidé; Anjanibhargavi Ragothaman; Faisal M Rashid; William R Reay; Ronny Redlich; Céline S Reinbold; Jonathan Repple; Geneviève Richard; Brandalyn C Riedel; Shannon L Risacher; Cristiane S Rocha; Nina Roth Mota; Lauren Salminen; Arvin Saremi; Andrew J Saykin; Fenja Schlag; Lianne Schmaal; Peter R Schofield; Rodrigo Secolin; Chin Yang Shapland; Li Shen; Jean Shin; Elena Shumskaya; Ida E Sønderby; Emma Sprooten; Katherine E Tansey; Alexander Teumer; Anbupalam Thalamuthu; Diana Tordesillas-Gutiérrez; Jessica A Turner; Anne Uhlmann; Costanza Ludovica Vallerga; Dennis van der Meer; Marjolein M J van Donkelaar; Liza van Eijk; Theo G M van Erp; Neeltje E M van Haren; Daan van Rooij; Marie-José van Tol; Jan H Veldink; Ellen Verhoef; Esther Walton; Mingyuan Wang; Yunpeng Wang; Joanna M Wardlaw; Wei Wen; Lars T Westlye; Christopher D Whelan; Stephanie H Witt; Katharina Wittfeld; Christiane Wolf; Thomas Wolfers; Jing Qin Wu; Clarissa L Yasuda; Dario Zaremba; Zuo Zhang; Marcel P Zwiers; Eric Artiges; Amelia A Assareh; Rosa Ayesa-Arriola; Aysenil Belger; Christine L Brandt; Gregory G Brown; Sven Cichon; Joanne E Curran; Gareth E Davies; Franziska Degenhardt; Michelle F Dennis; Bruno Dietsche; Srdjan Djurovic; Colin P Doherty; Ryan Espiritu; Daniel Garijo; Yolanda Gil; Penny A Gowland; Robert C Green; Alexander N Häusler; Walter Heindel; Beng-Choon Ho; Wolfgang U Hoffmann; Florian Holsboer; Georg Homuth; Norbert Hosten; Clifford R Jack; MiHyun Jang; Andreas Jansen; Nathan A Kimbrel; Knut Kolskår; Sanne Koops; Axel Krug; Kelvin O Lim; Jurjen J Luykx; Daniel H Mathalon; Karen A Mather; Venkata S Mattay; Sarah Matthews; Jaqueline Mayoral Van Son; Sarah C McEwen; Ingrid Melle; Derek W Morris; Bryon A Mueller; Matthias Nauck; Jan E Nordvik; Markus M Nöthen; Daniel S O'Leary; Nils Opel; Marie-Laure Paillère Martinot; G Bruce Pike; Adrian Preda; Erin B Quinlan; Paul E Rasser; Varun Ratnakar; Simone Reppermund; Vidar M Steen; Paul A Tooney; Fábio R Torres; Dick J Veltman; James T Voyvodic; Robert Whelan; Tonya White; Hidenaga Yamamori; Hieab H H Adams; Joshua C Bis; Stephanie Debette; Charles Decarli; Myriam Fornage; Vilmundur Gudnason; Edith Hofer; M Arfan Ikram; Lenore Launer; W T Longstreth; Oscar L Lopez; Bernard Mazoyer; Thomas H Mosley; Gennady V Roshchupkin; Claudia L Satizabal; Reinhold Schmidt; Sudha Seshadri; Qiong Yang; Marina K M Alvim; David Ames; Tim J Anderson; Ole A Andreassen; Alejandro Arias-Vasquez; Mark E Bastin; Bernhard T Baune; Jean C Beckham; John Blangero; Dorret I Boomsma; Henry Brodaty; Han G Brunner; Randy L Buckner; Jan K Buitelaar; Juan R Bustillo; Wiepke Cahn; Murray J Cairns; Vince Calhoun; Vaughan J Carr; Xavier Caseras; Svenja Caspers; Gianpiero L Cavalleri; Fernando Cendes; Aiden Corvin; Benedicto Crespo-Facorro; John C Dalrymple-Alford; Udo Dannlowski; Eco J C de Geus; Ian J Deary; Norman Delanty; Chantal Depondt; Sylvane Desrivières; Gary Donohoe; Thomas Espeseth; Guillén Fernández; Simon E Fisher; Herta Flor; Andreas J Forstner; Clyde Francks; Barbara Franke; David C Glahn; Randy L Gollub; Hans J Grabe; Oliver Gruber; Asta K Håberg; Ahmad R Hariri; Catharina A Hartman; Ryota Hashimoto; Andreas Heinz; Frans A Henskens; Manon H J Hillegers; Pieter J Hoekstra; Avram J Holmes; L Elliot Hong; William D Hopkins; Hilleke E Hulshoff Pol; Terry L Jernigan; Erik G Jönsson; René S Kahn; Martin A Kennedy; Tilo T J Kircher; Peter Kochunov; John B J Kwok; Stephanie Le Hellard; Carmel M Loughland; Nicholas G Martin; Jean-Luc Martinot; Colm McDonald; Katie L McMahon; Andreas Meyer-Lindenberg; Patricia T Michie; Rajendra A Morey; Bryan Mowry; Lars Nyberg; Jaap Oosterlaan; Roel A Ophoff; Christos Pantelis; Tomas Paus; Zdenka Pausova; Brenda W J H Penninx; Tinca J C Polderman; Danielle Posthuma; Marcella Rietschel; Joshua L Roffman; Laura M Rowland; Perminder S Sachdev; Philipp G Sämann; Ulrich Schall; Gunter Schumann; Rodney J Scott; Kang Sim; Sanjay M Sisodiya; Jordan W Smoller; Iris E Sommer; Beate St Pourcain; Dan J Stein; Arthur W Toga; Julian N Trollor; Nic J A Van der Wee; Dennis van 't Ent; Henry Völzke; Henrik Walter; Bernd Weber; Daniel R Weinberger; Margaret J Wright; Juan Zhou
Journal: Science Date: 2020-03-20 Impact factor: 63.714

9. Cell cycle networks link gene expression dysregulation, mutation, and brain maldevelopment in autistic toddlers.

Authors: Tiziano Pramparo; Michael V Lombardo; Kathleen Campbell; Cynthia Carter Barnes; Steven Marinero; Stephanie Solso; Julia Young; Maisi Mayo; Anders Dale; Clelia Ahrens-Barbeau; Sarah S Murray; Linda Lopez; Nathan Lewis; Karen Pierce; Eric Courchesne
Journal: Mol Syst Biol Date: 2015-12-14 Impact factor: 11.429

10. Hierarchy of transcriptomic specialization across human cortex captured by structural neuroimaging topography.

Authors: Joshua B Burt; Murat Demirtaş; William J Eckner; Natasha M Navejar; Jie Lisa Ji; William J Martin; Alberto Bernacchia; Alan Anticevic; John D Murray
Journal: Nat Neurosci Date: 2018-08-06 Impact factor: 24.884

4 in total

1. Neural responses to affective speech, including motherese, map onto clinical and social eye tracking profiles in toddlers with ASD.

Authors: Yaqiong Xiao; Teresa H Wen; Lauren Kupis; Lisa T Eyler; Disha Goel; Keith Vaux; Michael V Lombardo; Nathan E Lewis; Karen Pierce; Eric Courchesne
Journal: Nat Hum Behav Date: 2022-01-03

2. A predictive ensemble classifier for the gene expression diagnosis of ASD at ages 1 to 4 years.

Authors: Bokan Bao; Javad Zahiri; Vahid H Gazestani; Linda Lopez; Yaqiong Xiao; Raphael Kim; Teresa H Wen; Austin W T Chiang; Srinivasa Nalabolu; Karen Pierce; Kimberly Robasky; Tianyun Wang; Kendra Hoekzema; Evan E Eichler; Nathan E Lewis; Eric Courchesne
Journal: Mol Psychiatry Date: 2022-10-20 Impact factor: 13.437

3. Association between resting-state functional brain connectivity and gene expression is altered in autism spectrum disorder.

Authors: Stefano Berto; Alex H Treacher; Emre Caglayan; Danni Luo; Jillian R Haney; Michael J Gandal; Daniel H Geschwind; Albert A Montillo; Genevieve Konopka
Journal: Nat Commun Date: 2022-06-09 Impact factor: 17.694

4. Rethinking Our Concepts and Assumptions About Autism.

Authors: Michael V Lombardo; Veronica Mandelli
Journal: Front Psychiatry Date: 2022-06-03 Impact factor: 5.435

4 in total