Literature DB >> 34469739

Full-length isoform transcriptome of the developing human brain provides further insights into autism.

Kevin K Chau¹, Pan Zhang², Jorge Urresti², Megha Amar², Akula Bala Pramod², Jiaye Chen², Amy Thomas², Roser Corominas², Guan Ning Lin², Lilia M Iakoucheva³.

Abstract

Alternative splicing plays an important role in brain development, but its global contribution to human neurodevelopmental diseases (NDDs) requires further investigation. Here we examine the relationships between splicing isoform expression in the brain and de novo loss-of-function mutations from individuals with NDDs. We analyze the full-length isoform transcriptome of the developing human brain and observe differentially expressed isoforms and isoform co-expression modules undetectable by gene-level analyses. These isoforms are enriched in loss-of-function mutations and microexons, are co-expressed with a unique set of partners, and have higher prenatal expression. We experimentally test the effect of splice-site mutations and demonstrate exon skipping in five NDD risk genes, including SCN2A, DYRK1A, and BTRC. Our results suggest that the splice site mutation in BTRC reduces translational efficiency, likely affecting Wnt signaling through impaired degradation of β-catenin. We propose that functional effects of mutations should be investigated at the isoform- rather than gene-level resolution.

Entities: Chemical

Keywords: alternative splicing; autism risk gene; autism spectrum disorder; co-expression module; human brain development; isoform transcriptome; neurodevelopmental disease; protein interaction network; splice-site mutations; splicing isoform expression

Mesh：

Substances：

Year: 2021 PMID： 34469739 PMCID： PMC8437376 DOI： 10.1016/j.celrep.2021.109631

Source DB: PubMed Journal: Cell Rep Impact factor: 9.423

INTRODUCTION

More than 95% of multi-exon human genes undergo alternative splicing (AS) and/or use alternative promoters to increase transcriptomic and proteomic diversity, with an estimated average of five to seven isoforms transcribed per gene (Pan et al., 2008; Steijger et al., 2013; Wang et al., 2008). AS is highly specific, and expression of isoforms is often restricted to certain organs, tissues, or cell types (Barbosa-Morais et al., 2012; Sapkota et al., 2019; Shalek et al., 2013; Trapnell et al., 2010). In addition, many isoforms are expressed only during specific developmental periods (Kalsotra and Cooper, 2011). Alternatively spliced isoforms encoded by the same gene can also be expressed at different levels in the same tissue or during the same developmental period (Wang et al., 2008). The developing human brain has one of the highest frequencies of AS events (Calarco et al., 2011; Melé et al., 2015; Raj and Blencowe, 2015; Yeo et al., 2004). Many of the processes that occur during neural development, including cell fate determination, neuronal migration, axon guidance, and synaptogenesis, are controlled by differentially expressed alternatively spliced isoforms (Grabowski, 2011; Kim et al., 2013; Li et al., 2007). Several recent studies, including one by us, began to investigate isoform-level transcriptome dysregulation in psychiatric diseases (Gandal et al., 2018; Li et al., 2018; Parikshak et al., 2016). However, spatiotemporal full-length isoform transcriptome of the developing human brain remains relatively unexplored. Integration of the brain gene transcriptome with genetic data from exome and whole-genome sequencing studies have provided important insights into neurodevelopmental diseases (NDDs) (Li et al., 2018; Lin et al., 2015; Parikshak et al., 2013; Satterstrom et al., 2020; Willsey et al., 2013). Most of the recent work in this area has been focused on understanding the effect of mutations at gene-level resolution, whereas the isoform-specific effect of loss-of-function (LoF) mutations in the context of brain development has not yet been fully investigated. It is important to map LoF mutations to transcripts because protein isoforms encoded by different transcripts have drastically different protein interaction capabilities. As we have demonstrated previously, the majority of the isoforms encoded by the same gene share less than a half of their interacting partners in the human interactome network (Yang et al., 2016). This observation points to striking functional differences between splicing isoforms that are not accounted for by the majority of the existing gene-level studies. In addition, our recent work demonstrated that isoform-level networks of autism risk genes and copy number variants provide better resolution and depth around disease proteins (Corominas et al., 2014). To better understand how NDD risk mutations dysregulate normal brain development, we analyzed the temporal isoform transcriptome of the developing human brain using the BrainSpan RNA-seq dataset from the PsychEncode Consortium (Li et al., 2018), summarized to full-length isoforms, as described previously (Gandal et al., 2018). We identified hundreds of differentially expressed isoforms (DEIs) and dozens of isoform co-expression modules at brain developmental periods starting from fetal to adult. Compared with the gene-level transcriptome, the full-length isoform transcriptome provides more meaningful insights and paints a more complete picture of neurodevelopmental processes. Importantly, many DEIs and isoform co-expression modules were undetectable by gene-level analyses. Mapping autism spectrum disorder (ASD) risk mutations to DEIs revealed that ASD LoF-affected isoforms have higher prenatal expression, more frequently carry microexons, and are preferentially involved in key neuronal processes compared with non-affected isoforms. Furthermore, isoform co-expression modules with splicing-related and synaptic functions were enriched in LoF-affected isoforms, implicating these functions in NDDs. Finally, we experimentally tested the effect of several splice-site LoF mutations and demonstrated that they cause exon skipping to produce novel isoforms with altered biological properties. Our study makes a strong case for investigation of disease mutations at full-length isoform- rather than gene-level resolution.

RESULTS

Construction, quality control, and validation of the full-length isoform transcriptome of the developing human brain

To investigate global patterns of isoform expression across brain development, we analyzed the temporal full-length isoform transcriptome of the developing human brain (Figure S1). We used the BrainSpan RNA-seq dataset from the PsychEncode Consortium (Li et al., 2018), summarized to full-length isoforms, as described previously (Gandal et al., 2018). After rigorous quality control, which included sample outlier detection through weighted gene co-expression network analysis (WGCNA) (Oldham et al., 2012) and detection of confounding variables with surrogate variable analyses (Leek and Storey, 2007; Figures S2 and S3; STAR Methods), we obtained expression profiles for 100,754 unique full-length isoforms corresponding to 26,307 brain-expressed human genes, resulting in ~3.8 isoforms/gene. To experimentally validate the quality of the BrainSpan isoform transcriptome, we used quantitative RT-PCR (qRT-PCR) to estimate the relative expression difference of 44 unique isoforms of 32 genes between two independent RNA samples that were age-, sex-, and brain region-matched to the samples from BrainSpan (STAR Methods). The relative qRT-PCR isoform expression values in independent samples of the frontal lobe of a 22-week-old fetus and the cerebral cortex of a 27-year-old adult were compared with the values obtained from BrainSpan. We observed a positive correlation (R2 = 0.46) between experimental and BrainSpan-derived values for these isoforms despite using independent samples for validation (Table S1; Figure S4A; STAR Methods). This correlation was comparable with our previous validation values from other PsychEncode datasets (R2 = 0.48 for ASD and R2 = 0.51 for SCZ [schizophrenia]) (Gandal et al., 2018).

Differential isoform expression reveals distinct signals relative to differential gene expression

We recently demonstrated that isoform-level changes capture larger disease effects than gene-level changes in the context of three major psychiatric disorders (Gandal et al., 2018). Here we investigated the role of full-length isoform expression in the context of normal brain development. We performed differential expression analysis among all pairs of adjacent developmental periods as well as between pooled prenatal (P02–P07) and pooled postnatal (P08–P13) (PrePost) samples, yielding sets of differentially expressed genes (DEGs) and DEIs (STAR Methods; Tables S2 and S3). We observed the largest number of DEGs and DEIs in the P06/P07 (late mid-fetal/late fetal) and P07/P08 (late fetal/neonatal) developmental periods, supporting critical brain remodeling right before and after birth (Figure 1A). In P06/P07, 8.3% of genes and 20.3% of isoforms (converted to gene identifiers) were differentially expressed, whereas in P07/P08, 13.2% of genes and 20.4% of isoforms (converted to gene identifiers) were differentially expressed (Table S4). Overall, 48.4% of genes and 64.9% of isoforms (converted to gene identifiers) were differentially expressed between PrePost periods. These results indicate that the expression levels of over half of all isoforms change significantly between the prenatal and postnatal periods, suggesting profound transcriptomic remodeling during brain development.

Figure 1.

Differential gene and isoform expression analyses

(A) Number of significant DEGs and DEIs in the adjacent brain developmental periods and in prenatal versus postnatal periods. Isoform identifiers were summarized to gene identifiers for simplicity of comparison. Shaded areas represent identifiers shared between gene and isoform datasets, whereas unshaded bars represent genes (red) or isoforms (turquoise) unique to each dataset.

(B) Effect size (absolute log2 fold change) distribution of DEGs (red) and DEIs (turquoise) of combined data (top) and per developmental period (bottom). Average absolute effect sizes for genes and isoforms are marked by corresponding colored vertical lines, and differences were tested using two-sample t tests (*FDR < 0.05).

(C) Enrichment of cell types and literature-curated gene sets among genes and isoforms unique to each dataset (unshaded sets from a panel). Fisher’s exact test was used to calculate p values.

(D) Gene Ontology (GO) enrichment of DEGs and DEIs unique to each dataset (unshaded sets from a panel). Three adjacent periods are shown as examples (P04/P05, P07/P08, and P08/P09). DEIs are enriched in nervous system-related processes.

In addition to the greater fraction of DEIs between adjacent and PrePost periods, we also observed significantly increased effect sizes (absolute log2 fold changes) among DEIs compared with DEGs, overall and in nearly every developmental period (Figure 1B). This suggests that the levels of differential expression were more pronounced at the full-length isoform level relative to the gene level, consistent with previous results obtained from postmortem brains of individuals with NDDs (Gandal et al., 2018). Thus, the full-length isoform transcriptome is likely to provide additional information about brain development that is missed by the gene transcriptome. To better understand the biological basis of brain transcriptome differences at the gene and isoform levels, we performed enrichment analyses of unique non-overlapping DEGs and DEIs (lightly shaded subsets from Figure 1A) using cell type and literature-curated gene lists (Figure 1C). We used a published gene-level single-cell RNA-seq dataset (Zhong et al., 2018) for cell type enrichment and NDD-related gene lists for gene enrichment analyses in each period as well as in the PrePost dataset (STAR Methods). Overall, DEGs captured weaker enrichment signals than DEIs, potentially because of smaller DEG dataset sizes. Among cell types, DEIs were enriched significantly in excitatory neuron markers, especially in the prenatal to early childhood developmental periods (Fisher’s exact test, max Bonferroni-adjusted p < 1E–09, odds ratio [OR] = 2.39–3.29, min. 95% confidence interval [CI] = 2.07, max 95% CI = 3.98 for P02/P03–P09/P10) (Figure 1C, left panel). The DEIs from almost all periods were also enriched in post-synaptically expressed genes as well as FMRP (Fragile X Mental Retardation Protein) and CHD8 targets, with the most significant enrichment during P06/P07 (late mid-fetal/late fetal). Interestingly, the DEIs from only P04/P05 (early mid-fetal) were enriched in autism risk genes (Satterstrom et al., 2020) (Fisher’s exact test, Bonferroni-adjusted p = 0.005, OR = 3.88, 95% CI = 2.11–3.68) (Figure 1C, right panel), and this signal was not observed at the gene level. The mid-to-late fetal developmental period has been identified previously as critical for ASD pathogenesis (Parikshak et al., 2013; Willsey et al., 2013; Lin et al., 2015). The Gene Ontology (GO) enrichment analyses in P04/P05, P07/P08, and P08/P09 demonstrated stronger enrichment of DEI in neurodevelopment-relevant processes compared with DEGs (Figure 1D; Tables S5 and S6). For example, “neuron projection development,” “brain development,” and “nervous system development” were enriched among DEIs but not among DEGs. In contrast, DEGs were enriched in basic biological function-related processes, such as “mitotic cell cycle,” “metabolic processes,” “protein targeting,” and “localization.” This suggests that the full-length isoform transcriptome provides better biological insights into brain development than the gene transcriptome.

DEIs affected by autism LoF mutations have higher prenatal expression

To improve our understanding of the effect of NDD mutations on brain development, we mapped rare de novo LoF variants identified in the largest autism exome sequencing study (Satterstrom et al., 2020) to the full-length isoform transcriptome. 12,111 ASD case variants and 3,588 control variants were processed through Ensembl’s Variant Effect Predictor (VEP) and filtered for consequences likely to result in LoF of the affected gene or isoform (STAR Methods; Table S7). In total, 1,132 ASD case and 262 control variants fit this criterion, affecting 4,050 isoforms from 1,189 genes. At the isoform level, 3,128 isoforms were affected by ASD case variants (ASD LoF), 848 isoforms by control variants (control LoF), and 74 isoforms by both. We also defined a dataset of isoforms that were not affected by ASD variants (non-affected by ASD LoF) as an internal control. In every prenatal developmental period as well as in the pooled prenatal dataset, expression of the ASD LoF-affected isoforms was found to be significantly higher than control LoF-affected isoforms or non-affected by ASD LoF isoforms (Mann-Whitney test, Benjamini-Hochberg-adjusted p ≤ 0.05) (Figure 2A). This suggests that the potential decrease or loss of expression of these highly expressed isoforms in the normal prenatal human brain as a result of LoF mutation may contribute to ASD pathogenesis.

Figure 2.

Analyses of isoforms affected by rare de novo ASD LoF variants

(A) Mean expression of isoforms affected by case rare de novo ASD LoF variants (affected by ASD LoF) is significantly higher in prenatal periods compared with those affected by control LoF mutations or to non-affected isoforms.

(B) Proportion of protein-coding isoforms of high-risk ASD genes from Satterstrom et al. (2020), uniquely differentially expressed at isoform level, affected (red) or not affected (blue) by rare de novo ASD LoF variants.

(D) Expression profiles of affected and non-affected isoforms of four ASD risk genes across development, demonstrating higher prenatal expression of some affected isoforms.

(E) Schematic of alternatively regulated micro-exons (top panel), proportion of all brain-expressed genes with alternatively regulated microexons (bottom left), and proportion of all brain-expressed isoforms with alternatively regulated microexons (bottom right).

*p ≤ 0.1, **p ≤ 0.05, ***p ≤ 0.01.

We then selected genes with DEIs between adjacent development periods for which at least one isoform is ASD LoF-affected and at least one other isoform is non-affected by ASD LoF; 26 genes of 102 Satterstrom genes satisfied this criterion (Figure 2B). Hierarchical clustering of the isoforms from these genes based on expression values identified a prenatally expressed cluster consisting largely of the ASD LoF-affected isoforms (Figure 2C). A higher fraction of LoF-affected isoforms carried microexons (i.e., short exons 3–27 bp in length) compared with non-affected isoforms (permutation test, n = 1,000 permutations, p = 0.04) (Figure 2D), recapitulating previous findings at the gene level and in agreement with the important role of microexons in autism (Irimia et al., 2014; Li et al., 2015). The affected and non-affected isoforms of some genes (KMT2C, MBD5, and PTK7) had opposite developmental trajectories, whereas for other genes (GABRB3), the affected isoforms were highly expressed throughout brain development (Figure 2E). It is likely that a LoF mutation that affects a highly prenatally expressed isoform can severely disrupt early brain development and lead to NDD. In the more general case, we also found that ASD LoF-affected isoforms were more highly expressed throughout development (p < 0.0001, Wilcoxon test), although we had not observed any specific temporal patterns (Figure S4B). Overall, mapping of NDD risk mutations onto the full-length isoform transcriptome could help us to better understand their functional effects in the context of brain development.

Isoform co-expression modules capture distinct trajectories of brain development

To understand how brain development is regulated at the full-length isoform level, we carried out WGCNA and isoform co-expression network analysis (Langfelder and Horvath, 2008; STAR Methods). This analysis identified modules of genes and isoforms with highly correlated expression profiles across all BrainSpan samples. We identified a total of 8 gene and 55 isoform co-expression modules by analyzing the gene and isoform transcriptomes (Tables S8 and S9). Gene and isoform networks followed scale-free topology and had similarly low soft-thresholding betas (two for gene and three for isoform) selected to allow for module detection (Figures S4C and S4D). The hierarchical clustering of modules by eigengenes demonstrated that each gene co-expression module closely clustered with a corresponding isoform co-expression module (Figure 3A). Further characterization of these gene/isoform module pairs via GO annotations showed overlapping functions and pathways (Tables S10 and S11). For example, gene module gM2 and isoform module iM2 were enriched for GO terms related to synaptic transmission. This indicates that the isoform co-expression network recapitulates functional aspects of the gene co-expression network.

Figure 3.

Gene and isoform co-expression analyses

(A) Association of gene and isoform co-expression modules clustered by module eigengene with developmental periods (top). Linear regression beta coefficients were calculated using linear mixed-effects models. Module enrichment in cell type and literature curated gene sets (bottom) was calculated using Fisher’s exact test.

(B) Module eigengene expression profiles across brain development for modules most significantly associated with each cell type: astrocytes, iM25; oligodendrocytes, iM6; microglia, iM36; NPCs, iM10; excitatory neurons, iM2; interneurons, iM17.

(D) Gene (top panel) and isoform (bottom panel) co-expression modules affected by case and control ASD LoF mutations. Normalized effect rate per module is shown. Significance was calculated by permutation test (1,000 permutations, *FDR ≤ 0.05).

(E) Gene-level and isoform-level co-expressed protein interaction networks for the KMT2A gene from the gM1 and iM1 turquoise modules. Only edges in the top 10% of expression PCCs that are also supported by gene-level protein interactions are retained.

This hierarchical clustering of modules by eigengenes also revealed isoform modules that are completely unique relative to the gene-level modules (Figure 3A). We further performed a Pearson correlation analysis between the isoform modules and the gene modules by their respective eigengene expression and identified isoform modules with low Pearson correlation coefficients (PCCs) with any gene module (Figure S4E). This set of isoform modules with unique expression trajectories includes those related to chemical synaptic transmission and neurogenesis (iM18, PCC = 0.13); kinase activity (iM51, PCC = 0.27); cell adhesion, neuron fate, and cerebral cortex regionalization (iM40, PCC = 0.34); behavior, cognition, learning, and memory (iM13, PCC = 0.38); and chromatin, neuron organization, and migration (iM31, PCC = 0.3) as examples. These results identify co-expression modules with unique developmental trajectories and biological functions that were not detected by gene co-expression analyses. To relate each co-expression module with brain developmental periods, we calculated module-period associations using linear mixed effects models (STAR Methods). We found modules that were significantly associated with several developmental periods (Figure 3A, top panel); iM1 was significantly associated with prenatal periods P02 (false discovery rate [FDR]-adjusted p = 0.009), P03 (FDR-adjusted p = 0.003), and P04 (FDR-adjusted p = 0.008), whereas iM10 and iM39 were associated with P02 (FDR-adjusted p = 6.59E-04 and FDR-adjusted p = 0.026, respectively). Functional GO analyses of these modules demonstrated that iM1 was enriched in splicing functions and iM10 in mitosis and cell cycle-related processes, whereas iM39 was enriched in embryonic development; all functions were related to early fetal brain development (Table S11). The developmental trajectory of the iM10 module has a clear peak at early developmental periods that levels down after birth, and both modules have unique GO functions along with intermediate to low correlation with gene modules (Figure S4E). Several other modules (gM4, iM35, iM7, and iM38) were strongly associated with late fetal period P07 (gM4: FDR-adjusted p = 1.78E–09; iM35: FDR-adjusted p = 8.23E–04; iM7: FDR-adjusted p = 3.83E–04; iM38: FDR-adjusted p = 0.009). These modules were enriched for angiogenesis and extracellular matrix organization GO functions (Table S11) and had high PCC and overlapping GO functions with gene modules (Figure S4E). The analysis of cell-type markers extracted from single-cell sequencing studies (STAR Methods) identified modules that were significantly enriched in specific cell types (Figure 3A, center panel). For example, iM10, which was associated with a very early P02 period, was also enriched in neuroprogenitors (NPCs), the cells that give rise to other neuronal cell populations and are often found very early in brain development. Likewise, iM2 was primarily associated with postnatal periods and strongly enriched in excitatory neurons, a mature neuronal population. Interestingly, the cluster of modules that was strongly associated with the late fetal P07 period (gM4, iM35, iM7, and iM38), was enriched in microglia, or innate immune cells of the brain, that peak around late mid-fetal to late fetal development. Furthermore, isoform module eigengene trajectories captured the appropriate signals from each cell type, with NPCs steadily decreasing and neuronal cell types increasing from prenatal to postnatal brain development (Figure 3B). To ensure that the observed cell-type-specific signatures are driven by isoform expression rather than cellular composition, we performed decomposition analyses of bulk BrainSpan RNA-seq data with fetal brain single-cell RNA-seq from two previous studies (Polioudakis et al., 2019; Zhong et al., 2018). We observed that cell type composition only explained a relatively small portion of total variance compared with sample age (i.e., developmental period) (Figures S5A and S5B). Analysis of curated gene lists in the context of co-expression modules identified gM1/iM1 as being enriched in ASD risk genes and CHD8 target and functionally constrained and mutation-intolerant (pLI > 0.99) genes (Figure 3A, bottom panel). The same modules were significantly associated with prenatal periods and were enriched in RNA processing and splicing GO functions (Figure 3C, left panel). Another module enriched in ASD risk genes was iM19, annotated with chromatin and histone-related GO functions. This is consistent with previous observations regarding enrichment of chromatin modifier genes among ASD risk genes (De Rubeis et al., 2014). Analyses of isoform co-expression modules broaden our knowledge of the developing human brain at the full-length isoform transcriptome level.

LoF-affected co-expression modules point to dysregulation of RNA splicing and synaptic organization

We next investigated enrichment of rare de novo ASD variants from cases and controls (Satterstrom et al., 2020), and identified co-expression modules that were significantly affected by LoF case but not control mutations (Table S12; STAR Methods). We observed three modules significantly affected by case ASD variants: one gene module (gM1) and two isoform modules (iM1 and iM30) (Figure 3D). Unsurprisingly, gM1 and iM1 clustered together and were enriched in similar GO functions related to RNA processing and splicing, including non-coding RNA splicing (Figure 3C). This agrees with the already demonstrated crucial role of splicing dysregulation in ASD (Gandal et al., 2018; Parikshak et al., 2016). Functional enrichment of isoform co-expression module iM30 pointed to dysregulation of synapse organization, dendrite development, and neuronal projection pathways (Figure 3C), which are strongly implicated in ASD (Iakoucheva et al., 2019; Pinto et al., 2014). The highest PCC of iM30 with any gene module is 0.62, and its developmental trajectory and GO functions are distinct compared with gene modules (Figure S4E). Thus, isoform modules reflect processes implicated previously in ASD and point to specific isoforms (rather than genes) that can contribute to this dysregulation. To demonstrate how isoform co-expression modules could be useful for future studies, we built isoform co-expressed protein-protein interaction (PPI) networks for the gM1 and iM1 modules (Figure S5C). The isoform network was focused on ASD risk genes that had at least one isoform affected by an LoF mutation, and the edges that had gene-level PPI information (because of scarcity of isoform-level PPIs) were filtered for the top 10% PCC (STAR Methods). Clearly, gM1 had fewer connections than iM1, and iM1 highlighted some interesting isoform co-expressed PPIs that were not detectable from the gene-level network. For example, 9 genes from this module (ARID1B, CHD8, KMD5B, KMT2A, MED13L, PCM1, PHF12, POGZ, and TCF4) had at least one ASD LoF-affected isoform and at least one that was not affected by mutation. These isoforms were co-expressed and interacted with different partner isoforms. For example, ASD LoF-affected and non-affected isoforms of the KMT2A gene had shared as well as unique protein interaction partners (Figure 3E). This could lead to different networks being disrupted as a result of ASD mutation, and these networks were not observed at the gene level, with only one KMT2A partner (CREBBP) in the gene network. Another interesting observation from co-expressed PPI networks was that LoF-affected isoforms tended to have higher correlation with corresponding partners than non-affected isoforms (Mann-Whitney test, p = 1.53E–05), suggesting potentially greater functional effects on networks.

De novo splice-site mutations of NDD risk genes cause exon skipping, partial intron retention, or have no effect on isoforms

One type of LoF mutation is mutations that affect splice sites directly. Here we experimentally investigated the effect of de novo splice site mutations identified by exome sequencing studies in four NDD risk genes (DYRK1A, SCN2A, DLG2, and CELF2) to better understand their functional effects. All highly prenatally expressed isoforms of these genes were found in iM1. We used an exon trapping assay (STAR Methods) to test the following de novo splice site mutations: SCN2A (chromosome 2 [chr2]:166187838, A:G, acceptor site; Fromer et al., 2014), DYRK1A (chr21: 38865466, G:A, donor site; O’Roak et al., 2012), DLG2 (chr11: 83194295, G:A, donor site; Fromer et al., 2014), and CELF2 (chr10: 11356223, T:C, donor site; Xu et al., 2011). The mutation in SCN2A caused out-of-frame exon skipping and potential inclusion of 30 new amino acids into the translated protein before ending with a premature stop codon that most likely will result in nonsense-mediated decay (NMD) (Figure 4A). In contrast, the mutation in DYRK1A caused in-frame exon skipping, potentially producing a different variant of the same protein, and, thus, is expected to have milder or perhaps no functional effects (Figure 4B). In the case of DLG2, the mutation affected a splice site adjacent to exon 5, which is alternatively spliced in the wild-type (WT) isoforms (Figure 4C). We constructed a minigene that includes exon 5 together with the preceding exon 4 and observed that exon 5 was constitutively spliced out from our construct independent of the presence of mutation. However, the mutation caused partial (i.e., 65-bp) intron inclusion downstream of exon 4. At the translational level, this mutation would likely result in a truncated protein one residue after the end of exon 4 because of a premature stop codon. Finally, the CELF2 mutation affected an alternative splice site that also mapped to an exonic region of another alternatively spliced isoform. When cloned into the exon-trapping vector, the isoform generated from the WT minigene included the isoform carrying a longer exon with mutation (Figure 4D). Thus, after introducing the mutation, no difference between WT and mutant constructs was observed. This is not surprising, given the fact that the splice site mutation behaves like an exonic missense mutation in the isoform predominantly expressed from our construct. These results suggest that mutations could affect different isoforms of the same gene by different mechanisms; i.e., splice site mutation in one isoform could represent a missense mutation in another isoform.

Figure 4.

Functional effect of the de novo splice site mutations from individuals with NDDs

(A–D) Minigene assays demonstrate the effect of splice-site mutations in four genes: SCN2A (A), DYRK1A (B), DLG2 (C), and CELF2 (D). A schematic of the cloned minigenes, the expected splicing patterns, and the effects of the mutations are shown below the gel image. Numbers denote base pairs. M, molecular marker; E, exon.

(E) Expression profiles across brain development of the brain-expressed isoforms transcribed by these four genes, annotated with module memberships; highly overlapping expression profiles are unlabeled for readability.

Further analysis of expression profiles of the brain-expressed isoforms transcribed by these genes (Figure 4E) suggested that highly prenatally expressed isoforms (SCN2A-201, DYRK1A-001, DLG2-016, and CELF2-201) were most likely targets for the “pathogenic” effect of mutations. Furthermore, given distinct co-expressed PPI partners of affected versus non-affected isoforms (Figure S5D) in most cases, the effect of mutation would be propagated onto different networks, affecting different signaling pathways. Our experiments showcased different scenarios of the effect of splice site mutations and confirmed the need to investigate their functional effects at isoform- rather than gene-level resolution.

The splice site mutation in BTRC reduces its translational efficiency

As shown above, mutations can affect different protein interaction networks depending on the splicing isoform they affect and whether the affected isoform is expressed at specific periods of brain development. Next we investigated in greater detail how specific mutations mapped to different isoforms may disrupt downstream signaling pathways. For this, we selected three full-length isoforms (BTRC-001, BTRC-002, and BTRC-003) of an ASD risk gene, BTRC (beta-transducin repeat containing, also known as β-TrCP or FBXW1A) (Ruzzo et al., 2019), based on their availability from our previous study (Yang et al., 2016; Figure 5A). Two de novo mutations, one missense (chr10:103285935,G-A; Ruzzo et al., 2019) and one splice site (chr10: 103221816, G:A, donor site; De Rubeis et al., 2014), were identified in individuals with ASD and none in controls, making BTRC one of 69 high-confidence ASD risk genes with genome-wide significance (0.05 < FDR ≤ 0.1; Ruzzo et al., 2019). We demonstrated that the splice site BTRC mutation caused in-frame exon 4 (78 bp) skipping in the exon trapping assay (Figure 5B). To further test the effect of this mutation on different BTRC transcripts, we generated additional constructs by inserting abridged introns surrounding exon 4 into the coding sequences (CDSs) of two isoforms, BTRC-001 and BTRC-002 (Figure 5C; STAR Methods). The third isoform, BTRC-003, did not carry exon 4, and its structure and size are identical to BTRC-001 after exon 4 is skipped. We also generated the mutant constructs BTRC-001Mut and BTRC-002Mut carrying the mutation in the abridged intron (Figure 5A). RT-PCR following exon trapping assays on the full-length CDS as well as WT and mutant constructs with abridged introns confirmed the correct sizes of all constructs and validated the exon skipping event because of splice site mutation (Figure 5C). The expression levels of the mutant constructs were comparable with the WT. Furthermore, western blot confirmed the expected sizes of the protein products from the WT and mutant constructs (Figure 5D). The splice site mutation significantly reduced the amount of protein produced from mutant isoforms, suggesting their decreased translational efficiency (Figure 5E). Higher amounts of protein products from all constructs with abridged introns compared with CDSs were consistent with previous observations of increased translational efficiency of RNAs produced by splicing compared with their intron-less counterparts (Diem et al., 2007). Further, BTRC-001 and BTRC-002 are highly expressed relative to the non-affected BTRC-003 (Figure 5F) and are co-expressed and interact with non-overlapping sets of protein partners in the co-expressed PPI networks (Figure S5E). This suggests that mutations in different isoforms could affect different cellular networks.

Figure 5.

The de novo autism splice-site mutation causes exon skipping in BTRC isoforms and reduces their translational efficiency

(A) The exon structure of three splicing isoforms of the BTRC gene showing positions of the cloned abridged introns and the splice-site mutation; numbers denote base pairs.

(B) Minigene assays demonstrate exon 4 skipping as a result of the splice-site mutation. The assays show the results of RT-PCR performed using total RNA from HeLa cells transfected with BTRC minigene constructs; numbers denote base pairs.

(C) Splicing assays with full-length constructs carrying abridged introns, confirm exon skipping observed in the minigene assays.

(D) Immunoblot (IB) of whole-cell lysates of HeLa cells transfected with different BTRC minigene constructs and an empty vector, as indicated. Membranes were probed to observe BTRC overexpression and investigate expression of p-β-catenin, Cul1, and SKP1. β-Actin was used as a loading control. IP was performed with the antibody recognizing the V5 tag, and proteins were detected by IB with p-β-catenin, Cul1, SKP1, and V5 antibodies. The splice-site mutation causes reduced translational efficiency of the BTRC_1Mut and BTRC_2Mut mutant isoforms compared with their WT counterparts. A schematic of the Skp1-Cul1-BTRC ubiquitin protein ligase complex is shown at the bottom.

(E) Quantification of protein pull-down with V5 IP using ImageJ software. The band intensity values were normalized to WT expression levels. Error bars represent 95% CI based on 3 independent experiments. On average, a 40% reduction of BTRC protein expression is observed as a result of a mutation. Consequently, a reduction of the corresponding BTRC binding partners (p-β-catenin, Cul1, and SKP1) is also observed.

(F) Expression profiles of brain-expressed BTRC isoforms show higher expression of ASD-affected BTRC-001 and BTRC-002.

Numbers denote base pairs (A–C) or kilodaltons (D). *p < 0.05, **p ≤ 0.01, ***p ≤ 0.001.

Next we investigated binding properties of all isoforms using co-immunoprecipitation (coIP) (Figure 5D). The BTRC gene encodes a protein of the F box family and is a component of the SCF (Skp1-Cul1-F box protein) E3 ubiquitin-protein ligase complex. One of the well-known substrates of this complex is β-catenin (CTNNB1). The SCF complex ubiquitinates and regulates degradation of β-catenin, an essential component of the Wnt signaling pathway (Winston et al., 1999). Wnt plays key roles in cell patterning, proliferation, polarity, and differentiation during embryonic development of the nervous system (Ciani and Salinas, 2005), and it has been implicated consistently in ASD (Urresti et al., 2021; Iakoucheva et al., 2019; Kwan et al., 2016). β-Catenin and Cul1 carry de novo mutations identified in individuals with NDD (Satterstrom et al., 2020). The interaction of BTRC with its partners Cul1, Skp1, and β-catenin demonstrated reduced binding with mutant BTRC, potentially suggesting a shortage of the SCF ligase complexes (Figures 5D and 5E). In agreement with previous observations, we found that BTRC only binds to the phosphorylated form of β-catenin (Winston et al., 1999). This suggests that the amount of protein complex is strongly dependent on the availability of BTRC protein, which is reduced significantly because of splice-site mutation. Thus, our results indicate that the BTRC splice-site mutation causes exon skipping in BTRC isoforms and reduces the translational efficiency of the resulting protein product. This, in turn, decreases the amount of SCF protein ligase complexes available for β-catenin ubiquitination. We hypothesize that this may lead to impaired degradation of β-catenin, its cellular accumulation, and upregulation of Wnt signaling as a result of this ASD risk mutation. Further studies in neuronal cells are needed to test this hypothesis.

DISCUSSION

Recent large-scale whole-exome and whole-genome sequencing studies have greatly facilitated discovery of the genetic causes of neurodevelopmental disorders. One of the bottlenecks in translating these findings into molecular mechanisms is our limited understanding of the transcriptional and translational programs governing brain development. The brain is one of the most complex human organs with the highest number of alternatively spliced events (Melé et al., 2015; Raj and Blencowe, 2015). Thus, knowledge regarding its splicing repertoire is crucial for future translational studies of brain diseases. Several earlier studies have addressed the effects of NDD mutations at exon-level resolution. For example, Uddin et al. (2014) identified highly expressed critical exons with de novo ASD mutations that were enriched in cases compared with controls. A more recent study correlated exons affected by ASD LoF mutations with phenotypes of affected individuals, and found that those with mutations in the same exon had more similar phenotypes (Chiang et al., 2021). However, none of the previous studies investigated the effects of LoF mutations in the context of the full-length isoform transcriptome. We demonstrated previously that integration of genetic data with isoform-level co-expression and protein interaction networks is crucial for improving our understanding of the molecular mechanisms of neurodevelopmental disorders (Corominas et al., 2014; Gandal et al., 2018; Lin et al., 2015, 2017). The importance of isoform-level networks was further emphasized by our earlier observation that protein products encoded by different splicing isoforms of the same gene share less than half of their interaction partners (Yang et al., 2016). These studies underscore the importance of the brain isoform transcriptome for future studies of NDDs. Here we analyzed the full-length isoform transcriptome of the developing brain and demonstrated its utility for investigating LoF mutations implicated in autism. We demonstrated that brain differential isoform expression analysis identified a fairly large set of DEIs that were not detected by the gene-level analysis. Furthermore, DEIs captured more relevant functions than DEGs in the context of brain development. Processes such as neuron projection development, axon development, and head, brain, and nervous system development were primarily supported by DEIs uniquely identified only through analysis of the isoform transcriptome. By mapping LoF mutations from autism cases and controls onto the full-length isoform transcriptome, we found that ASD LoF-affected isoforms had significantly higher prenatal expression than non-affected isoforms or isoforms affected by control mutations. The expression trajectories of affected and non-affected isoforms across brain development were remarkably different for some of the autism risk genes. For example, two LoF-affected isoforms of KMT2C histone lysine methyltransferase, a high-confidence ASD risk gene (De Rubeis et al., 2014; Iossifov et al., 2015; O’Roak et al., 2011) were highly expressed prenatally and had opposing temporal trajectories compared with non-affected isoforms that were highly expressed postnatally (Figure 2D). Similar developmental trajectories were observed for some of the LoF-affected isoforms of PTK7 and MBD5 genes. This demonstrates that future studies of these and other genes with similar properties should focus on affected isoforms with high prenatal expression rather than on all available isoforms because they may be more relevant to brain development. In general, we consistently gain additional information from the full-length isoform transcriptome across various types of analyses. At the level of co-expression, isoform co-expression modules provide important insights into neurodevelopment and on how it may be disrupted by autism mutations. Importantly, many isoform modules, especially those with a low PCC with gene modules, have unique and distinct developmental trajectories and biological functions that are not identified through gene-level analyses (Figure S4E). Many of these distinct isoform modules were annotated with important and ASD-relevant functions, such as translation (iM21), protein/histone demethylation (iM47), chromatin organization and neuron migration (iM31), and cell adhesion, neuron fate, and cerebral cortex regionalization (iM40). Other isoform modules that have a high PCC with gene modules could also provide important insights. The isoform module iM1 was significantly enriched in isoforms affected by case LoF mutations (Figure 3D). Functionally, it was enriched in RNA splicing and processing pathways that have been implicated previously in ASD (Parikshak et al., 2016). iM1 was significantly associated with prenatal developmental periods; enriched in interneurons, microglia, and NPCs; and enriched in CHD8 target genes and mutation-intolerant genes and was also one of a few modules enriched in ASD risk genes. Given all of these lines of evidence, it clearly is a very important module for further investigation. Using isoforms from the iM1 module, we experimentally investigated the functional effects of splice site-disrupting LoF mutations in five genes. The results demonstrated exon skipping or disruption of normal splicing patterns but not in all cases. A more detailed analysis at the isoform level suggested that not all isoforms were affected by mutations. For example, at least one known isoform of the BTRC gene did not carry an exon with mutation and therefore was not expected to be affected by it. We next demonstrated that a BTRC mutation decreased translational efficiency of the affected isoforms because a lower amount of the resulting protein was observed (Figure 5D). This, in turn, lead to reduced interaction between BTRC and its protein partners, potentially disrupting Wnt signaling (Figure 5E). Because β-catenin is a substrate of the BTRC-Cul1-Skp1 ubiquitin ligase complex, shortage of this complex may lead to impaired ubiquitination and degradation of β-catenin and its neuronal accumulation. Interestingly, transgenic mice overexpressing β-catenin have enlarged forebrains, arrest of neuronal migration, and dramatic disorganization of the layering of the cerebral cortex (Chenn and Walsh, 2002). It would be interesting to investigate whether individuals carrying the de novo BTRC splice-site mutation have similar brain abnormalities. Typically, mutations affecting essential splice sites are automatically classified as LoF mutations when considering gene-level analyses. Here we demonstrate that this is not always the case and that splice-site mutations affecting one isoform of a gene may serve as a missense mutation in another isoform that carries a longer exon spanning the splice site, like in the case of CELF2 (Figure 4D). Thus, depending on where, when, at what level, and which isoform of the gene is expressed, the functional effect of the same mutation may differ dramatically. In addition, the mutation could also be “silent” when the isoform is highly expressed but does not carry an exon affected by a specific mutation. This suggests that the effects of mutations should be investigated at isoform- rather than gene-level resolution, and the expression levels of splicing isoforms in disease-relevant tissues should be taken into consideration to better guide hypotheses regarding potential mechanisms of disease and future treatments. Although our study provided further insights into functional effects of NDD mutations at the level of the full-length isoform transcriptome, it has several limitations that warrant discussion. The BrainSpan developmental brain transcriptome was sequenced using short-read sequencing technologies; therefore, full-length isoform assignment, especially for low-abundance transcripts, is less reliable compared to the long-read sequencing. Long-read RNA-seq technologies could offer large improvements in the accuracy of full-length isoform identification and discovery. When it becomes less cost prohibitive, creating a long-read-sequenced isoform transcriptome of the developing human brain would be invaluable for the scientific community. Our study directly demonstrates the value of using full-length transcriptome data for future human disease studies and for investigating functional effects of disease-associated mutations.

STAR★METHODS

RESOURCE AVAILABILITY

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Lilia M lakoucheva (lilyak@ucsd.edu).

Materials availability

This study did not generate new unique reagents.

Data and code availability

Raw isoform-level RNA sequencing (RNA-seq) BrainSpan data are available at PsychENCODE Capstone Data Collection, https://doi.org/10.7303/syn12080241. The processed summary-level BrainSpan data are available at http://Resource.PsychENCODE.org. The code used for isoform RNA-seq data analysis generated during this study is available from GitHub (https://github.com/IakouchevaLab/Isoform_BrainSpan), https://doi.org/10.5281/zenodo.5091071. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Brain transcriptome data

RNA-Seq datasets quantified at the gene and isoform levels were downloaded from PsychENCODE Knowledge Portal, PEC Capstone Collection, Synapse ID: syn8466658 (https://www.synapse.org/#!Synapse:syn12080241). RNA-Seq from post-mortem brain tissue of 57 donors aged between 8 weeks post-conception to 40 years, across a number of different brain regions, for a total of 606 samples, has been carried out as previously described (Li et al., 2018; Figure S1B).

Cell lines

HeLa cells were cultured using DMEM media (GIBCO) supplemented with 10% FBS (GIBCO) and 1x penicillin/streptomycin (GIBCO). Cells were maintained in 10 cm dishes at 37°C and 5% CO2 until 60%–80% confluency reached. When confluent, cells were subcultured in 1:10 – 1:20 dilution. The cells from passage 3-20 were used for the lipofection experiments.

METHOD DETAILS

R version 3.6.0 was used throughout this analysis. Downstream bioinformatics analysis is outlined in Figure S1A.

Processing of RNA-Seq gene and transcript BrainSpan data

RNA-seq data processing was performed as described (Gandal et al., 2018). Briefly, FASTQs were trimmed for adaptor sequence and low base call quality (Phred score < 30 at ends) using cutadapt (v1.12). Trimmed reads were then aligned to the GRCH37.p13 (hg19) reference genome via STAR (2.4.2a) using comprehensive gene annotations from Gencode (v19). BAM files were produced in both genomic and transcriptome coordinates and sorted using samtools (v1.3). Gene and isoform-level quantifications were calculated using RSEM (v1.2.29). Quality control metrics were calculated using RNA-SeQC (v1.1.8), featureCounts (v1.5.1), PicardTools (v1.128), and samtools (v1.3.1). Subsequently, TPM matrices for both gene and transcript datasets were filtered for TPM ≥ 0.1 in at least 25% of samples, yielding a total of 100,754 isoforms corresponding to 26,307 genes. Sample connectivity analysis was performed to detect sample outliers as previously described (Oldham et al., 2012). In brief, biweight mid-correlation was calculated among sample expression vectors in both filtered datasets. These values were converted into connectivity Z-scores. 55 samples were identified as having sample connectivity Z-scores ≤ −2, and were removed from downstream analysis, resulting in 551 final samples. Surrogate variable analysis (SVA) was performed to remove latent batch effects in the data, taking into consideration age, brain region, sex, ethnicity, and study site (Leek, 2014; Leek and Storey, 2007). The number of surrogate variables was chosen to minimize apparent batch effects while avoiding overfitting based on evidence from principal components analysis and relative log expression (Figures S2 and S3). 16 surrogate variables were found to be sufficient for downstream analysis of both gene and transcript data.

Validation of isoform expression with RT-qPCR

RT-qPCR was used to estimate the relative expression difference of 44 unique isoforms of 32 genes between two independent RNA samples that were age, sex and brain region-matched to the samples from the BrainSpan; these results were then compared to the computationally-assigned BrainSpan values (Table S1; Figure S4A). RNA from a frontal lobe tissue sample of a 22 weeks old male (fetal brain), and RNA from cerebral cortex tissue sample of a 27 years old male (adult brain) (AMSBIO, UK), corresponding to P06 (late mid-fetal) and P13 (young adult) in the BrainSpan data, was used. The isoforms were selected to encompass all possible ranges of differential expression - from high differential expression (high log2FC, e.g., > 4), to low or no differential expression (low log2FC, e.g., < 1) between the same isoforms from two samples from different time points. To further discriminate between isoform expression and absolute gene expression, isoforms were selected based on the presence of unique exons, for which the primers were designed (Table S1). The majority of selected isoforms were expressed with > 1 TPM values to ensure that they were detectable by the RT-qPCR. However, we also included 12 isoform pairs with at least one isoform being lowly expressed (< 1 TPM) to ensure that validation covers both highly and lowly expressed isoforms. This resulted in a total of 44 unique isoforms from 32 genes. Primers were designed using unique exon-exon junctions, specific for each of the selected isoforms. 3 μg of RNA using SuperScript II Kit (Invitrogen) were reverse transcribed to cDNA, following manufacturer’s instructions. Then, the cDNA was diluted ten times to use as a template for the RT-qPCR reaction. SYBR Green II Master Mix (Invitrogen) was used for the RT-qPCR reaction, performed in a CFX Connect 96X Thermal Cycler, using standard parameters for SYBR Green. Relative expression between each isoform in the two samples was calculated by normalizing each expression value against two housekeeping genes (RPL28 and MRSP36) as control using QIAGEN control primers, and the ΔΔt method was applied using the CFX Manager Software. Comparison of these relative expression values against the BrainSpan computational expression values resulted in positive correlation (Figure S4A).

Differential gene and isoform expression analysis

Differential gene and isoform expression analysis was performed using the limma (v3.40.6) R package (Ritchie et al., 2015). Relevant covariates (brain region, sex, ethnicity, and study site) and surrogate variables were included in the linear model as fixed effects. The duplicateCorrelation function was used to fit the donor identifier as a random effect to account for the nested expression measurements due to multiple brain regions derived from the same donor. Genes and isoforms with an absolute fold change of ≥ 1.5 and FDR-adjusted p value of ≤ 0.05 between adjacent developmental periods, or between prenatal and postnatal periods (PrePost) were defined as significantly differentially expressed.

Cell type and literature curated gene sets enrichment analyses

Fisher’s exact tests were performed on gene lists and isoform lists (converted to gene identifiers) against curated gene lists: mutational constraint genes (Mut. Const. Genes) (Samocha et al., 2014), FMRP target genes (Darnell et al., 2011), high-risk ASD genes (Satterstrom ASD) (Satterstrom et al., 2020), CHD8 target genes (Wilkinson et al., 2015), synaptic genes (Synaptome DB) (Pirooznia et al., 2012), genes intolerant to mutations (Pli_0.99) (Lek et al., 2016), syndromic and rank 1 and 2 ASD risk genes (SFARI_S_1_2) (https://gene.sfari.org/). Cell types were extracted from a recent gene-level single cell sequencing study (Zhong et al., 2018).

Gene Ontology (GO) functional enrichment analyses

Functional enrichment analysis was performed using the gprofiler2 v0.1.5 R package (Raudvere et al., 2019). Ensembl gene or isoform (converted to gene) identifiers were used to test for enrichment in two Gene Ontology categories, Biological Processes (BP) and Molecular Functions (MF). Enrichment p values were Benjamini-Hochberg corrected for multiple hypothesis testing, and overly general terms (i.e., terms with more than 1,000 members) were filtered out.

Rare de novo ASD loss-of-function variants

Rare de novo variant data was downloaded from Satterstrom et al. (2020), and was processed using Ensembl’s Variant Effect Predictor v96 (McLaren et al., 2016) using human genome version GRCh37 to annotate variants for predicted functional consequences. Loss-of-function (LoF) variants were defined as those impacting essential splice donor/acceptor sites, frameshift insertions or deletions, predicted start losses, and predicted stop gains.

Weighted gene/isoform co-expression network analyses (WGCNA)

Co-expression networks were constructed using the WGCNA (v1.68) R package (Langfelder and Horvath, 2008). Relevant covariates (brain region, sex, ethnicity, and study site) and surrogate variables were first regressed out of both gene and isoform expression datasets using linear mixed effects models. Each transformed expression matrix was then tested for scale-free topology to estimate a soft-thresholding power, beta. We used beta = 2 for gene co-expression and beta = 3 for isoform co-expression networks, and signed networks were constructed blockwise using a single block for the gene network and three blocks for the isoform network with deepSplit = 2 and minModuleSize = 20 for module detection in both networks. Both networks followed scale-free topology (Figures S4C and S4D).

Co-expression module characterization

Module eigengene and developmental period association analysis was performed using linear mixed effects models, considering fixed effects (age, brain region, sex, ethnicity, and study site) and random donor effects to account for multiple brain region samples per donor. Module enrichment analysis was performed using Fisher’s exact tests against curated gene lists; isoform identifiers within modules were converted to gene identifiers for this purpose. Gene Ontology functional enrichment analysis for modules was performed using gprofiler2 with queries ordered by module membership (kME) rank.

Decomposition of bulk BrainSpan data with fetal brain single cell RNA-seq

Bisque (Jew et al., 2020) was used to decompose bulk RNA-seq BrainSpan data to estimate the cell type proportions. Single-cell RNA-seq data of human fetal prefrontal cortex from two studies (Polioudakis et al., 2019; Zhong et al., 2018) was used to compute the reference profile. BrainSpan data was compared to the reference profile to estimate the cell type proportion in each sample. Variance partition (Hoffman and Schadt, 2016) was used to investigate cell type proportions (Figures S5A and S5B).

Variant impact analysis for co-expression modules

We quantified the impact of rare de novo LoF case and control variants on co-expression modules by mapping variants to the members of each module. To account for gene and isoform sizes in each module, we calculated the non-overlapping length, in base pairs, of the members of each module and normalized the module impact rates by these genomic coverages. Further, to control for total mutation rate, we scaled the impact rate by the total number of variants. We further added a scaling factor of 1,000,000 to make the numbers more manageable, in a similar fashion to a TPM calculation. Differences in the impact rates between case and control variants for each module were tested using permutation test with 1,000 iterations of module member resampling (controlling for length and GC content, ± 10% for each attribute) (Table S12). Modules impacted by significantly more case mutations were identified.

Integration of protein-protein interaction networks with co-expression modules

Gene-level PPI network data was manually curated and filtered for physical and co-complex interactions extracted from Bioplex (Huttlin et al., 2015), HPRD (Keshava Prasad et al., 2009), Inweb (Li et al., 2017), HINT (Das and Yu, 2012), BioGRID (Chatr-Aryamontri et al., 2017), GeneMANIA (Zuberi et al., 2013), STRING (Szklarczyk et al., 2017), and CORUM (Ruepp et al., 2010). To build co-expressed PPI networks, gene and isoform modules were first filtered for connections (i.e., edges) supported by the gene-level PPIs; isoform edges were retained if corresponding gene edges were supported by PPIs. Subsequently, networks were filtered to only retain edges supported by the top 10% of co-expression PCCs between genes or isoforms. Genes or isoforms without any connections were removed from the networks.

Minigenes cloning

The following genes impacted by rare de novo splice site mutations identified in patients with NDDs were selected for the experiments: SNC2A (chr2:166187838, A:G, acceptor site) (Fromer et al., 2014); DYRK1A (chr21: 38865466, G:A, donor site) (O’Roak et al., 2012), CELF2 (chr10: 11356223, T:C, donor site) (Xu et al., 2011), DLG2 (chr11: 83194295, G:A, donor site) (Fromer et al., 2014) and BTRC (chr10: 103221816, G:A, donor site) (De Rubeis et al., 2014). The exons of these genes that are likely impacted by splice site mutations, together with the ~1kb of their flanking intronic sequence, were cloned. The constructs were cloned into pDESTSplice exon trapping expression vector (Kishore et al., 2008). The site-directed mutagenesis by two-step stich PCR was performed to introduce the mutation affecting the splice site. The minigenes were generated by PCR-amplifying the desired sequences from genomic DNA (Clontech). Primers were designed for each minigene, and attB sites were added at the 5′ end of the primers. The sequences of the primers were as follows: (1) SCN2A; Fw: GGAAGCTATGTTTAGCCAGGATACATTTGG, Rv: CCAGATGATGTCCCCTCCCTACATAGTCC; (2) DYRK1A: Fw: GTTGGGAAAATTTCCCCCTATTTAAGC, Rv: CCCAGAGGCTTAATAAAGTATGGACC; (3) CELF2: Fw: GGAGTTGGAATGACAGACGTTCACA TGC, Rv: CCGCTGTGGGCTGAGGATCAGTTTCC; (4) DLG2: Fw: GAGGTTCAGAGACATTCAATTCCC, Rv: CTTGATGCTGTCCAGATAATGC; (5) BTRC: Fw: GGGCCTCAGAATGACACAGTACG, Rv: GAACTTGCGTTTCTTGTTTTTGCC. After PCR amplification, amplicons were loaded in a 1% low EEO agarose gel (G-BioSciences) and purified using the QIAquick Gel Extraction Kit (QIAGEN) following manufacturer’s instructions. Purified amplicons were subcloned into the pDON223.1 expression vector using the BP-Gateway System (Invitrogen). At least six different clones for each minigene were sequenced to verify correct sequences of the minigenes. The clone with the desired sequence and highest DNA concentration was used for subcloning into the pDESTSplice expression vector (Addgene) using the LR-Gateway System (Invitrogen).

Exon trapping and RT-PCR

HeLa cells were seeded at 2x105 cells per well in 6-well plates (Falcon). After 24h, cells were transfected using Lypofectamine 3000 (Invitrogen) following manufacturer’s instructions, and then harvested after additional 24h. RNA was purified using the RNAeasy Mini Kit (QIAGEN) following manufacturer’s instructions. Two μg of RNA was used to generate cDNA using the SuperScript III First Strand kit (Invitrogen), and PCR was carried out. In the case of exon trapping assays, we used primers specific for the rat insulin exons constitutively present in the pDESTSplice vector: Fw: CCTGCTGGCCCTGCTCA, Rv: TAGTTGCAGTAGTTCTCCAGTTGG. In the case of the BTRC RT-PCR, we used primers specific for 5′ and 3′ sequences of the BTRC gene. Amplicons were loaded into the agarose gel (G-BioSciences) and visualized using Gel-Doc XR+ Imaging System (Bio-Rad).

Co-Immunoprecipitation and Western Blot

HeLa cells were harvested and rinsed once with ice-cold 1xPBS, pH 7.2, and lysed in immunoprecipitation lysis buffer (20 mM Tris, pH 7.4, 140 mM NaCl, 10% glycerol, and 1% Triton X-100) supplemented with 1xEDTA-free complete protease inhibitor mixture (Roche) and phosphatase inhibitor cocktails-II, III (Sigma Aldrich). The cells were centrifuged at 16,000xg at 4°C for 30min, and the supernatants were collected. Protein concentration was quantified by modified Lowry assay (DC protein assay; Bio-Rad). The cell lysates were resolved by SDS-PAGE and transferred onto PVDF Immobilon-P membranes (Millipore). After blocking with 5% nonfat dry milk in TBS containing 0.1% Tween 20 for 1hr at room temperature, membranes were probed overnight with the appropriate primary antibodies. They were then incubated for 1h with the species-specific peroxidase-conjugated secondary antibody. Membranes were developed using the Pierce-ECL Western Blotting Substrate Kit (Thermo Scientific). For immunoprecipitation experiments, samples were lysed and quantified as described above. Then, 3 mg of total protein was diluted with immunoprecipitation buffer to achieve a concentration of 3 mg/ml. A total of 30μl of anti-V5-magnetic beads-coupled antibody (MBL) was added to each sample and incubated for 4h at 4°C in tube rotator. Beads were then washed twice with immunoprecipitation buffer and three more times with ice cold 1xPBS. The proteins were then eluted with 40μl of 2xLaemli buffer. After a short spin, supernatants were carefully removed, and SDS-PAGE was performed. The following primary antibodies were used: anti-V5 (1:1000; Invitrogen), anti-β-catenin (1:1000; Abcam), anti-p-βcatenin (1:1000; Cell Signaling), anti-Cul1 (1:1000; Abcam), anti-SKP1 (1:1000; Cell Signaling), and anti-βactin (1:10000; Thermo Scientific).

QUANTIFICATION AND STATISTICAL ANALYSIS

In all Figures, the “*” represents statistically significant result, with significance defined as p < 0.05 unless otherwise stated. All p-values underwent FDR multiple hypothesis correction where appropriate unless otherwise stated. Gene Ontology enrichment analyses were performed using gProfiler, with overly general terms (terms with more than 1,000 members) filtered out; enrichment analyses of curated gene lists were performed with Fisher’s exact tests. Additional statistical analyses details including p values could be found in Figure legends and in the Results section of the manuscript.

Expression validation by RT-qPCR

Expression quantification was validated through RT-qPCR of 44 unique isoforms from 32 genes for two age-, sex-, and region-matched brain samples. Relative qPCR expression was compared against the BrainSpan expression level for each of one 22-week-old male frontal lobe fetal sample and one 27-year-old male cerebral cortex adult sample. Expressions of each isoform were compared using Pearson correlation analysis. Results can be found in Table S1 and Figure S4A.

Variant impact analysis

Co-expression modules were analyzed for their rate of impact by LoF variants. This metric (Variants per Million, VPM) was calculated for each module as the number of LoF variants impacting the module members divided by the non-overlapping length of member genes/isoforms, scaled by the total number of LoF variants times 1,000,000. Statistical analysis of these module VPMs was performed using permutation tests for each module, comparing VPM of case variants against VPM of control variants, with n = 1,000 permutations. Genes and isoforms were sampled in each permutation controlling for length and GC content, ± 10% for each attribute. Results of this analysis can be found in Figure 3D and Table S12. Co-expression modules were characterized by their association with specific developmental periods. This was done using linear mixed effects models, fixing age, brain region, sex, ethnicity, and study site and designating random donor effects to account for multiple samples per donor in the dataset. Modules were further characterized through module enrichment analysis by Fisher’s exact tests against curated gene lists and Gene Ontology enrichment with gProfiler, with modules ordered by their kME rank. Results can be found in Figure 3 and Tables S10 and S11.

KEY RESOURCES TABLE

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Antibodies
anti-V5-tag magnetic beads	MBL International Cat# M167-9	RRID: AB_10795284
anti-V5 antibody	Thermo Fisher Scientific Cat# R960-25	RRID: AB_2556564
anti-β-catenin	Abcam Cat# ab22656	RRID: AB_447227
anti-p-βcatenin	Cell Signaling Technology Cat# 9561	RRID: AB_331729
anti-Cul1	Abcam Cat# 75812	RRID: AB_1310104
anti-SKP1	Cell Signaling Technology Cat#2156	RRID: AB_2270271
anti-βactin	Thermo Scientific Cat# MA5-15739	RRID: AB_10979409
Biological samples
Human fetal brain total RNA	AMSBIO	CAT #636526
Human adult brain total RNA	AMSBIO	CAT #540143
Critical commercial assays
QIAquick Gel Extraction Kit	QIAGEN	CAT #28704
Gateway BP Clonase II Enzyme mix	Invitrogen	CAT# 11789020
Gateway LR Clonase Enzyme Mix	Invitrogen	CAT# 11791019
RNAeasy Mini Kit	QIAGEN	CAT# 74104
Pierce-ECL Western Blotting Substrate Kit	Thermo Scientific	CAT# 32209
SuperScript III First Strand system	Invitrogen	CAT# 18080051
Deposited data
BrainSpan gene/isoform expression data	PsychEncode Consortium	syn12080241, syn8241756
Processed summary-level BrainSpan data	PsychEncode Consortium	syn8466658
Original analysis code	GitHub	10.5281/zenodo.5091071
Single-Cell RNA-seq data	Polioudakis et al., 2019; Zhong et al., 2018	0.1016/j.neuron.2019.06.011;10.1038/nature25980
Large exome sequencing variant data	Satterstrom et al., 2020	10.1101/484113
Mutational constraint genes	Samocha et al., 2014	10.1038/ng.3050
FMRP target genes	Darnell et al., 2011	10.1016/j.cell.2011.06.013
High risk ASD genes	Satterstrom et al., 2020	10.1016/j.cell.2019.12.036
CHD8 target genes	Wilkinson et al., 2015	10.1038/tp.2015.62
SynaptomeDB	Pirooznia et al., 2012	10.1093/bioinformatics/bts040
Mutation-intolerant genes	Lek et al., 2016	10.1038/nature19057
SFARI	https://gene.sfari.org/	N/A
Bioplex	Huttlin et al., 2015	10.1016/j.cell.2015.06.043
HPRD	Keshava Prasad et al., 2009	10.1093/nar/gkn892
InWeb	Li et al., 2017	10.1038/nmeth.4083
HINT	Das and Yu, 2012	10.1186/1752-0509-6-92
BioGRID	Chatr-Aryamontri et al., 2017	10.1093/nar/gkw1102
GeneMANIA	Zuberi et al., 2013	10.1093/bioinformatics/btu671
STRING	Szklarczyk et al., 2017	10.1093/nar/gkw937
CORUM	Ruepp et al., 2010	10.1093/nar/gkp914
Experimental models: Cell lines
HeLa cells	ATCC	CAT #CRL-12401
Oligonucleotides
All oligonucleotides used in the study are listed in Table S1 and in the STAR Methods section	N/A	N/A
Recombinant DNA
pDESTSplice plasmid	Addgene	CAT #32484
pDON223.1 plasmid	Invitrogen	10.1101/gr.2505604
Software and algorithms
R v3.6.3	The R Foundation	https://cran.r-project.org/
Surrogate Variable Analysis	Leek and Storey, 2007	https://bioconductor.org/packages/release/bioc/html/sva.html
limma v3.44.3	Ritchie et al., 2015	https://bioconductor.org/packages/release/bioc/html/limma.html
WGCNA: Weighted Correlation Network Analysis v1.69	Langfelder and Horvath, 2008	https://cran.r-project.org/web/packages/WGCNA/index.html
Variant Effect Predictor v96	McLaren et. al. 2016	https://github.com/Ensembl/ensembl-vep
gProfiler	Raudvere et al., 2019	https://biit.cs.ut.ee/gprofiler/
Bisque	Jew et al., 2020	10.1038/s41467-020-15816-6
variancePartition	Hoffman and Schadt, 2016	10.1186/s12859-016-1323-z
Custom code	N/A	10.5281/zenodo.5091071

69 in total

1. PYM binds the cytoplasmic exon-junction complex and ribosomes to enhance translation of spliced mRNAs.

Authors: Michael D Diem; Chia C Chan; Ihab Younis; Gideon Dreyfuss
Journal: Nat Struct Mol Biol Date: 2007-11-18 Impact factor: 15.369

2. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism.

Authors: Neelroop N Parikshak; Rui Luo; Alice Zhang; Hyejung Won; Jennifer K Lowe; Vijayendran Chandran; Steve Horvath; Daniel H Geschwind
Journal: Cell Date: 2013-11-21 Impact factor: 41.582

3. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations.

Authors: Brian J O'Roak; Laura Vives; Santhosh Girirajan; Emre Karakoc; Niklas Krumm; Bradley P Coe; Roie Levy; Arthur Ko; Choli Lee; Joshua D Smith; Emily H Turner; Ian B Stanaway; Benjamin Vernot; Maika Malig; Carl Baker; Beau Reilly; Joshua M Akey; Elhanan Borenstein; Mark J Rieder; Deborah A Nickerson; Raphael Bernier; Jay Shendure; Evan E Eichler
Journal: Nature Date: 2012-04-04 Impact factor: 49.962

4. RBFOX and PTBP1 proteins regulate the alternative splicing of micro-exons in human brain transcripts.

Authors: Yang I Li; Luis Sanchez-Pulido; Wilfried Haerty; Chris P Ponting
Journal: Genome Res Date: 2015-01 Impact factor: 9.043

Review 5. Wnt signaling networks in autism spectrum disorder and intellectual disability.

Authors: Vickie Kwan; Brianna K Unda; Karun K Singh
Journal: J Neurodev Disord Date: 2016-12-05 Impact factor: 4.025

6. Cell-Type-Specific Profiling of Alternative Translation Identifies Regulated Protein Isoform Variation in the Mouse Brain.

Authors: Darshan Sapkota; Allison M Lake; Wei Yang; Chengran Yang; Hendrik Wesseling; Amanda Guise; Ceren Uncu; Jasbir S Dalal; Andrew W Kraft; Jin-Moo Lee; Mark S Sands; Judith A Steen; Joseph D Dougherty
Journal: Cell Rep Date: 2019-01-15 Impact factor: 9.423

7. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations.

Authors: Brian J O'Roak; Pelagia Deriziotis; Choli Lee; Laura Vives; Jerrod J Schwartz; Santhosh Girirajan; Emre Karakoc; Alexandra P Mackenzie; Sarah B Ng; Carl Baker; Mark J Rieder; Deborah A Nickerson; Raphael Bernier; Simon E Fisher; Jay Shendure; Evan E Eichler
Journal: Nat Genet Date: 2011-05-15 Impact factor: 38.330

8. Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism.

Authors: Roser Corominas; Xinping Yang; Guan Ning Lin; Shuli Kang; Yun Shen; Lila Ghamsari; Martin Broly; Maria Rodriguez; Stanley Tam; Shelly A Trigg; Changyu Fan; Song Yi; Murat Tasan; Irma Lemmens; Xingyan Kuang; Nan Zhao; Dheeraj Malhotra; Jacob J Michaelson; Vladimir Vacic; Michael A Calderwood; Frederick P Roth; Jan Tavernier; Steve Horvath; Kourosh Salehi-Ashtiani; Dmitry Korkin; Jonathan Sebat; David E Hill; Tong Hao; Marc Vidal; Lilia M Iakoucheva
Journal: Nat Commun Date: 2014-04-11 Impact factor: 14.919

9. Assessment of transcript reconstruction methods for RNA-seq.

Authors: Josep F Abril; Pär G Engström; Felix Kokocinski; Tamara Steijger; Tim J Hubbard; Roderic Guigó; Jennifer Harrow; Paul Bertone
Journal: Nat Methods Date: 2013-11-03 Impact factor: 28.547

10. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information.

Authors: Brandon Jew; Marcus Alvarez; Elior Rahmani; Zong Miao; Arthur Ko; Kristina M Garske; Jae Hoon Sul; Kirsi H Pietiläinen; Päivi Pajukanta; Eran Halperin
Journal: Nat Commun Date: 2020-04-24 Impact factor: 14.919

5 in total

1. Discovery of eQTL Alleles Associated with Autism Spectrum Disorder: A Case-Control Study.

Authors: Allison R Hickman; Bradley Selee; Rini Pauly; Benafsh Husain; Yuqing Hang; Frank Alex Feltus
Journal: J Autism Dev Disord Date: 2022-06-23

Review 2. Neurodevelopmental disorders, immunity, and cancer are connected.

Authors: Ruth Nussinov; Chung-Jung Tsai; Hyunbum Jang
Journal: iScience Date: 2022-05-30

3. Complex regulation of Gephyrin splicing is a determinant of inhibitory postsynaptic diversity.

Authors: Fabrice Ango; Eric Allemand; Raphaël Dos Reis; Etienne Kornobis; Alyssa Pereira; Frederic Tores; Judit Carrasco; Candice Gautier; Céline Jahannault-Talignani; Patrick Nitschké; Christian Muchardt; Andreas Schlosser; Hans Michael Maric
Journal: Nat Commun Date: 2022-06-18 Impact factor: 17.694

4. Single-cell transcriptome identifies molecular subtype of autism spectrum disorder impacted by de novo loss-of-function variants regulating glial cells.

Authors: Nasna Nassir; Asma Bankapur; Bisan Samara; Abdulrahman Ali; Awab Ahmed; Ibrahim M Inuwa; Mehdi Zarrei; Seyed Ali Safizadeh Shabestari; Ammar AlBanna; Jennifer L Howe; Bakhrom K Berdiev; Stephen W Scherer; Marc Woodbury-Smith; Mohammed Uddin
Journal: Hum Genomics Date: 2021-11-21 Impact factor: 4.639

5. acorde unravels functionally interpretable networks of isoform co-usage from single cell data.

Authors: Sonia Tarazona; Ana Conesa; Angeles Arzalluz-Luque; Pedro Salguero
Journal: Nat Commun Date: 2022-04-05 Impact factor: 17.694

5 in total