Literature DB >> 35845835

DeepND: Deep multitask learning of gene risk for comorbid neurodevelopmental disorders.

Ilayda Beyreli¹, Oguzhan Karakahya¹, A Ercument Cicek^1,2.

Abstract

Autism spectrum disorder and intellectual disability are comorbid neurodevelopmental disorders with complex genetic architectures. Despite large-scale sequencing studies, only a fraction of the risk genes was identified for both. We present a network-based gene risk prioritization algorithm, DeepND, that performs cross-disorder analysis to improve prediction by exploiting the comorbidity of autism spectrum disorder (ASD) and intellectual disability (ID) via multitask learning. Our model leverages information from human brain gene co-expression networks using graph convolutional networks, learning which spatiotemporal neurodevelopmental windows are important for disorder etiologies and improving the state-of-the-art prediction in single- and cross-disorder settings. DeepND identifies the prefrontal and motor-somatosensory cortex (PFC-MFC) brain region and periods from early- to mid-fetal and from early childhood to young adulthood as the highest neurodevelopmental risk windows for ASD and ID. We investigate ASD- and ID-associated copy-number variation (CNV) regions and report our findings for several susceptibility gene candidates. DeepND can be generalized to analyze any combinations of comorbid disorders.

Entities: Chemical

Keywords: autism; comorbidity; deep learning; genome-wide association; graph convolution; intellectual disability; node classification; semisupervised learning

Year: 2022 PMID： 35845835 PMCID： PMC9278518 DOI： 10.1016/j.patter.2022.100524

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

Introduction

Autism spectrum Disorder (ASD) is a common neurodevelopmental disorder with a complex genetic architecture in which around a thousand risk genes have a role. Large consortia efforts have been paving the way for understanding the genetic, functional, and cellular aspects of this complex disorder via large-scale exome-2, 3, 4, 5, 6, 7, 8 and genome-9, 10, 11, 12, 13, 14 sequencing studies. The latest and also most comprehensive study to date analyzed ∼36,000 samples (6,430 trios) to pinpoint 102 risk genes (false discovery rate [FDR] ≤ 0.1). Overwhelming evidence suggests that genetic architectures of neuropsychiatric disorders overlap.16, 17, 18 For instance, out of the twenty five SFARI category I ASD risk genes (i.e., highest risk), only five are solely associated with ASD. Genes like chromodomain helicase DNA-binding protein 8 (CHD2), SCN2A, and ARID1B are associated with six neurodevelopmental disorders. Intellectual disability (ID) is one of such comorbid disorders that manifests itself with impaired mental capabilities. Reminiscent of ASD, ID also has a complex genetic background with hundreds of risk genes involved and is identified by rare de novo disruptive mutations observed in whole-exome- and genome-sequencing studies.19, 20, 21, 22, 23, 24, 25 ASD and ID are frequently observed together. In 2018, the CDC reported that of children with ASD were also diagnosed with ID and were borderline. They also share a large number of risk genes. Despite these similarities, Robinson et al. also point to differences in genetic architectures and report that intelligence quotient (IQ) positively correlates with family history of psychiatric disease and negatively correlates with de novo disruptive mutations. Yet, the shared functional circuitry behind is mostly unknown. The current lack of understanding on how comorbid neuropsychiatric disorders relate mostly stems from the incomplete profiling of individual genetic architectures. Statistical methods have been used to assess gene risk using excess genetic burden from case-control and family studies, which have recently been extended to work with multiple traits. Yet, these tools work with genes with observed disruptive mutations (mainly de novo). It is often of interest to use these as prior risks and obtain a posterior gene-interaction-network-adjusted risk that can also assess risk for genes with no prior signal. Network-based computational gene risk-prediction methods come in handy for (1) imputing the insufficient statistical signal and providing a genome-wide risk ranking and (2) finding out the affected cellular circuitries such as pathways and networks of genes.,31, 32, 33, 34, 35, 36, 37, 38, 39 While these methods have helped unravel the underlying mechanisms, they have several limitations. First, by design, they are limited to work with a single disorder. In order to compare and contrast comorbid disorders such as ASD and ID using these tools, one approach is to bag the mutational burden observed for each disorder, assuming two are the same. However, disorder-specific features are lost as a consequence. The more common approach is to perform independent analyses per disorder and intersect the results. Unfortunately, this approach ignores valuable source of information coming from the shared genetic architecture and loses prediction power as per-disorder analyses have less input (i.e., samples, mutation counts) and less statistical power.,, Second, current network-based gene-discovery methods can work with one or two integrated gene-interaction networks.,,,, This means that numerous functional interaction networks (e.g., co-expression, protein interaction, etc.) are disregarded, which limits and biases the predictions. Gene co-expression networks that model brain development are a promising source of diverse information regarding gene risk but currently cannot be fully utilized, as the signal coming from different networks cannot be deconvoluted. Usually, investigating such risky neurodevelopmental windows is an independent downstream analysis., Should this process be integrated within the risk-assessment framework, it has the potential to provide valuable biological insights and also to improve the performance of the genome-wide risk assessment task. Here, we address these challenges with a cross-disorder gene-discovery algorithm (Deep Neurodevelopmental Disorders [DeepND]). For the first time, DeepND analyzes comorbid neurodevelopmental disorders simultaneously over multiple gene co-expression networks and explicitly learns the shared and disorder-specific features using multitask learning. Thus, the predictions for the disorders depend on each other’s genetic architecture. The proposed DeepND architecture uses graph convolution to extract associations between genes from gene co-expression networks. This information is processed by a mixture-of-experts model that can self learn critical neurodevelopmental time windows and brain regions for each disorder etiology, which makes the model interpretable. We provide a genome-wide risk ranking for each disorder and show that the prediction power is improved in both single- (single disorder) and multitask settings. DeepND identifies the prefrontal cortex brain region and from early- to late-mid-fetal and from mid-childhood to young adulthood periods as the highest neurodevelopmental risk windows for both disorders. Finally, we investigate frequent ASD and ID associated copy-number-variation regions and pinpoint risk genes that are disregarded by other algorithms. The software has been released at http://github.com/ciceklab/deepnd. This neural network architecture can easily be generalized to other disorders with a shared genetic component and can be used to prioritize focused functional studies and possible drug targets.

Results

Using a deep-learning framework that combines graph convolutional neural networks with a mixture-of-experts model, we perform a genome-wide risk assessment for ASD and ID simultaneously in a multitask learning setting and detect neurodevelopmental windows that are informative for risk assessment for each disorder. Our results point to the shared disrupted functionalities and novel risk genes that provide a road map to researchers who would like to understand the ties between these two comorbid disorders.

Genome-wide risk prediction for ASD and ID

To have a model of evolving gene interactions throughout brain development, we extracted 52 gene co-expression networks from the BrainSpan dataset, using hierarchical clustering of brain regions and a sliding-window approach. These networks are used to assess the network-adjusted posterior risk for each gene for which disease agnostic the prior risk features such as the probability of loss-of-function intolerance (pLI) is given. Note that one can use disease-specific features such as mutation burden in case-control or family studies. However, current knowledge on ground-truth risk genes mostly rely on these data, and using these as features would cause a leak in the training procedure. The deep neural network architecture (DeepND) performs a semi-supervised learning task, which means a set of positively and negatively labeled genes are required as ground truth per disorder. As also done in Krishnan et al., we obtained 594 ASD-positive genes (Table S1) from the SFARI dataset (http://gene.sfari.org), which are categorized into three evidence levels based on the strength of evidence in the literature. For ID, we curated a ground-truth ID risk-gene set of 580 genes using landmark review studies on ID gene risk,47, 48, 49, 50, 51 (Table S2). We generated three evidence level sets similar to the ASD counterpart: E1, E2, and E3 sets where each set includes genes that are recurrently indicated in multiple studies. As for the negatively labeled genes, we used 1,074 non-mental-health-related genes for both disorders, which were curated by Krishnan et al., after discarding 181 of them that are shown to be related to other neurodevelopmental disorders, such as epilepsy, schizophrenia, or bipolar disorder (Table S1). DeepND uses the multitask-learning paradigm where multiple tasks are solved concurrently (i.e., genome wide risk assessment for ASD and ID). Thus, the network learns a shared set of weights for both disorders and also disorder-specific set of weights (Figure 1). First, the model inputs pLI for each gene and, using fully connected layers, produces an embedded feature set per gene. The set of weights learned in these layers are affected by the ground-truth labels for both disorders and, thus, are shared. Then, the architecture branches out to two single-task layers, with one per disorder (Figure 1, blue for ASD and yellow for ID). For each single-task branch, these embeddings are input to 52 graph convolutional neural networks (GCNs). Each GCN processes a co-expression network that represents a neurodevelopmental window and extracts network-adjusted gene risk signatures (i.e., embeddings), (experimental procedures). Finally, these embeddings are fed into a fully connected gating network along with the prior risk features. The gating network assigns a weight to each GCN that is proportional to the informativeness of the embedding coming from each neurodevelopmental window. Thus, the model also learns which windows are important for prediction of ASD/ID risk genes. This also means that they are important in the etiology of the disorder. In the end, each disorder-specific subnetwork produces a genome-wide ranking of genes being associated with that disorder along with risk probabilities. To quantify the contribution of the co-analyzing comorbid disorders (i.e., multitask) as opposed to individual analysis (i.e., single task), we also present our results of DeepND when it is run on a single-task mode (DeepND-ST). In this mode, the fully connected layers that generate embeddings are removed, and the feature (i.e., pLI) is directly fed into the GCNs (experimental procedures). We show that the genome-wide risk assessment of DeepND is robust, precise, and sensitive and substantially improves the state of the art both in terms of performance and interpretability of the predictions.

Figure 1

System model

System model of the proposed deep-learning architecture for genome-wide cross-disorder risk assessment (DeepND). The algorithm takes the following information as inputs: (1) gene-specific features such as pLI, (2) disorder-specific ground-truth genes that are labeled as positive with varying levels of evidence based on a literature search, and (3) non-psychiatric genes that are labeled as negative. The unique gene identifiers are passed through fully connected multitask layers that learn shared weights for both disorders and produces a new feature representation affected by both losses. This new representation is then input to the starting point of the single-task layers. Graph convolutional neural networks (GCNs) each process one of fifty-two gene co-expression networks that represent different brain regions and neurodevelopmental time windows. The outputs of GCNs are then weighted by the gating network using gene pLIs to learn which networks are informative for the gene risk assessment (shade of the network indicates importance). Thus, DeepND learns which neurodevelopmental windows confer more risk for each disorder’s etiology using the gating network (mixture of experts). The final output is a genome-wide risk-probability ranking per disorder, which are then used for various downstream analyses to understand the underlying functional mechanisms and to compare/contrast both disorders. The single-task layers are exclusively trained with the ground-truth genes of the disorder they belong. Thus, they learn only disorder-specific parameters and disorder-specific networks that implicate risk.

System model System model of the proposed deep-learning architecture for genome-wide cross-disorder risk assessment (DeepND). The algorithm takes the following information as inputs: (1) gene-specific features such as pLI, (2) disorder-specific ground-truth genes that are labeled as positive with varying levels of evidence based on a literature search, and (3) non-psychiatric genes that are labeled as negative. The unique gene identifiers are passed through fully connected multitask layers that learn shared weights for both disorders and produces a new feature representation affected by both losses. This new representation is then input to the starting point of the single-task layers. Graph convolutional neural networks (GCNs) each process one of fifty-two gene co-expression networks that represent different brain regions and neurodevelopmental time windows. The outputs of GCNs are then weighted by the gating network using gene pLIs to learn which networks are informative for the gene risk assessment (shade of the network indicates importance). Thus, DeepND learns which neurodevelopmental windows confer more risk for each disorder’s etiology using the gating network (mixture of experts). The final output is a genome-wide risk-probability ranking per disorder, which are then used for various downstream analyses to understand the underlying functional mechanisms and to compare/contrast both disorders. The single-task layers are exclusively trained with the ground-truth genes of the disorder they belong. Thus, they learn only disorder-specific parameters and disorder-specific networks that implicate risk. We benchmark the performances of DeepND and other gene-discovery algorithms for neurodevelopmental disorders. Note that these tools are mainly designed for ASD. However, to the best of our knowledge, there are no such algorithms designed for ID or for other neurodevelopmental disorders. Thus, we compare them against DeepND in terms of both ASD and ID. However, one can primarily focus on their ASD performance. First, we compare DeepND and the state-of-the-art algorithm by Krishnan et al. using their experimental settings. Evaluating performances of the algorithms using E1 genes through 5-fold cross shows that DeepND achieves a median area under receiver operating curve (AUC) of and a median area under the precision recall (AUPR) of for ASD, which correspond to and improvements over the evidence-weighted support vector machine (SVM) algorithm of Krishnan et al., respectively (Figures 2A and 2B). Similarly for ID, DeepND achieves a median AUC of and a median AUPR of , which correspond to improvements of and , respectively. We observe that even the single-task mode DeepND-ST performs better than Krishnan et al.’s algorithm in all settings other than ID’s AUPR. As expected, the multitask setting of DeepND performs better than DeepND-ST and leads to an improved AUC (up to ) and AUPR (up to ) for both disorders.

Figure 2

Evaluation of DeepND genome-wide risk assessment for ASD and ID

(A and B) The AUC distributions of Krishnan et al., DeepND, and DeepND-ST for ASD and ID genome-wide risk assessments. Every point corresponds to the performance on a test fold in the repeated cross validation setting (B). The AUPR curve distributions are for the same methods as in (A). Center line: median; box limits: upper and lower quartiles; whiskers: 1.5× interquartile range; points: outliers.

(C) Matthews correlation coefficient (MCC) between the rankings of each method and the ground-truth genes are shown for varying rank-percentage-threshold values (x axis). This ranking threshold p sets the top p% of a ranking as positive predictions. Methods compared are DeepND, DeepND - ST, evidence-weighted SVM of Krishnan et al., DAWN (only compared for ASD due to low intersection ratio with ground-truth gene set of ID), DAMAGES score of Zhang and Shen, and random-forest-based method of Duda et al. Results for ASD and ID shown when (1) E1 genes are used as the true risk genes and (2) E1 + E2 genes are considered as the true risk genes.

(D) PR curves to compare the same set of methods in (C).

Evaluation of DeepND genome-wide risk assessment for ASD and ID (A and B) The AUC distributions of Krishnan et al., DeepND, and DeepND-ST for ASD and ID genome-wide risk assessments. Every point corresponds to the performance on a test fold in the repeated cross validation setting (B). The AUPR curve distributions are for the same methods as in (A). Center line: median; box limits: upper and lower quartiles; whiskers: 1.5× interquartile range; points: outliers. (C) Matthews correlation coefficient (MCC) between the rankings of each method and the ground-truth genes are shown for varying rank-percentage-threshold values (x axis). This ranking threshold p sets the top p% of a ranking as positive predictions. Methods compared are DeepND, DeepND - ST, evidence-weighted SVM of Krishnan et al., DAWN (only compared for ASD due to low intersection ratio with ground-truth gene set of ID), DAMAGES score of Zhang and Shen, and random-forest-based method of Duda et al. Results for ASD and ID shown when (1) E1 genes are used as the true risk genes and (2) E1 + E2 genes are considered as the true risk genes. (D) PR curves to compare the same set of methods in (C). We also compare the ranking performance DeepND with the following methods from the literature using a rank-percentage-threshold-based method: Krishnan et al., DAWN, DAMAGES score, and the random forest-based method of Duda et al. That is, we obtain the final ranking of each method and mark p% of the top genes predicted as risk genes. We calculate the Matthews correlation coefficient of these marked lists with the ground truth. We vary p by and obtain the plots in Figure 2C. Results show that for varying threshold levels, the DeepND curve always dominates all others, indicating that it achieves the highest correlation with the ground truth both when we use E1 and E1 + E2 genes as positive ground truth. Comparing the precision-recall performance of the same set of methods shows that the PR curve of DeepND dominates all others in all settings and for both disorders (Figure 2D). Finally, we compare DeepND with an ensemble learner, forecASD. This method uses the outputs of other methods including the ones we compared above and learns to predict ASD/ID risk genes in a semi-supervised setting. However, labels used by forecASD are also used by the voter methods whose outputs are used as features, which suggests information leakage during the training process and that the performance might be overly optimistic. Comparing DeepND and forecASD shows that despite this advantage of forecASD, DeepND performs slightly worse or on par with respect to rank-percentage-threshold curve and PR curve comparisons in ASD (see Figure S1). DeepND performs better than forecASD in all settings for ID. We also considered adding DeepND as an additional voter into the ensemble of forecASD to check if it benefits the final performance. For both disorders and in all settings, forecASD that uses DeepND as a voter outperforms both DeepND and original forecASD. This shows the informativeness of DeepND predictions.

Critical neurodevelopmental windows for ASD and ID risk

We observe that the top percentile genes in the DeepND ASD and ID rankings have a high overlap with a Jaccard index of . The top 3 deciles also have relatively high overlap with Jaccard indices 0.66, and 0.69, respectively (Figure S2). To further investigate the shared genetic component, we focus on the spatiotemporal neurodevelopmental windows that are deemed important by DeepND for an accurate ranking of risk genes for both disorders. The neural network analyzes co-expression networks that represent 13 neurodevelopmental time windows (from embryonic period to late adulthood; Figures 3A and 4) and brain-region clusters (prefrontal and motor-somatosensory cortex [PFC-MSC]; mediodorsal nucleus of the thalamus and cerebellar cortex [MDCBC]; primary visual and superior temporal cortex [V1C-STC]; striatum, hippocampus, amygdala [SHA]) generated using Brainspan RNA-Seq dataset in accordance with Willsey et al. (experimental procedures; Figure 3B).

Figure 3

Spatiotemporal brain regions and heatmaps for the probabilities assigned to each spatiotemporal window by DeepND

The BrainSpan dataset, which models the spatiotemporal gene expression of human neurodevelopment, is used to obtain gene co-expression networks. This dataset contains samples from 12 brain regions and spans 15 time points from early fetal period to late adulthood.

(A) We generate 13 neurodevelopmental time windows using a sliding window of length 3, which provides sufficient data in each window as was also done by Willsey et al.

(B) We obtain 4 brain-region clusters based on transcriptional similarity during fetal development that also reflect topographical closeness and functional segregation. The brain regions considered are as follows. HIP, hippocampal anlage (for (1)–(2)), hippocampus (for (3)–(15)); OFC, orbital prefrontal cortex; DFC, dorsal prefrontal cortex; VFC, ventral prefrontal cortex; MFC, medial prefrontal cortex; M1C, primary motor cortex; S1C, primary somatosensory cortex; IPC, posterior inferior parietal cortex; A1C, primary auditory cortex; STC, superior temporal cortex; ITC, inferior temporal cortex; V1C, primary visual cortex; AMY, amygdala; STR, striatum; MD, mediodorsal nucleus of the thalamus; CBC, cerebellar cortex.

(C) Heatmaps show which spatiotemporal windows lead to assignment of higher risk probabilities to top-percentile genes for ASD (left) and ID (right). The numbers in boxes are softmaxed outputs of each respective GCN, averaged for top-percentile genes and then normalized. The weights assigned to each co-expression network by the MoE lets each GCN learn to make a better prediction.

Spatiotemporal brain regions and heatmaps for the probabilities assigned to each spatiotemporal window by DeepND The BrainSpan dataset, which models the spatiotemporal gene expression of human neurodevelopment, is used to obtain gene co-expression networks. This dataset contains samples from 12 brain regions and spans 15 time points from early fetal period to late adulthood. (A) We generate 13 neurodevelopmental time windows using a sliding window of length 3, which provides sufficient data in each window as was also done by Willsey et al. (B) We obtain 4 brain-region clusters based on transcriptional similarity during fetal development that also reflect topographical closeness and functional segregation. The brain regions considered are as follows. HIP, hippocampal anlage (for (1)–(2)), hippocampus (for (3)–(15)); OFC, orbital prefrontal cortex; DFC, dorsal prefrontal cortex; VFC, ventral prefrontal cortex; MFC, medial prefrontal cortex; M1C, primary motor cortex; S1C, primary somatosensory cortex; IPC, posterior inferior parietal cortex; A1C, primary auditory cortex; STC, superior temporal cortex; ITC, inferior temporal cortex; V1C, primary visual cortex; AMY, amygdala; STR, striatum; MD, mediodorsal nucleus of the thalamus; CBC, cerebellar cortex. (C) Heatmaps show which spatiotemporal windows lead to assignment of higher risk probabilities to top-percentile genes for ASD (left) and ID (right). The numbers in boxes are softmaxed outputs of each respective GCN, averaged for top-percentile genes and then normalized. The weights assigned to each co-expression network by the MoE lets each GCN learn to make a better prediction. Network analyses of the risk genes (A) The co-expression relationships between top 30 ASD and ID risk genes in PFC-MSC 4–6 region. This is found as the informative co-expression network for DeepND and is also indicated in the literature as an important window for ASD. Only links for at least 0.95 absolute correlation and genes with at least one connection are shown. (B) The protein-protein interactions between the top-percentile risk genes are shown. The protein-protein interactions (PPIs) are obtained from the tissue-specific PPI network of frontal cortex in the DifferentialNet database. Only interactions between the top-percentile risk genes are shown. We investigate which neurodevelopmental windows more confidently distinguish disorder risk genes. Figure 3C shows normalized average probabilities assigned to top-percentile risk genes (experimental procedures). We observe that the networks of the PFC-MSC brain region, spanning several time windows from early- to mid-fetal periods and from early childhood to young adulthood periods consistently are better predictors for ASD and ID risk. The periods 3–5 and 4–6 were also previously indicated as two of the highest risk regions and were subject to network analyses for ASD gene discovery., DeepND also captures strong signals from these windows, in addition to period 11–13, for both ASD and ID. We see that V1C-STC brain regions are second to PFC-MSC and MDCBC regions, providing the most subtle signal. We assess the empirical of significance of these findings via a permutation test. That is, we run DeepND in the exact same settings but shuffle the labels of the risk genes, keeping the same number of positively and negatively labeled genes. For each of the squares in Figure 3C, we obtain a distribution of average probabilities (background distribution) and check if the actual informativeness of that window is higher than the average background value. See these distributions in Figures S4 and S5 for ASD and ID, respectively. We observe that informativeness of PFC-MSC windows are the most significant compared with other brain regions and that MD-CBC results are consistently insignificant. Overall, we observe that not only earlier time windows but also later time windows are important for both etiologies. Figure S3 shows the heatmaps of DeepND-ST for ASD and ID. The weakest source of information is the MDCBC 2-4 network. We observe roughly 12,500 links between top-percentile ASD and ID genes. The network contains close to 37.5 m links. On the other hand, PFC-MSC 4–6 contains close to 785,000 edges. Yet, there exists close to 13,000 links between top-percentile ASD and ID genes, which is the reason behind DeepND focusing on this window as its top predictor. Visualization of the top 30 genes for each disorder in the PFC-MSC 4–6 network with only very high co-expression links () is provided in Figure 4A.

Figure 4

Network analyses of the risk genes

(A) The co-expression relationships between top 30 ASD and ID risk genes in PFC-MSC 4–6 region. This is found as the informative co-expression network for DeepND and is also indicated in the literature as an important window for ASD. Only links for at least 0.95 absolute correlation and genes with at least one connection are shown.

(B) The protein-protein interactions between the top-percentile risk genes are shown. The protein-protein interactions (PPIs) are obtained from the tissue-specific PPI network of frontal cortex in the DifferentialNet database. Only interactions between the top-percentile risk genes are shown.

Enrichment analysis of the predicted risk genes

In addition to the prediction performance benchmark above, we also evaluate the enrichment of our ASD and ID gene risk rankings in gene lists, which are shown to be related to these disorders. That is, while these gene sets are not ground-truth sets, they have been implicated as being associated with the etiology of either disorder. Thus, enrichment of members of these sets in the higher deciles of the genome-wide risk ranking of DeepND is an indication of the wellness of the ranking and also provides a means of comparing and contrasting the disrupted circuitries affected by ASD and ID. These lists are (1) targets of transcription regulators like CHD8, FMRP,,, RBFOX,,, and TOP1, (2) susceptibility pathways like WNT signaling and MAPK signaling, and (3) affected biological processes and molecular functions like post-synaptic density complex, and histone modification., The first decile of ASD risk genes has the highest enrichment in all categories. All enrichments are significant with respect to Binomial test (Figure 5; experimental procedures). We observe the same trend for ID that the top decile of the genome-wide risk ranking is the most enriched in most categories but the p values are more subtle. CHD8 is the highest ASD risk gene known to date with the highest mutation burden in large ASD cohorts, and with downstream functional analysis. Accordingly, DeepND ranks CHD8 as the 37th top risk gene whereas Krishnan et al. ranks it 1,943th genome wide. While it has a solid association to ASD etiology, its ties to ID are not well established. Bernier et al. reports that 9 out of 15 ASD probands with mutations in CHD8 also have ID. In accordance, DeepND places CHD8 as the 53rd highest risk gene for ID.

Figure 5

The enrichment of our ASD and ID gene risk rankings in various disease-related gene lists

The enrichment of our ASD and ID gene risk rankings in various disease-related gene lists (i.e., each panel) are shown: (1) ASD- and/or ID-related transcription regulators, (2) ASD- and/or ID-related pathways, and (3) ASD- and/or ID-related biological functions or protein complexes. While these do not fully contain ground-truth genes, they have been indicated in the literature as being enriched with risk genes for either disorder. Percentage of genes in the corresponding gene set (x axis) that occurred within each decile of the genome-wide risk ranking per ASD (blue) and ID (yellow) are shown. The gene sets used are as follows: (1) targets of RBFOX (splice); (2) targets of RBFOX (splice target); (3) targets of FMRP (all peak); (4) targets of CHD8; (5) targets of TOP1; (6) WNT pathway; (7) MAPK signaling pathway; (8) GTPase regulator activity; (9) post-synaptic density complex genes; (10) synaptic genes; and (11) histone-modifier genes.

The enrichment of our ASD and ID gene risk rankings in various disease-related gene lists The enrichment of our ASD and ID gene risk rankings in various disease-related gene lists (i.e., each panel) are shown: (1) ASD- and/or ID-related transcription regulators, (2) ASD- and/or ID-related pathways, and (3) ASD- and/or ID-related biological functions or protein complexes. While these do not fully contain ground-truth genes, they have been indicated in the literature as being enriched with risk genes for either disorder. Percentage of genes in the corresponding gene set (x axis) that occurred within each decile of the genome-wide risk ranking per ASD (blue) and ID (yellow) are shown. The gene sets used are as follows: (1) targets of RBFOX (splice); (2) targets of RBFOX (splice target); (3) targets of FMRP (all peak); (4) targets of CHD8; (5) targets of TOP1; (6) WNT pathway; (7) MAPK signaling pathway; (8) GTPase regulator activity; (9) post-synaptic density complex genes; (10) synaptic genes; and (11) histone-modifier genes. We also perform an untargeted enrichment analysis of the top-percentile predictions using the Enrichr system., Unsurprisingly, we find that for both disorders, nervous system development is the top enriched Biological Process GO term (Fisher exact test, p = 1.48 × 10-12 for ASD and p – 3.95 × 10-14 for ID), which indicates that it is a shared function that is disrupted in the etiology of both disorders (Table S3). As for the GO Molecular Function enrichment, two disorders overall have similar terms, but their top terms are different. The top function affected for ASD is histone-lysine N-methyltransferase activity, indicating a disruption in the transcription-factor activity, whereas for ID, the top affected function is voltage-gated cation channel activity, which indicates a disruption in synaptic activity. As transcription-factor activity is highly affected in both disorders, we further investigate if there are any master transcription regulators upstream that regulate the high-risk genes for ASD and ID in the ChEA database. We find that SMARCD1 is the top transcription factor, with 82 of its targets coinciding with the top-percentile DeepND-predicted ASD risk genes (Fisher’s exact test, p = 1.14 × 10-20) and 99 of its targets coinciding with the top-percentile DeepND-predicted ID risk genes (Fisher’s exact test, p = 7.18 × 10-18). Sixty-two targets are shared among ASD and ID. Note that 204 genes overlap in top-percentile ASD and ID risk genes. This means that 30% of the shared set of genes among two disorders are targeted by SMARCD1 (Table S3). When the union set of the top-percentile genes for ASD and ID are considered (312 genes), we find that the top transcription factor targeting these is again SMARCD1 (82 out of 2,071 targets; Fisher’s exact test, p = 1.24 × 10-21). These results indicate that SMARCD1 might be playing a role in the convergent etiologies of these comorbid disorders as a shared upstream chromatin modeler.

Other interactions between ASD and ID genes

We further investigate the protein-protein interactions (PPIs) between the top-percentile risk genes in tissue-specific PPI networks for frontal cortex obtained from the DifferentialNet database using the NetworkAnalyst system. This analysis reveals several hub proteins such as HECW2, EP300, and CREBBP1, as shown in Figure 4B. In this list, the HECW2 gene stands out as it has the highest degree in this network and has very low prior risk for both ASD (TADA p = 0.95) and ID (extTADA Q = 0.97). Yet, it is in the top percentile for ID risk. Note that DeepND did not use any PPI network information in its reasoning and yet was able to identify this hub protein, which has been linked with ASD and ID via de novo disruptive mutations in simplex families.

Evaluation of novel predictions and identification of candidates within ASD- and ID-associated CNV regions

Recurrent copy-number variations in several regions of the genome are associated with ASD and ID etiology. However, these are large regions, and it is not clear which genes in particular are driver genes. We investigate the DeepND risk ranking of genes within (1) six regions that frequently harbor ASD-related copy-number variations (CNVs) (16p11.2, 15q11-13, 15q13.3, 1q21.1, and 22q11) and (2) six regions that were reported to harbor mental-retardation-related CNVs (16p11.2, 1q21.1, 22q11.21, 22q11.22, 16p13, and 17p11.2). Note that these CNVs in these regions might confer risk for both disorders (Table S4). DeepND highly ranks several candidate genes for ASD and ID that (1) are within these CNV regions, (2) have low prior risk (e.g., E2 or lower, low TADA p value, etc.), and (3) low posterior risk assigned by other algorithms (Table S4). We discuss some of them below. NIPA2 is an ASD E3-E4 gene that encodes a magnesium transporter and is located on 15q11-13. It does not participate in any of the relevant gene sets (e.g., risk pathways), and it has a low TADA (p = 0.768). Although it is ranked in the 5th decile for ASD, it is listed in the top decile by DeepND for ID. Its linkage to Prader-Willi syndrome by Goytain et al. also suggests that NIPA2 might is an important candidate for ID. DeepND links MICAL3, which is in the 22q11.21–22 CNV region, to ASD and ID. It is a gene related to actin and Rab GTPase binding, ranked as the 191st for ASD risk and 147th for ID risk despite having ASD-TADA and not being in relevant risk-gene groups. Krishnan et al. rank it 11,665th, and DAMAGES score ranks it 1,916th. DAWN provides no ranking as it is not co-expressed with other networks of interest for DAWN. While disruption in the Rab GTPase cycle was shown to cause ID, there are no established ties between MICAL3 and ASD/ID. Moreover, MICAL family genes are related to cytoskeletal organization, which recently has been deemed an important molecular function in ASD etiology., It is also an important task to pinpoint genes that might seem related to a neurodevelopmental disorder but actually are not. This also corresponds to untangling the genetic architectures of comorbid disorders. One example that DeepND pinpoints is ZBTB20. Although ZBTB20 is not located within investigated CNV regions, it is regarded as an important risk gene for ASD. It is a CHD8 target and is listed as an E1 gene for ASD with TADA . DAWN and Krishnan et al. rank it as ∼500th ASD risk gene. However, DeepND consistently ranks it in the last decile with 0.252 probability of being an ASD risk gene. DeepND predicts that there is somewhat a higher chance for ZBTB20 to be a candidate risk gene for ID, though it is still in the 5th decile. This gene acts as a transcriptional repressor; it plays a role in various processes such as postnatal growth. Its shown relation to Primrose syndrome, which is specifically characterized by ID rather than other symptoms of ASD,, also suggests that ZBTB20 has a closer relation to ID. Yet, more mutational evidence is required for a more concrete assessment. Finally, DeepND ranks LMTK2 as the 2nd and 7th highest risk gene for ASD and ID, respectively, while other algorithms do not even list it in the top 1,000 (Table S1). It has very low TADA Q values for both disorders. However, it is a target of CHD8 and FMRP and is also a post-synaptic gene. It is a gene that plays a role in nerve growth factor (NGF)-TrkA signaling and in spermatogenesis. With this background in the literature, we think this could be a novel new candidate for both disorders identified by DeepND.

Generalization of DeepND’s performance for other diseases and disorders

We investigated whether DeepND generalizes to conditions other than ASD and ID. We investigate whether gene prediction performance increases in the multitask setting. That is, for the disease/disorder pairs mentioned below, we compared the genome-wide gene risk assessment performance of DeepND (1) in the single-task setting (DeepND-ST—separately for each disease/disorder type) and (2) in the multitask setting (DeepND). First, we co-analyzed ID and attention-deficit/hyperactivity disorder (ADHD). As stated by Faraone et al., in contrast to the rich literature on familial co-transmission of ADHD and several other disorders, such is lacking for ID and ADHD. Only a few studies suggest comorbidity: an increased risk of ADHD for people with ID, and lower IQs in people with ADHD.76, 77, 78 We use the exact same setup we used to co-analyze ASD and ID for this analysis, as described in the experimental procedures. The sole difference is the positive ground-truth risk genes of ADHD, which is compiled from the following review. We find 32 × 101 genes (see Table S6). We observe that in the multitask setting, the median AUC and AUPR values increase by 15% and 0.2% for ADHD, respectively. While the median AUPR for ID increases substantially by 17%, the median AUC slightly decreases by 5%. Thus, we see improvement in three out of four metrics (see Table S7). The consistent improvements in ADHD results are important, as the number of known risk genes for ADHD is relatively small. DeepND bringing the median AUC value from close to random 0.49 up to 0.64 is a leap forward in gene risk assessment of ADHD. Second, we investigated whether DeepND can be useful outside the domain of neurodevelopmental disorders. We co-analyzed breast and ovarian cancers. In addition to the risk created by the environmental factors, both breast and ovarian cancers are known to be hereditary with a large number of common risk genes/factors. The positive ground-truth gene sets for each cancer are curated from the following sources.80, 81, 82 We ended up with 142 and 58 positive risk genes (E1) for breast and ovarian cancers, respectively. We selected 300 random genes to generate a common negative risk gene set. The lists of these genes are given in Table S5. We used the following PPI networks: Human Protein Reference Database (HPRD), BioGRID, and DifferentialNet. All other settings (e.g., early stop condition, features, learning rates) are same as in the ASD-ID analysis (see learning and the cross-validation setting of DeepND). We observed that the results of DeepND improved the median DeepND-ST performance by ∼20% in terms of AUC and ∼12% in terms of AUPR for both cancer types, as shown in Table S6’s AUC and AUPR distributions. See Table S5 for gene-wise risk scores. These results show that DeepND can work with diseases/disorders other than neurodevelopmental disorders. It was able to capture the information from the overlapping genetic risk architecture and improve the gene risk prediction for both cancers.

Discussion

Neurodevelopmental disorders have been challenging geneticists and neuroscientists for decades with complex genetic architectures harboring hundreds of risk genes. Tracing inherited rare and de novo variation burdens has been the main driver of risk-gene discovery. However, overlapping genetic components and confounding clinical phenotypes make it hard to pinpoint disorder-specific susceptible genes and to understand differences. For instance, Satterstrom et al. pinpoint 102 ASD risk genes with FDR <10% with the largest ASD cohort to date covering nearly 6,500 trios. Yet, they still need to manually segregate these risk genes into two groups: (1) 53 ASD predominant risk genes that are distributed across a spectrum of ASD phenotypes and (2) 49 neurodevelopmental delay risk genes causing impaired cognitive, social, and motor skills. Thus, comorbidity is a further obstacle to be reckoned with in addition to identifying individual susceptible genes. Nevertheless, the shared risk genes and biological pathways offer opportunities for computational risk-assessment methods that were not explored before. So far, only disorder-specific analyses were possible by design of the network-based gene-discovery algorithms. These are limited in power due to distinct datasets that lead to limited cohort sizes. Here, we proposed a novel approach that can co-analyze comorbid neurodevelopmental disorders for gene risk assessment. The method is able to leverage the shared information using a multitask learning paradigm for the first time for this task. DeepND learns both a shared and a disorder-specific set of weights to calculate the genome-wide risk for each disorder. DeepND is a multitask deep learner that uses state-of-the-art techniques such as graph convolution and mixture of experts to learn non-linear relationships of genes on 52 brain co-expression networks. The model is also interpretable, as it is able to learn which neurodevelopmental windows (i.e., networks) provide more information for distinguishing high-risk genes and thus are important for understanding disease etiology. Our benchmarks show these techniques enable DeepND to outperform existing gene-discovery methods. We focus on ASD and ID in this study and identify similarities such as shared affected pathways and neurodevelopmental windows and differences such as regulatory relationships and novel risk genes. We think the findings in this paper will help guiding neuroscientists researching ASD and ID in prioritizing downstream functional studies. DeepND is not an algorithm specific to these disorders, though. It can easily be extended to consider other comorbid disorders such as schizophrenia and epilepsy by adding similar single-task layers for each disorder. It can also be used for other comorbid disorders that are not related to neurodevelopment at all. We demonstrated the advantage of being able to employ multiple co-expression networks and pointed to cases where, for instance, DAWN was not able to capture relationships between genes as they are limited with a single co-expression network. For the clarity of discussion and interpretability, we focused on only networks produced from the BrainSpan dataset. However, DeepND can also employ any combination of other types of gene-interaction networks such as protein-interaction networks, as also explained in generalization of DeepND’s performance for other diseases and disorders. While the BrainSpan dataset and co-expression network-based analyses have been frequently used to understand the relationships between genes in the context of ASD,33, 34, 35, 36,,,,,85, 86, 87 the dataset has several limitations: it contains post-mortem tissue samples from distinct individuals of different ages. The time intervals we use to generate the co-expression networks are not equal, and there is an inherent sample variability. Moreover, these samples belong to neurotypical individuals and do not convey the biology (i.e., gene relationships) of an individual with autism. These factors potentially limit the prediction power of DeepND. Yet, as of today, BrainSpan is one of the most useful resources with which to base our analyses. One alternative to BrainSpan is the BrainVar dataset. This dataset also contains healthy samples from various time points in the neurodevelopment. Yet, it contains only samples from the dorsolateral prefrontal cortex region. We foresee that with the ongoing interest in single-cell RNA sequencing (RNA-seq) studies, we are going to have access to better models of gene interactions during neurodevelopment with autism and other comorbid disorders in the near future.

Experimental procedures

Resource availability

Lead contact

Further information and requests for resources should be directed to the lead contact, A. Ercument Cicek (cicek@cs.bilkent.edu.tr).

Materials availability

This study did not generate any new unique materials.

Ground-truth risk-gene sets

Our genome-wide gene risk-prediction algorithm is based on a semi-supervised learning framework in which some of the samples (i.e., genes) are labeled (as ASD/ID risk gene or not) and are used to learn a ranking for the unlabeled samples. We obtain the labels for ASD from SFARI dataset. These genes are classified into three evidence levels indicating the quality of the evidence (E1–E3, E1 indicating the highest risk). The list contains 185 E1 genes (from SFARI category I), 200 E2 genes (from SFARI category II), and 470 E3 genes (from SFARI category III) as positively labeled ASD risk genes. We also obtain 893 non-mental-health-related genes from OMIM and Krishnan et al. as negatively labeled genes. The list and corresponding evidence levels are listed in Table S1. In the loss calculations during training of the model, genes in these categories are assigned the following weights: E1 (1.0), E2 (0.50), E3/E4 (0.25) and negative (1.0) genes. The performance is evaluated only on E1 genes for the sake of compatibility with other methods. For ID, we rely on review studies from the literature that provide in-depth analyses and lists of ID risk genes. We considered the known-gene lists from 2 landmark review studies as our base ground-truth ID risk-gene list., We divide this set into 3 parts (i.e., E1, E2, and E3) with respect to evidence obtained from 4 other studies.48, 49, 50, 51 The genes that are indicated by 5 or more studies are assigned to the highest-risk class, E1, and the genes that are indicated by 4 studies are assigned to the second-highest-risk class, E2. The remaining genes are assigned to the third risk-gene class, E3. See Table S2 for a detailed breakdown of evidence for each gene. We use the same set of negative genes as ASD, which are non-mental-health-related genes. Consequently, we have 123 × 101, 224 × 102, and 232 × 103 genes and 893 negative genes for ID. See Table S1 for a complete list of ground-truth genes for both disorders. The weights we use per gene class are as follows: E1 (1.0), E2 (0.50), E3 (0.25), and negative (1.0) genes. For the sake of consistency, we also report the performance of the model on E1 genes for ID.

Gene risk features

The only gene risk feature we use is the gene pLI, for both disorders. See Table S1 for the pLI of each gene obtained from Satterstrom et al. Note that neurodevelopmental risk genes are known to be conserved and that pLI has been used to quantify autism risk in the literature. Yet, the ground-truth labels described in ground-truth risk-gene sets are assigned independently and have no ties to this information.

Gene co-expression networks

We used the BrainSpan dataset of the Allen Brain Atlas, in order to model gene interactions through neurodevelopment and generated a spatiotemporal system of gene co-expression networks. This dataset contains 57 post-mortem brains (16 regions) that span 15 consecutive neurodevelopmental periods from 8 post-conception weeks to 40 years. To partition the dataset into developmental periods and clusters of brain regions, we follow the practice in Willsey et al. and refer the reader to this paper for details. Note that this is the same grouping also employed by Norman and Cicek and Liu et al., Brain regions were hierarchically clustered according to their transcriptional similarity, and four clusters were obtained (Figure 3B): (1) V1C-STC, (2) PFC-MSC, (3) SHA, and (4) MDCBC. In the temporal dimension, 13 neurodevelopmental windows (Figure 3A) were obtained using a sliding-window approach (i.e., [1-3], [3-5], , [13-15]). A spatiotemporal window of neurodevelopment and its corresponding co-expression network are denoted by the abbreviation for its brain-region cluster followed by the time window of interest, e.g. “PFC-MSC(1–3)” represents interactions among genes in the region PFC-MSC during the time interval [1-3]. Using the above-mentioned partitioning, we obtained 52 spatiotemporal gene co-expression networks, each of which contain 25,825 nodes representing genes. An undirected unweighted edge between two nodes is created if their absolute Pearson correlation coefficient is greater than or equal to 0.8 in the related partition of BrainSpan data. The reason for picking 0.8 as our threshold is as follows: we observed that, overall, a lower threshold results in an AUC and AUPR performance increase, and we found out that 0.7 and 0.8 had very close median performances and that 0.8 was the smallest threshold that let us fit all 52 gene co-expression networks into the eight graphics processing units (GPUs) we used (total memory of 114 GB). Instead of using binary co-expression networks, we also tested using weighted co-expression networks where we use the co-expression values as the edge weights. We found that binary edges provide slightly better median performance and opted to use binary co-expression networks.

DeepND model

Problem formulation

Each 52 co-expression network j described in gene co-expression networks is represented as a graph , where the vertex set contains genes in the human genome and denotes the binary adjacency matrix. Note that n = 25,825. Let be the feature matrix for disorder D where each row is a list of d features associated with gene (and node) . In our application, is a one-hot-encoded vector, uniquely representing the gene i. We also use pLI as a feature that represents how constrained the gene i is (i.e., ) an . Let be an l dimensional vector, where [i] = 1 if the the node i is a risk gene for ASD, and [i] = 0 if the gene is non-mental-health related, , l < n. Note that, in this semi-supervised learning task, only the first l genes out of n have labels. is defined similarly for ID, using its ground-truth risk and non-risk gene sets. The goal of the algorithm is to learn a function . denotes , and denotes .

GCN model

CNNs have revolutionized the computer-vision field by significantly improving the state of the art by extracting local patterns on grid-structured data. Applying the same principle on arbitrarily structured graph data has also been a success.89, 90, 91 While all these spectral approaches have proven useful, they are computationally expensive. Kipf and Welling have proposed an approach (GCN) to approximate the filters as a Chebyshev expansion of the graph Laplacian and let them operate on the 1-hop neighborhood of each node. This fast and scalable approach extracts a network-adjusted feature vector for each node (i.e., embedding) that incorporates high-dimensional information about each node’s neighborhood in the network. The convolution operation of DeepND is based on this method. Given a gene co-expression network , GCN inputs the normalized adjacency matrix with self loops (i.e., ) and the feature vector , for gene i and for disorder D. d = 1 in this application and is shared among ASD and ID. Then, the first layer embedding produced by GCN is computed using the following propagation rule: = , where is the weight matrix at the input layer to be learned, is the rectified linear unit function, and is the normalized version of a diagonal matrix where . We pick = 4 in this application. Each subsequent layer k is defined similarly as follows: = . Thus, the output of a k-layered GCN, for gene i on co-expression network j, is denoted as follows: = . In this application, we use two GCN layers and . The final layer is softmax to produce probabilities for the positive class. That is, the output of a GCN model j, for a gene i, is = .

Mixture-of-experts model

Mixture-of-experts (MoE) model is an ensemble machine-learning approach that aims to find out informative models (experts) among a collection. Specifically, MoE inputs the features and assigns weights to the outputs of the experts. In DeepND architecture, individual experts are the 52 GCNs that operate on 52 gene co-expression networks, as explained above (). For every gene i, MoE also inputs . It produces a weight of length 52, which is passed through a softmax layer (i.e., . The outputs of the GCNs are weighted by this network, and the weighted sum is used to produce a risk probability for every gene i using softmax. That is, . The GCNs produce 52 values (one per co-expression network), which are concatenated to produce . Finally, the following dot product is used to produce the risk probability of gene i for disorder D: .

Multitask learning model

The above-mentioned GCN and MoE cascade inputs , , and the co-expression networks along with labels for a single disorder to predict the risk probability for every gene. Thus, it corresponds to the single-task version of DeepND (i.e., DeepND-ST.) On the other hand, DeepND is designed to work concurrently with multiple disorders (i.e., ASD and ID.) DeepND employs one DeepND-ST cascade per disorder and puts a multilayer perceptron (MLP) as a precursor to two DeepND-ST subnetworks. The weights learned on these subnetworks are only affected by the back-propagated loss of the corresponding disorder, and hence, these are single-task parts of the architecture. On the contrary, the weights learned on the MLP part are affected by the loss back propagated from both subnetworks. Thus, this part corresponds to the multitask component of the DeepND architecture. The MLP layer inputs and passes it through a fully connected layer followed by Leaky ReLU activation (negative slope = -1.5) to learn a weight matrix and output a dimensional embedding to be input DeepND-ST instead of . That is, .

Learning and the cross-validation setting of DeepND

To evaluate both DeepND-ST and DeepND approaches, we use a 5-fold cross-validation scheme. All labeled genes are uniformly and randomly distributed to the folds. At each training iteration, we leave one fold for validation and one fold for testing. We train the model on the remaining three folds of the labeled genes. For all the genes in the left-out fold, their feature vectors are nullified, and their labels are masked and not included in the training loss calculation when input to training in order to prevent information leakage. The model is trained up to 1,000 epochs with early stop, which is determined with respect to the loss calculated on the validation fold using only E1 genes as the positives and all negative genes. Once the model converges, the test performance is reported on the test fold in the same manner as in the validation fold. The model uses cross-entropy loss (evidence weighted) and ADAM optimizer for updating the weights, which are initialized using Xavier initialization. DeepND-ST uses a fixed learning rate of 1 × 10-4 for both disorders. DeepND uses the learning rates of 7 × 10-4, 1 × 10-4, and 1 × 10-3 for the shared layer, ASD single-task layer, and ID single-task layer, respectively. To ensure proper convergence of DeepND, should one of the single-task subnetworks converge, the learning rate of that subnetwork and the shared layer are cut down twenty folds. This lets the yet-underfit subnetwork keep learning until early stop or the epoch limit. The above-mentioned procedure produces four results as for every left-out test fold as all remaining 4 folds are used as validation. We repeat this process 10 times with random initialization to obtain a total of 200 results for performance comparison (Figure 2). For ASD, positively labeled genes’ weights equal 1.00 (E1), 0.50 (E2), 0.25 (E3), and 0.25 (E4). For ID, positively labeled genes’ weights equal 1.00 (E1) and 0.50 (E2). For both disorders, negatively labeled genes have a weight of 1.00. Note that this procedure is in line with the setting of Krishnan et al. for fairness of comparison.

Genome-wide risk prediction for ASD and ID and comparison with other methods

We compare the performance of DeepND-ST and DeepND with other state-of-the-art network-based neurodevelopmental disorder gene risk-assessment methods from the literature that output a genome-wide risk ranking: Krishnan et al., DAWN, DAMAGES score, random-forest-based method of Duda et al., and forecASD. We run Krishnan et al.’s approach as described in their manuscript. This is an evidence-weighted SVM classifier that identifies risk genes based on similarity of network features. They use a human brain-specific functional interaction network to generate features as input. Note that the ground-truth gene set is the same as ours as well as the evidence weights for ASD. We perform a 5-fold cross-validation. That is, for each iteration, we train their SVM model on 80% of the labeled genes and evaluate the model on E1 genes and all negative genes in the left-out 20% of the labeled genes as suggested. We repeat this procedure 10 times. We post-process SVM outputs to produce risk probabilities using isotonic regression, which ensures that the gene ranking is preserved. We use the pLI value of each gene as the dependent variable. In a 10-fold cross-validation setting, we detect knots on the left-out fold and fit another isotonic-regression line to interpolate the knots. We use SVM output for all genes to produce gene risk-probability values for the corresponding disorder and produce the genome-wide risk ranking. We compare this method and DeepND-ST/DeepND with respect to (1) the AUC and AUPR distributions calculated on the left-out fold at each cross-validation iteration (Figures 2A and 2B) and (2) the Matthews correlation coefficient of the final predictions with the ground-truth (using a rank percentage threshold; Figures 2C and 2Ciii) PR curves (Figure 2D). DAWN is a hidden Markov-random-field-based approach that assigns a posterior, network-adjusted disorder risk score to every gene based on the guilt-by-association principle. It inputs transmission and de novo association analysis (TADA) p values as prior features along with a partial co-expression network to assess connectivity. We input the TADA p values to DAWN., The method also uses partial co-expression networks. We use two networks that the authors suggest as the most useful for this task in their manuscript. These represent prefrontal cortex/mid-fetal period (i.e., PFC-MSC 3–5 and PFC-MSC 4–6). We generate these networks using the RNA-seq data in the BrainSpan dataset. Note that DeepND utilizes the same dataset and uses these networks and 50 others. Instead of partial co-expression networks, DeepND uses co-expression networks. The DAMAGES score is a principal-component-analysis-based technique that assess the risk of genes based on (1) the similarity of their expression profiles in 24 specific mouse CNS cell types, (2) the enrichment of mutations in cases as opposed to controls, and (3) the pLI score of the gene. We directly obtained the risk ranking from Shen and Zhang. We also benchmark the random-forest-based method presented by Duda et al. The method inputs a combination of (1) gene-expression data from several microarray studies, (2) PPIs from multiple databases, (3) a quantitative physical interaction (PI) score for all protein isoform pairs in mouse, and (4) phenotype annotations from the Mouse Genome Informatics (MGI) database. They combine these modalities by a Bayesian network, and the final probabilities of functional interactions between gene pairs are represented with a matrix. Since this method does not support evidence weights for the ground-truth set, in the experimental setting, we used our ground-truth gene set for ASD after discarding the evidence information to obtain results. We compare all above mentioned methods with respect to (1) the Matthews correlation coefficient of the final predictions with the ground-truth (using a rank percentage threshold; Figures 2C and 2Cii) PR curves (Figure 2D). Apart from the previously mentioned methods, we also consider forecASD, which is a random-forest-based ensemble method. It inputs features derived from (1) BrainSpan transcriptome, (2) the STRING PPI network, and (3) the outputs of several previously published ASD gene prediction methods: Krishnan et al., DAWN, DAMAGES score, and TADA. These features are combined in a two-layer architecture. In the first layer, each derived feature set is used to train a random-forest branch. Then, in the second step, the results from the first layer are combined and are used to train the final ensemble classifier. We trained the original forecASD architecture with our ground-truth sets for ASD and ID and obtained their results. Then, in a separate run, we also added the final ranking of DeepND as an extra feature and retrained the model. We compared these two results with our final ranking using (1) the Matthews correlation coefficient of the final predictions with the ground-truth (using a rank percentage threshold) and (2) PR curves (Figure S1).

Enrichment analyses

We evaluate the enrichment of DeepND’s ASD and ID gene risk rankings in gene lists, which are known to be enriched in disorder risk genes in the literature. These lists are (1) targets of transcription regulators like CHD8, FMRP,, RBFOX1,, and TOP1, (2) susceptibility pathways like WNT signaling and MAPK signaling, and (3) affected biological processes and molecular functions like post-synaptic density complex, and histone modification., We use the binomial test to determine whether the top decile in the corresponding ranking significantly deviates from uniform enrichment (Figure 5). We perform a Gene Ontology term enrichment analysis of the top-percentile predictions using the Enrichr system, with respect to Biological Process and Molecular Function terms. As both analyses point to transcription-factor regulation, we investigate the connectivity of the high-risk genes in the ChEA database, which is a large repository that lists experimentally validated transcriptional regulation relationships in various organisms.

Spatiotemporal network analyses

We investigate which GCNs (i.e., neurodevelopmental windows) are better predictors of gene risk for each disorder. For each gene i, the average risk probability assigned by the corresponding GCN is calculated over all iterations such that i is in the test fold. The average values for top-percentile genes for each disorder are shown in Figure 3. To check whether obtained average risk probabilities per neurodevelopmental window that are assigned by that GCN are empirically significant, we performed a permutation test. First, we randomly redistribute the labels of the ground-truth genes while keeping the counts same. We obtain 100 randomized ground-truth sets. Keeping all other settings same, we train 100 DeepND models and obtain 100 ASD and 100 ID risk rankings. Using these rankings, we calculate the average risk probability assigned by each GCN and obtain two background distributions for each GCN: one for ASD and one for ID (Figures S4 and S5). We then compare the original average probability assigned by that GCN with the corresponding background distribution to assess significance.

Experimental setup

All DeepND models are trained and tested on a Super-Micro Super-Server 4029GP-TRT with 2 Intel Xeon Gold 6140 Processors (2.3 GHz, 24.75 M cache), 251 GB RAM, 6 NVIDIA GeForce RTX 2080 Ti (11 GB, 352 bit), and 2 NVIDIA TITAN RTX GPUs (24 GB, 384 bit). For DeepND-ST, we used 32,080 RTX and 1 TITAN RTX cards. For DeepND, we used 5 RTX 2080 and one TITAN RTX cards. The running time of DeepND depends on the number, size (number of nodes), and density (number of edges) of the networks used. Note that the number of input genes and the features are constant for each condition so that it is independent of the considered disease. When we used the 3 PPI networks to co-analyze breast and ovarian cancers, DeepND took approximately 3 h. When we used the 52 co-expression networks, it took approximately 16 h.

89 in total

1. Mutations in ZBTB20 cause Primrose syndrome.

Authors: Viviana Cordeddu; Bert Redeker; Emilia Stellacci; Aldo Jongejan; Alessandra Fragale; Ted E J Bradley; Massimiliano Anselmi; Andrea Ciolfi; Serena Cecchetti; Valentina Muto; Laura Bernardini; Meron Azage; Daniel R Carvalho; Alberto J Espay; Alison Male; Anna-Maja Molin; Renata Posmyk; Carla Battisti; Alberto Casertano; Daniela Melis; Antoine van Kampen; Frank Baas; Marcel M Mannens; Gianfranco Bocchinfuso; Lorenzo Stella; Marco Tartaglia; Raoul C Hennekam
Journal: Nat Genet Date: 2014-07-13 Impact factor: 38.330

2. A genome-wide association study of autism reveals a common novel risk locus at 5p14.1.

Authors: Deqiong Ma; Daria Salyakina; James M Jaworski; Ioanna Konidari; Patrice L Whitehead; Ashley N Andersen; Joshua D Hoffman; Susan H Slifer; Dale J Hedges; Holly N Cukier; Anthony J Griswold; Jacob L McCauley; Gary W Beecham; Harry H Wright; Ruth K Abramson; Eden R Martin; John P Hussman; John R Gilbert; Michael L Cuccaro; Jonathan L Haines; Margaret A Pericak-Vance
Journal: Ann Hum Genet Date: 2009-05 Impact factor: 1.670

3. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders.

Authors: Brian J O'Roak; Laura Vives; Wenqing Fu; Jarrett D Egertson; Ian B Stanaway; Ian G Phelps; Gemma Carvill; Akash Kumar; Choli Lee; Katy Ankenman; Jeff Munson; Joseph B Hiatt; Emily H Turner; Roie Levy; Diana R O'Day; Niklas Krumm; Bradley P Coe; Beth K Martin; Elhanan Borenstein; Deborah A Nickerson; Heather C Mefford; Dan Doherty; Joshua M Akey; Raphael Bernier; Evan E Eichler; Jay Shendure
Journal: Science Date: 2012-11-15 Impact factor: 47.728

4. Patterns and rates of exonic de novo mutations in autism spectrum disorders.

Authors: Benjamin M Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Paz Polak; Seungtai Yoon; Jared Maguire; Emily L Crawford; Nicholas G Campbell; Evan T Geller; Otto Valladares; Chad Schafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna Muzny; Jeffrey G Reid; Irene Newsham; Yuanqing Wu; Lora Lewis; Yi Han; Benjamin F Voight; Elaine Lim; Elizabeth Rossin; Andrew Kirby; Jason Flannick; Menachem Fromer; Khalid Shakir; Tim Fennell; Kiran Garimella; Eric Banks; Ryan Poplin; Stacey Gabriel; Mark DePristo; Jack R Wimbish; Braden E Boone; Shawn E Levy; Catalina Betancur; Shamil Sunyaev; Eric Boerwinkle; Joseph D Buxbaum; Edwin H Cook; Bernie Devlin; Richard A Gibbs; Kathryn Roeder; Gerard D Schellenberg; James S Sutcliffe; Mark J Daly
Journal: Nature Date: 2012-04-04 Impact factor: 49.962

5. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations.

Authors: Brian J O'Roak; Laura Vives; Santhosh Girirajan; Emre Karakoc; Niklas Krumm; Bradley P Coe; Roie Levy; Arthur Ko; Choli Lee; Joshua D Smith; Emily H Turner; Ian B Stanaway; Benjamin Vernot; Maika Malig; Carl Baker; Beau Reilly; Joshua M Akey; Elhanan Borenstein; Mark J Rieder; Deborah A Nickerson; Raphael Bernier; Jay Shendure; Evan E Eichler
Journal: Nature Date: 2012-04-04 Impact factor: 49.962

6. Characterization of the proteome, diseases and evolution of the human postsynaptic density.

Authors: Alex Bayés; Louie N van de Lagemaat; Mark O Collins; Mike D R Croning; Ian R Whittle; Jyoti S Choudhary; Seth G N Grant
Journal: Nat Neurosci Date: 2010-12-19 Impact factor: 24.884

7. DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics.

Authors: Li Liu; Jing Lei; Stephan J Sanders; Arthur Jeremy Willsey; Yan Kou; Abdullah Ercument Cicek; Lambertus Klei; Cong Lu; Xin He; Mingfeng Li; Rebecca A Muhle; Avi Ma'ayan; James P Noonan; Nenad Sestan; Kathryn A McFadden; Matthew W State; Joseph D Buxbaum; Bernie Devlin; Kathryn Roeder
Journal: Mol Autism Date: 2014-03-06 Impact factor: 7.509

Review 8. Advances in understanding - genetic basis of intellectual disability.

Authors: Pietro Chiurazzi; Filomena Pirozzi
Journal: F1000Res Date: 2016-04-07

9. Individual common variants exert weak effects on the risk for autism spectrum disorders.

Authors: Richard Anney; Lambertus Klei; Dalila Pinto; Joana Almeida; Elena Bacchelli; Gillian Baird; Nadia Bolshakova; Sven Bölte; Patrick F Bolton; Thomas Bourgeron; Sean Brennan; Jessica Brian; Jillian Casey; Judith Conroy; Catarina Correia; Christina Corsello; Emily L Crawford; Maretha de Jonge; Richard Delorme; Eftichia Duketis; Frederico Duque; Annette Estes; Penny Farrar; Bridget A Fernandez; Susan E Folstein; Eric Fombonne; John Gilbert; Christopher Gillberg; Joseph T Glessner; Andrew Green; Jonathan Green; Stephen J Guter; Elizabeth A Heron; Richard Holt; Jennifer L Howe; Gillian Hughes; Vanessa Hus; Roberta Igliozzi; Suma Jacob; Graham P Kenny; Cecilia Kim; Alexander Kolevzon; Vlad Kustanovich; Clara M Lajonchere; Janine A Lamb; Miriam Law-Smith; Marion Leboyer; Ann Le Couteur; Bennett L Leventhal; Xiao-Qing Liu; Frances Lombard; Catherine Lord; Linda Lotspeich; Sabata C Lund; Tiago R Magalhaes; Carine Mantoulan; Christopher J McDougle; Nadine M Melhem; Alison Merikangas; Nancy J Minshew; Ghazala K Mirza; Jeff Munson; Carolyn Noakes; Gudrun Nygren; Katerina Papanikolaou; Alistair T Pagnamenta; Barbara Parrini; Tara Paton; Andrew Pickles; David J Posey; Fritz Poustka; Jiannis Ragoussis; Regina Regan; Wendy Roberts; Kathryn Roeder; Bernadette Roge; Michael L Rutter; Sabine Schlitt; Naisha Shah; Val C Sheffield; Latha Soorya; Inês Sousa; Vera Stoppioni; Nuala Sykes; Raffaella Tancredi; Ann P Thompson; Susanne Thomson; Ana Tryfon; John Tsiantis; Herman Van Engeland; John B Vincent; Fred Volkmar; J A S Vorstman; Simon Wallace; Kirsty Wing; Kerstin Wittemeyer; Shawn Wood; Danielle Zurawiecki; Lonnie Zwaigenbaum; Anthony J Bailey; Agatino Battaglia; Rita M Cantor; Hilary Coon; Michael L Cuccaro; Geraldine Dawson; Sean Ennis; Christine M Freitag; Daniel H Geschwind; Jonathan L Haines; Sabine M Klauck; William M McMahon; Elena Maestrini; Judith Miller; Anthony P Monaco; Stanley F Nelson; John I Nurnberger; Guiomar Oliveira; Jeremy R Parr; Margaret A Pericak-Vance; Joseph Piven; Gerard D Schellenberg; Stephen W Scherer; Astrid M Vicente; Thomas H Wassink; Ellen M Wijsman; Catalina Betancur; Joseph D Buxbaum; Edwin H Cook; Louise Gallagher; Michael Gill; Joachim Hallmayer; Andrew D Paterson; James S Sutcliffe; Peter Szatmari; Veronica J Vieland; Hakon Hakonarson; Bernie Devlin
Journal: Hum Mol Genet Date: 2012-07-26 Impact factor: 6.150

10. Understanding multicellular function and disease with human tissue-specific networks.

Authors: Casey S Greene; Arjun Krishnan; Aaron K Wong; Emanuela Ricciotti; Rene A Zelaya; Daniel S Himmelstein; Ran Zhang; Boris M Hartmann; Elena Zaslavsky; Stuart C Sealfon; Daniel I Chasman; Garret A FitzGerald; Kara Dolinski; Tilo Grosser; Olga G Troyanskaya
Journal: Nat Genet Date: 2015-04-27 Impact factor: 38.330

1 in total

1. Prioritizing de novo autism risk variants with calibrated gene- and variant-scoring models.

Authors: Yuxiang Jiang; Jorge Urresti; Kymberleigh A Pagel; Akula Bala Pramod; Lilia M Iakoucheva; Predrag Radivojac
Journal: Hum Genet Date: 2021-09-22 Impact factor: 5.881

1 in total