Literature DB >> 33057194

Evidence for 28 genetic disorders discovered by combining healthcare and research data.

Joanna Kaplanis¹, Kaitlin E Samocha¹, Laurens Wiel^2,3, Zhancheng Zhang⁴, Kevin J Arvai⁴, Ruth Y Eberhardt¹, Giuseppe Gallone¹, Stefan H Lelieveld², Hilary C Martin¹, Jeremy F McRae¹, Patrick J Short¹, Rebecca I Torene⁴, Elke de Boer⁵, Petr Danecek¹, Eugene J Gardner¹, Ni Huang¹, Jenny Lord^1,6, Iñigo Martincorena¹, Rolph Pfundt⁵, Margot R F Reijnders^2,7, Alison Yeung^8,9, Helger G Yntema⁵, Lisenka E L M Vissers⁵, Jane Juusola⁴, Caroline F Wright¹⁰, Han G Brunner^5,7,11,12, Helen V Firth^1,13, David R FitzPatrick¹⁴, Jeffrey C Barrett¹, Matthew E Hurles¹⁵, Christian Gilissen², Kyle Retterer⁴.

Abstract

De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.

Entities: Chemical

Mesh：

Year: 2020 PMID： 33057194 PMCID： PMC7116826 DOI： 10.1038/s41586-020-2832-5

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Introduction

It has previously been estimated that ~42-48% of patients with a severe developmental disorder (DD) have a pathogenic de novo mutation (DNM) in a protein coding gene[1,2]. However, most of these patients remain undiagnosed despite the identification of hundreds of DD-associated genes. This implies that there are more DD relevant genes to find. Existing methods to detect gene-specific enrichments of damaging DNMs ignore much prior information about which variants are more likely to be disease-associated; missense variants and protein-truncating variants (PTVs) vary in their impact on protein function[3-6]. Known dominant DD-associated genes are strongly enriched in the minority of genes that exhibit strong selective constraint on heterozygous PTVs[7]. To identify additional DD-associated genes, we need to increase our power to detect gene-specific enrichments for damaging DNMs by both increasing sample sizes and improving our statistical methods. In previous studies of pathogenic Copy Number Variation, utilising healthcare data has been key to achieve larger sample sizes than would be possible in a research setting alone[8,9].

Identification of 285 DD-associated genes

Following clear consent practices and only using aggregate, de-identified data, we pooled DNMs in DD patients from three centres: GeneDx (a US-based diagnostic testing company), the Deciphering Developmental Disorders study, and Radboud University Medical Center. We performed stringent quality control on variants and samples to obtain 45,221 coding and splicing DNMs in 31,058 individuals (Supplementary Fig. 1; Supplementary Table 1), including data on 24,348 trios not previously published. These DNMs included 40,992 single nucleotide variants (SNVs) and 4,229 indels. The three cohorts have similar clinical characteristics, male/female ratios, enrichments of DNMs by mutational class, and prevalences of known disorders (Supplementary Fig. 2). To detect gene-specific enrichments of damaging DNMs, we developed a method named DeNovoWEST (De Novo Weighted Enrichment Simulation Test, https://github.com/queenjobo/DeNovoWEST). DeNovoWEST scores all classes of sequence variants on a unified severity scale based on empirically-estimated positive predictive values of being pathogenic (Supplementary Fig. 3–4). We perform two tests per gene: an enrichment test on all nonsynonymous DNMs and a test designed to detect genes likely acting via an altered-function mechanism, which combines a missense enrichment test with a missense clustering test. We then applied a Bonferroni multiple testing correction accounting for the number of genes (n=18,762) and two tests per gene. We first applied DeNovoWEST to all individuals in our cohort and identified 281 significantly enriched genes, 18 more than when using our previous method[1] (Supplementary Fig. 5; Fig. 1a). The majority (196/281; 70%) of these significant genes already had sufficient evidence of DD-association to be considered of diagnostic utility (as of late 2019) by all three centres, and we refer to them as “consensus” genes. 54/281 of these significant genes were previously considered diagnostic by one or two centres (“discordant” genes). Applying DeNovoWEST to synonymous DNMs, as a negative control analysis, identified no significantly enriched genes (Supplementary Fig. 6).

Figure 1

Results of DeNovoWEST analysis.

(a) Comparison of p-values using the new method (DeNovoWEST) versus the previous method (mupit)[1], run on the full cohort. Dashed lines indicate the threshold for genome-wide significance (one sided, Bonferroni correction). Point size is proportional to the number of nonsynonymous DNMs in our cohort (nsyn). The number of genes that fall into each quadrant are annotated. (b) The number of missense and PTV DNMs in the novel genes. Point size is proportional to the log10(-p-value) from analysis of the undiagnosed subset. Point colour corresponds to which test p-value was more significant: non-synonymous enrichment test in blue (pEnrich), missense enrichment and clustering test in red (pMEC). (c) The distribution of significant p-values from analysis of the undiagnosed subset for discordant and novel genes; p-values for consensus genes come from the full cohort analysis. The number of genes in each p-value bin is coloured by diagnostic gene group (n = 285 significant genes; one-sided p-values, Bonferroni corrected). Green represents the remaining fraction of cases expected to have a pathogenic de novo coding mutation and grey is the fraction of cases that are likely to be explained by other factors. (d) The fraction of cases (n = 31,058) with a nonsynonymous mutation in each diagnostic gene group. (e) The fraction of cases with a nonsynonymous mutation in each diagnostic gene group split by sex (n = 13,636 female and 17,422 male). In all panels, black, blue and orange represents consensus, discordant and novel genes respectively.

To discover novel DD-associated genes with greater power, we applied DeNovoWEST to DNMs in patients without damaging DNMs in consensus genes (we refer to this subset as ‘undiagnosed’ patients) and identified 94 significant genes (Supplementary Fig. 7; Supplementary Table 2), of which 33 were putative ‘novel’ DD-associated genes. To ensure robustness to potential mutation rate variation between genes, we determined whether any of the putative novel DD-associated genes had significantly more synonymous variants in the Genome Aggregation Database[6] (gnomAD) of population variation than expected under our null mutation model (Supplementary Note). We identified 11/33 genes with a significant excess of synonymous variants. For these 11 genes we repeated the DeNovoWEST test, increasing the null mutation rate by the ratio of observed to expected synonymous variants in gnomAD. Five of these genes fell below our exome-wide significance threshold and were removed, leaving 28 novel genes, with a median of 10 nonsynonymous DNMs (Fig. 1b–c; Supplementary Table 3). There were 314 patients with nonsynonymous DNMs in these 28 genes (1.0% of our cohort); all these DNMs were inspected in IGV[10] and, of 198 for which experimental validation was attempted, all were confirmed as DNMs. The DNMs in these novel genes were distributed randomly across the three datasets (no genes with p < 0.001, heterogeneity test). Six of the 28 novel DD-associated genes are corroborated by OMIM entries or publications, including TFE3 which was described in two recent publications. We also investigated whether some synonymous DNMs might be pathogenic by disrupting splicing. We identified a small but significant enrichment of synonymous DNMs with high values of the splicing pathogenicity score SpliceAI[13] (≥ 0.8, 1.56-fold enriched, p = 0.0037, Poisson test; Supplementary Table 4). This enrichment corresponds to an excess of ~15 splice-disrupting synonymous DNMs in our cohort, of which six are accounted for by a recurrent synonymous DNM in KAT6B known to disrupt splicing[14]. Taken together, 25.0% of our cohort has a nonsynonymous DNM in one of the consensus or significant DD-associated genes (Fig. 1d). We noted significant sex differences in the autosomal burden of nonsynonymous DNMs (Supplementary Fig. 8). The rate of nonsynonymous DNMs in consensus autosomal genes was significantly higher in females than males (OR = 1.16, p = 4.4 × 10-7, Fisher’s exact test; Fig. 1e), as noted previously[1]. However, the exome-wide burden of autosomal nonsynonymous DNMs in all genes was not significantly different between undiagnosed males and females (OR = 1.03, p = 0.29, Fisher’s exact test). This suggests the existence of subtle sex differences in the genetic architecture of DD, especially with regard to known and undiscovered disorders. This could include sex-biased contribution of polygenic, oligogenic and/or environmental modifiers of phenotypic variation and thus clinical ascertainment.

Characteristics novel DD-associated genes

Based on semantic similarity[15] between Human Phenotype Ontology terms, patients with DNMs in the same novel DD-associated gene were less phenotypically similar to each other, on average, than patients with DNMs in a consensus gene (p = 2.3 × 10-11, Wilcoxon rank-sum test; Fig. 2a; Supplementary Figure 9). This suggests that these novel disorders less often result in distinctive and consistent clinical presentations, which may have made these disorders harder to discover via a phenotype-driven approach. Each of these novel disorders requires genotype-phenotype characterisation, which is beyond the scope of this study.

Figure 2

Properties of novel genes.

(a) The phenotypic similarity of patients with DNMs in novel and consensus genes. Random phenotypic similarity was calculated from random pairs of patients. Cases with DNMs in the same novel gene were less phenotypically similar than cases with DNMs in the same consensus gene (p = 2.3 × 10-11, two-sided Wilcoxon rank-sum test). (b) Comparison of properties of consensus (n = 380) and novel (n = 28) DD genes known to be differential between consensus and non-DD genes (95% bootstrapped confidence intervals shown).

Overall, novel DD-associated genes encode proteins that have very similar functional and evolutionary properties to consensus genes (Fig. 2b; Supplementary Table 5). Despite the high-level functional similarity between known and novel DD-associated genes, nonsynonymous DNMs in the more recently discovered DD-associated genes are much more likely to be missense DNMs, and less likely to be PTVs (discordant and novel; p = 1.2 × 10-25, chi-squared test). Fifteen (54%) of the 28 novel genes only had missense DNMs. Consequently, we expect that a greater proportion of the novel genes will act via altered-function mechanisms (e.g. dominant negative or gain-of-function). For example, the novel gene PSMC5 (DeNovoWEST p = 2.6 × 10-15) had one inframe deletion and nine missense DNMs, eight of which altered two structurally important amino acids in the AAA+ ATPase domain, and so is likely to operate via an altered-function mechanism (Supplementary Fig. 10a–b). None of the novel genes exhibited significant clustering of de novo PTVs. We observed that missense DNMs were more likely to affect functional protein domains than other coding regions. We observed a 2.63-fold enrichment (p = 2.2 × 10-68, G-test) of missense DNMs residing in protein domains among consensus genes and a 1.80-fold enrichment (p = 8.0 × 10-5, G-test) in novel DD-associated genes, but no enrichment for synonymous DNMs (Supplementary Table 6). Four protein domain families in consensus genes were enriched for missense DNMs (Supplementary Table 7): ion transport protein (PF00520, p = 6.9 × 10-4, G-test Bonferroni corrected), ligand-gated ion channel (PF00060, p = 4.0 × 10-6), protein kinase domain (PF00069, p = 0.043), and kinesin motor domain (PF00225, p = 0.027). Missense DNMs in all four enriched domain families have previously been associated with DD (Supplementary Table 8)[16]–[18]. We observed a significant overlap between the 285 DNM-enriched DD-associated genes and a set of 369 previously described cancer driver genes[19] (overlap of 70 genes; p = 1.7 × 10-49, logistic regression correcting for shet), as observed previously[20,21], as well as a significant enrichment of nonsynonymous DNMs in both overlapping and non-overlapping cancer genes (Supplementary Table 9). We observe 117 DNMs at 76 recurrent somatic mutations observed in at least three patients in The Cancer Genome Atlas (TCGA)[22]. By modelling the germline mutation rate at these somatic driver mutations, we found that recurrent nonsynonymous mutations in TCGA are enriched 21-fold in our cohort (p < 10-50, Poisson test, Supplementary Fig. 11), whereas recurrent synonymous mutations in TCGA are not significantly enriched (2.4-fold, p = 0.13, Poisson test). This suggests that this observation is driven by the pleiotropic effects of these mutations in development and tumourigenesis, rather than hypermutability.

Recurrent mutations

We identified 773 recurrent DNMs (736 SNVs and 37 indels), observed in 2-36 individuals, which allowed us to interrogate systematically the factors driving recurrent germline mutation. We considered three potential contributory factors: (i) clinical ascertainment enriching for pathogenic mutations, (ii) greater mutability at specific sites, and (iii) positive selection conferring a proliferative advantage in the male germline[23]. We observed evidence that all three factors contribute, but not mutually exclusively. Clinical ascertainment drives the observation that 65% of recurrent DNMs were in consensus genes, a 5.4-fold enrichment compared to DNMs only observed once (p < 10-50, proportion test). Hypermutability underpins the observation that 64% of recurrent de novo SNVs occurred at hypermutable CpG dinucleotides[24], a 2.0-fold enrichment over DNMs only observed once (p = 3.3 × 10-68, chi-square test). To assess the contribution of germline selection to recurrent DNMs, we initially focused on the 12 known germline selection genes, which all operate through activation of the RAS-MAPK signalling pathway[25,26]. We identified 39 recurrent DNMs in 11 of these genes, 38 of which are missense and all of which are known to be activating in the germline (see Supplement). As expected, given that hypermutability is not the driving factor for recurrent mutation in these genes, these 39 recurrent DNMs were depleted for CpGs relative to other recurrent mutations (6/39 vs 425/692, p = 3.4 × 10-8, chi-squared test). Positive germline selection can increase the apparent mutation rate more strongly[23] than either clinical ascertainment (10-100X in our dataset) or hypermutability (~10X for CpGs). However, only a minority of the most highly recurrent mutations in our dataset are in genes that have been previously associated with germline selection. Nonetheless, several lines of evidence suggested that the majority of these most highly recurrent mutations are likely to confer a germline selective advantage. Based on the observations above, DNMs under germline selection should be more likely to be activating missense mutations, and should be less enriched for CpG dinucleotides. Extended Data Table 1 shows the 16 de novo SNVs observed nine or more times in our cohort, only two of which are in known germline selection genes. All but two of these 16 de novo SNVs cause missense changes, all but two of these genes cause disease by an altered-function mechanism, and these DNMs were depleted for CpGs relative to all recurrent mutations. Two of these genes with highly recurrent de novo SNVs, SHOC2 and PPP1CB, encode interacting proteins that regulate the RAS-MAPK pathway, and pathogenic variants in these genes are associated with a Noonan-like syndrome[27]. Moreover, two of these recurrent DNMs are in the same gene SMAD4, which encodes a key component of the TGF-beta signalling pathway, potentially expanding the pathophysiology of germline selection beyond the RAS-MAPK pathway. Confirming germline selection of these mutations will require deep sequencing of testes and/or sperm[26].

Extended Data Table 1

Recurrent Mutations.

De novo single nucleotide variants with more than 9 recurrences in our cohort annotated with relevant information, such as CpG status, whether the impacted gene is a known somatic driver or germline selection gene, and diagnostic gene group (e.g. consensus). “Recur” refers to the number of recurrences. “Likely mechanism” refers to mechanisms attributed to this gene in the published literature.

Symbol	Chr	Position	Ref	Alt	Consequence	Recur	Likely mechanism	CpG	Somatic Driver Gene	Germline Selection Gene	DD status
PACS1	11	65978677	C	T	missense	36	activating	Yes	-	-	consensus
PPP2R5D	6	42975003	G	A	missense	22	dominant negative	-	-	-	consensus
SMAD4	18	48604676	A	G	missense	21	activating	-	Yes	-	consensus
PACS2	14	105834449	G	A	missense	13	dominant negative	Yes	-	-	discordant
MAP2K1	15	66729181	A	G	missense	11	activating	-	Yes	Yes	consensus
PPP1CB	2	28999810	C	G	missense	11	all missense/in frame	-	-	-	consensus
NAA10	X	153197863	G	A	missense	11	all missense/in frame	Yes	-	-	consensus
MECP2	X	153296777	G	A	stop gain	11	loss of function	Yes	-	-	consensus
CSNK2A1	20	472926	T	C	missense	10	activating	-	-	-	consensus
CDK13	7	40085606	A	G	missense	10	all missense/in frame	-	-	-	consensus
SHOC2	10	112724120	A	G	missense	9	activating	-	-	-	consensus
PTPN11	12	112915523	A	G	missense	9	activating	-	Yes	Yes	consensus
SMAD4	18	48604664	C	T	missense	9	activating	Yes	Yes	-	consensus
SRCAP	16	30748664	C	T	stop gain	9	dominant negative	Yes	-	-	consensus
FOXP1	3	71021817	C	T	missense	9	loss of function	Yes	-	-	consensus
CTBP1	4	1206816	G	A	missense	9	dominant negative	Yes	-	-	discordant

Incomplete penetrance and pre/perinatal death

Nonsynonymous DNMs in consensus or significant DD-associated genes accounted for half of the exome-wide nonsynonymous DNM burden associated with DD (Fig. 1b). Despite our identification of 285 significantly DD-associated genes, there remains a substantial burden of both missense and protein-truncating DNMs in unassociated genes (those that are neither significant in our analysis nor on the consensus gene list). This residual burden of protein-truncating DNMs is greatest in genes that are intolerant of PTVs in the general population (Supplementary Fig. 12) suggesting that more haploinsufficient (HI) disorders await discovery. We observed that PTV mutability (estimated from a null germline mutation model) was significantly lower in unassociated genes compared to DD-associated genes (p = 4.5 × 10-68, Wilcox rank-sum test Fig. 3a), which leads to reduced statistical power to detect DNM enrichment in unassociated genes, consistent with our hypothesis that many more HI disorders await discovery.

Figure 3

Factors influencing power.

(a) PTV mutability is significantly lower (p = 4.6 × 10-68, two-sided Wilcox rank sum test) in genes that are not significantly DD-associated (blue) than in DD-associated genes (red). Median depicted with a black horizontal line. (b) Distribution of PTV enrichment in significant, likely haploinsufficient, genes by category (118 consensus, 23 discordant, 8 novel genes). Lower and upper hinges correspond to first and third quantiles. Median depicted by a horizontal grey line. The upper and lower whiskers extend 1.5 times the inter-quartile range. (c) Comparison of PTV enrichment in our cohort vs the PTV to synonymous ratio in gnomAD, for genes that are significantly PTV-enriched in our cohort (without variant weighting; n = 156 genes). PTV enrichment bins labelled with log10(enrichment). Dashed line indicates regression. Confidence intervals are 95% of the rate ratio. (d) Overall PTV enrichment across genes grouped by likelihood of presenting with a structural malformation on prenatal ultrasound (145 low, 65 medium, 6 low genes). PTV enrichment is significantly higher for genes with a low likelihood compared to other genes (p = 4.6 × 10-5, two-sided Poisson test). Poisson 95% confidence intervals shown.

A key parameter in estimating statistical power to detect novel HI disorders is the fold-enrichment of de novo PTVs expected in undiscovered HI disorders. We observed that novel DD-associated HI genes had significantly lower PTV enrichment compared to the consensus HI genes (p = 0.005, Wilcox rank-sum test; Fig. 3b). Two additional factors that could lower DNM enrichment, and thus power to detect a novel DD-association, are reduced penetrance and increased pre/perinatal death (due to spontaneous fetal loss, termination of pregnancy for fetal anomaly, stillbirth, or early neonatal death). To evaluate incomplete penetrance, we investigated whether HI genes with a lower enrichment of de novo PTVs in our cohort are associated with greater prevalences of PTVs in the general population. We observed a significant negative correlation (p = 0.031, weighted linear regression) between PTV enrichment in our cohort and the ratio of PTV to synonymous variants in gnomAD[6], suggesting that incomplete penetrance does lower de novo PTV enrichment in our cohort (Fig. 3c). Additionally, we observed that the fold-enrichment of de novo PTVs in consensus HI DD-associated genes in our cohort was significantly higher for genes with a low likelihood of presenting with a prenatal structural malformation (p = 4.6 × 10-5, Poisson test, Fig. 3d), suggesting that pre/perinatal death decreases our power to detect some novel disorders (see supplement for details).

Hundreds of DD genes not yet discovered

Downsampling of our cohort and repeating enrichment analyses showed that the discovery of DD-associated genes has not plateaued (Extended Data Fig 1a). Increasing sample sizes should result in the discovery of many novel DD-associated genes. To estimate how many haploinsufficient genes might await discovery, we modelled the likelihood of the observed distribution of de novo PTVs among genes as a function of varying numbers of undiscovered HI DD-associated genes and fold-enrichments of de novo PTVs in those genes. We found that the remaining PTV burden is most likely spread across ~1,000 genes with ~10-fold PTV enrichment (Extended Data Fig 1b). This fold enrichment is three times lower than in known HI DD-associated genes, suggesting that incomplete penetrance and/or pre/perinatal death is more prevalent among undiscovered HI genes. We modelled the missense DNM burden separately and also observed that the most likely architecture of undiscovered DD-associated genes is one that comprises over 1,000 genes with a substantially lower fold-enrichment than in currently known DD-associated genes (Supplementary Fig. 13).

Extended Data Figure 1

Exploring the remaining number of DD genes.

(a) Number of significant genes from downsampling full cohort and running DeNovoWEST’s enrichment test. (b) Results from modelling the likelihood of the observed distribution of de novo PTV mutations. This model varies the numbers of remaining haploinsufficient (HI) DD genes and PTV enrichment in those remaining genes. The 50% credible interval is shown in red and the 90% credible interval is shown in orange. Note that the median PTV enrichment in genes that are significant and known to operate via a loss-of-function mechanism (shown with an arrow) is 39.7.

We calculated that a sample size of ~350,000 parent-offspring trios would be needed to have 80% power to detect a 10-fold enrichment of de novo PTVs for an average gene. Using this inferred 10-fold enrichment among undiscovered HI genes, from our current data we can evaluate the likelihood that any gene i is an undiscovered HI gene, by comparing the likelihood of the number of de novo PTVs observed in each gene to have arisen from the null mutation rate or from a 10-fold increased PTV rate. Among the ~19,000 non-DD-associated genes, ~1,200 were more than three times more likely to have arisen from a 10-fold increased PTV rate, whereas ~7,000 were three times more likely to have no de novo PTV enrichment.

Discussion

In this study, we have presented evidence for 28 novel developmental disorders by developing an improved statistical test for mutation enrichment and applying it to a dataset of exome sequences from 31,058 parent-offspring trios. Most of the increased power to detect novel disorders comes from the increase in sample size, rather than the improved statistical test. These 28 novel genes account for 1.0% of our cohort, and their inclusion in diagnostic workflows will catalyse increased diagnosis of similar patients globally. The value of this study for improving diagnostic yield extends beyond these 28 novel genes; the total number of genes added to diagnostic workflows of the three participating centres (including newly validated discordant genes) ranged from 48-65 genes. We have shown that both incomplete penetrance and pre/perinatal death reduce our power to detect novel DDs postnatally, and hypothesise that one or both of these factors are operating more strongly among undiscovered DD-associated genes. In addition, we have identified a set of highly recurrent mutations that are strong candidates for novel germline selection mutations, which should result in a higher than expected disease incidence that increases dramatically with increased paternal age. Our study is approximately three times larger than a recent meta-analysis of DNMs from a collection of individuals with autism spectrum disorder, intellectual disability, and/or a developmental disorder[28]. We identified ~2.3 times as many significantly DD-associated genes as this previous study when using Bonferroni-corrected exome-wide significance (285 vs 124). In contrast to meta-analyses of published DNMs, the harmonised filtering of candidate DNMs across cohorts in this study should be more robust to cohort-specific differences in the sensitivity and specificity of detecting DNMs. We inferred indirectly that developmental disorders with higher rates of detectable prenatal structural abnormalities had greater pre/perinatal death. The potential size of this effect can be quantified from the recently published PAGE study of genetic diagnoses in a cohort of fetal structural abnormalities[29]. In this latter study, genetic diagnoses were not returned to participants during the pregnancy, and so genetic diagnostic information could not influence pre/perinatal death. In the PAGE study data, 69% of fetal abnormalities with a genetically diagnosable cause died perinatally or neonatally. This emphasises the substantial impact that pre/perinatal death can have on reducing the ability to discover novel DDs from postnatal recruitment alone, and motivates the integration of genetic data from prenatal, neonatal and postnatal studies in future studies. To empower our mutation enrichment testing, we estimated positive predictive values (PPV) of each DNM being pathogenic on the basis of their predicted protein consequence, CADD score[3], selective constraint against heterozygous PTVs across the gene (shet)[30], and, for missense variants, presence in a region under selective missense constraint[4]. These PPVs should also be informative for variant prioritisation in the diagnosis of dominant developmental disorders. Further work is needed to see whether these PPVs might be informative for recessive developmental disorders, and in other types of dominant disorders. More generally, we hypothesise that empirically-estimated PPVs based on variant enrichment in large datasets will be similarly informative in many other disease areas. We adopted a conservative statistical approach to identifying DD-associated genes. In two previous studies using the same significance threshold, we identified 26 novel DD-associated genes[1,31]. All 26 are now regarded as being diagnostic, and have entered routine clinical diagnostic practice. Had we used a significance threshold of FDR < 10% as used in Satterstrom, Kosmicki, Wang et al[32], we would have identified 770 DD-associated genes. The FDR of individual genes depends on the significance of other genes being tested, so are not appropriate for assessing the significance of individual genes, but rather for defining gene-sets. There are 184 consensus genes that did not cross our significance threshold in this study. It is likely that many of these cause disorders that were under-represented in our study due to the ease of clinical diagnosis on the basis of distinctive clinical features or targeted diagnostic testing. These ascertainment biases will not impact the representation of novel DDs in our cohort. Our modelling suggested that likely over 1,000 DD-associated genes remain to be discovered, and that reduced penetrance and pre/perinatal death will reduce our power to identify these genes through DNM enrichment. Identifying these genes will require both improved analytical methods and greater sample sizes. As sample sizes increase, accurate modelling of gene-specific mutation rates becomes more important. In our analyses of 31,058 trios, we observed evidence that mutation rate heterogeneity among genes can lead to over-estimating the statistical significance of mutation enrichment based on an exome-wide mutation model. We advocate the development of more granular mutation rate models, based on large-scale population variation resources, that correct for all technical and biological complexities, to ensure that larger studies are robust to mutation rate heterogeneity. We anticipate that the variant-level weights used by DeNovoWEST will improve over time. As reference population samples, such as gnomAD[6], increase in size, weights based on selective constraint metrics (e.g. shet, regional missense constraint) will improve. Weights could also incorporate more functional information, such as expression in disease-relevant tissues. For example, we observe that DD-associated genes are significantly more likely to be expressed in fetal brain (Supplementary Fig. 14). Furthermore, novel metrics based on gene co-regulation networks can predict whether genes function within a disease-relevant pathway[33]. As a cautionary note, including more functional information may increase power to detect some novel disorders while decreasing power for disorders with pathophysiology different from known disorders. Our analyses also suggest that variant-level weights could be further improved by incorporating other variant prioritisation metrics, such as upweighting variants predicted to impact splicing, variants in particular protein domains, or variants that are somatic driver mutations during tumorigenesis. In developing DeNovoWEST, we explored applying both variant-level weights and gene-level weights in separate stages of the analysis, however, subtle but pervasive correlations between gene-level metrics (e.g. shet) and variant-level metrics (e.g. regional missense constraint, CADD) presents statistical challenges to implementation. Finally, the discovery of less penetrant disorders can be empowered by analytical methodologies that integrate both DNMs and rare inherited variants, such as TADA[34]. Nonetheless, using current methods focused on DNMs alone, we estimated that ~350,000 parent-child trios would need to be analysed to have ~80% power to detect HI genes with a 10-fold PTV enrichment. Discovering non-HI disorders will need even larger sample sizes. Reaching this number of sequenced families will be impossible for an individual research study or clinical centre, therefore it is essential that genetic data generated as part of routine diagnostic practice is shared with the research community such that it can be aggregated to drive discovery of novel disorders and improve diagnostic practice.

Exploring the remaining number of DD genes.

Recurrent Mutations.

79 in total

1. Mutation-specific pathophysiological mechanisms define different neurodevelopmental disorders associated with SATB1 dysfunction.

Authors: Joery den Hoed; Elke de Boer; Norine Voisin; Alexander J M Dingemans; Nicolas Guex; Laurens Wiel; Christoffer Nellaker; Shivarajan M Amudhavalli; Siddharth Banka; Frederique S Bena; Bruria Ben-Zeev; Vincent R Bonagura; Ange-Line Bruel; Theresa Brunet; Han G Brunner; Hui B Chew; Jacqueline Chrast; Loreta Cimbalistienė; Hilary Coon; Emmanuèlle C Délot; Florence Démurger; Anne-Sophie Denommé-Pichon; Christel Depienne; Dian Donnai; David A Dyment; Orly Elpeleg; Laurence Faivre; Christian Gilissen; Leslie Granger; Benjamin Haber; Yasuo Hachiya; Yasmin Hamzavi Abedi; Jennifer Hanebeck; Jayne Y Hehir-Kwa; Brooke Horist; Toshiyuki Itai; Adam Jackson; Rosalyn Jewell; Kelly L Jones; Shelagh Joss; Hirofumi Kashii; Mitsuhiro Kato; Anja A Kattentidt-Mouravieva; Fernando Kok; Urania Kotzaeridou; Vidya Krishnamurthy; Vaidutis Kučinskas; Alma Kuechler; Alinoë Lavillaureix; Pengfei Liu; Linda Manwaring; Naomichi Matsumoto; Benoît Mazel; Kirsty McWalter; Vardiella Meiner; Mohamad A Mikati; Satoko Miyatake; Takeshi Mizuguchi; Lip H Moey; Shehla Mohammed; Hagar Mor-Shaked; Hayley Mountford; Ruth Newbury-Ecob; Sylvie Odent; Laura Orec; Matthew Osmond; Timothy B Palculict; Michael Parker; Andrea K Petersen; Rolph Pfundt; Eglė Preikšaitienė; Kelly Radtke; Emmanuelle Ranza; Jill A Rosenfeld; Teresa Santiago-Sim; Caitlin Schwager; Margje Sinnema; Lot Snijders Blok; Rebecca C Spillmann; Alexander P A Stegmann; Isabelle Thiffault; Linh Tran; Adi Vaknin-Dembinsky; Juliana H Vedovato-Dos-Santos; Samantha A Schrier Vergano; Eric Vilain; Antonio Vitobello; Matias Wagner; Androu Waheeb; Marcia Willing; Britton Zuccarelli; Usha Kini; Dianne F Newbury; Tjitske Kleefstra; Alexandre Reymond; Simon E Fisher; Lisenka E L M Vissers
Journal: Am J Hum Genet Date: 2021-01-28 Impact factor: 11.025

2. SPEN haploinsufficiency causes a neurodevelopmental disorder overlapping proximal 1p36 deletion syndrome with an episignature of X chromosomes in females.

Authors: Francesca Clementina Radio; Kaifang Pang; Andrea Ciolfi; Michael A Levy; Andrés Hernández-García; Lucia Pedace; Francesca Pantaleoni; Zhandong Liu; Elke de Boer; Adam Jackson; Alessandro Bruselles; Haley McConkey; Emilia Stellacci; Stefania Lo Cicero; Marialetizia Motta; Rosalba Carrozzo; Maria Lisa Dentici; Kirsty McWalter; Megha Desai; Kristin G Monaghan; Aida Telegrafi; Christophe Philippe; Antonio Vitobello; Margaret Au; Katheryn Grand; Pedro A Sanchez-Lara; Joanne Baez; Kristin Lindstrom; Peggy Kulch; Jessica Sebastian; Suneeta Madan-Khetarpal; Chelsea Roadhouse; Jennifer J MacKenzie; Berrin Monteleone; Carol J Saunders; July K Jean Cuevas; Laura Cross; Dihong Zhou; Taila Hartley; Sarah L Sawyer; Fabíola Paoli Monteiro; Tania Vertemati Secches; Fernando Kok; Laura E Schultz-Rogers; Erica L Macke; Eva Morava; Eric W Klee; Jennifer Kemppainen; Maria Iascone; Angelo Selicorni; Romano Tenconi; David J Amor; Lynn Pais; Lyndon Gallacher; Peter D Turnpenny; Karen Stals; Sian Ellard; Sara Cabet; Gaetan Lesca; Joset Pascal; Katharina Steindl; Sarit Ravid; Karin Weiss; Alison M R Castle; Melissa T Carter; Louisa Kalsner; Bert B A de Vries; Bregje W van Bon; Marijke R Wevers; Rolph Pfundt; Alexander P A Stegmann; Bronwyn Kerr; Helen M Kingston; Kate E Chandler; Willow Sheehan; Abdallah F Elias; Deepali N Shinde; Meghan C Towne; Nathaniel H Robin; Dana Goodloe; Adeline Vanderver; Omar Sherbini; Krista Bluske; R Tanner Hagelstrom; Caterina Zanus; Flavio Faletra; Luciana Musante; Evangeline C Kurtz-Nelson; Rachel K Earl; Britt-Marie Anderlid; Gilles Morin; Marjon van Slegtenhorst; Karin E M Diderich; Alice S Brooks; Joost Gribnau; Ruben G Boers; Teresa Robert Finestra; Lauren B Carter; Anita Rauch; Paolo Gasparini; Kym M Boycott; Tahsin Stefan Barakat; John M Graham; Laurence Faivre; Siddharth Banka; Tianyun Wang; Evan E Eichler; Manuela Priolo; Bruno Dallapiccola; Lisenka E L M Vissers; Bekim Sadikovic; Daryl A Scott; Jimmy Lloyd Holder; Marco Tartaglia
Journal: Am J Hum Genet Date: 2021-02-16 Impact factor: 11.025

3. Prediction of Neurodevelopmental Disorders Based on De Novo Coding Variation.

Authors: Julie C Chow; Fereydoun Hormozdiari
Journal: J Autism Dev Disord Date: 2022-05-20

4. Retrospective analysis of a clinical exome sequencing cohort reveals the mutational spectrum and identifies candidate disease-associated loci for BAFopathies.

Authors: Chun-An Chen; John Lattier; Wenmiao Zhu; Jill Rosenfeld; Lei Wang; Tiana M Scott; Haowei Du; Vipulkumar Patel; Anh Dang; Pilar Magoulas; Haley Streff; Jessica Sebastian; Shayna Svihovec; Kathryn Curry; Mauricio R Delgado; Neil A Hanchard; Seema Lalani; Ronit Marom; Suneeta Madan-Khetarpal; Margarita Saenz; Hongzheng Dai; Linyan Meng; Fan Xia; Weimin Bi; Pengfei Liu; Jennifer E Posey; Daryl A Scott; James R Lupski; Christine M Eng; Rui Xiao; Bo Yuan
Journal: Genet Med Date: 2021-11-30 Impact factor: 8.822

5. Genetic origins of schizophrenia find common ground.

Authors: Conrad O Iyegbe; Paul F O'Reilly
Journal: Nature Date: 2022-04 Impact factor: 49.962

6. MYT1L-associated neurodevelopmental disorder: description of 40 new cases and literature review of clinical and molecular aspects.

Authors: Juliette Coursimault; Anne-Marie Guerrot; Michelle M Morrow; Catherine Schramm; Francisca Millan Zamora; Anita Shanmugham; Shuxi Liu; Fanggeng Zou; Frédéric Bilan; Gwenaël Le Guyader; Ange-Line Bruel; Anne-Sophie Denommé-Pichon; Laurence Faivre; Frédéric Tran Mau-Them; Marine Tessarech; Estelle Colin; Salima El Chehadeh; Bénédicte Gérard; Elise Schaefer; Benjamin Cogne; Bertrand Isidor; Mathilde Nizon; Diane Doummar; Stéphanie Valence; Delphine Héron; Boris Keren; Cyril Mignot; Charles Coutton; Françoise Devillard; Anne-Sophie Alaix; Jeanne Amiel; Laurence Colleaux; Arnold Munnich; Karine Poirier; Marlène Rio; Sophie Rondeau; Giulia Barcia; Bert Callewaert; Annelies Dheedene; Candy Kumps; Sarah Vergult; Björn Menten; Wendy K Chung; Rebecca Hernan; Austin Larson; Kelly Nori; Sarah Stewart; James Wheless; Christina Kresge; Beth A Pletcher; Roseline Caumes; Thomas Smol; Sabine Sigaudy; Christine Coubes; Margaret Helm; Rosemarie Smith; Jennifer Morrison; Patricia G Wheeler; Amy Kritzer; Guillaume Jouret; Alexandra Afenjar; Jean-François Deleuze; Robert Olaso; Anne Boland; Christine Poitou; Thierry Frebourg; Claude Houdayer; Pascale Saugier-Veber; Gaël Nicolas; François Lecoquierre
Journal: Hum Genet Date: 2021-11-08 Impact factor: 4.132

Review 7. Genetic testing for unexplained perinatal disorders.

Authors: Thomas Hays; Ronald J Wapner
Journal: Curr Opin Pediatr Date: 2021-04-01 Impact factor: 2.856

8. Non-coding region variants upstream of MEF2C cause severe developmental disorder through three distinct loss-of-function mechanisms.

Authors: Caroline F Wright; Nicholas M Quaife; Laura Ramos-Hernández; Petr Danecek; Matteo P Ferla; Kaitlin E Samocha; Joanna Kaplanis; Eugene J Gardner; Ruth Y Eberhardt; Katherine R Chao; Konrad J Karczewski; Joannella Morales; Giuseppe Gallone; Meena Balasubramanian; Siddharth Banka; Lianne Gompertz; Bronwyn Kerr; Amelia Kirby; Sally A Lynch; Jenny E V Morton; Hailey Pinz; Francis H Sansbury; Helen Stewart; Britton D Zuccarelli; Stuart A Cook; Jenny C Taylor; Jane Juusola; Kyle Retterer; Helen V Firth; Matthew E Hurles; Enrique Lara-Pezzi; Paul J R Barton; Nicola Whiffin
Journal: Am J Hum Genet Date: 2021-05-21 Impact factor: 11.025

9. A rare missense variant in the ATP2C2 gene is associated with language impairment and related measures.

Authors: Angela Martinelli; Mabel L Rice; Joel B Talcott; Rebeca Diaz; Shelley Smith; Muhammad Hashim Raza; Margaret J Snowling; Charles Hulme; John Stein; Marianna E Hayiou-Thomas; Ziarih Hawi; Lindsey Kent; Samantha J Pitt; Dianne F Newbury; Silvia Paracchini
Journal: Hum Mol Genet Date: 2021-06-09 Impact factor: 6.150

10. The CHD8/CHD7/Kismet family links blood-brain barrier glia and serotonin to ASD-associated sleep defects.

Authors: Mireia Coll-Tané; Naihua N Gong; Samuel J Belfer; Lara V van Renssen; Evangeline C Kurtz-Nelson; Milan Szuperak; Ilse Eidhof; Boyd van Reijmersdal; Isabel Terwindt; Jaclyn Durkin; Michel M M Verheij; Chang N Kim; Caitlin M Hudac; Tomasz J Nowakowski; Raphael A Bernier; Sigrid Pillen; Rachel K Earl; Evan E Eichler; Tjitske Kleefstra; Matthew S Kayser; Annette Schenck
Journal: Sci Adv Date: 2021-06-04 Impact factor: 14.957