| Literature DB >> 33057194 |
Joanna Kaplanis1, Kaitlin E Samocha1, Laurens Wiel2,3, Zhancheng Zhang4, Kevin J Arvai4, Ruth Y Eberhardt1, Giuseppe Gallone1, Stefan H Lelieveld2, Hilary C Martin1, Jeremy F McRae1, Patrick J Short1, Rebecca I Torene4, Elke de Boer5, Petr Danecek1, Eugene J Gardner1, Ni Huang1, Jenny Lord1,6, Iñigo Martincorena1, Rolph Pfundt5, Margot R F Reijnders2,7, Alison Yeung8,9, Helger G Yntema5, Lisenka E L M Vissers5, Jane Juusola4, Caroline F Wright10, Han G Brunner5,7,11,12, Helen V Firth1,13, David R FitzPatrick14, Jeffrey C Barrett1, Matthew E Hurles15, Christian Gilissen2, Kyle Retterer4.
Abstract
De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.Entities:
Mesh:
Year: 2020 PMID: 33057194 PMCID: PMC7116826 DOI: 10.1038/s41586-020-2832-5
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Results of DeNovoWEST analysis.
(a) Comparison of p-values using the new method (DeNovoWEST) versus the previous method (mupit)[1], run on the full cohort. Dashed lines indicate the threshold for genome-wide significance (one sided, Bonferroni correction). Point size is proportional to the number of nonsynonymous DNMs in our cohort (nsyn). The number of genes that fall into each quadrant are annotated. (b) The number of missense and PTV DNMs in the novel genes. Point size is proportional to the log10(-p-value) from analysis of the undiagnosed subset. Point colour corresponds to which test p-value was more significant: non-synonymous enrichment test in blue (pEnrich), missense enrichment and clustering test in red (pMEC). (c) The distribution of significant p-values from analysis of the undiagnosed subset for discordant and novel genes; p-values for consensus genes come from the full cohort analysis. The number of genes in each p-value bin is coloured by diagnostic gene group (n = 285 significant genes; one-sided p-values, Bonferroni corrected). Green represents the remaining fraction of cases expected to have a pathogenic de novo coding mutation and grey is the fraction of cases that are likely to be explained by other factors. (d) The fraction of cases (n = 31,058) with a nonsynonymous mutation in each diagnostic gene group. (e) The fraction of cases with a nonsynonymous mutation in each diagnostic gene group split by sex (n = 13,636 female and 17,422 male). In all panels, black, blue and orange represents consensus, discordant and novel genes respectively.
Figure 2Properties of novel genes.
(a) The phenotypic similarity of patients with DNMs in novel and consensus genes. Random phenotypic similarity was calculated from random pairs of patients. Cases with DNMs in the same novel gene were less phenotypically similar than cases with DNMs in the same consensus gene (p = 2.3 × 10-11, two-sided Wilcoxon rank-sum test). (b) Comparison of properties of consensus (n = 380) and novel (n = 28) DD genes known to be differential between consensus and non-DD genes (95% bootstrapped confidence intervals shown).
Recurrent Mutations.
De novo single nucleotide variants with more than 9 recurrences in our cohort annotated with relevant information, such as CpG status, whether the impacted gene is a known somatic driver or germline selection gene, and diagnostic gene group (e.g. consensus). “Recur” refers to the number of recurrences. “Likely mechanism” refers to mechanisms attributed to this gene in the published literature.
| Symbol | Chr | Position | Ref | Alt | Consequence | Recur | Likely mechanism | CpG | Somatic Driver Gene | Germline Selection Gene | DD status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PACS1 | 11 | 65978677 | C | T | missense | 36 | activating | Yes | - | - | consensus |
| PPP2R5D | 6 | 42975003 | G | A | missense | 22 | dominant negative | - | - | - | consensus |
| SMAD4 | 18 | 48604676 | A | G | missense | 21 | activating | - | Yes | - | consensus |
| PACS2 | 14 | 105834449 | G | A | missense | 13 | dominant negative | Yes | - | - | discordant |
| MAP2K1 | 15 | 66729181 | A | G | missense | 11 | activating | - | Yes | Yes | consensus |
| PPP1CB | 2 | 28999810 | C | G | missense | 11 | all missense/in frame | - | - | - | consensus |
| NAA10 | X | 153197863 | G | A | missense | 11 | all missense/in frame | Yes | - | - | consensus |
| MECP2 | X | 153296777 | G | A | stop gain | 11 | loss of function | Yes | - | - | consensus |
| CSNK2A1 | 20 | 472926 | T | C | missense | 10 | activating | - | - | - | consensus |
| CDK13 | 7 | 40085606 | A | G | missense | 10 | all missense/in frame | - | - | - | consensus |
| SHOC2 | 10 | 112724120 | A | G | missense | 9 | activating | - | - | - | consensus |
| PTPN11 | 12 | 112915523 | A | G | missense | 9 | activating | - | Yes | Yes | consensus |
| SMAD4 | 18 | 48604664 | C | T | missense | 9 | activating | Yes | Yes | - | consensus |
| SRCAP | 16 | 30748664 | C | T | stop gain | 9 | dominant negative | Yes | - | - | consensus |
| FOXP1 | 3 | 71021817 | C | T | missense | 9 | loss of function | Yes | - | - | consensus |
| CTBP1 | 4 | 1206816 | G | A | missense | 9 | dominant negative | Yes | - | - | discordant |
Figure 3Factors influencing power.
(a) PTV mutability is significantly lower (p = 4.6 × 10-68, two-sided Wilcox rank sum test) in genes that are not significantly DD-associated (blue) than in DD-associated genes (red). Median depicted with a black horizontal line. (b) Distribution of PTV enrichment in significant, likely haploinsufficient, genes by category (118 consensus, 23 discordant, 8 novel genes). Lower and upper hinges correspond to first and third quantiles. Median depicted by a horizontal grey line. The upper and lower whiskers extend 1.5 times the inter-quartile range. (c) Comparison of PTV enrichment in our cohort vs the PTV to synonymous ratio in gnomAD, for genes that are significantly PTV-enriched in our cohort (without variant weighting; n = 156 genes). PTV enrichment bins labelled with log10(enrichment). Dashed line indicates regression. Confidence intervals are 95% of the rate ratio. (d) Overall PTV enrichment across genes grouped by likelihood of presenting with a structural malformation on prenatal ultrasound (145 low, 65 medium, 6 low genes). PTV enrichment is significantly higher for genes with a low likelihood compared to other genes (p = 4.6 × 10-5, two-sided Poisson test). Poisson 95% confidence intervals shown.
Extended Data Figure 1Exploring the remaining number of DD genes.
(a) Number of significant genes from downsampling full cohort and running DeNovoWEST’s enrichment test. (b) Results from modelling the likelihood of the observed distribution of de novo PTV mutations. This model varies the numbers of remaining haploinsufficient (HI) DD genes and PTV enrichment in those remaining genes. The 50% credible interval is shown in red and the 90% credible interval is shown in orange. Note that the median PTV enrichment in genes that are significant and known to operate via a loss-of-function mechanism (shown with an arrow) is 39.7.