| Literature DB >> 31604926 |
Eugene J Gardner1, Elena Prigmore1, Giuseppe Gallone1, Petr Danecek1, Kaitlin E Samocha1, Juliet Handsaker1, Sebastian S Gerety1, Holly Ironfield1, Patrick J Short1, Alejandro Sifrim2, Tarjinder Singh1, Kate E Chandler3, Emma Clement4, Katherine L Lachlan5,6, Katrina Prescott7, Elisabeth Rosser4, David R FitzPatrick8, Helen V Firth1,9, Matthew E Hurles10.
Abstract
Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient's symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31604926 PMCID: PMC6789007 DOI: 10.1038/s41467-019-12520-y
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
RT variant discovery in the DDD study
| Total sites cohort-wide | Mean sites per unaffected parent | Total de novo sites | |
|---|---|---|---|
| Alu | 917 | 23.6 ± 4.2 | 7 |
| LINE-1 | 167 | 2.8 ± 1.5 | 2 |
| SVA | 45 | 0.2 ± 0.5 | 0 |
|
|
|
| |
| Processed pseudogenes (PPGs) | 576 | 6.6 ± 1.7 | 2 |
|
|
|
Quantification of the four different classes of retrotransposons discovered as part of this study. Rows in italic indicate totals across the classes listed above
Fig. 1The DDD RT call set. a–e Histograms of the total number of variants per individual for the four classes of RT events identified in the DDD cohort (Alu: blue; L1: green; SVA: orange; PPGs: red; combined RT events: grey) in size one bins. f Allele frequency distributions for the RT classes depicted in a–e in log10 allele frequency bins. g Insert size estimates provided by MELT for the MEI classes ascertained in this study in log10 insert size bins. All plots only include variants from unaffected parents
Fig. 2Coding constraint on MEIs. a Cumulative consequence annotations for Alu, L1, and SVA MEIs in all samples (n = 28,132 individuals) analyzed. The majority of variants identified in this study fell within the noncoding space (either an enhancer or intron). b Comparison of constraint between MEIs and SNVs in unaffected parents. To compare the impact of exonic and intronic Alu (blue) and all MEIs (grey) to varying classes of SNVs (black), we used two metrics: the proportion of variants in genes that have been identified as LoF intolerant as gauged by pLI-score[22] (x-axis) and the proportion of variants identified in only one individual (i.e., singletons; y-axis). Error bars indicate 95% confidence intervals based on population proportion; confidence intervals were calculated for SNVs, but are too small to appear at the resolution displayed in this figure
Confirmed germ-line de novo variants in the DDD study
| Insertion coord. | RT type | Genomic compartment | ENSEMBL gene ID | HGNC gene ID | pLI | DDG2P annotation | Decipher ID[ | Diagnostic? | Parental origin | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| chr3:9495459 |
| Exonic | ENSG00000168137 |
| 1.000 | Confirmed, monoallelic | 280818 | True | Father | |
| chr5:176638159 |
| Exonic | ENSG00000165671 |
| 1.000 | Confirmed, monoallelic | 259118 | True | Unknown | Included in Wright et. al.[ |
| chr6:159190834 |
| Exonic | ENSG00000092820 |
| 0.988 | None | 300984 | False | Unknown | |
| chr7:77552086 |
| Exonic | ENSG00000006576 |
| 0.024 | None | 271388 | False | Father | |
| chr3:135913800 |
| Intronic | ENSG00000174579 |
| 0.890 | None | 292325 | False | Unknown | |
| chr3:148614204 |
| Intronic | ENSG00000163751 |
| <0.001 | None | 270426; 270428 | False | Unknown | Monozygotic twins |
| chr3:172480619 |
| Intronic | ENSG00000114346 |
| <0.001 | None | 307591 | False | Unknown | |
| chr12:46246325 | L1 | Exonic | ENSG00000189079 |
| 1.000 | Probable, monoallelic | 264759 | True | Unknown | |
| chr5:88100580 | L1 | Exonic | ENSG00000081189 |
| 0.004 | Confirmed, monoallelic | 285645 | True | Unknown | |
| chr6:10847968 | Retrogene-SLC35F2 | Intergenic | N/A | N/A | N/A | N/A | 291670 | False | Unknown | |
| chr1:25074202 | Retrogene-SERINC5 | Intronic | ENSG00000169504 |
| 0.009 | None | 301168 | False | Father |
Relevant clinical and annotation information for MEI and PPG de novo variants identified as part of this study. Location of the insertion event is given in human build GRCh37 reference coordinates (Insertion coord.). A true value in the Diagnostic column indicates, at the time of publication, that this variant intersected a known DD gene, and was deemed likely to be involved in the patient’s phenotype by the referring clinician; false does not indicate whether or not, with additional future evidence, the gene may become associated with DD and the variant thus deemed diagnostically relevant. If applicable, ENSEMBL[59] gene IDs indicate the gene impacted, not the gene from which the event is derived (i.e., for PPGs)
Fig. 3RT-derived de novos in the DDD. We identified a total of nine de novo MEIs, four of which disrupted the protein-coding sequence of a known DD gene: a SETD5, b NSD1, c MEF2C, and d ARID2. Shown in each panel is a diagram of the affected gene (blue model) with the relevant insertion indicated with a colored bubble. To the right are PCR validations confirming the de novo status of each mutation; a positive result is indicated by a raised secondary band present only in the proband sample (red arrow). e Circos diagram and PCR results for two identified germ-line de novo PPGs. For each de novo PPG shown is a diagram of the donor gene (gene model), location of duplication as PPG (directional arrow), and new insertion site. Exons from the donor gene included in the PPG are indicated by brackets underneath the donor gene model. To confirm PPG presence, PCR was performed (Methods) on proband, paternal, and maternal gDNA (sample in each lane is shown by pedigree). The band which represents the PPG is marked with a red arrow, and was confirmed via capillary sequencing (Supplementary Fig. 12). Dashed lines indicate intergenic regions, all genes models are shown in sense orientation, and PPG gene diagrams are not to scale
Fig. 4Estimating enrichment of deleterious MEIs. Depicted are the total number of expected (black) and observed (red) de novo mutations observed in exons (a) and enhancers (b) for all, high pLI (pLI > 0.9), and known monoallelic DD (MA DDG2P) genes. Expectation is based on the Poisson distribution of 100 simulations utilizing the neutral mutation rate (1.2 × 10–11 μ). P-values are based on the Poisson distribution and used to determine statistical deviation of observed to expected de novo counts for exons and enhancers