| Literature DB >> 31814998 |
Charles A Steward1,2, Jolien Roovers3,4, Marie-Marthe Suner2,5, Jose M Gonzalez2,5, Barbara Uszczynska-Ratajczak6,7,8, Dmitri Pervouchine9, Stephen Fitzgerald2, Margarida Viola3,4, Hannah Stamberger3,4,10, Fadi F Hamdan11, Berten Ceulemans12, Patricia Leroy13, Caroline Nava14,15, Anne Lepine16, Electra Tapanari2,5, Don Keiller17, Stephen Abbs18, Alba Sanchis-Juan19, Detelina Grozeva20, Anthony S Rogers1, Mark Diekhans21, Roderic Guigó6,7, Robert Petryszak5, Berge A Minassian22,23, Gianpiero Cavalleri24, Dimitrios Vitsios25, Slavé Petrovski25, Jennifer Harrow2,5,26, Paul Flicek5, F Lucy Raymond20, Nicholas J Lench1,27, Peter De Jonghe3,4,10, Jonathan M Mudge2,5, Sarah Weckhuysen3,4,10, Sanjay M Sisodiya28,29, Adam Frankish2,5.
Abstract
The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.Entities:
Keywords: Medical genomics; Molecular medicine
Year: 2019 PMID: 31814998 PMCID: PMC6889285 DOI: 10.1038/s41525-019-0106-7
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Fig. 1Expression of transcripts. Cumulative distribution curves for the number of intron-supporting reads in pre-existing (GENCODE v20) versus updated (GENCODE v28) annotation. Distribution curves for overall transcripts, CDS, 5′ and 3′ UTR are given. The x-axis is in log10 scale.
Fig. 2Variants in coding sequence of CDKL5. ENST00000379996 had previously been annotated in GENCODE and represents a known protein-coding transcript; coding exons 17–20 are shown (coding exons in black; UTR in grey). ENST00000623535 was annotated as part of this study and the transcript contains an alternative CDS based on the usage of a different C-terminus, linked to a 3′ UTR sequence that extends into an intron of ENST00000379996. This alternative 3′ UTR has strong support in polyAseq experiments and RNA-Seq assays across multiple tissues (not shown). The 170 bp of CDS added to the intronic region contains 4 ClinVar variants, listed here by their dbSNP I.D. alongside the consequences as presented by ClinVar. Bodian et al. recently reported rs1555955290 as a de novo frameshift variant in a child with early onset seizures. Variants rs1555955296 and rs863225289 are de novo nonsense variants submitted to ClinVar by private testing laboratories, both from people described as having early infantile epileptic encephalopathy 2. In contrast, nonsense variant rs1555955268 is currently classified as ‘likely benign’ by ClinVar; this is a privately submitted germline variant from an individual with an unspecified condition. Additional transcript models within the gene have been omitted for clarity.
Fig. 3The updated SCN1A annotation identified 10 exons and five shifted splice junctions, increasing the genomic footprint of SCN1A transcription by ~3 kb. All features are described with respect to existing Ensembl model ENST00000303395 and numbered according to the scheme used in Table 1. For clarity, the features are shown as truncated models containing only the exons of specific interest (and certain features are present on multiple transcript models in the complete gene annotation). UTR sequences are shown in grey, coding or NMD regions in black. Features [1] and [2] represent previously unreported 5′ UTR sequences that have conservation and equivalent expression in mouse and chicken. Features [7] and [14] are cassette exons predicted to invoke NMD and contain the de novo variants identified in the study within patients one and two respectively. Feature 9 is a cassette exon that is an ancient duplication of coding exon five, to which it is transcribed in a mutually exclusive manner; the clinical significance of this exon has been previously demonstrated by Tate et al. Feature 12 is a cassette exon predicted to invoke NMD. Intron and exon sizes are to approximate scale. Additional transcript models have been omitted for clarity.
List of all features identified within SCN1A.
| Feature | Feature type | Feature length | Chr position (GRCh38) | Feature position | Transcript biotype | Transcript region | Feature conservation |
|---|---|---|---|---|---|---|---|
| 1 | Exon | 168 nt | chr2:166,149,047–166,149,214 | Terminal | Coding | 5′ UTR | Yes |
| 2 | Exon | 87 nt | chr2:166,126,924–166,127,010 | Internal | Coding | 5′ UTR | Partial - splice donor conserved |
| 3 | Exon | 73 nt | chr2:166,126,982–166,127,055 | Terminal | Coding | 5′ UTR | No |
| 4 | Exon | 111 nt | chr2:166,126,924–166,127,034 | Terminal | Retained intron | 5′ UTR | Yes |
| 5 | Splice acceptor | 46 nt extension | chr2:166,077,802–166,077,848 | Internal | Coding | 5′ UTR | No |
| 6 | Exon | 264 nt | chr2:166,071,623–166,071,886 | Internal | Processed transcript | n/a | No |
| 7 | Exon | 228 nt | chr2:166,060,640–166,060,867 | Internal | NMD | CDS | No |
| 8 | Splice acceptor | 4 nt extension | chr2:166,056,501–166,056,504 | Internal | NMD | CDS | Yes |
| 9 | Exon | 92 nt | chr2:166,053,039–166,053,130 | Internal | Coding | CDS | Partial - splice acceptor conserved |
| 10 | Splice acceptor | 3 nt truncation | chr2:166,045,325–166,045,327 | Internal | Coding | CDS | Yes |
| 11 | Splice donor | 16 nt extension | chr2:166,041,215–166,041,230 | Internal | NMD | CDS | Yes |
| 12 | Exon | 64 nt | chr2:166,007,230–166,007,293 | Internal | NMD | CDS | Yes |
| 13 | Intron | Skips 282 nt exon | chr2:166,002,471–166,002,752 | Internal | Coding | CDS | Yes |
| 14 | Exon | 66 nt | chr2:165,999,051–165,999,116 | Internal | NMD | CDS | No |
| 15 | Intron retention | Retains final intron of 1723 nt | chr2:165,992,413–165,994,147 | Terminal | Coding | CDS | Yes |
A single Ensembl transcript model is listed for all features; certain features are also present in other models. ‘Biotype’ details the functional effect of the feature as inferred by manual annotation. The alternative final exon within model ENST00000642141 was annotated as ‘non-coding’ due to the absence of polyadenylation data as per GENCODE guidelines; the functional status of this model is in reality unknown. Feature conservation describes the annotation and structurally identical feature in the mouse ortholog Scn1a
Fig. 4Variants in coding regions are associated with DS. a Pedigrees and Sanger sequencing traces of the two families with a de novo SCN1A variant in the identified poison exons. b The two transcripts containing the variants, relative to the full-length transcript. Red exons are coding, white exons are non-coding. c Variants are predicted to disrupt a hnRNP A1 recognition site.