Literature DB >> 34708937

A spotter's guide to SNPtic exons: The common splice variants underlying some SNP-phenotype correlations.

Niall Patrick Keegan^1,2,3, Sue Fletcher^1,2,4.

Abstract

BACKGROUND: Cryptic exons are typically characterised as deleterious splicing aberrations caused by deep intronic mutations. However, low-level splicing of cryptic exons is sometimes observed in the absence of any pathogenic mutation. Five recent reports have described how low-level splicing of cryptic exons can be modulated by common single-nucleotide polymorphisms (SNPs), resulting in phenotypic differences amongst different genotypes.
METHODS: We sought to investigate whether additional 'SNPtic' exons may exist, and whether these could provide an explanatory mechanism for some of the genotype-phenotype correlations revealed by genome-wide association studies. We thoroughly searched the literature for reported cryptic exons, cross-referenced their genomic coordinates against the dbSNP database of common SNPs, then screened out SNPs with no reported phenotype associations.
RESULTS: This method discovered five probable SNPtic exons in the genes APC, FGB, GHRL, MYPBC3 and OTC. For four of these five exons, we observed that the phenotype associated with the SNP was compatible with the predicted splicing effect of the nucleotide change, whilst the fifth (in GHRL) likely had a more complex splice-switching effect.
CONCLUSION: Application of our search methods could augment the knowledge value of future cryptic exon reports and aid in generating better hypotheses for genome-wide association studies.

Entities: Chemical

Keywords: RNA splicing; cryptic exon; genome-wide association study; single-nucleotide polymorphism

Mesh：

Substances：
Nucleotides

Year: 2021 PMID： 34708937 PMCID： PMC8801146 DOI： 10.1002/mgg3.1840

Source DB: PubMed Journal: Mol Genet Genomic Med ISSN： 2324-9269 Impact factor: 2.183

INTRODUCTION

Since the first cryptic exon (CE), or pseudoexon, was discovered in humans in 1983 (Dobkin et al., 1983), there have been hundreds more reported examples of this splicing phenomenon. Most CEs are detected as the result of pathogenic deep intronic mutations that directly enhance the exon‐like characteristics of intron tracts not otherwise retained in mature transcripts. Because the sequences of most CEs have not evolved to preserve the open reading frame, CE inclusion typically introduces premature stop codons or frameshifts to the affected mRNA, resulting in non‐functional transcripts and/or nonsense‐mediated decay (NMD). The most common cause of CE pathogenesis is a single‐nucleotide variant (SNV) in the CE or its flanking splice site motifs, usually at one of the four bases of the CE terminal dinucleotides (Romano et al., 2013; Vaz‐Drago et al., 2017; Vorechovsky, 2010). Mutations that alter the binding motifs of other local splicing factors are also observed, but less frequently (Canson et al., 2020; Keegan, 2020; Tubeuf et al., 2020). Some reports of CE pathogenesis have noted low‐level CE splicing in cells that do not carry a pathogenic mutation (Braun et al., 2013; Druhan et al., 2020; Will et al., 1993). In these cases, it appears that the pathogenic mutations are not ‘creating’ or ‘activating’ a CE, but rather, dramatically enhancing the inclusion of a CE that already exists. This begs the question of why these low‐frequency CEs have not been eliminated from the genome by selective pressure. Do they persist as subtle but useful regulators of gene expression, or are they merely tolerated as an unavoidable side effect of organismal complexity? Recent research indicates that at least some low‐spliced CEs are indeed functional and may be better described as ‘poison exons’, a spliceosomal tactic for committing unneeded transcripts to nonsense‐mediated decay and thus avoiding excess translation of the encoded protein (Anko et al., 2012). However, at the time of writing, only a few poison exons have been formally characterised in a limited range of genes (Carvill & Mefford, 2020; Thomas et al., 2020). Regardless of whether a CE serves a functional role, it can be speculated that any change in its splicing characteristics will produce a phenotypic change in corresponding directionality and severity. At one end of this spectrum are those pathogenic mutations that greatly increase CE inclusion and produce an easily observable disease phenotype; whilst at the other end are so‐called ‘near‐neutral’ variants, so slight in their effect that they would defy characterisation in a single individual. It is only when these subtle variants occur frequently in a population that statistical analysis can measure the differences amongst the carriers of each variant, and thus separate the signal from the noise (Figure 1).

FIGURE 1

A general model of SNPtic exon splicing. A cryptic exon, or CE (dashed‐line box) is included in mature transcripts at frequencies that vary depending on the genotype of the carrier. Because the CE encodes a premature stop codon more than 55 nt from the final splice junction, mature transcripts that include the CE are targeted for nonsense‐mediated decay (NMD, grey circle) and are not translated. If a patient carries an SNV (C>G) that greatly increases CE inclusion, NMD predominates and little protein is translated, resulting in a rare but distinct disease phenotype. Conversely, through similar mechanisms a common SNP (A>T) with a weak effect on CE splicing leads to a common but indistinct phenotype, which may only be measurable with a sufficiently powered genome‐wide association study Genome‐wide association studies (GWASes) have used this approach to identify thousands of correlations between common genetic variants and particular phenotypes or disease risk profiles; and most germline variants examined by these studies are single‐nucleotide polymorphisms (SNPs). A strict definition of the term ‘SNP’ refers only to germline one‐nucleotide substitutions, but conventional usage of the term, which we have adopted in this report, also encompasses small deletions and insertions, and typically only refers to variants observed in at least 1% of the haploid sample population. However, despite the great power of GWASes to discover SNP–phenotype correlations, deriving the aetiologies underlying these correlations has proved a much more challenging and laborious task (Cano‐Gamez & Trynka, 2020). Evidence indicates that the mechanism driving at least some SNP–phenotype associations is SNP‐driven modulation of cryptic splicing (Stein et al., 2015). However, the effect of SNPs on the splicing of cryptic exons specifically is underexamined in the literature. This led us to investigate whether there may be published reports describing the components of CE–SNP pairs but not conceptually connecting them as components of a single phenomenon. The online resource dbSNP (Sherry et al., 2001), accessible both directly and via the UCSC Genome Browser (Kent et al., 2002), collates the locations and frequencies of millions of SNPs across the human genome, whilst GWAS Central (Beck et al., 2020) serves as an international repository for GWAS data. Both are freely accessible and easily searchable. Unfortunately, however, to our knowledge an equally comprehensive database of cryptic exons does not exist. We believe that this is largely due to the sporadic nature of cryptic exon discovery over the last four decades resulting in a lack of consistency in how they are reported. In this report, we outline our approach to discovering examples of cryptic exons likely to be subject to SNP‐associated differential splicing. In the interest of clarity, we will henceforth refer to this SNP‐associated differential splicing, and the cryptic exons it purportedly affects, as ‘SNPtic splicing’ and ‘SNPtic exons’, respectively; this novel term ‘SNPtic’ (pronounced SNIP‐tick) being a portmanteau of ‘SNP’ and ‘cryptic’. We believe that this technique will be of great use to researchers reporting new CEs in the future, who may find it substantially adds to the information content of their publications.

MATERIALS AND METHODS

Our strategy for search and analysis is outlined below. Henceforth, all references to the ‘effect’ of a SNP refer to the effect of the minor (least common) allele relative to that of the major allele. Cryptic exon discovery. Using Google Scholar, we performed a thorough literature search for reported examples of cryptic exons, using the search terms ‘pseudoexon’, ‘cryptic exon’ and ‘deep intronic mutation’. For each resulting report, we used the details provided therein to derive the full genomic sequence, coordinates (GRCh38/hg38) and strand identity (+ or ‐) of each described cryptic exon, plus 20 nucleotides of flanking sequence at each end. Cross‐search for common SNPs. The final dataset of cryptic exon coordinates was compiled into a BED file, uploaded to the UCSC Genome Browser data integrator as a custom track and cross‐searched against the track ‘dbSNP 153’, sub‐track ‘Common dbSNP 153’. Results from the cross‐search were exported into Microsoft Excel for further analysis. Alternative to steps 1. & 2. Instead of following the methods described above, researchers investigating small numbers of cryptic exons may find it is easier to simply enable the ‘Common dbSNP 153’ track on the UCSC Genome Browser, and then perform serial BLAT searches of each sequence of interest whilst manually annotating the rsIDs of any coinciding SNPs. Filtering. Because flanking AG‐GY terminal dinucleotides appear to be almost essential for U2‐type splicing, which predominates in the human transcriptome (Parada et al., 2014), we manually excluded any cryptic exon (and associated SNPs) that did not bear these dinucleotides in at least one SNP allele. Search for GWAS phenotypes. We searched the rsID of each remaining SNP both in GWAS Central and in Google. For each GWAS Central search, we considered as ‘hits’ only those studies that reported a p‐value with baseline significance (p ≤ .05) and had a defined effect size for the searched SNP. This latter requirement was to ensure that the correct allele of the SNP was assigned to the correct phenotype. For Google results, we considered as ‘hits’ only those results that originated from peer‐reviewed literature in which the SNP was described as being of probable significance to a particular phenotype. Prediction of SNP effect. The method of analysis for each SNP depended on its position relative to the CE splice sites. SNPs at or between positions −20 to +3 of the CE acceptor site, or at or between positions −3 to +6 of the donor site, were analysed for their effect on the Maximum Entropy (MaxEnt) score of the corresponding motif using the MaxEntScan web utility (Yeo & Burge, 2004). MaxEnt was chosen based on its well‐established efficacy––at the time of writing, Yeo & Burge’s, 2004 report has been cited over 1600 times. However, there are numerous other splice motif scoring methods that perform comparably well (see Jian et al., 2014 for review). Most SNPs were classed as either ‘More inclusion’ if they increased a MaxEnt score or as ‘Less inclusion’ if they decreased a MaxEnt score. In cases where a SNP was predicted to alter the splicing ratio between two isoforms of a CE, it was classed as ‘Splice‐switching’. For other SNPs inside the CE, their effects were predicted using HExoSplice (Tubeuf et al., 2020), with a positive score indicating higher inclusion and a negative score indicating lower inclusion. For all other SNPs outside the CE, cryptic exon sequences corresponding to both SNP alleles were comparatively analysed via the SpliceAid 2 web utility (Piva et al., 2012). SpliceAid 2 automatically designates detected motifs as ‘enhancers’ or ‘silencers’ of exon inclusion, but in some cases the true effect of an RNA‐bound splice factor is dependent on its orientation to the putative exon (Fu & Ares, 2014). We therefore investigated the predicted effects of any altered splice factor motifs on a case‐by‐case basis to determine whether they were more likely to increase or decrease inclusion. Categorisation. Each resulting cryptic exon/SNP pair was categorised as: A known SNPtic exon, if the association between the cryptic exon and the SNP had been explicitly characterised in a prior report; A probable SNPtic exon, if a prior report had linked the SNP with a particular phenotype, but had not investigated differential splicing of the cryptic exon as a cause of that phenotype and A potential SNPtic exon, if the SNP was not significantly associated with a phenotype and had not been shown to directly affect CE splicing, but was still deemed a worthwhile candidate for further investigation due to its predicted effect on splicing of a known CE. To limit this category to the most likely examples, we included only those SNPs that altered the most highly conserved nucleotides of a cryptic exon splice motif, that is, −3 to +3 of the acceptor site or −3 to +6 of the donor site. Final assessment. The expected phenotypic effects of each putative SNPtic exon were analysed and discussed, according to both prior research on the affected gene and the fundamental principles of U2‐type splicing. Additionally, the predicted changes to each gene's encoded protein sequence were calculated for each putative SNPtic exon using the ExPASy Translate Tool (Duvaud et al. 2021) and are provided as a Supplementary File in the online version of this report. In devising this method, we were unable to account for the splicing impact of SNP‐associated changes on RNA folding as, to our knowledge, there is currently no generalised method for making these types of predictions. Since it has been shown that even single nucleotide changes can affect gene expression by altering RNA secondary structures (Ritz et al., 2012; Sabarinathan et al., 2013), these types of splicing effects may well exist; although another recent report indicated that the impact of SNPs on conserved RNA structures was minimal (Kalmykova et al., 2021). GenBank IDs of studied genes: APC, NG_008481.4; ARSB, NG_007089.1; ATM, NG_009830.1; CSF1R, NG_012303.2; DMD, NG_012232.1; F8, NG_011403.2; FGB, NG_008833.1; GHRL, NG_011560.1; IL16, NG_029933.1; LHCGR, NG_008193.2; MYBPC3, NG_007667.1; NF1, NG_009018.1; OAS1, NG_011530.2; OTC, NG_008471.1; POC1B, NG_041783.1; TSFM, NG_016971.1.

RESULTS AND DISCUSSION

In addition to six known SNPtic exons, our analysis also discovered five probable SNPtic exons and five potential SNPtic exons (Tables S1 and 1), each arising within a different gene. With one exception (OAS1‐2a, described below), the predicted reading frame effect of each CE inclusion was to introduce at least one premature stop codon more than 55 nt upstream of the transcript’s final exon junction, either within the putative SNPtic exon itself or within its flanking 3′ canonical exon, and would therefore be expected to induce NMD of the mature transcript (Zhang, Sun, et al., 1998; Zhang, Center, et al., 1998). There were no examples of a SNP of interest adding or altering a start or stop codon within a CE.

TABLE 1

Sixteen putative SNPtic exons and their associated phenotypes

SNPtic exon	SNPs	Expected effect	SNP phenotype	Exon high‐inclusion phenotype
ATM‐27a (Known)	rs609261 (NC_000011.10: g.108287407T>C)	Less inclusion	Lower cancer risk	Ataxia telangiectasia (poor coordination, prominent eye blood vessels and high cancer risk)
F8‐13a (Known)	rs781928603 (NC_000023.11: g.154947237_154947249del)	More inclusion	Mild haemophilia type A	Mild haemophilia type A
IL16‐6a (Known)	rs4778639 (NC_000015.10: g.81308110T>C)	More inclusion (no NMD)	Higher interleukin‐16 levels in blood	Unknown
LHCGR‐6a (Known)	rs68073206 (NC_000002.12: g.48721568A>C)	Splice‐switch (S > L)	Higher testosterone levels and higher androgen sensitivity index	Male pseudohermaphroditism
OAS1‐2a (Known) (rs116086311)	rs116086311 (NC_000012.12: g.112910849C>T), rs34137742 (NC_000012.12: g.112910856C>T)	More inclusion	Higher risk of encephalitis and paralysis if infected with West Nile virus (rs34137742)	Unknown. Other OAS1 mutations associated with higher risk of West Nile virus infection
TSFM‐2a (Known)	rs2014886 (NC_000012.12: g.57783654C>G)	More inclusion	PREDICTED: Higher risk of multiple sclerosis	Unknown. Other TSFM mutations associated with cardiomyopathy, encephalomyopathy and ataxia
APC‐11a (Probable)	rs2545162 (NC_000005.10: g.112822734G>A)	More inclusion	Higher colorectal cancer risk	Adenomatous polyposis (colon cancer)
FGB‐1a (Probable)	rs2227401 (NC_000004.12: g.154565229C>T)	Less inclusion	Higher blood fibrinogen levels	Afibrinogenemia (Persistent cerebral transient ischemic attacks, blood clots and 1/50th normal fibrinogen levels)
GHRL‐4a (Probable)	rs2075356 (NC_000003.12: g.10287125T>C)	Splice‐switch (L > S)	Decreases cancer risk and increases bulimia risk	Unknown; other GHRL mutations associated with metabolic dysregulation
MYBPC3‐12a (Probable)	rs10769255 (NC_000011.10: g.47345820C>A)	Less inclusion	Slightly higher cognitive performance	Hypertrophic cardiomyopathy
OTC‐9a (Probable)	rs5963419 (NC_000023.11: g.38412940T>A)	Less inclusion	Increased risk of bipolar disorder	Hyperammonemia leading to brain damage and death
ARSB‐6a (Possible)	rs337836 (NC_000005.10: g.78884913T>C)	More inclusion	PREDICTED: Shorter stature and higher risk profile for other symptoms.	Mucopolysaccharidosis Type VI (Skeletal abnormalities, hearing and vision loss and heart disease)
CSF1R‐15a (Possible)	rs11952821 (NC_000005.10: g.150060771G>A)	More inclusion	PREDICTED: Shorter stature and increased susceptibility to cognitive decline.	Early onset HDLS, skeletal dysplasia (dwarfism) and brain malformation
DMD‐2a (Possible)	rs145743673 (NC_000023.11: g.32863915T>C)	Splice‐switch (S > L)	PREDICTED: Asymptomatically lower dystrophin levels. May compound an existing BMD phenotype.	Duchenne muscular dystrophy, primarily due to DMD e8‐11 duplication
NF1‐36a (Possible)	rs35888506 (NC_000017.11: g.31324211C>T)	More inclusion	PREDICTED: Higher cancer risk	Unknown; other NF1 mutations cause neurofibromatosis type 1
POC1B‐9a (Possible)	rs11323565 (NC_000012.12: g.89461145del)	More inclusion	PREDICTED: Lower visual acuity	Reduced visual acuity and contrast, photophobia

Citations are shown in main text. GenBank IDs of studied genes: APC, NG_008481.4; ARSB, NG_007089.1; ATM, NG_009830.1; CSF1R, NG_012303.2; DMD, NG_012232.1; F8, NG_011403.2; FGB, NG_008833.1; GHRL, NG_011560.1; IL16, NG_029933.1; LHCGR, NG_008193.2; MYBPC3, NG_007667.1; NF1, NG_009018.1; OAS1, NG_011530.2; OTC, NG_008471.1; POC1B, NG_041783.1; TSFM, NG_016971.1.

Sixteen putative SNPtic exons and their associated phenotypes ATM‐27a (Known) rs609261 (NC_000011.10: g.108287407T>C) F8‐13a (Known) rs781928603 (NC_000023.11: g.154947237_154947249del) IL16‐6a (Known) rs4778639 (NC_000015.10: g.81308110T>C) LHCGR‐6a (Known) rs68073206 (NC_000002.12: g.48721568A>C) OAS1‐2a (Known) (rs116086311) rs116086311 (NC_000012.12: g.112910849C>T), rs34137742 (NC_000012.12: g.112910856C>T) TSFM‐2a (Known) rs2014886 (NC_000012.12: g.57783654C>G) APC‐11a (Probable) rs2545162 (NC_000005.10: g.112822734G>A) FGB‐1a (Probable) rs2227401 (NC_000004.12: g.154565229C>T) GHRL‐4a (Probable) rs2075356 (NC_000003.12: g.10287125T>C) MYBPC3‐12a (Probable) rs10769255 (NC_000011.10: g.47345820C>A) OTC‐9a (Probable) rs5963419 (NC_000023.11: g.38412940T>A) ARSB‐6a (Possible) rs337836 (NC_000005.10: g.78884913T>C) CSF1R‐15a (Possible) rs11952821 (NC_000005.10: g.150060771G>A) DMD‐2a (Possible) rs145743673 (NC_000023.11: g.32863915T>C) NF1‐36a (Possible) rs35888506 (NC_000017.11: g.31324211C>T) POC1B‐9a (Possible) rs11323565 (NC_000012.12: g.89461145del) Citations are shown in main text. GenBank IDs of studied genes: APC, NG_008481.4; ARSB, NG_007089.1; ATM, NG_009830.1; CSF1R, NG_012303.2; DMD, NG_012232.1; F8, NG_011403.2; FGB, NG_008833.1; GHRL, NG_011560.1; IL16, NG_029933.1; LHCGR, NG_008193.2; MYBPC3, NG_007667.1; NF1, NG_009018.1; OAS1, NG_011530.2; OTC, NG_008471.1; POC1B, NG_041783.1; TSFM, NG_016971.1. Therefore, except where otherwise stated, we have assumed the following general precepts: (a) Splicing of a SNPtic exon into a transcript prevents translation of the transcript and triggers its decay via NMD, (b) leading to chronically lower levels of the full‐length mature transcript, (c) leading to chronically lower levels of the full‐length protein and (d) leading to the observed phenotypic differences amongst different genotypes of the relevant SNP. We have applied these assumptions accordingly in discussing each putative SNPtic exon in the sections that follow. Below we have identified each SNPtic exon according to the name of the gene and the intron in which it occurs, followed by the letter ‘a’ to distinguish it from the preceding canonical exon. Where two splice variants exist for a single cryptic exon, we have identified each variant as ‘S’ or ‘L’ depending on whether it is the shorter or longer variant, respectively.

Known SNPtic exons

ATM‐27a

This CE in Ataxia‐Telangiectasia Mutated (ATM–OMIM #607585) was first discovered by Coutinho et al. (2005), who also described a longer variant that shared the same acceptor site. The short variant was subsequently characterised as a SNPtic exon (sans use of this term) by Kralovicova et al. (2016). Remarkably, even though this SNP only slightly weakened the CE’s acceptor site, Kralovicova and colleagues demonstrated that this was sufficient to cause a measurable decrease in the rate of its inclusion. This, in turn, led to the corresponding increase in translation of ATM protein; and since ATM is a tumour‐suppressor gene (Choi et al., 2016), it is likely that this elevated ATM level explains the lower cancer risk seen in carriers of the SNP.

F8‐13a

Unlike the other SNPs discussed in this report, which are germline substitutions of single nucleotides, the SNP in this case (rs781928603) is a variably sized poly‐T deletion with multiple reported alternative alleles. Although the summed frequencies of these alternative alleles exceed 1%, Jourdy et al. (2018) report only on the phenotype of the del13T variant, the global frequency of which is not precisely defined but estimated at well below 1%. This del13T allele is associated with a mild haemophilia type A phenotype in males, as it induces inclusion of a CE in transcripts of Coagulation Factor VIII (F8–OMIM #300841), an important blood clotting protein. However, despite being associated with increased F8‐13a inclusion, the del13T allele slightly decreases the MaxEnt score of the CE acceptor site. Jourdy and colleagues showed that the likely reason for the splicing enhancement is a decrease in 5′ silencer binding, although we suggest that shortening of the branch point AG‐exclusion zone may also be a contributing factor (Wimmer et al., 2020). Interestingly, inclusion of identical CE sequence, and a mild haemophilia type A phenotype, has also been reported to result from an enhancing mutation in the CE donor site (Dericquebourg et al., 2020), demonstrating that the major allele isoform of the F8‐13a acceptor site is functional.

IL16‐6a

This CE in Interleukin 16 (IL16–OMIM #603035) is unique amongst the putative SNPtic exons discussed in this report, as it is the only one not to introduce a premature stop codon into the mature transcript, and therefore is not expected to promote transcript degradation via NMD. The CE was discovered in the peripheral blood RNA of 23 individuals by Sakaguchi and Suyama (2021), via bioinformatic analysis of RNA‐Seq and whole‐genome sequence data, and was not linked with a disease phenotype. The SNP rs4778639 converts the IL16‐6a acceptor site dinucleotide from an AT to an AG and is therefore likely to be essential for splicing of the CE. This SNP was found by Sun et al. (2018) to significantly correlate with increased IL16 protein levels in blood. The CE arises in the terminal intron of IL16 and is predicted to introduce nine additional amino acids to the IL16 peptide (see Appendix S1). This insertion interrupts the PDZ3 domain of the precursor protein (Sakaguchi & Suyama, 2021) and constitutes a substantial increase in the size of the mature protein, which is typically only 121 peptides long after caspase‐3 catalysis (Zhang, Sun, et al., 1998; Zhang, Center, et al., 1998). This would presumably have a marked effect on the 3D structure, export, multimeric assembly and CD4+ recruitment activity of mature IL16 (Richmond et al., 2014), yet the haploid frequency of the causative SNP (8.37%) indicates that it is not significantly deleterious, at least for heterozygous carriers. We would welcome any future research that elucidates the true in vivo behaviour of this novel potential protein isoform.

LHCGR‐6a(S/L)

Like ATM‐27a, the SNPtic exon in Luteinising Hormone/Choriogonadotropin Receptor (LHCGR–OMIM #152790) was discovered (Kossack et al., 2008) several years before the effect of its SNP was directly characterised (Liu et al., 2017). This cryptic exon bears two variants that have distinct donor sites but share an acceptor site. In their 2008 report, Kossack and colleagues detailed an SNV in LHCGR‐6a that significantly increased its frequency of inclusion, resulting in a male‐pseudohermaphroditism phenotype in the affected patients. The authors also showed significant inclusion of LHCGR‐6a from the reference allele and claimed that this demonstrated its status as a bona fide exon, a claim that appears to be supported by the high degree of conservation of LHCGR‐6a and its flanking regions (Figure 2b). However, at the time of writing, LHCGR‐6a has not yet been listed as a canonical exon of any official transcript variants on NCBI, and we have therefore continued to refer to it as a cryptic exon here.

FIGURE 2

Cryptic exons APC‐11a, LHCGR‐6a and POC1B‐9a exhibit high sequence conservation. Images were captured as screenshots from the UCSC Genome Browser (Kent et al., 2002). In descending order, displayed tracks are: Base position, dbSNP 153, input sequence, ‘GENCODE V37’ (aligned transcript variants) and ‘Cons 30 Primates’. ‘The Cons 30 Primates’ track, which is erroneously labelled as ‘Cons 30 Mammals’ in the browser, displays sequence conservation data from 30 non‐human primate species Liu et al. (2017) investigated the effects of the SNP rs68073206, located in the donor site of LHCGR‐6aL. Because this SNP substantially enhances this donor site, it might be expected that this would increase the NMD of inclusive transcripts and therefore be associated with a phenotype of lower male sexual development. Surprisingly, the authors discovered just the opposite––SNP carrier status was associated with higher levels of testosterone and higher androgen sensitivity, and inter‐genotype differences in transcript frequencies did not follow a simple ‘zero sum’ model. Part of the reason for these counterintuitive effects may be competition between the donor sites of the long and the short isoforms, as it is unclear how much of the SNP‐driven increase in LHCGR‐6aL splicing comes at the expense of LHCGR‐6aS splicing and how much at the expense of normal LHCGR splicing. The likely status of LHCGR‐6a as a highly conserved bona fide exon suggests that its splicing may play a more complex role in LHCGR autoregulation.

OAS1‐2a

This CE in 2′‐5′‐Oligoadenylate Synthetase 1 (OAS1–OMIM #164350) was identified in whole blood RNA sequence from eight healthy donors by Sakaguchi and Suyama (2021). The OAS1 gene plays an important role in the innate immune response to viruses, and a canonical splice site polymorphism near OAS1 exon 6 has been shown to increase the risk of West Nile virus infection (Lim et al., 2009). Sakaguchi and Suyama identified the rs116086311 SNP as causative of OAS1‐2a inclusion. The aetiology of this SNP is obvious, as it converts OAS1‐2a’s GC donor site dinucleotide to a much stronger GT. But whilst no phenotype associations have been discovered for rs116086311, a second SNP 3′ of the donor site, rs34137742, was found to be associated with a higher risk of encephalitis and paralysis following West Nile virus infection (Bigham et al., 2011). At first glance this seems counterintuitive: since the most powerful single‐nucleotide splice mutations tend to be those that alter an intron terminal dinucleotide, one might expect that the strongest association would be detected for rs116086311, with rs34137742 perhaps being identified as a weaker contributing factor. However, this phenotype association can be interpreted consistently with the general model of SNPtic exon splicing (Figure 1) once population genetics are considered. Firstly, the direct effect of rs34137742 is to remove a binding motif for SRSF9, a ubiquitously expressed serine‐rich splicing factor that silences upstream donor sites and enhances downstream donor sites (Cloutier et al., 2008). Loss of this motif would therefore be more permissive of OAS1‐2a splicing. Secondly, since the OAS1‐2a donor site dinucleotide is splice‐competent in both rs116086311 alleles (i.e. GC or GT), it is theoretically possible to observe a quantitative effect from rs34137742 in a population independently of their rs116086311 genotypes. Lastly, rs34137742 has a haploid frequency of over 10.3%, compared to less than 3.5% for rs116086311. This means that rs34137742 is likely to be much better represented in the sample group of any GWAS, making its phenotypic effects more easily discoverable at the population level even if they are milder than those of rs34137742 at an individual level. Although a disease risk phenotype has been established only for rs34137742, and an OAS1‐2a splicing effect only for rs116086311, we suggest that the reverse may also be true, and that these effects are a logical consequence of CE‐induced NMD of OAS1 transcripts.

TSFM‐2a

Unlike the other three ‘known’ SNPtic exons, the SNPtic exon in Ts Translation Elongation Factor, Mitochondrial (TSFM–OMIM #604723) does not have an associated phenotype and was discovered in the blood RNA of healthy individuals (Morrison et al., 2013). Morrison and colleagues suggest that this SNP may be a risk factor for multiple sclerosis (MS); but although both prior and subsequent research has supported a link between MS and other TSFM variants (Handel et al., 2010; Mo et al., 2019), at the time of writing, no such association has been demonstrated for this SNP. However, the authors did demonstrate that this SNP was almost entirely responsible for splicing of the SNPtic exon through conversion of the GC‐donor motif to a GT‐donor motif, though they also detected low levels of splicing even in C‐allele homozygotes, which fits with prior observations of U2‐spliced GC‐donor sites being functional but less efficient (Thanaraj & Clark, 2001). We also noted that this SNPtic exon was an exact match for 1 of the 10 CEs previously predicted by Sela et al. (2010). Other mutations in TSFM have been associated with cardiomyopathy, encephalomyopathy and ataxia (Smeitink et al., 2006; Emperador et al., 2016).

Other SNPtic exons in Sakaguchi and Suyama 2021

Sakaguchi and Suyama (2021) reported 116 new CEs discovered in publicly available RNAseq data. For two of these CEs, we found evidence in the literature supporting a SNP‐associated phenotype, and we have discussed these above as OAS1‐2a and IL16‐6a. We also noted an additional 17 CEs in the authors’ report where the causative variants corresponded to common SNPs, though we were not able to find any published phenotype associations for these SNPs, nor for any other SNPs within ±20 nt of their associated SNPtic exons. These examples are listed in Table 2, but as we have little to add to the original authors’ analysis of these 17 CEs, we instead refer interested readers to investigate their report.

TABLE 2

SNPtic exons caused by common SNPs (≥1% haploid frequency) as reported by Sakaguchi and Suyama (2021)

Chr.	Gene	Start	End	SNP position	rsID	Varnomen
chr1‐	NOC2L	882,137	882,244	882,250	rs111463901	NC_000001.11:g.946870C>A
chr1+	RWDD3	95,702,899	95,703,016	95,702,898	rs80241359	NC_000001.11:g.95237342A>G
chr5‐	TBCA	77,026,223	77,026,280	77,026,221	rs75503375	NC_000005.10:g.77730396C>A
chr5‐	SRA1	139,932,741	139,932,889	139,932,740	rs112703681	NC_000005.10:g.140553155T>C
chr6+	ABRACL	139,354,886	139,354,992	139,354,992	rs62441851	NC_000006.12:g.139033855A>G
chr7‐	COA1	43,695,632	43,695,752	43,695,628	rs1859877	NC_000007.14:g.43656029C>T
chr10+	HSD17B7P2	38,654,838	38,654,939	38,654,940	rs2804645	NC_000010.11:g.38366012T>A
chr11‐	DHCR7	71,157,568	71,157,656	71,157,567	rs75686975	NC_000011.10:g.71446521G>A
chr12+	MGST1	16,503,692	16,503,788	16,503,789	rs9332891	NC_000012.12:g.16350855T>G
chr12+	OAS1	113,348,549	113,348,652	113,348,654	rs116086311	NC_000012.12:g.112910849C>T
chr14+	CRIP1	105,954,227	105,954,364	105,954,368	rs112661676	NC_000014.9:g.105488031G>A
chr15+	IL16	81,600,452	81,600,478	81,600,451	rs4778639	NC_000015.10:g.81308110T>C
chr16‐	CNOT1	58,662,843	58,663,002	58,662,841	rs28644182	NC_000016.10:g.58628937G>A
chr16‐	FANCA	89,829,046	89,829,201	89,829,201	rs9806894	NC_000016.10:g.89762793G>A
chr17+	STAT5A	40,440,948	40,441,015	40,441,014	rs74875201	NC_000017.11:g.42288996G>A
chr19+	CERS4	8,312,329	8,312,446	8,312,447	rs12977774	NC_000019.10:g.8247563A>G
chr21‐	LINC00158	26,758,995	26,759,072	26,758,994	rs13049048	NC_000021.9:g.25386681T>A
chr21‐	C21orf59	33,980,707	33,980,799	33,980,705	rs111323620	NC_000021.9:g.32608395G>A
chr21+	NDUFV3	44,326,950	44,327,012	44,327,013	rs73905782	NC_000021.9:g.42906903A>G
chr22+	APOBEC3D	39,419,690	39,419,852	39,419,853	rs6001388	NC_000022.11:g.39023848T>G

‘Start’ and ‘End’ coordinates refer to human genome assembly hg19, as per cited work. SNPtic exons OAS1‐2a and IL16‐6a are indicated with bold text. The APOBEC3D SNP is not shown in the cited work but is required for splicing in addition to the published variant (Narumi Sakaguchi 2021, Pers. Comm).

SNPtic exons caused by common SNPs (≥1% haploid frequency) as reported by Sakaguchi and Suyama (2021) ‘Start’ and ‘End’ coordinates refer to human genome assembly hg19, as per cited work. SNPtic exons OAS1‐2a and IL16‐6a are indicated with bold text. The APOBEC3D SNP is not shown in the cited work but is required for splicing in addition to the published variant (Narumi Sakaguchi 2021, Pers. Comm).

Probable SNPtic exons

APC‐11a

This CE in Adenomatous Polyposis Coli (APC–OMIM #611731) was first reported as a pathogenic inclusion by Spier et al. (2012). Remarkably, three unique donor site SNVs have been reported as being causative of pathogenic APC‐11a splicing (Nieminen et al., 2016; Spier et al., 2012). All three mutations caused a phenotype of familial adenomatous polyposis (FAP), a disease characterised by colon polyps and an elevated risk of colon cancer. Like LHCGR‐6a, the sequence in and surrounding APC‐11a is highly conserved (Figure 2a), supporting the case for this being an as yet unrecognised bona fide exon. The SNP rs2545162 is predicted to create a 3′ binding motif for MBNL1, an alternative splicing regulator that has been shown to consistently enhance the splicing of exons when it binds within ~200 nt 3′ of their donor sites (Konieczny et al., 2014; Wang et al., 2012). We would therefore expect the minor allele of this SNP to increase APC‐11a inclusion and be associated with a higher risk of FAP‐like symptoms. This prediction agrees with the findings of Hildebrandt et al. (2016), who found that rs2545162 was significantly associated with a higher risk of colorectal cancer.

FGB‐1a

This pathogenic CE in Fibrinogen Beta (FGB–OMIM #134830) was first predicted by Dear et al. (2006), who identified the causative mutation in a consanguineous family, and was later confirmed and further characterised by Davis et al. (2009). The authors determined that an SNV within the CE converted a silencer motif to an enhancer, thereby substantially increasing FGB‐1a inclusion. Consequently, the homozygous proband exhibited a phenotype of afibrinogenemia with recurrent transient ischemic attacks, whilst his two heterozygous children bore a milder phenotype of hypofibrinogenemia. The SNP rs2227401 is situated inside the CE and is predicted to silence its inclusion, so we would expect an associated phenotype opposite to afibrinogenemia. This is supported by two GWASes (de Vries et al., 2017; Kolz et al., 2009) that independently discovered an association between rs2227401 and higher levels of blood fibrinogen.

GHRL‐4a(S/L)

Like the LHCGR‐6a SNPtic exon, this CE in Ghrelin (GHRL–OMIM #605353) also consists of a short and a long variants, though in this case it is the donor site that is shared with two unique acceptor sites (Seim et al., 2013). Seim and colleagues observed GHRL‐4a inclusion in multiple healthy cell types and elevated inclusion in prostate cancer cell lines. They also noted that the acceptor site of GHRL‐4aS appeared to be non‐canonical, with an AA terminal dinucleotide. However, the SNP rs2075356 converts this AA to a canonical AG. Given the haploid frequency of this SNP (11%) compared to the frequency of bona fide non‐AG acceptor sites (<0.1% as per Olthof et al., 2019 and Piovesan et al., 2019), we suggest that carriage of this SNP may be the more likely explanation for GHRL‐4aS splicing. The rs2075356 SNP has separately been linked with a decreased risk of certain forms of cancers (Pabalan et al., 2014) and elevated risk of purging‐type bulimia nervosa (Ando et al., 2006). However, whilst the rs2075356 minor allele is likely to be essential for GHRL‐4aS splicing, the confounding effect of competition between the GHRL‐4aS and GHRL‐4aL acceptor sites makes it difficult to predict how it would change the total amount of GHRL‐4a splicing. This difficulty is compounded by the complex post‐translational processing of preproghrelin peptides and the varied roles they play in metabolic regulation. We therefore limit ourselves to suggesting that a focused investigation of the effects of rs2075356 may prove to be a fruitful line of research.

MYBPC3‐12a

This CE in Myosin‐Binding Protein C3 (MYBPC3–OMIM #600958) was discovered by Bagnall et al. (2018). In this case, the patient’s SNV converted the GC of MYBPC3‐12a donor site to a stronger GT. The proband was one of a cohort of patients with hypertrophic cardiomyopathy, a disease characterised by overdevelopment of the muscle in the left ventricle of the heart, leading to a greatly elevated risk of arrhythmia and heart failure. Cardiac hypertrophy in general has also been associated with a higher risk of cognitive dysfunction in later life (Hayakawa et al., 2012). The SNP rs10769255 occurs inside MYBPC3‐12a and is predicted to silence its inclusion and thereby permit increased translation of full‐length MYBPC3. Surprisingly, in a subsequent GWAS rs10769255 was found to correlate with higher performance in certain tests of cognitive ability (Lee et al., 2018). Although the difference in scores attributed to the SNP was quite small, it was nonetheless determined to be highly significant due to the study's large sample size. This phenotype could be explained as a mild inverse of the elevated cognitive decline risk typically associated with hypertrophic cardiomyopathy.

OTC‐9a

This CE in Ornithine Transcarbamylase (OTC–OMIM #300461) was first observed as a pathogenic inclusion by Engel et al. (2008), caused by a donor site SNV. Because OTC is a key component in the metabolic conversion of ammonia to urea, the OTC deficiency caused by pathogenic inclusion of OTC‐9a resulted in hyperammonemia, and was ultimately fatal to the affected patient, who died at a very young age due to severe cerebral oedema. Mutations with less severe effects on the quantity and function of OTC protein have been known to cause late‐onset OTC deficiency, which can manifest in previously asymptomatic patients as erratic behaviour, lethargy and hyperammonemia (Hidaka et al., 2020; Rush et al., 2014). The SNP rs5963419 is situated within this CE and is predicted to silence its inclusion. We might therefore expect this SNP to be associated with higher OTC protein levels and a benign ‘hypoammonemic’ phenotype, opposite to the severe hyperammonemia observed for pathogenic inclusion of OTC‐9a. However, to date the only positive GWAS correlation for rs5963419 is deleterious: its minor allele was found to be overrepresented in populations with bipolar disorder (Sklar et al., 2008). A possible explanation for this is that a higher level of neuronal OTC (Bernstein et al., 2017) in carriers of this SNP may elevate the conversion of ammonia to urea in some neurons, and therefore leave less ammonia available for the conversion of glutamate into glutamine by glutamine synthetase. This could in turn result in chronically higher neuronal glutamate levels, which have been associated with bipolar disorder (Gigante et al., 2012). If this SNP had an opposite mechanism of action––that is, it increased risk of bipolar disorder by reducing OTC levels––then there should also be a strong and obvious correlation between bipolar disorder and late‐onset hyperammonemia generally; yet we could find no reports of any such association in the literature.

Potential SNPtic exons

ARSB‐6a

This CE in Arylsulfatase B (ARSB–OMIM #300461) was discovered by Broeders et al. (2020) as a sporadic inclusion in both patient and healthy control RNAs from primary human fibroblasts treated with cycloheximide, an NMD inhibitor. Broeders and colleagues noted that the donor site of this CE, which bears a non‐canonical AT flanking dinucleotide in the reference sequence, was not predicted by any of the algorithms they tested. However, we observed that if the SNP rs337836 was present then this donor site dinucleotide would be converted to a canonical GT. Given that this SNP has a haploid frequency of 33%, we suggest that its presence or absence is the most likely explanation for differential ARSB‐6a splicing between individuals. Loss‐of‐function mutations in ARSB are typically causative of mucopolysaccharidosis type six (MPS VI), a recessive inherited disorder with a spectrum of severity and a broad range of symptoms, including skeletal abnormalities, hearing loss, vision loss and heart disease. Broeders and colleagues showed compelling evidence that the immediate effect of ARSB‐6a inclusion is to induce NMD, as ARSB‐6a‐inclusive transcripts were almost undetectable in the RNA of cells not treated with cycloheximide. Therefore, the expected phenotype associations for this SNP would be analogous to sub‐clinical MPS VI. We speculate that these might include shorter stature and an elevated risk of sleep apnoea and heart disease. We also noted that this CE falls within the 3′ UTR of ARSB transcript variant ENST00000565165.2 (GENCODE), although its sequence does not show significant conservation.

CSF1R‐15a

This CE in Colony‐Stimulating Factor 1 Receptor (CSF1R–OMIM #164770) was discovered by Guo et al. (2019), who observed it as a pathogenic splicing variant induced by an internal two‐nucleotide deletion. The consanguineous proband had a severe phenotype due to being homozygous for this allele, and their symptoms included hypotonicity, focal seizures, brain malformation and mild skeletal abnormalities. In cases of other monoallelic CSF1R loss‐of‐function mutations, a phenotype of ‘hereditary diffuse leukoencephalopathy with spheroids’ (HDLS) is often observed, a neurodegenerative disorder with adult onset and variable presentation. Although the SNP rs11952821 only slightly enhances the CSF1R‐15a acceptor site, it is comparable to the improvement induced by a SNP at the same position in ATM‐27a, which was demonstrated to have a significant splicing effect. We would therefore expect rs11952821 carriers to have elevated CSF1R‐15a inclusion leading to NMD and lower full‐length CSF1R translation, and an associated phenotype equivalent to very mild HDLS. Due to the variable presentation of classical HDLS, this phenotype could manifest as an increased general risk of neurodegenerative disease and/or a more severe prognosis when neurodegenerative symptoms are already present for other reasons.

DMD‐2a(S/L)

This CE in Duchenne Muscular Dystrophy (DMD–OMIM #300377) was detected in a patient diagnosed with Duchenne muscular dystrophy (Ishibashi et al., 2006). The CE bears a short (S) and a long (L) isoforms, with a shared donor site, and two acceptor sites four nucleotides aside. Unusually, the causative mutation in this case was significantly distal on the same allele––a tandem duplication of DMD exons 8–11. The affected 3‐year‐old male (XY) patient had a characteristic Duchenne muscular dystrophy phenotype for his age, with extremely high serum creatine kinase and early signs of muscle weakness. However, because the exons 8–11 duplication already induces a reading frame shift in the DMD transcript, it is not possible to assign aspects of this patient’s phenotype to DMD‐2a splicing alone. The SNP rs145743673, respectively, weakens and strengthens the acceptor sites of the CE short and long isoforms, and would therefore be expected to induce splice‐switching from the short to the long isoform. As with LHCGR‐6a and GHRL‐4a, we have refrained from predicting the effect of CE splice‐switching on total transcript and protein levels. It is possible that a GWAS could detect a correlation between rs145743673 and levels of dystrophin in normal individuals, though the rarity of the SNP (1.1%) would make this challenging, and any differences detected may be largely asymptomatic if the high variability of ‘normal’ dystrophin expression is any indication (Beekman et al., 2018).

NF1‐36a

This CE in Neurofibromin 1 (NF1–OMIM #613113) was first detected in the peripheral blood RNA of at least 17 healthy control individuals (Landrith et al., 2020). Although this splice variant is not yet associated with a phenotype, loss‐of‐function mutations in NF1 are typically causative of type 1 neurofibromatosis (NF1), which is characterised by ubiquitous benign nerve tumours, café‐au‐lait skin pigmentation, neurocognitive impairment and a greatly elevated risk of cancer. The SNP rs35888506 converts the NF1‐36a donor site dinucleotide from a GC to a stronger GT. It would therefore be expected to cause substantially higher inclusion of this CE, although low‐level splicing of the GC allele might also be observed. We predict an associated phenotype equivalent to very mild NF1, which may be detected as elevated cancer risk and elevated risk of neurocognitive impairment.

POC1B‐9a

This CE in Proteome Of Centriole Protein 1B (POC1B–OMIM #614784) was detected in blood RNA from a compound heterozygous patient with adult‐onset symptoms of reduced visual acuity, reduced visual contrast and photophobia (Weisschuh et al., 2021). Pathogenic mutations to POC1B generally cause some form of retinopathy, although symptoms and age of onset are highly variable. In this case, the patient’s mutation destroyed the POC1B exon 7 donor site, resulting in variable skipping of exons 6 and 7 in addition to POC1B‐9a inclusion. Consequently, POC1B‐9a inclusion by itself cannot be definitively implicated in the proband’s symptoms. However, like LHCGR‐6a and APC‐11a, POC1B‐9a also exhibits high sequence conservation (Figure 2c), indicating that it may be a bona fide poison exon. Similar to the F8‐13a SNP, rs11323565 causes a length variation in the POC1B‐9a acceptor site poly‐T tract, extending it from 12T to 13T. But unlike F8‐13a, in this case an expansion of the poly‐T tract appears more likely to increase inclusion of the CE, as the change in AGEZ length is minimal. We would therefore predict that this SNP may be associated with diminished visual acuity in the elderly. Our comparison of POC1B‐9a with F8‐13a led us to note that length variations in acceptor site poly‐T tracts appear to have competing and contradictory effects on exon recognition, as such variants can simultaneously strengthen an acceptor splice motif whilst weakening branch point definition. We would welcome any further research towards reliably predicting the effects of these variants.

Conclusions and recommendations

Although we discovered only five new probable SNPtic exons, we were encouraged to observe that in four of these cases, the predicted splicing effect was generally consistent with the correlated phenotype, whilst the fifth (GHRL‐4a) was expected to cause complex splice‐switching and thus neither supported nor contradicted our model. We also highlighted an additional four possible SNPtic exons; their associated SNPs may prove worthwhile targets of future GWASes. A reviewer of this report observed that several of the SNPs in the ‘Probable’ and ‘Possible’ SNPtic exon categories fell outside of the highly conserved splice motif regions (as defined in step 5a of our search method), whilst this was true for only one (F8‐13a) in the ‘Known’ category. This discrepancy may be a consequence of the fact that, prior to this report, there had not been any general attempts to match SNPtic exons with population phenotypes. Consequently, only those SNPs with the most noticeable splicing effects have been characterised, and these primarily occur in the most highly conserved splice motif nucleotides. Whilst we hope these findings will be of interest, our primary goal in reporting them is to demonstrate proof of concept for the utility of our discovery method. In future, researchers reporting on new cryptic exons may apply this method for no cost greater than a few minutes expended on online database queries, and in doing so may discover better explanations for published results, or fruitful new lines of inquiry for their research. Antisense oligonucleotide‐based skipping of NMD‐inducing poison exons is already showing great promise for the treatment of heritable encephalopathies (Aziz et al., 2021), and it is possible that further discoveries of SNPtic exons will reveal additional novel antisense targets. As innovations in RNA sequencing technology continue to accelerate the discovery of new cryptic exons and pseudoexons, so will grow the potential for making exciting new connections between this relatively small body of data and the vast number of SNP–phenotype associations already discovered by GWASes. Addendum: Close to time of publication we identified what appear to be two additional examples of known SNPtic exons, one in the gene Ras Homolog Family Member A (RHOA‐OMIM #165390) (Medina et al., 2012) and one in the gene F‐Box Protein 38 (FBXO38‐OMIM #608533) (Saferali et al. 2019). Although we could not include these examples in our analysis without further peer review, we wish to acknowledge the original reports as literature of interest.

CONFLICT OF INTEREST

The authors have declared no conflicts of interest.

ETHICAL COMPLIANCE

As our study exclusively used published and publicly available data, it did not require approval by an ethics committee. Supplementary Material Click here for additional data file. Click here for additional data file.

84 in total

1. Human GC-AG alternative intron isoforms with weak donor sites show enhanced consensus at acceptor exon positions.

Authors: T A Thanaraj; F Clark
Journal: Nucleic Acids Res Date: 2001-06-15 Impact factor: 16.971

2. Abnormal splice in a mutant human beta-globin gene not at the site of a mutation.

Authors: C Dobkin; R G Pergolizzi; P Bahre; A Bank
Journal: Proc Natl Acad Sci U S A Date: 1983-03 Impact factor: 11.205

3. Reccurrent F8 Intronic Deletion Found in Mild Hemophilia A Causes Alu Exonization.

Authors: Yohann Jourdy; Alexandre Janin; Mathilde Fretigny; Anne Lienhart; Claude Négrier; Dominique Bozon; Christine Vinciguerra
Journal: Am J Hum Genet Date: 2018-01-18 Impact factor: 11.025

4. Possible role of preproghrelin gene polymorphisms in susceptibility to bulimia nervosa.

Authors: Tetsuya Ando; Gen Komaki; Tetsuro Naruo; Kenjiro Okabe; Masato Takii; Keisuke Kawai; Fujiko Konjiki; Michiko Takei; Takakazu Oka; Kaori Takeuchi; Akinori Masuda; Norio Ozaki; Hiroyuki Suematsu; Kenzo Denda; Nobuo Kurokawa; Kotarou Itakura; Chikara Yamaguchi; Masaki Kono; Tatsuyo Suzuki; Yoshikatsu Nakai; Aya Nishizono-Maher; Masanori Koide; Ken Murakami; Kiyohide Nagamine; Yuichiro Tomita; Kazuyoshi Ookuma; Kazumi Tomita; Eita Tonai; Akira Ooshima; Toshio Ishikawa; Yuhei Ichimaru
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2006-12-05 Impact factor: 3.568

5. Cloning of a novel insulin-regulated ghrelin transcript in prostate cancer.

Authors: Inge Seim; Amy A Lubik; Melanie L Lehman; Nadine Tomlinson; Eliza J Whiteside; Adrian C Herington; Colleen C Nelson; Lisa K Chopin
Journal: J Mol Endocrinol Date: 2013-02-15 Impact factor: 5.098

Review 6. Role of pseudoexons and pseudointrons in human cancer.

Authors: Maurizio Romano; Emanuele Buratti; Diana Baralle
Journal: Int J Cell Biol Date: 2013-09-24

7. AG-exclusion zone revisited: Lessons to learn from 91 intronic NF1 3' splice site mutations outside the canonical AG-dinucleotides.

Authors: Katharina Wimmer; Esther Schamschula; Annekatrin Wernstedt; Pia Traunfellner; Albert Amberger; Johannes Zschocke; Peter Kroisel; Yunjia Chen; Tom Callens; Ludwine Messiaen
Journal: Hum Mutat Date: 2020-03-11 Impact factor: 4.878

8. Conserved long-range base pairings are associated with pre-mRNA processing of human genes.

Authors: Svetlana Kalmykova; Marina Kalinina; Stepan Denisov; Alexey Mironov; Dmitry Skvortsov; Roderic Guigó; Dmitri Pervouchine
Journal: Nat Commun Date: 2021-04-16 Impact factor: 14.919

9. Clinical Characteristics of POC1B-Associated Retinopathy and Assignment of Pathogenicity to Novel Deep Intronic and Non-Canonical Splice Site Variants.

Authors: Nicole Weisschuh; Pascale Mazzola; Miriam Bertrand; Tobias B Haack; Bernd Wissinger; Susanne Kohl; Katarina Stingl
Journal: Int J Mol Sci Date: 2021-05-20 Impact factor: 5.923

1 in total

1. A spotter's guide to SNPtic exons: The common splice variants underlying some SNP-phenotype correlations.

Authors: Niall Patrick Keegan; Sue Fletcher
Journal: Mol Genet Genomic Med Date: 2021-10-28 Impact factor: 2.183

1 in total