| Literature DB >> 30475984 |
Jannah Shamsani1, Stephen H Kazakoff1, Irina M Armean2, Will McLaren2, Michael T Parsons1, Bryony A Thompson3, Tracy A O'Mara1, Sarah E Hunt2, Nicola Waddell1, Amanda B Spurdle1.
Abstract
SUMMARY: Assessing the pathogenicity of genetic variants can be a complex and challenging task. Spliceogenic variants, which alter mRNA splicing, may yield mature transcripts that encode non-functional protein products, an important predictor of Mendelian disease risk. However, most variant annotation tools do not adequately assess spliceogenicity outside the native splice site and thus the disease-causing potential of variants in other intronic and exonic regions is often overlooked. Here, we present a plugin for the Ensembl Variant Effect Predictor that packages MaxEntScan and extends its functionality to provide splice site predictions using a maximum entropy model. The plugin incorporates a sliding window algorithm to predict splice site loss or gain for any variant that overlaps a transcript feature. We also demonstrate the utility of the plugin by comparing our predictions to two mRNA splicing datasets containing several cancer-susceptibility genes.Entities:
Mesh:
Year: 2019 PMID: 30475984 PMCID: PMC6596880 DOI: 10.1093/bioinformatics/bty960
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) The MaxEntScan plugin provides scores for sequence motifs within the native splice sites and other intronic and exonic regions. The native donor splice site is a 9-mer that overlaps the last three nucleotides of an exon and the first six nucleotides of a downstream intron. The native acceptor splice site is a 23-mer that overlaps the last 20 nucleotides of an intron and the first three nucleotides of a downstream exon. (B) Variants that overlapped the native splice sites were assessed for native splice site loss whilst variants outside of the native splice sites were assessed for gain of a de novo or cryptic splice site. Spliceogenicity was assessed using the reference (ref), alternate (alt) and difference (diff; ref–alt) maximum entropy scores and the ENIGMA score thresholds. SNVs within the native splice site were assessed for splice site loss using the native splice site scores, whilst indels that overlapped the native splice sites were assessed for splice site loss using the MES-SWA function. Variants predicted to diminish splicing (diff > 0) were further classified as having a high (alt < 6.2), moderate (6.2 ≤ alt ≤ 8.5) or low (alt > 8.5) potential of disrupting native splice sites. High and moderate classifications may also be downgraded to moderate and low, respectively (diff < 1.15). Creation of a de novo or cryptic splice site was assessed using the MES-SWA and MES-NCSS functions. Variants predicted to increase splicing (diff < 0) were further classified as having a high (alt > 8.5), moderate (6.2 ≤ alt ≤ 8.5) or low (alt < 6.2) potential of creating a de novo or cryptic splice site. Variants were only classified as having moderate potential if they could be shown to outcompete the nearest native splice site. (C) Sensitivity (pink) and specificity (green) for spliceogenic predictions of 1116 BRCA1/2 and MMR variants made using the MaxEntScan plugin. Variants predicted having a high or moderate potential of native loss or de novo gain were expected to cause splicing aberrations. Spliceogenic predictions were compared to the reported in vitro splicing assays. Sensitivity measures the proportion of variants correctly predicted causing splicing aberrations, whilst specificity measures the proportion of variants correctly predicted to retain splicing profiles (100% reflects a perfect prediction). The specificity to predict normal splicing across the GT-AG donor and acceptor dinucleotides could not be calculated as only one true negative result was identified in those regions