Literature DB >> 30475984

A plugin for the Ensembl Variant Effect Predictor that uses MaxEntScan to predict variant spliceogenicity.

Jannah Shamsani1, Stephen H Kazakoff1, Irina M Armean2, Will McLaren2, Michael T Parsons1, Bryony A Thompson3, Tracy A O'Mara1, Sarah E Hunt2, Nicola Waddell1, Amanda B Spurdle1.   

Abstract

SUMMARY: Assessing the pathogenicity of genetic variants can be a complex and challenging task. Spliceogenic variants, which alter mRNA splicing, may yield mature transcripts that encode non-functional protein products, an important predictor of Mendelian disease risk. However, most variant annotation tools do not adequately assess spliceogenicity outside the native splice site and thus the disease-causing potential of variants in other intronic and exonic regions is often overlooked. Here, we present a plugin for the Ensembl Variant Effect Predictor that packages MaxEntScan and extends its functionality to provide splice site predictions using a maximum entropy model. The plugin incorporates a sliding window algorithm to predict splice site loss or gain for any variant that overlaps a transcript feature. We also demonstrate the utility of the plugin by comparing our predictions to two mRNA splicing datasets containing several cancer-susceptibility genes.
AVAILABILITY AND IMPLEMENTATION: Source code is freely available under the Apache License, Version 2.0: https://github.com/Ensembl/VEP_plugins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 30475984      PMCID: PMC6596880          DOI: 10.1093/bioinformatics/bty960

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

RNA splicing is a tightly-regulated process that involves the excision of non-coding intronic sequences from nascent precursor mRNA and the ligation of coding exons to produce mature transcripts ready for translation into protein. The splicing reaction is catalyzed by molecular machinery which recognizes short consensus sequences called donor and acceptor sites at the intron/exon boundaries. Variants that impact these sequences or other regulatory sequences may disrupt normal splicing and result in the synthesis of aberrant or non-functional transcript or protein products. The identification of such splicing defects remains a challenge, despite nearly a third of all pathogenic (disease-associated) variants being predicted to impact normal splicing (Lim ; Sterne-Weiler ). Variants that affect the highly conserved GT-AG dinucleotides in the native donor and acceptor splice sites are routinely assessed for spliceogenicity because they are generally presumed to cause severe splicing aberrations. However, variants outside of the native splice sites are often overlooked for their role in splicing. Although these are less likely to impact splicing, variants outside the native splice sites have been shown to abolish native splice sites and activate de novo or pre-existing cryptic splice sites (Houdayer ; Rodríguez-Balada ; Sanz ; Théry ; Walker ). To address this issue and enable the rapid assessment of complex sequence variants, we developed a plugin for the Ensembl Variant Effect Predictor (VEP) (McLaren ) that encapsulates MaxEntScan (Eng ; Yeo and Burge, 2004) and uses its functionality to generate splice prediction scores for any variant that overlaps a transcript feature. We demonstrate the utility of the plugin by comparing our spliceogenicity predictions to in vitro splicing results using the current score thresholds defined by the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium (Spurdle ).

2 Approach

The core functionality of the MaxEntScan plugin is designed to provide three sets of scores that can be incorporated into a comprehensive framework to predict the fitness of a given sequence motif as either a donor or an acceptor splice site based on a maximum entropy model. First, the plugin provides scores necessary to predict the loss of a native splice site for single nucleotide variants (SNVs). These are the scores for reference and alternate sequence motifs, using 9-mers for native donor splice sites and 23-mers for acceptor splice sites as described in (Yeo and Burge, 2004). Second, a sliding window algorithm called MES-SWA was added to assess deeper intronic, exonic and other types of variants, such as insertions and deletions (Vallée ). A scoring window is slid across the reference or alternate sequence such that the reference or alternate allele moves from either the 9th position (donor) or 23rd position (acceptor) to the 1st position to capture the highest score as the most fit potential donor or acceptor splice site. To assess the impact of variants, reference comparison scores are also provided. For SNVs, the reference comparison scores are derived from the sequence with the same frame as the highest scoring k-mer containing the alternate allele. For all other variants, the frame of the highest scoring k-mer containing the reference sequence is used to derive the comparison score. Third, the plugin provides additional scores to assess if a de novo donor or acceptor can out-compete the native splice site. This is achieved using the MES-NCSS function which scores the nearest canonical donor and acceptor splice sites both upstream and downstream of the variant.

3 Application examples–analysis of mRNA splicing datasets for cancer gene variants

We applied our plugin to 1116 variants in the BRCA1 and BRCA2 DNA damage repair genes and the MLH1, MSH2, MSH6 and PMS2 mismatch repair genes, which have been previously tested for possible splicing aberrations using mRNA assays. Data was collated from multiple publications and existing databases, summarized in Supplementary Table S1. An overview of the datasets is described in the Supplementary Material. The utility of this plugin to predict variant spliceogenicity was analyzed based on two conditions: loss of native splice sites, or gain of de novo splice site. Variants assessed in this analysis include SNVs, insertions and deletions within the native splice sites and other intronic and exonic regions (Fig. 1A).
Fig. 1.

(A) The MaxEntScan plugin provides scores for sequence motifs within the native splice sites and other intronic and exonic regions. The native donor splice site is a 9-mer that overlaps the last three nucleotides of an exon and the first six nucleotides of a downstream intron. The native acceptor splice site is a 23-mer that overlaps the last 20 nucleotides of an intron and the first three nucleotides of a downstream exon. (B) Variants that overlapped the native splice sites were assessed for native splice site loss whilst variants outside of the native splice sites were assessed for gain of a de novo or cryptic splice site. Spliceogenicity was assessed using the reference (ref), alternate (alt) and difference (diff; ref–alt) maximum entropy scores and the ENIGMA score thresholds. SNVs within the native splice site were assessed for splice site loss using the native splice site scores, whilst indels that overlapped the native splice sites were assessed for splice site loss using the MES-SWA function. Variants predicted to diminish splicing (diff > 0) were further classified as having a high (alt < 6.2), moderate (6.2 ≤ alt ≤ 8.5) or low (alt > 8.5) potential of disrupting native splice sites. High and moderate classifications may also be downgraded to moderate and low, respectively (diff < 1.15). Creation of a de novo or cryptic splice site was assessed using the MES-SWA and MES-NCSS functions. Variants predicted to increase splicing (diff < 0) were further classified as having a high (alt > 8.5), moderate (6.2 ≤ alt ≤ 8.5) or low (alt < 6.2) potential of creating a de novo or cryptic splice site. Variants were only classified as having moderate potential if they could be shown to outcompete the nearest native splice site. (C) Sensitivity (pink) and specificity (green) for spliceogenic predictions of 1116 BRCA1/2 and MMR variants made using the MaxEntScan plugin. Variants predicted having a high or moderate potential of native loss or de novo gain were expected to cause splicing aberrations. Spliceogenic predictions were compared to the reported in vitro splicing assays. Sensitivity measures the proportion of variants correctly predicted causing splicing aberrations, whilst specificity measures the proportion of variants correctly predicted to retain splicing profiles (100% reflects a perfect prediction). The specificity to predict normal splicing across the GT-AG donor and acceptor dinucleotides could not be calculated as only one true negative result was identified in those regions

(A) The MaxEntScan plugin provides scores for sequence motifs within the native splice sites and other intronic and exonic regions. The native donor splice site is a 9-mer that overlaps the last three nucleotides of an exon and the first six nucleotides of a downstream intron. The native acceptor splice site is a 23-mer that overlaps the last 20 nucleotides of an intron and the first three nucleotides of a downstream exon. (B) Variants that overlapped the native splice sites were assessed for native splice site loss whilst variants outside of the native splice sites were assessed for gain of a de novo or cryptic splice site. Spliceogenicity was assessed using the reference (ref), alternate (alt) and difference (diff; ref–alt) maximum entropy scores and the ENIGMA score thresholds. SNVs within the native splice site were assessed for splice site loss using the native splice site scores, whilst indels that overlapped the native splice sites were assessed for splice site loss using the MES-SWA function. Variants predicted to diminish splicing (diff > 0) were further classified as having a high (alt < 6.2), moderate (6.2 ≤ alt ≤ 8.5) or low (alt > 8.5) potential of disrupting native splice sites. High and moderate classifications may also be downgraded to moderate and low, respectively (diff < 1.15). Creation of a de novo or cryptic splice site was assessed using the MES-SWA and MES-NCSS functions. Variants predicted to increase splicing (diff < 0) were further classified as having a high (alt > 8.5), moderate (6.2 ≤ alt ≤ 8.5) or low (alt < 6.2) potential of creating a de novo or cryptic splice site. Variants were only classified as having moderate potential if they could be shown to outcompete the nearest native splice site. (C) Sensitivity (pink) and specificity (green) for spliceogenic predictions of 1116 BRCA1/2 and MMR variants made using the MaxEntScan plugin. Variants predicted having a high or moderate potential of native loss or de novo gain were expected to cause splicing aberrations. Spliceogenic predictions were compared to the reported in vitro splicing assays. Sensitivity measures the proportion of variants correctly predicted causing splicing aberrations, whilst specificity measures the proportion of variants correctly predicted to retain splicing profiles (100% reflects a perfect prediction). The specificity to predict normal splicing across the GT-AG donor and acceptor dinucleotides could not be calculated as only one true negative result was identified in those regions The current ENIGMA thresholds (https://enigmaconsortium.org) were used to classify variant spliceogenicity depending on variant type and location (Fig. 1B). The sensitivity of this plugin to predict splicing aberrations across different regions varied between 93.3% and 100%, and the overall sensitivity to predict splicing aberrations reached 98.7%, whilst the overall specificity to predict normal splicing reached 96.5% (Fig. 1C;Supplementary Table S2). In summary, the spliceogenicity predictions using the ENIGMA thresholds compared well with the observed in vitro splicing results. Other user-defined thresholds may also be applied to assess variant spliceogenicity beyond the genes assessed here. The MaxEntScan plugin provides a simple and flexible means of assessing variant spliceogenicity, regardless of the location.

Funding

This work was supported by a QIMR Berghofer PhD scholarship (J.S.). N.W. is supported by an National Health and Medical Research Council of Australia Senior Research Fellowship (APP1139071). A.B.S. is supported by an National Health and Medical Research Council of Australia Senior Research Fellowship (APP1061779). T.A.O.’M. is supported by an National Health and Medical Research Council of Australia Early Career Fellowship (APP1111246). Ensembl receives majority funding from the Wellcome Trust (grant number WT108749/Z/15/Z) with additional funding for specific project components from the National Human Genome Research Institute (U41HG007823 and 2U41HG007234), the Biotechnology and Biological Sciences Research Council (BB/N019563/1 and BB/M011615/1), Open Targets, the Wellcome Trust (WT104947/Z/14/Z, WT200990/Z/16/Z, WT201535/Z/16/Z, WT108749/Z/15/A, WT212925/Z/18/Z), ELIXIR: the research infrastructure for life-science data and the European Molecular Biology Laboratory. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° 733161 (MultipleMS). Conflict of Interest: none declared. Click here for additional data file.
  12 in total

1.  Loss of exon identity is a common mechanism of human inherited disease.

Authors:  Timothy Sterne-Weiler; Jonathan Howard; Matthew Mort; David N Cooper; Jeremy R Sanford
Journal:  Genome Res       Date:  2011-07-12       Impact factor: 9.043

2.  Contribution of bioinformatics predictions and functional splicing assays to the interpretation of unclassified variants of the BRCA genes.

Authors:  Jean Christophe Théry; Sophie Krieger; Pascaline Gaildrat; Françoise Révillion; Marie-Pierre Buisine; Audrey Killian; Christiane Duponchel; Antoine Rousselin; Dominique Vaur; Jean-Philippe Peyrat; Pascaline Berthet; Thierry Frébourg; Alexandra Martins; Agnès Hardouin; Mario Tosi
Journal:  Eur J Hum Genet       Date:  2011-06-15       Impact factor: 4.246

3.  In silico, in vitro and case-control analyses as an effective combination for analyzing BRCA1 and BRCA2 unclassified variants in a population-based sample.

Authors:  Marta Rodríguez-Balada; Bàrbara Roig; Lourdes Martorell; Mireia Melé; Mònica Salvat; Elisabet Vilella; Joan Borràs; Josep Gumà
Journal:  Cancer Genet       Date:  2016-09-20

4.  A high proportion of DNA variants of BRCA1 and BRCA2 is associated with aberrant splicing in breast/ovarian cancer patients.

Authors:  David J Sanz; Alberto Acedo; Mar Infante; Mercedes Durán; Lucía Pérez-Cabornero; Eva Esteban-Cardeñosa; Enrique Lastra; Franco Pagani; Cristina Miner; Eladio A Velasco
Journal:  Clin Cancer Res       Date:  2010-03-09       Impact factor: 12.531

5.  Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes.

Authors:  Kian Huat Lim; Luciana Ferraris; Madeleine E Filloux; Benjamin J Raphael; William G Fairbrother
Journal:  Proc Natl Acad Sci U S A       Date:  2011-06-17       Impact factor: 11.205

6.  Detection of splicing aberrations caused by BRCA1 and BRCA2 sequence variants encoding missense substitutions: implications for prediction of pathogenicity.

Authors:  Logan C Walker; Phillip J Whiley; Fergus J Couch; Daniel J Farrugia; Sue Healey; Diana M Eccles; Feng Lin; Samantha A Butler; Sheila A Goff; Bryony A Thompson; Sunil R Lakhani; Leonard M Da Silva; Sean V Tavtigian; David E Goldgar; Melissa A Brown; Amanda B Spurdle
Journal:  Hum Mutat       Date:  2010-06       Impact factor: 4.878

7.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals.

Authors:  Gene Yeo; Christopher B Burge
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

8.  Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: maximum entropy estimates of splice junction strengths.

Authors:  Laura Eng; Gabriela Coutinho; Shareef Nahas; Gene Yeo; Robert Tanouye; Mahnoush Babaei; Thilo Dörk; Christopher Burge; Richard A Gatti
Journal:  Hum Mutat       Date:  2004-01       Impact factor: 4.878

9.  Adding In Silico Assessment of Potential Splice Aberration to the Integrated Evaluation of BRCA Gene Unclassified Variants.

Authors:  Maxime P Vallée; Tonya L Di Sera; David A Nix; Andrew M Paquette; Michael T Parsons; Russel Bell; Andrea Hoffman; Frans B L Hogervorst; David E Goldgar; Amanda B Spurdle; Sean V Tavtigian
Journal:  Hum Mutat       Date:  2016-04-15       Impact factor: 4.878

10.  The Ensembl Variant Effect Predictor.

Authors:  William McLaren; Laurent Gil; Sarah E Hunt; Harpreet Singh Riat; Graham R S Ritchie; Anja Thormann; Paul Flicek; Fiona Cunningham
Journal:  Genome Biol       Date:  2016-06-06       Impact factor: 13.583

View more
  17 in total

1.  Molecular and genetic characterization of a large Brazilian cohort presenting hearing loss.

Authors:  Ana Carla Batissoco; Vinicius Pedroso-Campos; Eliete Pardono; Juliana Sampaio-Silva; Cindy Yukimi Sonoda; Gleiciele Alice Vieira-Silva; Estefany Uchoa da Silva de Oliveira Longati; Diego Mariano; Ana Cristina Hiromi Hoshino; Robinson Koji Tsuji; Rafaela Jesus-Santos; Osório Abath-Neto; Ricardo Ferreira Bento; Jeanne Oiticica; Karina Lezirovitz
Journal:  Hum Genet       Date:  2021-10-01       Impact factor: 4.132

2.  Loss of FOCAD, operating via the SKI messenger RNA surveillance pathway, causes a pediatric syndrome with liver cirrhosis.

Authors:  Ricardo Moreno Traspas; Tze Shin Teoh; Pui-Mun Wong; Michael Maier; Crystal Y Chia; Kenneth Lay; Nur Ain Ali; Austin Larson; Fuad Al Mutairi; Nouriya Abbas Al-Sannaa; Eissa Ali Faqeih; Majid Alfadhel; Huma Arshad Cheema; Juliette Dupont; Stéphane Bézieau; Bertrand Isidor; Dorrain Yanwen Low; Yulan Wang; Grace Tan; Poh San Lai; Hugues Piloquet; Madeleine Joubert; Hulya Kayserili; Kimberly A Kripps; Shareef A Nahas; Eric P Wartchow; Mikako Warren; Gandham SriLakshmi Bhavani; Majed Dasouki; Renata Sandoval; Elisa Carvalho; Luiza Ramos; Gilda Porta; Bin Wu; Harsha Prasada Lashkari; Badr AlSaleem; Raeda M BaAbbad; Anabela Natália Abreu Ferrão; Vasiliki Karageorgou; Natalia Ordonez-Herrera; Suliman Khan; Peter Bauer; Benjamin Cogne; Aida M Bertoli-Avella; Marie Vincent; Katta Mohan Girisha; Bruno Reversade
Journal:  Nat Genet       Date:  2022-07-21       Impact factor: 41.307

3.  RNA-seq analysis, targeted long-read sequencing and in silico prediction to unravel pathogenic intronic events and complicated splicing abnormalities in dystrophinopathy.

Authors:  Mariko Okubo; Satoru Noguchi; Tomonari Awaya; Motoyasu Hosokawa; Nobue Tsukui; Megumu Ogawa; Shinichiro Hayashi; Hirofumi Komaki; Madoka Mori-Yoshimura; Yasushi Oya; Yuji Takahashi; Tetsuhiro Fukuyama; Michinori Funato; Yousuke Hosokawa; Satoru Kinoshita; Tsuyoshi Matsumura; Sadao Nakamura; Azusa Oshiro; Hiroshi Terashima; Tetsuro Nagasawa; Tatsuharu Sato; Yumi Shimada; Yasuko Tokita; Masatoshi Hagiwara; Katsuhisa Ogata; Ichizo Nishino
Journal:  Hum Genet       Date:  2022-09-01       Impact factor: 5.881

4.  CI-SpliceAI-Improving machine learning predictions of disease causing splicing variants using curated alternative splice sites.

Authors:  Yaron Strauch; Jenny Lord; Mahesan Niranjan; Diana Baralle
Journal:  PLoS One       Date:  2022-06-03       Impact factor: 3.752

5.  Genomic profiling of human vascular cells identifies TWIST1 as a causal gene for common vascular diseases.

Authors:  Sylvia T Nurnberg; Marie A Guerraty; Robert C Wirka; H Shanker Rao; Milos Pjanic; Scott Norton; Felipe Serrano; Ljubica Perisic; Susannah Elwyn; John Pluta; Wei Zhao; Stephanie Testa; YoSon Park; Trieu Nguyen; Yi-An Ko; Ting Wang; Ulf Hedin; Sanjay Sinha; Yoseph Barash; Christopher D Brown; Thomas Quertermous; Daniel J Rader
Journal:  PLoS Genet       Date:  2020-01-09       Impact factor: 5.917

6.  An evaluation of pipelines for DNA variant detection can guide a reanalysis protocol to increase the diagnostic ratio of genetic diseases.

Authors:  Raquel Romero; Lorena de la Fuente; Marta Del Pozo-Valero; Rosa Riveiro-Álvarez; María José Trujillo-Tiebas; Inmaculada Martín-Mérida; Almudena Ávila-Fernández; Ionut-Florin Iancu; Irene Perea-Romero; Gonzalo Núñez-Moreno; Alejandra Damián; Cristina Rodilla; Berta Almoguera; Marta Cortón; Carmen Ayuso; Pablo Mínguez
Journal:  NPJ Genom Med       Date:  2022-01-27       Impact factor: 8.617

7.  A functionally impaired missense variant identified in French Canadian families implicates FANCI as a candidate ovarian cancer-predisposing gene.

Authors:  Caitlin T Fierheller; Laure Guitton-Sert; Wejdan M Alenezi; Timothée Revil; Kathleen K Oros; Yuandi Gao; Karine Bedard; Suzanna L Arcand; Corinne Serruya; Supriya Behl; Liliane Meunier; Hubert Fleury; Eleanor Fewings; Deepak N Subramanian; Javad Nadaf; Jeffrey P Bruce; Rachel Bell; Diane Provencher; William D Foulkes; Zaki El Haffaf; Anne-Marie Mes-Masson; Jacek Majewski; Trevor J Pugh; Marc Tischkowitz; Paul A James; Ian G Campbell; Celia M T Greenwood; Jiannis Ragoussis; Jean-Yves Masson; Patricia N Tonin
Journal:  Genome Med       Date:  2021-12-03       Impact factor: 11.117

8.  Contribution of mRNA Splicing to Mismatch Repair Gene Sequence Variant Interpretation.

Authors:  Bryony A Thompson; Rhiannon Walters; Michael T Parsons; Troy Dumenil; Mark Drost; Yvonne Tiersma; Noralane M Lindor; Sean V Tavtigian; Niels de Wind; Amanda B Spurdle
Journal:  Front Genet       Date:  2020-07-27       Impact factor: 4.599

9.  GJB2 and GJB6 Genetic Variant Curation in an Argentinean Non-Syndromic Hearing-Impaired Cohort.

Authors:  Paula Buonfiglio; Carlos D Bruque; Leonela Luce; Florencia Giliberto; Vanesa Lotersztein; Sebastián Menazzi; Bibiana Paoli; Ana Belén Elgoyhen; Viviana Dalamón
Journal:  Genes (Basel)       Date:  2020-10-21       Impact factor: 4.096

10.  Under-ascertainment of breast cancer susceptibility gene carriers in a cohort of New Zealand female breast cancer patients.

Authors:  Vanessa Lattimore; Michael T Parsons; Amanda B Spurdle; John Pearson; Klaus Lehnert; Jan Sullivan; Caroline Lintott; Suzannah Bawden; Helen Morrin; Bridget Robinson; Logan Walker
Journal:  Breast Cancer Res Treat       Date:  2020-10-28       Impact factor: 4.872

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.