Literature DB >> 34104972

eSkip-Finder: a machine learning-based web application and database to identify the optimal sequences of antisense oligonucleotides for exon skipping.

Shuntaro Chiba1, Kenji Rowel Q Lim2, Narin Sheri2, Saeed Anwar2, Esra Erkut2, Md Nur Ahad Shah2, Tejal Aslesh2, Stanley Woo2, Omar Sheikh2, Rika Maruyama2, Hiroaki Takano1, Katsuhiko Kunitake3, William Duddy4, Yasushi Okuno1,5, Yoshitsugu Aoki3, Toshifumi Yokota2.   

Abstract

Exon skipping using antisense oligonucleotides (ASOs) has recently proven to be a powerful tool for mRNA splicing modulation. Several exon-skipping ASOs have been approved to treat genetic diseases worldwide. However, a significant challenge is the difficulty in selecting an optimal sequence for exon skipping. The efficacy of ASOs is often unpredictable, because of the numerous factors involved in exon skipping. To address this gap, we have developed a computational method using machine-learning algorithms that factors in many parameters as well as experimental data to design highly effective ASOs for exon skipping. eSkip-Finder (https://eskip-finder.org) is the first web-based resource for helping researchers identify effective exon skipping ASOs. eSkip-Finder features two sections: (i) a predictor of the exon skipping efficacy of novel ASOs and (ii) a database of exon skipping ASOs. The predictor facilitates rapid analysis of a given set of exon/intron sequences and ASO lengths to identify effective ASOs for exon skipping based on a machine learning model trained by experimental data. We confirmed that predictions correlated well with in vitro skipping efficacy of sequences that were not included in the training data. The database enables users to search for ASOs using queries such as gene name, species, and exon number.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 34104972      PMCID: PMC8265194          DOI: 10.1093/nar/gkab442

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Exon skipping is a strategy that uses antisense oligonucleotides (ASOs) to exclude specific exons from the mature mRNA transcript of a given gene. ASOs are short nucleic acid analogs of diverse chemistry that recognize target mRNA sequences by base pairing. Once hybridized to their targets, ASOs act as steric blockers that prevent splicing factors and other critical proteins from accessing these sequences (1). It is through this mechanism that ASOs could be designed to modulate splicing, for example, by targeting exonic splice enhancer sequences. Given its simplicity and versatility, exon skipping has evolved to become a promising treatment for various genetic disorders, particularly muscular dystrophies (2,3). Exon skipping is showing promise as a therapy to treat Duchenne muscular dystrophy (DMD) and other genetic diseases (1). Most cases of DMD are caused by large, out-of-frame deletions in the DMD gene, leading to an absence of the sarcolemma-stabilizing dystrophin protein in muscle cells (4–6). Exon skipping was adapted to make out-of-frame DMD mutations in-frame by removing incompatible exons from the final transcript. In this manner, exon skipping facilitates the production of shorter but partially functional dystrophin protein in muscle, ameliorating DMD pathology. Recent years have seen the approval of four exon-skipping ASOs for DMD therapy by the U.S. Food and Drug Administration (FDA): eteplirsen (2016, Sarepta), golodirsen (2019, Sarepta), viltolarsen (2020, NS and NS Pharma), and casimersen (2021, Sarepta) (7–9). In addition, the FDA approved the first n-of-1 clinical trial with an exon-skipping ASO named milasen to treat a single patient with Batten's disease in 2018 (10). While these support the outlook of exon skipping as a viable therapeutic strategy for genetic diseases, there is much to improve especially regarding efficacy. For instance, eteplirsen could only restore up to about 1% dystrophin of healthy levels after 180 weeks of treatment in DMD patients (7). Previous studies from our group demonstrate the utility of in silico methods to design more effective ASOs (11–14). In one study, we developed an ASO with 12-fold higher in vitro exon skipping efficacy than eteplirsen using an in silico predictive tool based on statistical modelling (12). Such work and others have since uncovered numerous factors that could influence the exon skipping efficacy of an ASO including length, proximity to splice sites, target mRNA secondary structure, chemistry, and binding energy, among others (13,15–19)—all of which would be useful considerations in ASO design. However, previously developed online tools lack the capacity to simultaneously integrate many parameters critical to ASO design. To address this gap, we previously developed a computational method using a mathematical model based on 60 descriptor candidates as well as experimental data to design highly effective ASOs for exon skipping (13). Here, we improved this framework further using machine-learning algorithms and have developed eSkip-Finder, a web server to aid the design of effective ASOs for exon skipping. The overview of the webserver is presented in Figure 1. One part of eSkip-Finder is a first-of-its-kind comprehensive database of exon skipping ASOs for DMD and other genes. This database was populated using published scientific literature and patents as sources, and contains information such as ASO chemistry, ASO sequence, and experimentally obtained skipping efficacies. The second part is a first-of-its-kind machine learning-based application to predict highly effective ASO sequences for exon skipping, based on a training set of 566 skipping values from 209 unique ASOs extracted from the database above. Here, we describe the features of eSkip-Finder in-depth and outline the ways by which it can be used for the design of exon skipping ASOs.
Figure 1.

Overview of eSkip-Finder.

Overview of eSkip-Finder.

RESULTS

Construction of database

A database of exon-skipping ASOs and their skipping efficacy was built by manually collecting and curating research papers and patents written in English. The database compiles data on exon-skipping ASOs for various genes, including their sequence, target exon, chemistry, literature information, and experimental information such as the ASO concentration, the cell type used for testing, and the target species. The database statistics as of 15 April 2021, are shown in Supplementary Table S1. The complete dataset extracted for each ASO in the database is provided in the web server.

Predictive model of exon-skipping efficacy

We extracted skipping data that met the following criteria from the database to prepare our training and test datasets: (i) an absolute skipping efficacy was given by a numerical value; (ii) ASO concentration used in the experiment was given; (iii) rhabdomyosarcoma (RD) cells were used in the experiment to normalize experimental conditions; (iv) the skipping efficacy was not given as an EC50 value; (v) an ASO sequence that was sequential (not dual-targeting) in the pre-mRNA of dystrophin was used. After filtering the database, 426 skipping values from 109 unique ASO sequences and 228 skipping values from 124 unique ASO sequences were obtained for ASOs with phosphorodiamidate morpholino oligomers (PMO) and 2′-O-methyl oligonucleotides (2OMe), respectively. Predictive models were built for the PMO and 2OMe separately. We split the filtered data into a training set (90%) and a test set (10%), as shown in Supplementary Table S2, under two conditions, that is, training and test sets reproduced a similar distribution of skipping efficacy, and they did not share identical sequences, as shown in Supplementary Figure S1. We built a predictive model for the relative skipping efficacy of a target exon of dystrophin mRNA using the support vector regressor (SVR) implemented in scikit-learn version 0.23.2 (20). First, 32 features, tabulated in Table 1 and Supplementary Table S3, were prepared by feature engineering of ASO and/or its target exon sequences such as predicted binding score between the ASO and its target exon (21), predicted local RNA structure at the target site (22), and GC contents of the ASO and target exon. We also included the ASO concentration used in experimental studies as a feature. More details on the features used are provided elsewhere (13). Each feature was standardized before fitting the model. To select fewer important features, we built all possible combinations of the SVR model that used fewer than seven features, where the experimental ASO concentration was always included as a selected feature. The upper limit number of features, six, was chosen according to the available computational resources. For each model, the hyper-parameter optimization by a grid search for C, gamma, and epsilon was conducted with 100-time repeated splitting of the training data into 80% used to build a model and 20% used to validate the built model under the condition that they did not share identical sequences. Finally, we selected the SVR model that yielded the highest average R2 of the validation sets as shown in Supplementary Figure S2, the features of which are given in Table 1. The selected models for PMO and 2OMe were applied to the test set, yielding R2 values of 0.6 and 0.7, as shown in Figure 2. The correlation between experimental and predicted skipping efficacy was confirmed for various concentrations. The contributions of each feature to predictive performance (feature importance) were estimated by permutation importance (23). The importance of each feature was defined by decrease of the R2 value when the feature in the test set was permutated randomly. The feature importance calculation was repeated 100 times and the averaged values are shown in Table 1. The current model is focused on the prediction of the relative skipping efficacy of ASOs. However, other parameters should be also considered when designing ASOs, one of which is the off-target effect. Other bioinformatics tools such as SKIP-E (https://skip-e.geneticsandbioinformatics.eu/) could complement it.
Table 1.

Selected features

Selected features for PMOSelected features for 2OMe
NameDescriptionFIaNameDescriptionFIa
ASO concentrationbConcentration of oligomer used in the experiment0.64±0.14ASO concentrationbConcentration of oligomer used in the experiment0.11±0.05
Exon v intron %GC after blocking by oligo%GC in exon when blocked by oligo / %GC 5′ intron 200 bases upstream0.68±0.15GCs (number of)Total GCs in ASO sequence0.67±0.20
dG (50BaseFlanksAroundTarget)Predicted binding energy (21) of ASO to the target sequence plus 50-base flanks (13)0.66±0.16ACPDistance in bases from the splice acceptor site to the center of the target site (17)0.49±0.21
ACC_LAST15Predicted accessibility scores (22) of the 3′ end of the target (last 15 bases)0.32±0.09%GC of exon when blocked by oligoTotal remaining %GCs of target exon sequence when blocked by ASOs0.46±0.11
niscore_per_baseCumulative NI score (24) divided by the number of exon bases.0.18±0.09
ACC_LAST8Predicted accessibility scores of the 3′ end of the target (last 8 bases)0.12±0.07

aThe feature importance (FI) was calculated by the permutation importance method (23).

bThe ASO concentration used in the experiment is always included as one of the features of the predictive model.

Figure 2.

Predictive performance of SVR models for PMO and 2OMe. Symbols represent oligomer concentration (c) given in μM used in the experiment. The coefficient of determination, R2, was calculated by linear regression (black lines).

Selected features aThe feature importance (FI) was calculated by the permutation importance method (23). bThe ASO concentration used in the experiment is always included as one of the features of the predictive model. Predictive performance of SVR models for PMO and 2OMe. Symbols represent oligomer concentration (c) given in μM used in the experiment. The coefficient of determination, R2, was calculated by linear regression (black lines).

Implementation

The selected predictive models (Figure 2 and Table 1) are implemented on the web server with scikit-learn (20). Features of local accessibility scores of target exon sequences and binding scores between ASOs and their target exons were calculated with the ViennaRNA Package (22) and RNAstructure (21). The dictionary of NI scores was retrieved from Ref. (24). The concentrations of ASOs were set to typical values, that is, 3 μM for PMO and 0.1 μM for 2OMe. The database was built using PostgreSQL.

Case study

Database search

The web server provides an intuitive search interface of relevant information on exon skipping efficacy with search queries, such as gene name, species, and exon number.

Prediction of the efficacy of exon-skipping ASOs

The web server provides a prediction of the relative exon-skipping efficacy of a target exon specified by a user as shown in Figure 3 under the following conditions: 3 μM of PMO or 0.1 μM of 2OMe introduced into cultured cells.
Figure 3.

Case study on predicting skipping ASOs for exon 44 of the dystrophin pre-mRNA. (A) Input image of the predictive model. A user specifies the length of ASO and its chemistry (PMO or 2OMe). The upstream (200 bases) and downstream (200 bases) intron sequences of the target exon are required in addition to the target exon sequence, which are used to calculate features. (B) Output image. The relative exon-skipping efficacy is predicted by scanning the target exon sequence with a window size of the length specified by the user. Moving averages with 15 bases are plotted with a dashed line. (C) Efficacy of dystrophin exon 44 skipping observed under identical experimental conditions (cell type used = healthy primary human myotubes, ASO chemistry = PMO, ASO length = 30, ASO concentration = 0.5 μM) as previously reported (15), which is not included in the training dataset. The correlation between predicted and experimental skipping efficacies R2 was 0.7 as shown in Supplementary Figure S3.

Case study on predicting skipping ASOs for exon 44 of the dystrophin pre-mRNA. (A) Input image of the predictive model. A user specifies the length of ASO and its chemistry (PMO or 2OMe). The upstream (200 bases) and downstream (200 bases) intron sequences of the target exon are required in addition to the target exon sequence, which are used to calculate features. (B) Output image. The relative exon-skipping efficacy is predicted by scanning the target exon sequence with a window size of the length specified by the user. Moving averages with 15 bases are plotted with a dashed line. (C) Efficacy of dystrophin exon 44 skipping observed under identical experimental conditions (cell type used = healthy primary human myotubes, ASO chemistry = PMO, ASO length = 30, ASO concentration = 0.5 μM) as previously reported (15), which is not included in the training dataset. The correlation between predicted and experimental skipping efficacies R2 was 0.7 as shown in Supplementary Figure S3. In this case study, we targeted exon 44 of the dystrophin pre-mRNA using a single ASO, the chemistry and length of which were PMO and 30-mer, respectively. A user needs to input 200 bp of upstream and 200 bp of downstream intron sequences in addition to the target exon sequence, as this sequence information is required to calculate the features. The prediction took 79 s. We obtained the promising regions for exon 44 skipping, that is, the regions between 10–20 and 50–80. We found that these regions were indeed included in experimentally observed effective ASOs (15). As a validation of predicting exon skipping efficiency beyond DMD, we present a test case of PMO-mediated exon 73 skipping of collagen type VII alpha 1 chain (COL7A1) (Supplementary Table S4). (25). Although the experimental conditions (e.g. ASO concentration) were different, we found that predicted and experimental values correlated well with each other, and the model correctly ranked the efficacy of the three PMOs, indicating a potential predictive ability of the current model for other genes. Currently, the amount of available experimental data of exon skipping for other genes is limited. To examine the applicability of our model to other genes, we plan to further validate with various genes when sufficient data become available. We expect that adding various genes and oligo chemistries to the database will help expand the applicability of the predictive model further.

DATA AVAILABILITY

The authors confirm that the data supporting the findings of this study are available within the article and/or in the supplementary material. Click here for additional data file.
  23 in total

Review 1.  Invention and Early History of Exon Skipping and Splice Modulation.

Authors:  Kenji Rowel Q Lim; Toshifumi Yokota
Journal:  Methods Mol Biol       Date:  2018

2.  Exonic sequences provide better targets for antisense oligonucleotides than splice site sequences in the modulation of Duchenne muscular dystrophy splicing.

Authors:  Annemieke Aartsma-Rus; Hellen Houlleberghs; Judith C T van Deutekom; Gert-Jan B van Ommen; Peter A C 't Hoen
Journal:  Oligonucleotides       Date:  2010-04

Review 3.  Viltolarsen for the treatment of Duchenne muscular dystrophy.

Authors:  R R Roshmi; T Yokota
Journal:  Drugs Today (Barc)       Date:  2019-10       Impact factor: 2.245

4.  Patient-Customized Oligonucleotide Therapy for a Rare Genetic Disease.

Authors:  Jinkuk Kim; Chunguang Hu; Christelle Moufawad El Achkar; Lauren E Black; Julie Douville; Austin Larson; Mary K Pendergast; Sara F Goldkind; Eunjung A Lee; Ashley Kuniholm; Aubrie Soucy; Jai Vaze; Nandkishore R Belur; Kristina Fredriksen; Iva Stojkovska; Alla Tsytsykova; Myriam Armant; Renata L DiDonato; Jaejoon Choi; Laura Cornelissen; Luis M Pereira; Erika F Augustine; Casie A Genetti; Kira Dies; Brenda Barton; Lucinda Williams; Benjamin D Goodlett; Bobbie L Riley; Amy Pasternak; Emily R Berry; Kelly A Pflock; Stephen Chu; Chantal Reed; Kimberly Tyndall; Pankaj B Agrawal; Alan H Beggs; P Ellen Grant; David K Urion; Richard O Snyder; Susan E Waisbren; Annapurna Poduri; Peter J Park; Al Patterson; Alessandra Biffi; Joseph R Mazzulli; Olaf Bodamer; Charles B Berde; Timothy W Yu
Journal:  N Engl J Med       Date:  2019-10-09       Impact factor: 91.245

5.  The influence of antisense oligonucleotide length on dystrophin exon skipping.

Authors:  P L Harding; A M Fall; K Honeyman; S Fletcher; S D Wilton
Journal:  Mol Ther       Date:  2007-01       Impact factor: 11.454

6.  Inference of splicing regulatory activities by sequence neighborhood analysis.

Authors:  Michael B Stadler; Noam Shomron; Gene W Yeo; Aniket Schneider; Xinshu Xiao; Christopher B Burge
Journal:  PLoS Genet       Date:  2006-09-28       Impact factor: 5.917

7.  The TREAT-NMD DMD Global Database: analysis of more than 7,000 Duchenne muscular dystrophy mutations.

Authors:  Catherine L Bladen; David Salgado; Soledad Monges; Maria E Foncuberta; Kyriaki Kekou; Konstantina Kosma; Hugh Dawkins; Leanne Lamont; Anna J Roy; Teodora Chamova; Velina Guergueltcheva; Sophelia Chan; Lawrence Korngut; Craig Campbell; Yi Dai; Jen Wang; Nina Barišić; Petr Brabec; Jaana Lahdetie; Maggie C Walter; Olivia Schreiber-Katz; Veronika Karcagi; Marta Garami; Venkatarman Viswanathan; Farhad Bayat; Filippo Buccella; En Kimura; Zaïda Koeks; Janneke C van den Bergen; Miriam Rodrigues; Richard Roxburgh; Anna Lusakowska; Anna Kostera-Pruszczyk; Janusz Zimowski; Rosário Santos; Elena Neagu; Svetlana Artemieva; Vedrana Milic Rasic; Dina Vojinovic; Manuel Posada; Clemens Bloetzer; Pierre-Yves Jeannet; Franziska Joncourt; Jordi Díaz-Manera; Eduard Gallardo; A Ayşe Karaduman; Haluk Topaloğlu; Rasha El Sherif; Angela Stringer; Andriy V Shatillo; Ann S Martin; Holly L Peay; Matthew I Bellgard; Jan Kirschner; Kevin M Flanigan; Volker Straub; Kate Bushby; Jan Verschuuren; Annemieke Aartsma-Rus; Christophe Béroud; Hanns Lochmüller
Journal:  Hum Mutat       Date:  2015-03-17       Impact factor: 4.878

8.  Identification of Novel Antisense-Mediated Exon Skipping Targets in DYSF for Therapeutic Treatment of Dysferlinopathy.

Authors:  Joshua J A Lee; Rika Maruyama; William Duddy; Hidetoshi Sakurai; Toshifumi Yokota
Journal:  Mol Ther Nucleic Acids       Date:  2018-10-11       Impact factor: 8.886

9.  Nonsequential Splicing Events Alter Antisense-Mediated Exon Skipping Outcome in COL7A1.

Authors:  Kristin A Ham; May Thandar Aung-Htut; Sue Fletcher; Steve D Wilton
Journal:  Int J Mol Sci       Date:  2020-10-18       Impact factor: 5.923

View more
  4 in total

Review 1.  Therapeutic Prospects of Exon Skipping for Epidermolysis Bullosa.

Authors:  Franciscus C Vermeer; Jeroen Bremer; Robert J Sietsma; Aileen Sandilands; Robyn P Hickerson; Marieke C Bolling; Anna M G Pasmooij; Henny H Lemmink; Morris A Swertz; Nine V A M Knoers; K Joeri van der Velde; Peter C van den Akker
Journal:  Int J Mol Sci       Date:  2021-11-12       Impact factor: 5.923

2.  Roles of Physicochemical and Structural Properties of RNA-Binding Proteins in Predicting the Activities of Trans-Acting Splicing Factors with Machine Learning.

Authors:  Lin Zhu; Wenjin Li
Journal:  Int J Mol Sci       Date:  2022-04-17       Impact factor: 6.208

3.  Preparing n-of-1 Antisense Oligonucleotide Treatments for Rare Neurological Diseases in Europe: Genetic, Regulatory, and Ethical Perspectives.

Authors:  Matthis Synofzik; Willeke M C van Roon-Mom; Georg Marckmann; Hermine A van Duyvenvoorde; Holm Graessner; Rebecca Schüle; Annemieke Aartsma-Rus
Journal:  Nucleic Acid Ther       Date:  2021-09-29       Impact factor: 4.244

Review 4.  Emerging Oligonucleotide Therapeutics for Rare Neuromuscular Diseases.

Authors:  Yoshitsugu Aoki; Matthew J A Wood
Journal:  J Neuromuscul Dis       Date:  2021
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.