Jasper Zuallaert1,2, Fréderic Godin2, Mijung Kim1,2, Arne Soete3,4, Yvan Saeys4,5, Wesley De Neve1,2. 1. Center for Biotech Data Science, Department of Environmental Technology, Food Technology and Molecular Biotechnology, Ghent University Global Campus, Songdo, Incheon, South Korea. 2. IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium. 3. Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium. 4. Data Mining and Modeling for Biomedicine, VIB Inflammation Research Center, Ghent, Belgium. 5. Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
Abstract
Motivation: During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. Results: In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their 'black box' nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g. several types of exclusion patterns near splice sites). Availability and implementation: SpliceRover is available as a web service. The prediction tool and instructions can be found at http://bioit2.irc.ugent.be/splicerover/. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. Results: In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their 'black box' nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g. several types of exclusion patterns near splice sites). Availability and implementation: SpliceRover is available as a web service. The prediction tool and instructions can be found at http://bioit2.irc.ugent.be/splicerover/. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Seyedeh Neelufar Payrovnaziri; Zhaoyi Chen; Pablo Rengifo-Moreno; Tim Miller; Jiang Bian; Jonathan H Chen; Xiuwen Liu; Zhe He Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497
Authors: Junwon Lee; Han Jeong; Dongju Won; Saeam Shin; Seung-Tae Lee; Jong Rak Choi; Suk Ho Byeon; Helen J Kuht; Mervyn G Thomas; Jinu Han Journal: Transl Vis Sci Technol Date: 2022-06-01 Impact factor: 3.048
Authors: Somayah Albaradei; Arturo Magana-Mora; Maha Thafar; Mahmut Uludag; Vladimir B Bajic; Takashi Gojobori; Magbubah Essack; Boris R Jankovic Journal: Gene X Date: 2020-05-13