Ashraful Arefeen1, Xinshu Xiao2, Tao Jiang1,3,4. 1. Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA. 2. Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095, USA. 3. Institute of Integrative Genome Biology, University of California, Riverside, CA 92521, USA. 4. Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Abstract
MOTIVATION: Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. RESULTS: In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. AVAILABILITY AND IMPLEMENTATION: https://github.com/arefeen/DeepPASTA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. RESULTS: In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. AVAILABILITY AND IMPLEMENTATION: https://github.com/arefeen/DeepPASTA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Yuefeng Lin; Zhihua Li; Fatih Ozsolak; Sang Woo Kim; Gustavo Arango-Argoty; Teresa T Liu; Scott A Tenenbaum; Timothy Bailey; A Paula Monaghan; Patrice M Milos; Bino John Journal: Nucleic Acids Res Date: 2012-06-29 Impact factor: 16.971
Authors: Mina Ryten; Harpreet Saini; Juan A Botia; Siddharth Sethi; David Zhang; Sebastian Guelfi; Zhongbo Chen; Sonia Garcia-Ruiz; Emmanuel O Olagbaju Journal: Nat Commun Date: 2022-04-27 Impact factor: 17.694
Authors: Ryan Lusk; Evan Stene; Farnoush Banaei-Kashani; Boris Tabakoff; Katerina Kechris; Laura M Saba Journal: Nat Commun Date: 2021-03-12 Impact factor: 14.919