Ashraful Arefeen1, Juntao Liu2, Xinshu Xiao3, Tao Jiang1,4,5. 1. Department of Computer Science and Engineering, University of California, Riverside, CA, USA. 2. School of Mathematics, Shandong University, Jinan, Shandong, China. 3. Department of Integrative Biology and Physiology, University of California, Los Angeles, CA, USA. 4. Institute of Integrative Genome Biology, University of California, Riverside, CA, USA. 5. MOE Key Lab of Bioinformatics and Bioinformatics Division, TNLIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China.
Abstract
Motivation: The length of the 3' untranslated region (3' UTR) of an mRNA is essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, correlation between diseases and the shortening (or lengthening) of 3' UTRs has been reported in the literature. This length is largely determined by the polyadenylation cleavage site in the mRNA. As alternative polyadenylation (APA) sites are common in mammalian genes, several tools have been published recently for detecting APA sites from RNA-Seq data or performing shortening/lengthening analysis. These tools consider either up to only two APA sites in a gene or only APA sites that occur in the last exon of a gene, although a gene may generally have more than two APA sites and an APA site may sometimes occur before the last exon. Furthermore, the tools are unable to integrate the analysis of shortening/lengthening events with APA site detection. Results: We propose a new tool, called TAPAS, for detecting novel APA sites from RNA-Seq data. It can deal with more than two APA sites in a gene as well as APA sites that occur before the last exon. The tool is based on an existing method for finding change points in time series data, but some filtration techniques are also adopted to remove change points that are likely false APA sites. It is then extended to identify APA sites that are expressed differently between two biological samples and genes that contain 3' UTRs with shortening/lengthening events. Our extensive experiments on simulated and real RNA-Seq data demonstrate that TAPAS outperforms the existing tools for APA site detection or shortening/lengthening analysis significantly. Availability and implementation: https://github.com/arefeen/TAPAS. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: The length of the 3' untranslated region (3' UTR) of an mRNA is essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, correlation between diseases and the shortening (or lengthening) of 3' UTRs has been reported in the literature. This length is largely determined by the polyadenylation cleavage site in the mRNA. As alternative polyadenylation (APA) sites are common in mammalian genes, several tools have been published recently for detecting APA sites from RNA-Seq data or performing shortening/lengthening analysis. These tools consider either up to only two APA sites in a gene or only APA sites that occur in the last exon of a gene, although a gene may generally have more than two APA sites and an APA site may sometimes occur before the last exon. Furthermore, the tools are unable to integrate the analysis of shortening/lengthening events with APA site detection. Results: We propose a new tool, called TAPAS, for detecting novel APA sites from RNA-Seq data. It can deal with more than two APA sites in a gene as well as APA sites that occur before the last exon. The tool is based on an existing method for finding change points in time series data, but some filtration techniques are also adopted to remove change points that are likely false APA sites. It is then extended to identify APA sites that are expressed differently between two biological samples and genes that contain 3' UTRs with shortening/lengthening events. Our extensive experiments on simulated and real RNA-Seq data demonstrate that TAPAS outperforms the existing tools for APA site detection or shortening/lengthening analysis significantly. Availability and implementation: https://github.com/arefeen/TAPAS. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Patrick K Kimes; Christopher R Cabanski; Matthew D Wilkerson; Ni Zhao; Amy R Johnson; Charles M Perou; Liza Makowski; Christopher A Maher; Yufeng Liu; J S Marron; D Neil Hayes Journal: Nucleic Acids Res Date: 2014-07-16 Impact factor: 16.971
Authors: Xavier Pichon; Lindsay A Wilson; Mark Stoneley; Amandine Bastide; Helen A King; Joanna Somers; Anne E E Willis Journal: Curr Protein Pept Sci Date: 2012-06 Impact factor: 3.272
Authors: Charlotte Soneson; Katarina L Matthes; Malgorzata Nowicka; Charity W Law; Mark D Robinson Journal: Genome Biol Date: 2016-01-26 Impact factor: 13.583
Authors: Hari Krishna Yalamanchili; Callison E Alcott; Ping Ji; Eric J Wagner; Huda Y Zoghbi; Zhandong Liu Journal: Nucleic Acids Res Date: 2020-07-09 Impact factor: 16.971
Authors: Mina Ryten; Harpreet Saini; Juan A Botia; Siddharth Sethi; David Zhang; Sebastian Guelfi; Zhongbo Chen; Sonia Garcia-Ruiz; Emmanuel O Olagbaju Journal: Nat Commun Date: 2022-04-27 Impact factor: 17.694
Authors: Nitika Kandhari; Calvin A Kraupner-Taylor; Paul F Harrison; David R Powell; Traude H Beilharz Journal: Int J Mol Sci Date: 2021-05-18 Impact factor: 5.923
Authors: Michelle M Halstead; Alma Islas-Trejo; Daniel E Goszczynski; Juan F Medrano; Huaijun Zhou; Pablo J Ross Journal: Front Genet Date: 2021-05-20 Impact factor: 4.599