James T Morton1, Patricia Abrudan2, Nathanial Figueroa3, Chun Liang4, John E Karro5. 1. Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA. Electronic address: mortonjt@miamiOH.edu. 2. Department of Biology, Miami University, Oxford, OH, USA. Electronic address: abrudapa@miamiOH.edu. 3. Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA. Electronic address: figuernd@miamiOH.edu. 4. Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA; Department of Biology, Miami University, Oxford, OH, USA. Electronic address: liangc@miamiOH.edu. 5. Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA; Department of Microbiology, Miami University, Oxford, OH, USA; Department of Statistics, Miami University, Oxford, OH, USA. Electronic address: karroje@miamiOH.edu.
Abstract
BACKGROUND: mRNA polyadenylation, the addition of a poly(A) tail to the 3'-end of pre-mRNA, is a process critical to gene expression and regulation in eukaryotes. To understand the molecular mechanisms governing polyadenylation and other relevant biological processes, it is important to identify these poly(A) tails accurately in transcriptome sequencing data and differentiate them from artificial adapter sequences added in the sequencing process. But the annotation of these tails is complicated by the presence of sequencing errors and post-transcriptional modifications. While determining that a tail is present in a given transcript fragment is straight-forward, these obfuscations make the problem of boundary identification a challenge; conventional seed-and-extend algorithms struggle to accurately identify these poly(A) tail end-points. Further, all existing tools that we are aware of focus exclusively on the trimming of poly(A) tails, failing to provide the detailed information needed for studying the polyadenylation process. RESULTS: We have created SCOPE++, an open-source tool for finding the precise border of poly(A) tails and other homopolymers in raw mRNA sequence reads. Based on a Hidden Markov Model (HMM) approach, SCOPE++ accurately identifies specific homopolymer sequences in error-prone EST/cDNA data or RNA-Seq data at a speed appropriate for large sequence sets. CONCLUSIONS: We demonstrate that our tool can precisely identify poly(A) tails with near perfect accuracy at the speed required for high-throughput applications, providing a valuable resource for polyadenylation research.
BACKGROUND: mRNA polyadenylation, the addition of a poly(A) tail to the 3'-end of pre-mRNA, is a process critical to gene expression and regulation in eukaryotes. To understand the molecular mechanisms governing polyadenylation and other relevant biological processes, it is important to identify these poly(A) tails accurately in transcriptome sequencing data and differentiate them from artificial adapter sequences added in the sequencing process. But the annotation of these tails is complicated by the presence of sequencing errors and post-transcriptional modifications. While determining that a tail is present in a given transcript fragment is straight-forward, these obfuscations make the problem of boundary identification a challenge; conventional seed-and-extend algorithms struggle to accurately identify these poly(A) tail end-points. Further, all existing tools that we are aware of focus exclusively on the trimming of poly(A) tails, failing to provide the detailed information needed for studying the polyadenylation process. RESULTS: We have created SCOPE++, an open-source tool for finding the precise border of poly(A) tails and other homopolymers in raw mRNA sequence reads. Based on a Hidden Markov Model (HMM) approach, SCOPE++ accurately identifies specific homopolymer sequences in error-prone EST/cDNA data or RNA-Seq data at a speed appropriate for large sequence sets. CONCLUSIONS: We demonstrate that our tool can precisely identify poly(A) tails with near perfect accuracy at the speed required for high-throughput applications, providing a valuable resource for polyadenylation research.
Authors: Xiaohui Wu; Man Liu; Bruce Downie; Chun Liang; Guoli Ji; Qingshun Q Li; Arthur G Hunt Journal: Proc Natl Acad Sci U S A Date: 2011-07-11 Impact factor: 11.205
Authors: Fatih Ozsolak; Philipp Kapranov; Sylvain Foissac; Sang Woo Kim; Elane Fishilevich; A Paula Monaghan; Bino John; Patrice M Milos Journal: Cell Date: 2010-12-10 Impact factor: 41.582
Authors: Juan Falgueras; Antonio J Lara; Noé Fernández-Pozo; Francisco R Cantón; Guillermo Pérez-Trabado; M Gonzalo Claros Journal: BMC Bioinformatics Date: 2010-01-20 Impact factor: 3.169