| Literature DB >> 29270443 |
Andrew T Magis1, Cory C Funk1, Nathan D Price1.
Abstract
The process of converting raw RNA sequencing data to interpretable results can be circuitous and time consuming, requiring multiple steps. We present an RNA-seq mapping algorithm that streamlines this process. Our algorithm utilizes a hash table approach to leverage the availability and power of high memory machines. SNAPR, which can be run on a single library or thousands of libraries, can take compressed or uncompressed FASTQ and BAM files as inputs, and can output a sorted BAM file, individual read counts, gene fusions and identify exogenous RNA species in a single step. SNAPR also does native Phred score filtering of reads. SNAPR is also well suited for future sequencing platforms that generate longer reads. Using SNAPR, we show how we can analyze data from hundreds of TCGA samples in a matter of hours, while identifying gene fusions and viral events at the same time. With the references genome and transcriptome undergoing periodic updates, and the need for uniform parameters when integrating multiple data sets, there is great need for a streamlined process for RNA-seq analysis. We demonstrate how SNAPR does this efficiently and accurately, with the high-throughput capacity needed to do high-volume analyses.Entities:
Year: 2015 PMID: 29270443 PMCID: PMC5736311 DOI: 10.1109/LLS.2015.2465870
Source DB: PubMed Journal: IEEE Life Sci Lett