| Literature DB >> 24972667 |
Krishna R Kalari, Asha A Nair, Jaysheel D Bhavsar, Daniel R O'Brien, Jaime I Davila, Matthew A Bockol, Jinfu Nie, Xiaojia Tang, Saurabh Baheti, Jay B Doughty, Sumit Middha, Hugues Sicotte, Aubrey E Thompson, Yan W Asmann, Jean-Pierre A Kocher1.
Abstract
BACKGROUND: Although the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome.Entities:
Mesh:
Year: 2014 PMID: 24972667 PMCID: PMC4228501 DOI: 10.1186/1471-2105-15-224
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
MAP-RSeq installation and run time for QuickStart virtual machine
| Download | 2.2GB | ~ 20 minutes to download on consumer grade internet |
| Unpacked size | 8GB | - |
| Time to import into VM | - | ~ 10 minutes |
| VM boot | - | 3 minutes |
| Run time with sample data (chr22 only) | - | ~ 30 minutes |
MAP-RSeq installation and run time in a Linux environment
| Download | 930 MB | ~10 minutes to download on consumer grade internet |
| Install time | - | ~6 hours (mostly downloading and indexing references) |
| Unpacked size | 9GB | - |
| Run time | - | Depends on the sample data used |
Wall clock times to run MAP-RSeq at different read counts
| 118 minutes | 1000000 |
| 82 minutes | 500000 |
| 71 minutes | 200000 |
Figure 1Flowchart of the MAP-RSeq workflow. High-level representation of the MAP-RSeq workflow for processing RNA-Seq data.
Figure 2Screenshot output report (html) of MAP-RSeq. An example screenshot report of MAP-RSeq output file.
Figure 3Correlation of gene counts reported by MAP-RSeq in comparison to counts simulated by BEERS. MAP-RSeq uses the HTSeq software to classify reads to genomic features. The intersection nonempty mode of HTSeq was applied and the query-name sorted alignment (BAM) file along with the reference GTF file obtained from BEERS were provided as input files to HTSeq for accurate assignment of paired-end reads to genomic features. Comparison of the gene counts (RPKM) obtained from MAP-RSeq with counts for respective genes simulated by BEERS yielded a Pearson correlation of 0.87. The genomic regions where gene expression reported by HTSeq did not completely correlate with simulated expression are due to ambiguous reads or due the fact that either mate of the paired-end read mapped to a different genomic feature, thus categorizing the read as ambiguous by HTSeq.
Alignment statistics from MAP-RSeq using simulated dataset from BEERS
| Total number of single reads | 4000000 |
| Reads used for alignment | 3999995 |
| Total number of reads mapped | 3851539 (96.3%) |
| Reads mapped to transcriptome | 3401468 (85.0%) |
| Reads mapped to junctions | 450071 (11.3%) |
| Reads contributing to gene abundance | 1395844 |
| Reads contributing to exon abundance | 11266392 |
| Number of SNVs identified | 6222 |
Figure 4Screenshots of gene and exon expression reports by MAP-RSeq. An example of the gene and exon expression counts from the output reports of MAP-RSeq.
Figure 5Screenshot of a MAP-RSeq VCF files after VQSR annotation. An example of SNV data representation from MAP-RSeq runs.
Figure 6Examples of SNVs called in RNA and DNA data for NA07347. An IGV screenshot representation of SNV regions for the 1000 genome sample NA07347 A) at high read depths called in RNA when compared to exome/DNA data and B) at low read depth called in RNA when compared to exome/DNA data.
Figure 7Fusion transcripts reported by MAP-RSeq. An example of the fusion transcripts output file from MAP-RSeq workflow.