| Literature DB >> 26571139 |
Alexander S Baras1, Christopher J Mitchell2, Jason R Myers3, Simone Gupta2, Lien-Chun Weng1, John M Ashton3, Toby C Cornish1, Akhilesh Pandey2,4, Marc K Halushka1.
Abstract
Small RNA RNA-seq for microRNAs (miRNAs) is a rapidly developing field where opportunities still exist to create better bioinformatics tools to process these large datasets and generate new, useful analyses. We built miRge to be a fast, smart small RNA-seq solution to process samples in a highly multiplexed fashion. miRge employs a Bayesian alignment approach, whereby reads are sequentially aligned against customized mature miRNA, hairpin miRNA, noncoding RNA and mRNA sequence libraries. miRNAs are summarized at the level of raw reads in addition to reads per million (RPM). Reads for all other RNA species (tRNA, rRNA, snoRNA, mRNA) are provided, which is useful for identifying potential contaminants and optimizing small RNA purification strategies. miRge was designed to optimally identify miRNA isomiRs and employs an entropy based statistical measurement to identify differential production of isomiRs. This allowed us to identify decreasing entropy in isomiRs as stem cells mature into retinal pigment epithelial cells. Conversely, we show that pancreatic tumor miRNAs have similar entropy to matched normal pancreatic tissues. In a head-to-head comparison with other miRNA analysis tools (miRExpress 2.0, sRNAbench, omiRAs, miRDeep2, Chimira, UEA small RNA Workbench), miRge was faster (4 to 32-fold) and was among the top-two methods in maximally aligning miRNAs reads per sample. Moreover, miRge has no inherent limits to its multiplexing. miRge was capable of simultaneously analyzing 100 small RNA-Seq samples in 52 minutes, providing an integrated analysis of miRNA expression across all samples. As miRge was designed for analysis of single as well as multiple samples, miRge is an ideal tool for high and low-throughput users. miRge is freely available at http://atlas.pathology.jhu.edu/baras/miRge.html.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26571139 PMCID: PMC4646525 DOI: 10.1371/journal.pone.0143066
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The benefits of collapsing reads in short RNA-seq data.
Collapsing identical reads is advantageous for miRNAs because the species length (17-24bp) is less than the sequence length (50 bp). Collapsing is not advantageous for mRNAs or DNA.
Fig 2miRge: multi-sample quantization of unique sequences followed by a single sequential annotation method for miRNA-seq analysis.
First, sequencing data undergoes a quality control and length filtering step. Sequences are trimmed of adaptors (optional) and unique sequences are quantitated per sample. The unique sequences identified across all samples examined then undergo 5 separate alignment steps against 4 libraries using Bowtie. Only reads > 25 bp are aligned to the hairpin miRNAs. The resulting data is organized and miRge outputs several files including a final miRNA oriented data table in both absolute counts and RPM.
Profiling and miRNA assignment across 5 methods in 3 separate samples.
| Human Adipose Tissue (SRR772563) | ||||
| Method | Processing time | miRNA Reads | miRNAs | miRNAs >10 RPM |
| miRge |
| 2,041,334 | 479 | 245 |
| miRExpress 2.0 | 3.5 min | 1,503,704 | 593 | 240 |
| omiRAs | 14 min | 1,672,612 | 458 | 238 |
| miRDeep2 | 13.5 min | 1,969,122 | 432 | 189 |
| sRNAbench | 7 min | 1,916,307 | 969 | 278 |
| Chimira | 2 min |
| 804 | 268 |
| UEA small RNA Workbench | 3.6 min | 1,583,013 | 578 | 225 |
| Human Beta Cell (SRR873410) | ||||
| Method | Processing time | miRNA Reads | miRNAs | miRNAs >10 RPM |
| miRge |
| 26,169,405 | 884 | 306 |
| miRExpress 2.0 | 62 min | 16,386,290 | 878 | 260 |
| omiRAs | 55 min | 25,823,397 | 804 | 288 |
| miRDeep2 | 39.5 min | 19,949,196 | 489 | 196 |
| sRNAbench | 21 min | 23,755,866 | 598 | 276 |
| Chimira | 24.4 min |
| 1,499 | 323 |
| UEA small RNA Workbench | ||||
| Mouse Heart (SRR402445) | ||||
| Method | Processing time | miRNA Reads | miRNAs | miRNAs >10 RPM |
| miRge |
| 8,783,714 | 519 | 274 |
| miRExpress 2.0 | 22.7 min | 6,939,148 | 742 | 247 |
| omiRAs | 16 min | 8,298,256 | 525 | 254 |
| miRDeep2 | 13 min | 6,336,341 | 529 | 216 |
| sRNAbench | 13 min | 7,696,386 | 927 | 294 |
| Chimira | 10.2 min |
| 893 | 265 |
| UEA small RNA Workbench | 9 min | 4,497,946 | 583 | 226 |
Starting read counts: SRR772563 = 2,373,604 reads; SRR873410 = 33,233,648 reads; SRR9402445 = 15,981,680 reads. Each method was run with the number of processing cores reported: miRge—5 cores; miRExpress 2.0–5 cores; omiRAs—5 cores; miRDeep2–1 core; sRNAbench—unknown; Chimira—unknown; UEA small RNA Workbench—24 cores. Bold indicates fastest time and most miRNA reads.
a unable to complete due to memory limitations.
Fig 3Comparisons across 8 methods of miRNA identification.
The miRQC sample A RNA-seq Illumina data set was analyzed by 7 methods and compared to the original data. For each method, a histogram is given of log2 normalized miRNA read counts for 333 shared miRNAs. Pearson correlation was performed for each comparison and a scatter plot with loess curve is presented.
Fig 4The spectrum of miRNA entropy.
Kernel density estimates of the distribution of normalized miRNA entropy in two sample sets. A) As embryonic stem cells (ESCs) differentiate towards retinal pigment epithelial cells (RPE) the distribution of miRNA entropy is shifted towards more order (Spearman correlation coefficient 0.14, p>0.001). B) No significant difference in the distribution of miRNA entropy with respect to normal pancreas vs pancreatic adenocarcinoma is observed (Kolmogorov-Smirnov test p > 0.05).
A comparison of common miRNA alignment methods.
| Method | |||||||
|---|---|---|---|---|---|---|---|
| miRge | sRNAbench | omiRAs | miRDeep2 | miRExpress | Chimira | UEA small RNA Workbench | |
| Map to | Modified libraries | Genome or libraries | Genome | Genome or libraries | Hairpin | Hairpin | Genome and/or mature |
| Input | Fastq, Fastq.gz | Fastq, Fastq.gz, sra | Fastq, Fastq.gz | Fastq | Fastq | Fastq.gz, Fasta.gz | Fastq |
| Process multiple files | Yes | No | Yes (≤2GB) | No | No | Yes (≤2GB each) | Yes |
| Identify novel miRNAs | No | Yes | Yes | Yes | No | No | Yes |
| Identify other RNA species | Yes | Yes | Yes | Yes | No | No | No |
| Allows RNA edited IsomiRs | Yes | Yes | Yes | Yes | No | Yes | Yes |
| Incorporates miRNA SNPs | Yes | No | No | No | No | No | No |
| Visual outputs | Yes | Yes | Yes | Yes | Yes | Yes | No |
| Format | Stand-alone | Web based / Stand-alone | Web based | Stand-alone | Stand-alone | Web based | Stand-alone |
| Alignment Tool | Bowtie | Bowtie | Bowtie | Bowtie | Smith-Waterman algorithm | BLASTn | PatMaN |
a Executable version only.
b Smith-Waterman algorithm implemented following Single Instruction Multiple Data (SIMD) instructions