| Literature DB >> 34308351 |
Arun H Patil1, Marc K Halushka1.
Abstract
MicroRNAs and tRFs are classes of small non-coding RNAs, known for their roles in translational regulation of genes. Advances in next-generation sequencing (NGS) have enabled high-throughput small RNA-seq studies, which require robust alignment pipelines. Our laboratory previously developed miRge and miRge2.0, as flexible tools to process sequencing data for annotation of miRNAs and other small-RNA species and further predict novel miRNAs using a support vector machine approach. Although miRge2.0 is a leading analysis tool in terms of speed with unique quantifying and annotation features, it has a few limitations. We present miRge3.0 that provides additional features along with compatibility to newer versions of Cutadapt and Python. The revisions of the tool include the ability to process Unique Molecular Identifiers (UMIs) to account for PCR duplicates while quantifying miRNAs in the datasets, correct erroneous single base substitutions in miRNAs with miREC and an accurate mirGFF3 formatted isomiR tool. miRge3.0 also has speed improvements benchmarked to miRge2.0, Chimira and sRNAbench. Finally, miRge3.0 output integrates into other packages for a streamlined analysis process and provides a cross-platform Graphical User Interface (GUI). In conclusion miRge3.0 is our third generation small RNA-seq aligner with improvements in speed, versatility and functionality over earlier iterations.Entities:
Year: 2021 PMID: 34308351 PMCID: PMC8294687 DOI: 10.1093/nargab/lqab068
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.An overview of miRge3.0 workflow. A sample or samples (FASTQ, FASTQ.gz) are processed through a number of user-selected steps including quality control and adapter trimming. Identical reads are collapsed together and aligned to species-specific reference RNA libraries. Through multiple alignment steps, reads are assigned their appropriate RNA identity. Unaligned/unannotated reads can be sent to a predictive model to try to identify novel miRNAs. Annotated reads are outputted into a number of different files for downstream visualization and analysis.
Figure 2.Comparison across tools and UMI analysis. (A) A FASTQ file with Qiagen UMIs showing few examples of duplication. (B) A FASTQ file with the 4N ligation adaptors showing significant counts of duplicated UMIs. (C) A FASTQ file of synthetic data showing some increase in UMIs. (D) The correlation between deduplicated and non-deduplicated miRNA RPM counts for Qiagen UMIs was very similar between repeated miRge3.0 runs (r2 = 0.9996). (E and F) The correlation was lower for samples with 4N UMIs (r2 = 0.95 and 0.81, respectively). (G) Run speed for seven samples ranging from 79 MB to 2.9 GB file sizes across four tools.
miRNA annotations across tools
| Tissue/Cell | SRA references | Alignment tool | miRNA reads | Unique miRNAs | miRNAs > 10 RPM |
|---|---|---|---|---|---|
| T Cell CD8+ (Neonatal) | SRR1853808 | Chimira | 1 011 733 | 620 | 250 |
| sRNAbench | 996 785 | 396 | 190 | ||
| miRge2.0 | 1 004 944 | 317 | 216 | ||
| miRge3.0 | 1 004 438 | 318 | 217 | ||
| Platelets | ERR747967 | Chimira | 2 644 772 | 970 | 280 |
| sRNAbench | 2 030 951 | 504 | 222 | ||
| miRge2.0 | 2 652 547 | 423 | 256 | ||
| miRge3.0 | 2 561 109 | 423 | 259 | ||
| Retinal pigment epithelium | SRR5127210 | Chimira | 5 951 602 | 1045 | 427 |
| sRNAbench | 5 274 077 | 911 | 318 | ||
| miRge2.0 | 5 802 230 | 777 | 372 | ||
| miRge3.0 | 5 742 333 | 777 | 371 | ||
| Cortical neuron | SRR5127204 | Chimira | 15 447 871 | 1433 | 325 |
| sRNAbench | 15 076 370 | 1033 | 243 | ||
| miRge2.0 | 15 353 234 | 873 | 296 | ||
| miRge3.0 | 15 316 966 | 875 | 295 | ||
| Cardiac fibroblast | SRR5127236 | Chimira | 7 759 550 | 1427 | 410 |
| sRNAbench | 7 489 281 | 989 | 320 | ||
| miRge2.0 | 7 674 927 | 844 | 367 | ||
| miRge3.0 | 7 661 528 | 849 | 367 | ||
| Renal proximal epithelium | SRR5127209 | Chimira | 35 894 433 | 1830 | 395 |
| sRNAbench | 34 961 340 | 1382 | 306 | ||
| miRge2.0 | 35,748,118 | 1171 | 359 | ||
| miRge3.0 | 35 697 299 | 1178 | 360 | ||
| Islet alpha cell | SRR1028924 | Chimira | – | – | – |
| sRNAbench | 43 746 104 | 1177 | 240 | ||
| miRge2.0 | 43 880 787 | 910 | 279 | ||
| miRge3.0 | 43 770 112 | 913 | 279 |