| Literature DB >> 25674563 |
Wenjing Kang1, Marc R Friedländer1.
Abstract
Next-generation sequencing now for the first time allows researchers to gage the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. microRNAs (miRNAs) are 22 nucleotide small RNAs (sRNAs) that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq), which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here, we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field.Entities:
Keywords: gene prediction; miRNA; microRNA; next-generation sequencing data
Year: 2015 PMID: 25674563 PMCID: PMC4306309 DOI: 10.3389/fbioe.2015.00007
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1miRNA biogenesis and function. miRNAs are transcribed as primary transcripts or are sometimes derived from exons or introns of hosts transcripts. Characteristic hairpin RNA structures are recognized by Drosha and DGCR8 and cleaved out. The hairpin is exported to the cytosol and cleaved by Dicer, which is a part of the canonical RNA interference pathway, releasing three products: the two miRNA strands (the “mature” or “guide” strand and the “star” or “passenger” strand) and the terminal loop. The guide strand is then bound by an Argonaute protein, which is part of the miRNP effector complex. Once thus bound, the miRNA can bind to target sites, often located in the 3′ UTR of protein coding transcripts, and guide the effector complex to inhibit translation of the target, cause its degradation, or relocate it to subcellular foci, where they are no longer accessible to the translation machinery.
Tools for predicting animal miRNAs from sRNA-seq data.
| Tool | Algorithm | Mapping tool | Tested in plants | Performance comparison | Validated in wet-lab | Pre-process data | Quantifies expression | Target prediction | User interface |
|---|---|---|---|---|---|---|---|---|---|
| deepBlockAlign | Read block alignment | Not included | Yes | Langenberger et al. ( | No | No | No | No | Graphics, webserver |
| miRanalyzer | Random forest | Prefix tree | No | Hackenberg et al. ( | See below | Partial | Differential expression | MiRanda and TargetScan | Graphics, webserver |
| miRanalyzer (update) | Random forest | Bowtie | Yes | An et al. ( | RT-PCR (Smith et al., | Yes | Differential expression | TargetSpy | Graphics, webserver, and standalone |
| miRCat | Rules-based | PatMaN | Yes | Moxon et al. ( | RT-PCR (Kohli et al., | Yes | Yes (mirprof), differential expression (colide) | PAREsnip | Graphics, webserver, and standalone |
| miRDeep | Bayesian | Megablast | No | An et al. ( | Northern blot (Friedländer et al., | No | Yes | No | No graphics, standalone |
| miRDeep2 | Bayesian | Bowtie | No | An et al. ( | Knock-down (Friedländer et al., | Yes | Yes | No | Graphics, standalone |
| miRDeep* | Bayesian | Bowtie (java version) | No | An et al. ( | RT-PCR, knock-down (An et al., | Yes | Yes | TargetScan | Graphics, standalone (java software) |
| MIReNA | Rules-based | Megablast | Yes | An et al. ( | Knock-down (Friedländer et al., | No | No | No | No graphics |
| miREvo | Bayesian | Bowtie | No | No | No | Yes | Yes | No | Graphics, standalone |
| miRExpress | Sequence homology | Custom mapping tools | No | No | No | Yes | Yes | No | No graphics, standalone |
| miRTRAP | Rules-based | Not included | No | An et al. ( | Knock-down (Friedländer et al., | No | No | No | No graphics |
| miRdentify | Feature scoring | Bowtie | No | Hansen et al., | RT-PCR (Hansen et al., | Yes | No | No | No graphics |
| MirPlex | Support vector machine | Not included | Yes | Mapleson et al. ( | Knock-out (Mapleson et al., | No | No | No | No graphics |
| MIRPIPE | Sequence homology | BLASTN | No | Kuenne et al. ( | No | Yes | Yes | No | Graphics, webserver, and standalone |
Algorithm: the core algorithm for identifying miRNAs. Mapping tool: software used to trace sequenced RNAs to the reference sequences. Tested in plants: if the method has been benchmarked with plant data. Performance comparison: studies that have benchmarked the performance of the tool. Validated in wet-lab: studies that have validated predicted miRNA candidates with experimental methods. Given the overall number of miRNA studies, this list may not be exhaustive. Pre-process data: tools that prepare the FASTQ sequence data for the mapping and prediction steps. Quantifies expression: tools that report estimated miRNA abundances. In addition, some tools report miRNAs that are differentially expressed between samples. Target prediction: tools that predict targets of candidate miRNAs. User interface: tools that have a graphic user interface (as opposed to being operated from the command line). Tools that are run on a webserver (as opposed to being installed and run on a local machine).
Methods for miRNA validation.
| Method | Throughput | Pros | Cons |
|---|---|---|---|
| Northern blot analysis | Low | Length of transcripts observed, possibility of “double-band” | Work-intensive, lack of sensitivity |
| PCR-based methods | Low | Specific to transcript 3′end, sensitive | Costly for large-scale validation |
| Ectopic RNA hairpin expression | Low | miRNA biogenesis is directly tested | Work-intensive, impractical for large-scale validation |
| Association with Argonaute proteins | Low/high | Directly shows interaction with effector proteins | Method is not always specific for miRNAs |
| Inhibition of miRNA biogenesis pathways | Low/high | Directly shows dependence on biogenesis proteins | Knock-downs are transient and sometimes weak, generating knock-outs is time-consuming |
| Experimentally identified target sites | Low/high | Directly demonstrates target interaction or repression | Reporter assays are work-intensive |
| Conservation and population selection pressure | Sequence analysis | No wet-lab experiments required | Non-conserved miRNAs can be functional |
Sensitivity, specificity, and accuracy.
| miRNA state | |||
|---|---|---|---|
| Genuine miRNA | Not genuine miRNA | ||
| miRNA prediction | Positive | True positives (TP) | False positives (FP) |
| Negative | False negatives (FN) | True negatives (TN) | |
| Formulas | Sensitivity or true positive rate | TP/(TP + FN) | |
| Specificity or true negative rate | TN/(FP + TN) | ||
| Accuracy | (TP + TN)/(TP + FP + FN + TN) | ||