| Literature DB >> 28407097 |
Claudia Paicu1,2, Irina Mohorianu2,3, Matthew Stocks2, Ping Xu3, Aurore Coince3, Martina Billmeier3, Tamas Dalmay3, Vincent Moulton2, Simon Moxon3.
Abstract
MOTIVATION: MicroRNAs are a class of ∼21-22 nt small RNAs which are excised from a stable hairpin-like secondary structure. They have important gene regulatory functions and are involved in many pathways including developmental timing, organogenesis and development in eukaryotes. There are several computational tools for miRNA detection from next-generation sequencing datasets. However, many of these tools suffer from high false positive and false negative rates. Here we present a novel miRNA prediction algorithm, miRCat2. miRCat2 incorporates a new entropy-based approach to detect miRNA loci, which is designed to cope with the high sequencing depth of current next-generation sequencing datasets. It has a user-friendly interface and produces graphical representations of the hairpin structure and plots depicting the alignment of sequences on the secondary structure.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28407097 PMCID: PMC5870699 DOI: 10.1093/bioinformatics/btx210
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Output of miRCat2 for a predicted sequence corresponding to hsa-mir-2110 (chromosome 10), depicting (A) precursor presence plots, (B) precursor secondary structure and (C) alignment of incident reads. (A) On the x-axis we represent each position along the miRNA hairpin; on the y-axis we represent the point abundance calculated as the algebraic sum of the abundances of incident reads. (B) Precursor secondary structure, color-coded for each nucleotide type (A—green, C—orange, G—red, T—black). (C) Alignment of incident reads on the precursor; the numbers of the right represent the raw read abundance. The last line presents the secondary structure in dot-bracket notation, together with its MFE
Performance comparison of benchmarked tools
| Animals | Plants | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Organism | Tool | High-conf. miRNAs | Low-conf. miRNAs | Novel predictions | Specificity (%) | Sensitivity (%) | Organism | Tool | High-conf. miRNAs | Low-conf. miRNAs | Novel predictions | Specificity (%) | Sensitivity (%) |
| miRCat2 | 159 | 83 | 72 | 78.6 (±9.1) | 30.6 (±3.3) | miRCat2 | 66 | 44 | 8 | 93.6 (±2.7) | 38.3 (±2.7) | ||
| (23 datasets) | miRCat | 122 | 67 | 27 | 87.9 (±5.8) | 23.9 (±2.5) | (7 datasets) | miRCat | 51 | 57 | 167 | 40.9 (±9) | 37.9 (±1.8) |
| miRDeep2 | 149 | 61 | 14 | 94 (±2.7) | 26.5 (±4.5) | miRPlant | 62 | 52 | 7 | 93.3 (±5.4) | 39.3 (±14.9) | ||
| miReap | 148 | 108 | 227 | 52.3 (±14.3) | 32.5 (±7.4) | miReap | 6 | 8 | 121 | 14.5 (±8.5) | 4.9 (±0.6) | ||
| miRCat2 | 147 | 25 | 23 | 90.5 (±7.5) | 39.8 (±3.2) | miRCat2 | 15 | 13 | 233 | 11.6 (±5) | 44.2 (±12.8) | ||
| (21 datasets) | miRCat | 124 | 20 | 20 | 88.5 (±8.3) | 33.5 (±1.9) | (14 datasets) | miRCat | 14 | 16 | 1204 | 2.7 (±1.1) | 48 (±4.8) |
| miRDeep2 | 117 | 14 | 2 | 98.6 (±2) | 29.7 (±7.2) | miRPlant | 11 | 7 | 45 | 30.3 (±7) | 28.9 (±13.1) | ||
| miReap | 114 | 21 | 134 | 48.7 (±12.3) | 31.6 (±8.5) | miReap | 4 | 5 | 1619 | 0.7 (±0.3) | 13.6 (±3.2) | ||
| miRCat2 | 141 | 145 | 42 | 93.6 (±2.4) | 88.6 (±2.3) | miRCat2 | N/A | 129 | 269 | 32.7 (±3.8) | 34.9 (±1.1) | ||
| (2 datasets) | miRCat | 101 | 88 | 26 | 87.9 (±0.3) | 58.2 (±2.5) | (2 datasets) | miRCat | N/A | 149 | 865 | 15.4 (±4.5) | 40.2 (±0.8) |
| miRDeep2 | 120 | 111 | 27 | 89.7 (±1.3) | 71.5 (±3.0) | miRPlant | N/A | 80 | 74 | 52 (±0.7) | 21.6 (±4.9) | ||
| miReap | 137 | 132 | 43 | 86.2 (±0.2) | 82.9 (±0.2) | miReap | N/A | 25 | 2243 | 1.2 (±0.3) | 6.8 (±0.8) | ||
Note: miRCat2 performs well consistently, with a good specificity and sensitivity trade-off, whereas miRCat and miReap struggle in terms of specificity, especially in plants. miRDeep2/miRPlant have good specificity, but lack in sensitivity.
Fig. 2Cumulative plots of log2 fold changes of control versus mutant datasets, calculated on the output of miRCat2, miRCat, miRDeep2/miRPlant and miReap and a control dataset formed of tRNAs and snoRNAs. We present results for H. sapiens [subplots (A) Dicer and (B) Drosha knock-out], M. musculus [subplot (C)], D. rerio [subplot (D)], A. thaliana [subplots (E) and (F)], S. lycopersicum [subplot (G)] and G. max [subplot (H)]. miRCat2 has the highest percentage of DE miRNAs in all but one of the experiments, where it classifies as a close second to miRCat. (A) Homo sapiens wild-type versus Dicer knock-out. (B) Homo sapiens wild-type versus DROSHA knock-out. (C) Mus musculus wild-type versus DGCR8 knock-out. (D) Danio rerio wild-type versus Dicer knock-out. (E, F) Arabidopsis thaliana wild-type versus Dicer knock-down. (G) Solanum lycopersicum wild-type versus DCL1 knock-down. (H) Glycine max wild-type versus DCL1 knock-down