| Literature DB >> 30153801 |
Yin Lu1, Alexander S Baras1, Marc K Halushka2.
Abstract
BACKGROUND: miRNAs play important roles in the regulation of gene expression. The rapidly developing field of microRNA sequencing (miRNA-seq; small RNA-seq) needs comprehensive, robust, user-friendly and standardized bioinformatics tools to analyze these large datasets. We present miRge 2.0, in which multiple enhancements were made towards these goals.Entities:
Keywords: Alignment; Small RNA-seq; isomiR; miRNA
Mesh:
Substances:
Year: 2018 PMID: 30153801 PMCID: PMC6112139 DOI: 10.1186/s12859-018-2287-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of miRge 2.0. It illustrates the flow chart from input to output. The models of A-to-I editing sites for known miRNAs and novel miRNAs detection are newly added functions, while the original outputs are shown in dashed box
Data sets for constructing the predictive model in human and mouse
| Tissue Type | SRA References in human | SRA References in mouse |
|---|---|---|
| Adrenal | SRR944031, SRR944034 | SRR3653309, SRR3653310 |
| Bladder | SRR333658, SRR333674 | SRR3652859, SRR3652860 |
| Blood | SRR837475, SRR837477 | SRR5241767, SRR5241768 |
| Brain Prefrontal Cortex | SRR1635903, ERR409900 | SRR3540303, SRR3540304 |
| Colon | SRR837839, SRR837842 | SRR1973865 |
| Epididymis | SRR384894 | NA |
| Heart | SRR553574, ERR038425 | SRR5832818, SRR5832819 |
| Kidney | SRR553575, ERR038420 | SRR3652244, SRR3652245 |
| Liver | ERR038413, ERR038410 | SRR5832837, SRR5832838 |
| Lung | SRR372648, SRR372650 | SRR5059366, SRR5059367 |
| Pancreas | ERR852097, ERR852099 | SRR1973869 |
| Placenta | SRR567637, SRR567638 | NA |
| Retina | ERR973611, ERR973613 | SRR1427160, SRR1427161 |
| Skeletal Muscle | SRR1635908, SRR1820680 | SRR3651659, SRR3651660 |
| Skin | SRR2174513, SRR2174517 | SRR3402126, SRR3402132 |
| Testes | SRR333680, SRR553576 | SRR1647951, SRR1647953 |
| Thyroid | SRR1291267, SRR1291269 | NA |
Fig. 2The process of construction of the predictive model. a The building of the predictive model composed of data preparation, feature calculation, feature selection and machine learning model training. (Key parameters are in parentheses.) b Schematic diagram of generating a stable range of clustered sequences in a cluster. The sequences in the cluster were aligned against the assembled sequence. The probability of the major nucleotide at each position was computed. A threshold of 0.8 was selected to determine the stable range of the cluster sequence
Annotation comparison of the first version of miRge, miRge 2.0, miRDeep2 and miRAnalyzer
| Tissue/Cell | SRA References | Alignment Tool | Processing time | miRNA Reads | Unique miRNAs | miRNAs > 10 RPM |
|---|---|---|---|---|---|---|
| Human Adipose Tissue | SRR772563 | miRge - mb | 35 s | 2,041,433 | 484 | 240 |
| miRge 2.0 - mb | 36 s | 2,039,835 | 477 | 238 | ||
| miRge 2.0 - MDB | 35 s | 2,034,710 | 390 | 220 | ||
| miRDeep2 | 9.3 min | 1,981,793 | 598 | 224 | ||
| miRAnalyzer | 30 s | 1,752,855 | 689 | 243 | ||
| Human Alpha Cell | SRR1028924 | miRge - mb | 14.6 min | 44,124,580 | 920 | 293 |
| miRge 2.0 - mb | 15.6 min | 43,880,855 | 911 | 279 | ||
| miRge 2.0 - MDB | 15.0 min | 43,752,598 | 583 | 261 | ||
| miRDeep2 | 56.0 min | 42,326,135 | 864 | 267 | ||
| miRAnalyzer | 18.4 min | 34,349,816 | 1124 | 281 | ||
| Human Beta Cell | SRR873410 | miRge - mb | 6.5 min | 26,196,298 | 896 | 297 |
| miRge 2.0 - mb | 6.6 min | 26,197,845 | 889 | 291 | ||
| miRge 2.0 - MDB | 6.5 min | 26,130,904 | 585 | 274 | ||
| miRDeep2 | 34.1 min | 23,280,604 | 754 | 273 | ||
| miRAnalyzer | 8.0 min | 14,240,669 | 1113 | 289 | ||
| Mouse Stomach Tissue | SRR3653378 | miRge - mb | 2.0 min | 7,063,128 | 804 | 457 |
| miRge 2.0 - mb | 2.3 min | 7,175,534 | 806 | 420 | ||
| miRge 2.0 - MDB | 2.2 min | 7,094,217 | 578 | 378 | ||
| miRDeep2 | 18.5 min | 6,738,987 | 748 | 387 | ||
| miRAnalyzer | 2.5 min | 6,818,220 | 1086 | 423 | ||
| Mouse Epididymal Epithelial Cell | SRR2075702 | miRge - mb | 3.0 min | 1,394,193 | 435 | 364 |
| miRge 2.0 - mb | 3.6 min | 1,387,591 | 411 | 290 | ||
| miRge 2.0 - MDB | 3.4 min | 1,381,670 | 360 | 271 | ||
| miRDeep2 | 24.4 min | 1,367,627 | 402 | 212 | ||
| miRAnalyzer | 3.0 min | 925,019 | 532 | 270 | ||
| Mouse B3 Cell | SRR2960463 | miRge - mb | 3.7 min | 9,515,760 | 604 | 322 |
| miRge 2.0 - mb | 3.9 min | 9,612,571 | 606 | 282 | ||
| miRge 2.0 - MDB | 3.9 min | 9,553,713 | 359 | 227 | ||
| miRDeep2 | 32.6 min | 8,321,228 | 487 | 251 | ||
| miRAnalyzer | 4.2 min | 6,856,264 | 819 | 289 |
Key: mb = miRBase; MDB = MirGeneDB. Starting read counts: SRR772563 = 2,373,604 reads; SRR1028924 = 82,497,527 reads; SRR873410 = 33,233,648 reads; SRR3653378 = 9,587,887 reads; SRR2075702 = 13,890,643 reads; SRR2960463 = 17,652,076 reads
Fig. 3A-to-I analysis. a The A-to-I proportion of the sites is strongly correlated with a reference dataset analysis with adjusted R2 of 0.96 in the log-log plot. b The output of miRge 2.0 showing an illustrated heat map of miRNA A-to-I editing sites across colon tissue, primary colon cell, colon cancer tissue and colon cancer cells from multiple sources
Fig. 4Model performance on top 40 features for training and validation sets for human a and mouse b miRNA discovery. Each dot stands for the mean value of Matthews correlation coefficient (MCC)
Top 21 features in human predictive model. Hairpin structural features are labeled in italics, while read compositional features are not
| Rank | Feature name | Description of the feature |
|---|---|---|
| 1 |
| Number of bindings in the stable range of sequences |
| 2 | exactMatchRatio | The proportion of reads that are an exact match to the cluster sequence in the cluster |
| 3 |
| Whether there is another stable range of sequences located at the other arm of precursor |
| 4 |
| Minimum free energy (MFE) of the precursor |
| 5 | head_minus3_TemplateNucleotide_percentage | Proportion of genomic templated nucleotide at position −3 relative to the 5′ end of the stable range of the cluster sequences |
| 6 |
| Number of hairpin loops in the precursor |
| 7 |
| Stem length of the precursor |
| 8 |
| Distance between the stable range of sequences and the terminal loop |
| 9 |
| Number of bindings in the stable range of sequences divided by its length |
| 10 | headUnstableLength | 5′ unstable length of the cluster |
| 11 |
| Whether there is another stable range of sequences located at the other arm of precursor |
| 12 | tail_plus2_A_percentage | Proportion of non-templated adenine (A) at position + 2 relative to the 3′ end of the stable range of the cluster sequences |
| 13 | head_minus2_TemplateNucleotide_percentage | Proportion of genomic templated nucleotide at position −2 relative to the 5′ end of the stable range of the cluster sequences |
| 14 |
| Number of bindings in the precursor hairpin |
| 15 | tail_plus1_A_percentage | Proportion of non-templated adenine (A) at position + 1 relative to the 3′ end of the stable range of the cluster sequences |
| 16 |
| Whether the stable range of sequences is located at the terminal loop if the precursor |
| 17 | tail_plus3_A_percentage | Proportion of non-templated adenine (A) at position + 3 relative to the 3′ end of the stable range of the cluster sequences |
| 18 | tail_plus5_TemplateNucleotide_percentage | Proportion of genomic templated nucleotide at position + 5 relative to the 3′ end of the stable range of the cluster sequences |
| 19 | tail_plus1_TemplateNucleotide_percentage | Proportion of genomic templated nucleotide at position + 1 relative to the 3′ end of the stable range of the cluster sequences |
| 20 |
| Number of interior loops in the precursor |
| 21 | head_minus1_TemplateNucleotide_percentage | Proportion of genomic templated nucleotide at position −1 relative to the 5′ end of the stable range of the cluster sequences |
Predictive results of 32 human cell data in a test set by the human model
| Cell Type | SRA References | AUC | Precision | Recall | MCC |
|---|---|---|---|---|---|
| Fibroblast Aorta Adventitia | SRR5127206 | 0.995 | 0.983 | 0.963 | 0.945 |
| Smooth Muscle Cell Aorta | SRR5127217 | 0.994 | 0.981 | 0.961 | 0.938 |
| Astrocyte | SRR5127214 | 0.994 | 0.98 | 0.968 | 0.949 |
| Smooth Muscle Cell Bladder | SRR5127215 | 0.992 | 0.971 | 0.963 | 0.936 |
| Fibroblast Dermal (Adult) | SRR5127205 | 0.995 | 0.983 | 0.974 | 0.95 |
| Fibroblast Dermal (Neonatal) | SRR5127225 | 0.995 | 0.989 | 0.96 | 0.942 |
| Epithelium Keratinocyte (Adult) | SRR5127203 | 0.994 | 0.977 | 0.962 | 0.934 |
| Epithelium Keratinocyte (Neonatal) | SRR5127208 | 0.993 | 0.975 | 0.942 | 0.923 |
| Endothelial Aortic | SRR5139121 | 0.988 | 0.975 | 0.932 | 0.915 |
| Endothelial Umbilical vein | SRR5127213 | 0.993 | 0.981 | 0.954 | 0.926 |
| Epithelium Bronchial | SRR5127216 | 0.988 | 0.974 | 0.951 | 0.935 |
| Chondrocyte | SRR5127229 | 0.995 | 0.985 | 0.959 | 0.944 |
| Endothelial Microvascular | SRR5127201 | 0.991 | 0.973 | 0.957 | 0.945 |
| Fibroblast Cardiac | SRR5127236 | 0.992 | 0.983 | 0.945 | 0.94 |
| Melanocyte | SRR5127207 | 0.995 | 0.99 | 0.981 | 0.954 |
| Epithelium Mammary | SRR5127224 | 0.99 | 0.976 | 0.941 | 0.927 |
| Epithelium Prostate | SRR5127212 | 0.992 | 0.975 | 0.961 | 0.948 |
| Epithelium Renal Cortex | SRR5127204 | 0.988 | 0.966 | 0.948 | 0.927 |
| Epithelium Renal Proximal | SRR5127230 | 0.992 | 0.978 | 0.949 | 0.936 |
| Stromal cell Prostate | SRR5127226 | 0.991 | 0.976 | 0.963 | 0.94 |
| Myoblast Skeletal Muscle | SRR5127218 | 0.99 | 0.974 | 0.956 | 0.932 |
| Epithelium Intestinal | SRR5127223 | 0.994 | 0.985 | 0.973 | 0.957 |
| Myofibroblast | SRR5127220 | 0.991 | 0.987 | 0.965 | 0.95 |
| Smooth Muscle Cell Prostate | SRR5127222 | 0.991 | 0.978 | 0.961 | 0.943 |
| Neuron Dopaminergic | SRR5127234 | 0.982 | 0.963 | 0.922 | 0.916 |
| Neuron Cortical | SRR5127209 | 0.986 | 0.968 | 0.917 | 0.917 |
| Mesangial | SRR5127221 | 0.996 | 0.986 | 0.971 | 0.948 |
| Osteoblast | SRR5127233 | 0.997 | 0.986 | 0.955 | 0.946 |
| Fibroblast Periodontal ligament | SRR5127227 | 0.994 | 0.986 | 0.962 | 0.946 |
| Epithelium Renal | SRR5127235 | 0.992 | 0.989 | 0.954 | 0.936 |
| Epithelium Retinal Pigment | SRR5127210 | 0.994 | 0.988 | 0.974 | 0.959 |
| Skeletal Muscle Cell | SRR5127202 | 0.995 | 0.984 | 0.96 | 0.936 |
| Mean | 0.992 | 0.98 | 0.956 | 0.939 | |
| Std dev | 0.003 | 0.007 | 0.014 | 0.012 |
Fig. 5Venn diagram for novel miRNAs predicted by miRge 2.0, miRDeep2, and miRAnalyzer. a Overlapped novel miRNAs among the three tools. b The average basewise conservation scores across novel miRNAs. c The average Quality score across novel miRNAs among the three tools