| Literature DB >> 24884968 |
Sherin M ElGokhy1, Mahmoud ElHefnawi, Amin Shoukry.
Abstract
BACKGROUND: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24884968 PMCID: PMC4051165 DOI: 10.1186/1756-0500-7-286
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Main characteristics of the classifiers used in the proposed ensemble
| Triplet-SVM | SVM | A vector of 32 structure-sequence features | Human pre-miRNAs |
| Mipred | RF | A vector of 32 structure-sequence features, MFE and P-value | Human pre-miRNAs |
| Virgo | SVM | A vector of 512 structure-sequence features | Viruses pre-miRNAs |
| EumiR | SVM | A vector of 512 structure-sequence features | Different Eukaryotic pre-miRNA |
| Ensemble-based | Committee classifier | A vector of 4-dimensions (the outputs from the base classifiers) | Human pre-miRNAs |
Figure 1The adopted architecture for miRNA prediction ensemble-based classification approach.
The neural network parameters
| Neuron1 | 0.8523 | 0.7190 | 0.0156 | 0.3914 | -0.0528 |
| Neuron2 | -0.4030 | -0.3190 | 0.7133 | 0.2558 | 0.8994 |
| Neuron3 | -0.03238 | -0.7238 | -0.2314 | -0.0992 | -0.8330 |
| | |||||
| Output | -0.0877 | -0.0166 | -1.1731 | -0.0336 | |
Performance of ensemble-based classifier versus the other adopted classifiers
| Triplet-SVM | 78.5% | 63.9% | 89.2% | 72.6% | 57.2% | 80.7% |
| Mipred | 89% | 81.7% | 96.7% | 91.8% | 73.6% | 87.9% |
| Virgo | 74.3% | 66.4% | 73.4% | 58.9% | 76.2% | 80.1% |
| EumiR | 74.1% | 66.7% | 72.2% | 58.3% | 77.8% | 86.7% |
| Ensemble-based Classifier | 89.3% | 82.2% | 97% | 92.5% | 74% | 88.2% |
Figure 2Performance of our Ensemble-based classifier versus Triplet-SVM, MiPred, Virgo and EumiR.
Figure 3Receiver Operating Characteristic performance curve of the classifiers.
ROC- based evaluation metrics of the adopted and designed classifiers
| Triplet-SVM | 0.76 | 0.709 to 0.754 | 0.0121 |
| Mipred | 0.89 | 0.832 to 0.869 | 0.0103 |
| Virgo | 0.72 | 0.725 to 0.770 | 0.0118 |
| EumiR | 0.73 | 0.727 to 0.772 | 0.0117 |
| Ensemble-based Classifier | 0.9 | 0.836 to 0.872 | 0.0102 |
Figure 4Receiver Operating Characteristic performance curve of Our Ensemble Classifier as well as MiPred using Linear scan.
Figure 5Marine metagenome pre-miRNA candidate.
Samples of the obtained prediction results
| Marine sequence 1 | 56 | gma-MIR393f and oan-miR-1353 | Glycine max and Ornithorhynchus anatinus |
| Marine sequence 9 | 28 | osa-miR5072 and age- miR-513c-1 | Oryza sativa and Ateles geoffroyi |
| Mine drainage sequence 1 | 45 | ppt- miR1215 and pdi- miR7720 | Physcomitrella patens and Brachypodium distachyon |
| Mine drainage sequence 18 | 49 | pma-miR-138b and osa- miR1851 | Petromyzon marinus and Oryza sativa |
| Mine drainage sequence 29 | 12 | hco- miR-5983 and sme-miR-2167 | Haemonchus contortus and Schmidtea mediterranea |
| Mine drainage sequence 35 | 29 | ppt- miR537d and hma- miR-3005 | Physcomitrella patens and Hydra magnipapillata |
| Mine drainage sequence 41 | 34 | gga- miR-6611 and mtr-miR5037a | Gallus gallus and Medicago truncatula |
| Mine drainage sequence 53 | 27 | aly- miR3444 and hsa-miR-4440 | Arabidopsis lyrata and Homo sapiens |
| Mine drainage sequence 67 | 71 | lja-miR7526f and cte- miR-96 | Lotus japonicus and Capitella teleta |
| Mine drainage sequence 72 | 26 | cel-miR-90 and dps-miR-2543a-1 | Caenorhabditis elegans and Drosophila pseudoobscura |
| Mine drainage sequence 88 | 62 | hsa- miR-3167 and bdi- miR7711 | Homo sapiens and Brachypodium distachyon |
| Groundwater sequence 1 | 12 | ssc-miR-486-2 and hsa- miR-661 | Sus scrofa and Homo sapiens |
| Groundwater sequence 10 | 14 | csi-miR3950 and cel-miR-87 | Citrus sinensis and Caenorhabditis elegans |
| Groundwater sequence 16 | 98 | mmu-miR-8112 and tgu-miR-2981 | Mus musculus and Taeniopygia guttata |
| Groundwater sequence 23 | 35 | rco-miR156h and hsa-miR-4483 | Ricinus communis and Homo sapiens |
| Groundwater sequence 37 | 29 | osa-miR531 and ggo-miR-760 | Oryza sativa and Gorilla gorilla |
| Groundwater sequence 50 | 54 | hsv1-miR-H17 and mmu-miR-5131 | Herpes Simplex Virus 1 and Mus musculus |