| Literature DB >> 34440454 |
Dashuai Fan1, Yuangen Yao2, Ming Yi1.
Abstract
MicroRNAs (miRNAs) are a kind of short non-coding ribonucleic acid molecules that can regulate gene expression. The computational identification of plant miRNAs is of great significance to understanding biological functions. In our previous studies, we have put firstly forward and further developed a set of knowledge-based energy features to construct two plant pre-miRNA prediction tools (plantMirP and riceMirP). However, these two tools cannot be used for miRNA prediction from NGS (Next-Generation Sequencing) data. In addition, for further improving the prediction performance and accessibility, plantMirP2 has been developed. Based on the latest dataset, plantMirP2 achieves a promising performance: 0.9968 (Area Under Curve, AUC), 0.9754 (accuracy), 0.9675 (sensitivity) and 0.9876 (specificity). Additionally, the comparisons with other plant pre-miRNA tools show that plantMirP2 performs better. Finally, the webserver and stand-alone version of plantMirP2 are available.Entities:
Keywords: knowledge-based energy feature; microRNA; pre-miRNA; support vector machine
Mesh:
Substances:
Year: 2021 PMID: 34440454 PMCID: PMC8392394 DOI: 10.3390/genes12081280
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
The differences in positive datasets of both tools.
| Positive Dataset | Species | PlantMirP (Release 21) | PlantMirP2 (Release 22.1) |
|---|---|---|---|
| Training | Arabidopsis thaliana | 325 | 326 |
| Glycine max | 573 | 684 | |
| Oryza sativa | 592 | 604 | |
| Physcomitrella patens | 229 | 247 | |
| Medicago truncatula | 672 | 672 | |
| Sorghum bicolor | 205 | 205 | |
| Arabidopsis lyrata | 205 | 205 | |
| Zea mays | 166 | 168 | |
| Solanum lycopersicum | 77 | 112 | |
| Testing | Remaining plant species | 3865 | 5323 |
Full features used in plantMirP2.
| NO. | Features | Description | Origin |
|---|---|---|---|
| 1–34 | Knowledge-based energy score1 | Calculated using the position-specific contact potentials of 2-mer pairs. | riceMirP |
| 35–39 | Knowledge-based energy score2 | Calculated using the distance-specific contact potentials of | plantMirP |
| 40–49 | The ratio of unpaired nucleotide in sub-region 1–10 | The secondary structure was divided into 10 parts and the ratio of unpaired nucleotide in each part was calculated. | plantMirP |
| 50 | the size of biggest bulge | The size of biggest bulge in secondary structure. A bugle contains at least three adjacent unpaired nucleotides. | plantMirP |
| 51 | n_stems/L | n_stems denotes the number of stems. A stem contains at least three continuous base pairs. L is the length of sequence. | plantMirP |
| 52 | n_loops/L | n_loops denotes the number of loops. | plantMirP |
| 53 | %(|G| + |C|) | (|G| + |C|)/L × 100. Here |X| denotes the number of base X in sequence. | miPred |
| 54–69 | %XY | |XY|/(L − 1) × 100. |XY| is number of dinucleotide XY in sequence. | miPred |
| 70 | dG = MFE/L | MFE is minimum of free energy of the secondary structure. | miPred |
| 71 | MFE1 | (MFE/L)/%(|G| + |C|) | miPred |
| 72 | MFE2 | (MFE/L)/n_stems | miPred |
| 73 | dP = tot_bases/L | tot_bases is number of base pairs in the secondary structure. | miPred |
| 74 | MFE3 | (MFE/L)/n_loops | microPred |
| 75–77 | |X − Y|/L | |X − Y| is the number of base pairs, (X − Y)∈[(A − U), (G − C), (G − U)] | microPred |
| 78–80 | %(X − Y)/n_stems | %(X − Y) = |X − Y|/n_stems × 100 | microPred |
| 81 | Avg_bp_stem1 | tot_bases/n_stems | microPred |
| 82 | pb/nb | paired nucleotide/unpaired nucleotide | miRD |
| 83 | MCPN | Maximum of consecutive paired nucleotides. | ZmirP |
| 84 | n_bugles/L | n_bulges is the total number of bulges in the secondary structure. | ZmirP |
| 85 | Avg_bp_stem2 | The ratio of number of base pairs to n_stems | ZmirP |
| 86 | MFE4 | dG/tot_bases | ZmirP |
| 87 | MFE5 | dG/n_bugles | ZmirP |
| 88–167 | milRP | ||
| 168–193 | Knowledge-based energy score3 | Calculated using the distance-dependent | milRP |
Figure 1Flowchart of plantMirP2.
Figure 2Webpage description and function introduction of plantMirP2′s webserver.
Figure 3ROC (Receiver Operating Characteristic) curves and corresponding AUC (Area Under Curve) values of the 4-, 6-, 8- and 10-fold CVs (Cross-Validations) based on the training dataset and the independent testing dataset.
Figure 4Venn diagram results for the top predictions of plantMirP2, plantMirP and riceMirP.
Figure 5Based on the same training dataset and testing dataset, indicators of plantMirP, riceMirP and plantMirP2 were compared.
Figure 6The ROC curves of plantMirP, riceMirP and plantMirP2.
Figure 7Based on the miRbase (release 21) training dataset and the miRbase (release 22.1) testing dataset, indicators of plantMirP, riceMirP and plantMirP2 were compared.
Figure 8Venn diagram results for the top predictions of plantMirP2, plantMirP and riceMirP based on the miRbase (release 21) training dataset and the miRbase (release 22.1) testing dataset.
Figure 9Based on the same dataset of miPlantPreMat, indicators of miPlantPreMat and plantMirP2 were compared.
Figure 10Based on the training and testing datasets from PlantMiRNAPred, the prediction accuracies of plantMirP, plantMirP2, PlantMiRNAPred, Triplet-SVM and microPred were compared.