| Literature DB >> 32570706 |
Huiyu Zhang1, Hua Wang2, Yuangen Yao1, Ming Yi3.
Abstract
Rice microRNAs (miRNAs) are important post-transcriptional regulation factors and play vital roles in many biological processes, such as growth, development, and stress resistance. Identification of these molecules is the basis of dissecting their regulatory functions. Various machine learning techniques have been developed to identify precursor miRNAs (pre-miRNAs). However, no tool is implemented specifically for rice pre-miRNAs. This study aims at improving prediction performance of rice pre-miRNAs by constructing novel features with high discriminatory power and developing a training model with species-specific data. PlantMirP-rice, a stand-alone random forest-based miRNA prediction tool, achieves a promising accuracy of 93.48% based on independent (unseen) rice data. Comparisons with other competitive pre-miRNA prediction methods demonstrate that plantMirP-rice performs better than existing tools for rice and other plant pre-miRNA classification.Entities:
Keywords: knowledge-based energy feature; microRNA; prediction; random forest; rice
Mesh:
Substances:
Year: 2020 PMID: 32570706 PMCID: PMC7349308 DOI: 10.3390/genes11060662
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1The flowchart of feature extraction for knowledge-based energy score 1.
Full features used in plantMirP-rice (riceMirP)
| No. | Feature | Description | Origin |
|---|---|---|---|
| 1–34 | Energy score 1 | Obtained from position-dependent potentials with character pair | Novel |
| 35 | Energy score 2 | Obtained from distance-dependent potentials with 3-mer pairs. | plantMirP |
| 36–45 | Ratio of Unpaired bases in sub-region | The secondary structure is divided into 10 parts, and the ratio in each part is calculated. | plantMirP |
| 46 | Size of biggest bulge | A bulge contains at least three adjacent unpaired bases. | plantMirP |
| 47 | n_loops/L | n_loops denotes the number of loops, L is the length of sequence. | plantMirP |
| 48 | n_stems/L | A stem consists of at least three continuous paired bases. | plantMirP |
| 49 | %(|G| + |C|) | (|G| + |C|)/L * 100, here |X| denotes the number of X in sequence. | miPred |
| 50–65 | %XY | |XY|/(L − 1) * 100, |XY| is number of dinucleotide XY in sequence. | miPred |
| 66 | dG | MFE/L, MFE is minimum of free energy of the secondary structure. | miPred |
| 67 | MFE1 | (MFE/L)/%(|G| + |C|) | miPred |
| 68 | MFE2 | (MFE/L)/n_stems | miPred |
| 69 | dP = tot_bases/L | tot_bases is number of base pairs in the secondary structure. | miPred |
| 70 | MFE3 | (MFE/L)/n_loops | microPred |
| 71–73 | |X − Y|/L | |X − Y| is the number of base pairs, (X − Y)∈[(A − U), (G − C), (G − U)] | microPred |
| 74 | Avg_bp_stem | tot_bases/n_stems, n_stems denotes the number of stems. | microPred |
| 75–77 | %(X − Y)/n_stems | %(X − Y) = |X − Y|/tot_bases | microPred |
| 78 | pb/nb | The ratio of paired nucleotides to unpaired nucleotides. | miRD |
| 79 | MCPN | Maximum of consecutive paired nucleotides. | ZmirP [ |
| 80 | n_bulges/L | n_bulges is the total number of bulges in the secondary structure. | ZmirP |
| 81 | Avg_bp_stem | The ratio of number of base pairs to n_stems. | ZmirP |
| 82 | MFE4 | dG/tot_bases | ZmirP |
| 83 | MFE5 | dG/n_bulges | ZmirP |
Figure 2Receiver operating characteristic (ROC) curves from riceMirP prediction performance based on training dataset of riceMirP.
Figure 3Sensitivity (Se), specificity (Sp), accuracy (Ac), and Matthew’s correlation coefficient (MCC) of riceMirP based on independent (unseen) testing dataset.
Figure 4Comparison of riceMirP with plantMirP based on training dataset of riceMirP.
Figure 5Comparison of riceMirP with plantMirP based on the training dataset of plantMirP.
Figure 6Accuracy (Ac), sensitivity (Se), and specificity (Sp) of riceMirP and miPlantPreMat based on the datasets from miPlantPreMat and PlantMiRNAPred.
Figure 7Comparison of riceMirP with triplet-support vector machine (SVM), microPred, and PlantMiRNAPred based on the training and testing datasets from PlantMiRNAPred. The classification results reported previously are directly used for comparison [24].