| Literature DB >> 25547126 |
Jun Meng1, Dong Liu2, Chao Sun3, Yushi Luan4.
Abstract
BACKGROUND: MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25547126 PMCID: PMC4310204 DOI: 10.1186/s12859-014-0423-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Original pre-miRNA and intercepted pre-miRNA of miR-166b. Mature miRNA is at 3’-end and miRNA* is at 5’-end of the selected sequence. Each base has two states, match or mismatch. Each precursor contains one loop at least. The original pre-miRNA has 201 bases with the MFE −76.92 kcal/mol and the intercepted pre-miRNA has 138 bases with the MFE −51.72 kcal/mol.
Selected pre-miRNA features
|
|
|
|
|---|---|---|
| MFE-related | 9 | MFEI12, MFEI22, MFEI33, MFEI43, MFEI54, MFEI64, MFEI75, MFEI85, MFEI95 |
| Sequence-related | 20 | %AA,%AC, etc.2 (16),%G + C2, Avg_mis_num4 Mis_num_begin5, Mis_num_end5 |
| Mfold-related | 6 | dS3, dS/L3, dH3, dH/L3, Tm3, Tm/L3 |
| Base-pair -related | 7 | |A-U|/L3,|C-G|/L3, |G-U|/L3, Avg_BP_Stem3, %(A − U)/n_stems3, %(G − C)/n_stems3 |
| Triple-related | 96 | A(((_S, A((._S, etc.1 (32), A(((_begin_S, A((._begin _S, etc.5 (32), A(((_end _S, A((._end _S, etc.5 (32) |
| RNAfold-related | 14 | dP2, dG2, dD2, dQ2, dF2, zP2, zG2, zD2, zQ2, zF2,NEFE3, Freq3, Diversity3, Diff3 |
1Features extracted in triplet-SVM.
2Features extracted in miPred.
3Features extracted in microPred.
4Features extracted in plantMiRNAPred.
5Features extracted in miPlantPreMat.
Figure 2Flow chart of the classification model miPlantPreMat for use with plant miRNAs. Construction of SVM classifier MiPlantPreMat based on feature selection and sample selection was shown.
Information gain of each attribute and SVM-RFE ranking
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| 0.78628 | 1 |
| 0.09652 | 58 |
|
| 0.77982 | 2 |
| 0.0933 | 103 |
|
| 0.75613 | 3 |
| 0.07866 | 30 |
|
| 0.68656 | 54 |
| 0.07662 | 74 |
|
| 0.66704 | 48 |
| 0.072 | 13 |
| … | … | … | … | … | … |
| % | 0.12375 | 38 |
| 0.07866 | 30 |
|
| 0.1227 | 25 |
| 0.07662 | 74 |
|
| 0.11855 | 77 |
| 0.072 | 13 |
| % | 0.11651 | 8 | %( | 0.07079 | 44 |
|
| 0.11603 | 15 |
| 0.06746 | 93 |
|
| 0.11563 | 34 | % | 0.06041 | 28 |
|
| 0.11034 | 139 |
| 0.05969 | 101 |
|
| 0.10372 | 127 |
| 0.05779 | 53 |
Figure 3Flow chart of B-SVM-RFE feature selection. Feature subset was selected using B-SVM-RFE. This method was combined by SVM-RFE and information gain. The final feature subset for miPlantPreMat was obtained.
Figure 4Determination of the best feature subset. Two indicators named LooErrorRate and TestErrorRate were used for the best subset evaluation. The LooErrorRate was calculated with 5-fold cross validation model. The TestErrorRate was calculated by independent training set and testing set with optimized parameters. The parameters of penalty coefficient c and the kernel function parameter g were obtained by grid search method.
LooErrorRate and TestErrorRate of SVM-RFE and B-SVM-RFE
|
|
|
| ||
|---|---|---|---|---|
|
|
|
|
| |
| 1 | 21.13 | 26.53 | 21.13 | 26.53 |
| 2 | 11.40 | 21.01 | 11.40 | 21.01 |
| 3 | 9.91 | 20.94 | 9.91 | 20.94 |
| … | … | … | … | … |
| 46 | 3.04 | 7.15 | 2.72 | 7.15 |
| 47 | 2.84 | 7.34 | 2.42 | 7.04 |
| 48 | 2.72 | 7.14 | 2.72 | 7.14 |
| … | … | … | … | … |
| 150 | 3.00 | 8.17 | 3.00 | 8.17 |
| 151 | 3.19 | 8.29 | 3.19 | 8.29 |
| 152 | 3.30 | 7.30 | 3.30 | 7.30 |
Classification results based on different feature subsets using three methods
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||
| miPlantPre | NBC | PCA | 76 | 92.2 | 92.6 | 92.4 | 92.4 |
| CFS | 20 | 93.9 | 97.8 | 95.8 | 95.8 | ||
| B-SVM-RFE | 47 | 93.8 | 98.6 | 96.2 | 96.2 | ||
| All features | 152 | 92.9 | 98.0 | 95.4 | 95.4 | ||
| RF | PCA | 76 | 93.5 | 95.3 | 94.4 | 94.4 | |
| CFS | 20 | 95.0 | 97.6 | 96.3 | 96.3 | ||
| B-SVM-RFE | 47 | 95.3 | 97.7 | 96.5 | 96.5 | ||
| All features | 152 | 95.3 | 97.7 | 96.5 | 96.5 | ||
| SVM | PCA | 76 | 94.9 | 99.2 | 97.0 | 97.0 | |
| CFS | 20 | 94.3 | 99.1 | 96.7 | 96.7 | ||
| B-SVM-RFE | 47 | 95.5 | 99.1 | 97.2 | 97.2 | ||
| All features | 152 | 93.9 | 98.5 | 96.2 | 96.2 | ||
| miPlantMat | NBC | PCA | 71 | 88.6 | 82.3 | 85.5 | 85.4 |
| CFS | 40 | 93.2 | 74.8 | 83.6 | 83.5 | ||
| B-SVM-RFE | 63 | 89.8 | 88.4 | 89.1 | 89.1 | ||
| All features | 152 | 91.7 | 79.3 | 85.5 | 85.3 | ||
| RF | PCA | 71 | 93.2 | 73.2 | 83.2 | 82.6 | |
| CFS | 40 | 89.2 | 89.1 | 89.2 | 89.2 | ||
| B-SVM-RFE | 63 | 89.7 | 88.6 | 89.2 | 89.2 | ||
| All features | 152 | 86.6 | 84.4 | 85.5 | 85.5 | ||
| SVM | PCA | 71 | 88.6 | 84.3 | 86.4 | 86.4 | |
| CFS | 40 | 90.6 | 87.5 | 89.1 | 89.1 | ||
| B-SVM-RFE | 63 | 92.9 | 88.7 | 90.8 | 90.8 | ||
| All features | 152 | 87.1 | 81.6 | 84.4 | 84.4 | ||
Comparison of miPlantPre against other methods
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| Triplet-SVM | 163 | 168 | 30 | 1000 | 32 | 93.30 | 88.10 | 90.66 | 90.66 |
| MiPred | 163 | 168 | 263 | 265 | 34 | 89.35 | 93.21 | 91.26 | 91.26 |
| miPred | 200 | 400 | 123 | 146 | 34 | 84.55 | 97.97 | 91.01 | 91.01 |
| miRabela |
| 71.00 | 97.00 | 82.99 | 82.99 | ||||
| microPred |
| 21 | 90.02 | 97.28 | 93.58 | 93.58 | |||
| plantMiRNAPred |
| 68 | 91.93 | 97.84 | 94.84 | 94.84 | |||
| miPlantPre |
| 47 | 95.50 | 98.82 | 97.16 | 97.16 | |||
The classification accuracy of four methods for the pre-miRNA of several plants species and for the negative dataset
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| aly | 94.29 | 96.19 | 96.19 | 99.05 |
| ath | 91.75 | 90.72 | 92.78 | 96.91 |
| gma | 91.18 | 92.65 | 93.93 | 95.89 |
| mtr | 85.90 | 88.46 | 89.74 | 90.60 |
| osa | 92.31 | 95.10 | 95.10 | 95.10 |
| ppt | 88.44 | 91.16 | 97.96 | 98.64 |
| sbi | 93.38 | 97.79 | 96.99 | 98.53 |
| sly | 97.14 | 100.00 | 100.00 | 100.00 |
| zma | 89.74 | 97.44 | 97.44 | 98.29 |
| neg | 94.80 | 97.80 | 98.20 | 98.60 |
The classification results obtained using miPlantMat for various pre-miRNA datasets
|
|
| |
|---|---|---|
|
|
| |
| aly | 89.46 | 9.46 |
| ath | 87.84 | 10.53 |
| gma | 89.50 | 13.36 |
| mtr | 87.67 | 12.22 |
| osa | 88.96 | 10.31 |
| ppt | 90.98 | 10.46 |
| sbi | 89.02 | 9.53 |
| sly | 89.87 | 8.36 |
| zma | 91.42 | 10.93 |
Figure 5Number of predicted members and reported number in The number of predicted members which is more than 4 and the corresponding reported number in Solanum lycopersicum.