| Literature DB >> 27635401 |
Limin Jiang1, Jingjun Zhang2, Ping Xuan3, Quan Zou4.
Abstract
MicroRNAs (miRNAs) are a set of short (21-24 nt) noncoding RNAs that play significant regulatory roles in cells. In the past few years, research on miRNA-related problems has become a hot field of bioinformatics because of miRNAs' essential biological function. miRNA-related bioinformatics analysis is beneficial in several aspects, including the functions of miRNAs and other genes, the regulatory network between miRNAs and their target mRNAs, and even biological evolution. Distinguishing miRNA precursors from other hairpin-like sequences is important and is an essential procedure in detecting novel microRNAs. In this study, we employed backpropagation (BP) neural network together with 98-dimensional novel features for microRNA precursor identification. Results show that the precision and recall of our method are 95.53% and 96.67%, respectively. Results further demonstrate that the total prediction accuracy of our method is nearly 13.17% greater than the state-of-the-art microRNA precursor prediction software tools.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27635401 PMCID: PMC5011242 DOI: 10.1155/2016/9565689
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Corresponding training results with different numbers of nodes in the hidden layers.
| Hidden layers | Training times | Training errors | Hidden layers | Training times | Training errors |
|---|---|---|---|---|---|
| 11 | 43 | 9.57718 | 12 | 39 | 9.88418 |
| 13 | 17 | 9.42136 | 14 | 65 | 9.92537 |
| 15 | 34 | 9.88206 | 16 | 74 | 8.38658 |
| 17 | 48 | 7.82527 | 18 | 157 | 6.63468 |
| 19 | 7 | 9.46711 | 20 | 47 | 9.3627 |
Figure 1Topology structure of the BP neural network.
Basic parameters of the classifier based on BP neural network.
| Setting items | The value set |
|---|---|
| The learning rate | 0.1 |
| Error bounds | 0.0001 |
| The number of iterations | 1000 |
| Transfer function of hidden layer nodes | Tansig |
| Transfer function of output nodes | Purelin |
| The training function | Trainlm |
Figure 2Process flow of model generation and training.
Measurements for the classification problems.
| Classification result | ||
|---|---|---|
| Actual result | Forecast result | |
| P | N | |
| P | TP | FN |
| N | FP | TN |
Comparison of classification results based on different feature sets.
| Features | SP (%) | SE (%) | Gm (%) | ACC (%) |
|---|---|---|---|---|
| B | 67.89 | 68.25 | 68.07 | 68.00 |
| C | 92.74 | 76.42 | 84.19 | 88.03 |
| A + B | 91.79 | 90.41 | 91.10 | 91.31 |
| A + C | 94.03 | 80.85 | 87.19 | 89.67 |
| B + C | 96.12 | 85.21 | 90.50 | 92.49 |
| A + B + C | 96.33 | 86.51 | 91.29 | 93.42 |
Notes: A: energy feature and structural diversity; B: 32-dimensional triad structure characteristic; C: 64-dimensional n-gram frequency characteristics.
Figure 3Different test results for varying sample quantities.
Figure 4Different test results for same sample quantity.
Evaluation of the reference index.
| Training sample | Test sample | Output sample | Correct sample | Precision (%) | Recall (%) | Gm (%) | |
|---|---|---|---|---|---|---|---|
| D | |||||||
| Positive | 553 | 138 | 128 | 124 | 96.0 | 90.0 | 93.43 |
| Negative | 1150 | 287 | 296 | 282 | 95.38 | 98.19 | |
|
| |||||||
| E | |||||||
| Positive | 552 | 138 | 136 | 128 | 94.10 | 92.82 | 93.98 |
| Negative | 552 | 138 | 140 | 130 | 92.87 | 94.12 | |
Note: D: sample set has different numbers of positive and negative samples; E: the sample set has equal numbers of positive and negative samples; correct sample: the number of correct predictions.
Comparison of the BP with alternative models.
| ACC | Precision | Recall | MCC | |
|---|---|---|---|---|
| BP | 95.53% | 96.00% | 96.67% | 0.8662 |
| GBDT | 94.27% | 94.76% | 96.87% | 0.8682 |
| LibSVM | 93.52% | 93.60% | 93.50% | 0.8510 |
| LibD3C | 93.52% | 93.50% | 93.50% | 0.8510 |
| Adaboost | 91.82% | 91.80% | 91.80% | 0.8120 |
| Random forest | 90.13% | 90.00% | 90.10% | 0.7720 |
| J48 | 87.78% | 87.70% | 87.80% | 0.7200 |
| String kernel SVM | 81.89% | 99.37% | 46.31% | 0.6002 |
Figure 5Comparison results of different models.
Figure 6Test comparison results for six different species.