| Literature DB >> 25708359 |
Kai-Yao Huang, Tzong-Yi Lee, Yu-Chuan Teng, Tzu-Hao Chang.
Abstract
BACKGROUND: microRNAs (miRNAs) play a vital role in development, oncogenesis, and apoptosis by binding to mRNAs to regulate the posttranscriptional level of coding genes in mammals, plants, and insects. Recent studies have demonstrated that the expression of viral miRNAs is associated with the ability of the virus to infect a host. Identifying potential viral miRNAs from experimental sequence data is valuable for deciphering virus-host interactions. Thus far, a specific predictive model for viral miRNA identification has yet to be developed. METHODS ANDEntities:
Mesh:
Substances:
Year: 2015 PMID: 25708359 PMCID: PMC4331708 DOI: 10.1186/1471-2105-16-S1-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Characteristics of tools for identifying pre-miRNAs.
| Tool | Classifier | Used features |
|
| References |
|---|---|---|---|---|---|
| SVM | Each hairpin is encoded as a set of 32 triplet elements | 93.3 | 88.1 | Xue et al. [ | |
| Random forest | 32 Triplet-SVM features and a minimum of the free energy of the secondary structure | 89.3 | 93.2 | Jiang et al. [ | |
| SVM | 17 primary sequencing features, 5 secondary structural features, and 7 normalized features | 84.5 | 97.9 | Ng and Mishra [ | |
| RVKDE | 29 miPred features and 4 stem-loop features | 88.9 | 92.6 | Chang et al. [ | |
| SVM | 29 miPred features, 4 RNAfold-related features, 6 Mfold-related features, 7 base-pair-related features, and 2 MFE-related features | 83.3 | 99.0 | Batuwita et al. [ | |
| SVM | 8 triplet structural features, 8 base-pair group features, 16 thermodynamic group features | 87.7 | 98.8 | Ding et al. [ | |
| Naïve Bayes | 4 mononucleotide features, 16 dinucleotide features, 20 triplet structural features, consecutive paired bases, structural profile scoring, and normalized sequence-based total-pairing features | 89.8 | 91.5 | Ashwani Jha et al. [ | |
SN: sensitivity; SP: specificity
F-scores of the 54 features.
| Feature | Feature | Feature | |||
|---|---|---|---|---|---|
| 1.09 | 0.83 | 0.63 | |||
| 1.08 | 0.81 | 0.63 | |||
| 1.04 | 0.80 | 0.61 | |||
| 1.03 | 0.78 | 0.60 | |||
| 1.01 | 0.78 | 0.58 | |||
| 1.01 | 0.77 | 0.58 | |||
| 1 | 0.76 | 0.57 | |||
| 1 | 0.75 | 0.56 | |||
| 1 | 0.72 | 0.54 | |||
| 0.99 | 0.71 | 0.51 | |||
| 0.97 | 0.71 | 0.48 | |||
| 0.97 | 0.70 | 0.47 | |||
| 0.96 | 0.69 | 0.43 | |||
| 0.95 | 0.66 | 0.43 | |||
| 0.94 | 0.66 | 0.41 | |||
| 0.94 | 0.66 | 0.37 | |||
| 0.91 | 0.66 | 0.34 | |||
| 0.87 | 0.65 | 0.32 |
Classification results of the SVM model.
| Negative dataset |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Virus genome | 213 | 661 | 128 | 50 | 80.98% | 83.77% | 83.07% |
| Pseudo-8494 | 202 | 8403 | 91 | 61 | 76.80% | 98.92% | 98.26% |
| Human pre-miRNA | 204 | 1498 | 102 | 59 | 77.56% | 93.62% | 91.35% |
Classification results of the random forest model.
| Negative dataset |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Virus genome | 215 | 669 | 120 | 48 | 81.74% | 84.79% | 84.03% |
| Pseudo-8494 | 198 | 8306 | 188 | 65 | 75.28% | 97.78% | 97.11% |
| Human pre-miRNA | 203 | 1464 | 136 | 60 | 77.18% | 91.50% | 89.47% |
Classification results of the SVM model using the 40 features with the highest F-scores.
| Negative dataset |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Virus genome | 224 | 690 | 99 | 39 | 85.17% | 87.45% | 86.88% |
| Pseudo-8494 | 207 | 8389 | 105 | 56 | 78.70% | 98.76% | 98.16% |
| Human pre-miRNA | 211 | 1487 | 113 | 52 | 80.22% | 92.93% | 91.14% |
Classification results of the random forest model using the 40 features with the highest F-scores.
| Negative dataset |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Virus genome | 219 | 686 | 103 | 44 | 83.26% | 86.94% | 86.02% |
| Pseudo-8494 | 201 | 8368 | 126 | 62 | 76.42% | 98.51% | 97.85% |
| Human pre-miRNA | 208 | 1473 | 127 | 55 | 79.08% | 92.06% | 90.23% |
Performance comparison with previous studies using a partial dataset.
| Tool | Positive dataset/negative dataset |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 44 | 171 | 18 | 19 | 69.84% | 90.47% | 85.32% | 80.15% | 0.61 | ||
| 41 | 175 | 14 | 22 | 65.07% | 92.59% | 85.71% | 78.83% | 0.60 | ||
| 42 | 177 | 12 | 21 | 66.66% | 93.65% | 86.90% | 80.16% | 0.64 | ||
| 63/189 | 39 | 176 | 13 | 24 | 61.90% | 93.18% | 85.31% | 77.51% | 0.59 | |
| 48 | 159 | 30 | 15 | 76.54% | 84.12% | 82.14% | 80.16% | 0.56 | ||
| 45 | 161 | 28 | 18 | 71.45% | 85.21% | 81.75% | 78.30% | 0.54 | ||
| 46 | 166 | 23 | 17 | 73.01% | 87.83% | 84.13% | 80.42% | 0.59 | ||
| 50 | 164 | 25 | 13 | 79.36% | 86.77% | 84.92% | 83.06% | 0.63 | ||
Performance comparison with previous studies using newly released data from miRBase.
| Tool | Positive dataset/negative dataset |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 22 | 88 | 8 | 10 | 68.75% | 91.67% | 85.94% | 80.21% | 0.62 | ||
| 20 | 79 | 17 | 12 | 62.50% | 82.29% | 77.34% | 72.40% | 0.43 | ||
| 24 | 85 | 11 | 8 | 75.00% | 88.54% | 85.16% | 81.77% | 0.62 | ||
| 32/96 | 23 | 81 | 15 | 9 | 71.88% | 84.38% | 81.25% | 78.13% | 0.53 | |
| 23 | 86 | 10 | 9 | 71.88% | 89.58% | 85.16% | 80.73% | 0.61 | ||
| 19 | 81 | 15 | 13 | 59.38% | 84.38% | 78.13% | 71.88% | 0.43 | ||
| 22 | 82 | 14 | 10 | 68.75% | 85.42% | 81.25% | 77.08% | 0.52 | ||
| 25 | 85 | 11 | 7 | 78.13% | 88.54% | 85.94% | 83.33% | 0.64 | ||
Figure 1Web interface of ViralmiR.
Number of hairpin-like shapes and non-hairpin-like shapes in prediction results
| True-positive predictions | False-negative predictions | |||
|---|---|---|---|---|
| Hairpin-like | Non-hairpin-like | Hairpin-like | Non-hairpin-like | |
| SVM model | 210 (93%) | 14 (7%) | 9 (23%) | 30 (77%) |
| Random forest model | 208 (95%) | 11 (5%) | 11 (25%) | 33 (75%) |