| Literature DB >> 16381612 |
Chenghai Xue1, Fei Li, Tao He, Guo-Ping Liu, Yanda Li, Xuegong Zhang.
Abstract
BACKGROUND: MicroRNAs (miRNAs) are a group of short (approximately 22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16381612 PMCID: PMC1360673 DOI: 10.1186/1471-2105-6-310
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Using the triplet elements to represent the local structure-sequence features of the hairpin. The triplet element is composed of the 3 continuous sub-structures and the nucleotide type at the middle. The appearances of all 32 possible triplet elements are counted along a hairpin segment, forming a 32-dimensional vector, which is then normalized to be the input vector for SVM.
Classification performance of the triplet-SVM classifier on test sets TE-C, CONSERVED-HAIRPIN and UPDATED.
| Test set | Type | Size | Accuracy (%) |
| TE-C | Real1 | 30 | 93.3 |
| Pseudo2 | 1000 | 88.1 | |
| CONSERVED-HAIRPIN | Pseudo2 | 2444 | 89.0 |
| UPDATED | Real1 | 39 | 92.3 |
1Real: real human pre-miRNAs.
2Pseudo: pseudo pre-miRNA hairpins.
Figure 2The average appearance frequencies of the triplet elements in the two classes (real pre-miRNA vs. pseudo-miRNA hairpins).
The discriminative power of top 15 triplet elements. The discriminative power of the triplet element features that distinguish pre-miRNAs from other similar hairpins are calculated using the F value and the 15 most discriminative triplet elements are listed here. The μ+, μ- and σ+, σ- are the means and standard deviations of the elements in the two classes estimated with the training dataset
| Triplet elements | Pre-miRNAs | Other hairpins | |||
| 0.121 | 0.042 | 0.063 | 0.032 | 0.792 | |
| 0.154 | 0.048 | 0.089 | 0.040 | 0.734 | |
| 0.006 | 0.011 | 0.025 | 0.030 | 0.475 | |
| 0.008 | 0.014 | 0.025 | 0.025 | 0.429 | |
| 0.007 | 0.011 | 0.021 | 0.023 | 0.397 | |
| 0.042 | 0.025 | 0.063 | 0.031 | 0.383 | |
| 0.009 | 0.011 | 0.019 | 0.017 | 0.353 | |
| 0.032 | 0.022 | 0.048 | 0.027 | 0.329 | |
| 0.011 | 0.012 | 0.020 | 0.016 | 0.316 | |
| 0.151 | 0.038 | 0.127 | 0.040 | 0.303 | |
| 0.013 | 0.013 | 0.022 | 0.019 | 0.295 | |
| 0.006 | 0.011 | 0.014 | 0.019 | 0.289 | |
| 0.007 | 0.010 | 0.014 | 0.015 | 0.266 | |
| 0.040 | 0.020 | 0.050 | 0.024 | 0.231 | |
| 0.119 | 0.030 | 0.105 | 0.034 | 0.230 | |
Prediction accuracy on test set CROSS-SPECIES by SVM trained with human data.
| Species | # of pre-miRNAs | Accuracy (%) |
| 36 | 94.4 | |
| 25 | 80 | |
| 13 | 84.6 | |
| 6 | 66.7 | |
| 73 | 95.9 | |
| 110 | 86.4 | |
| 71 | 90.1 | |
| 71 | 91.5 | |
| 96 | 94.8 | |
| 75 | 92 | |
| 5 | 100 | |
| Total | 581 | 90.9 |