| Literature DB >> 23826117 |
Tapas Bhadra1, Malay Bhattacharyya, Lars Feuerbach, Thomas Lengauer, Sanghamitra Bandyopadhyay.
Abstract
Predicting the transcription start sites (TSSs) of microRNAs (miRNAs) is important for understanding how these small RNA molecules, known to regulate translation and stability of protein-coding genes, are regulated themselves. Previous approaches are primarily based on genetic features, trained on TSSs of protein-coding genes, and have low prediction accuracy. Recently, a support vector machine based technique has been proposed for miRNA TSS prediction that uses known miRNA TSS for training the classifier along with a set of existing and novel CpG island based features. Current progress in epigenetics research has provided genomewide and tissue-specific reports about various phenotypic traits. We hypothesize that incorporating epigenetic characteristics into statistical models may lead to better prediction of primary transcripts of human miRNAs. In this paper, we have tested our hypothesis on brain-specific miRNAs by using epigenetic as well as genetic features to predict the primary transcripts. For this, we have used a sophisticated feature selection technique and a robust classification model. Our prediction model achieves an accuracy of more than 80% and establishes the potential of epigenetic analysis for in silico prediction of TSSs.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23826117 PMCID: PMC3691241 DOI: 10.1371/journal.pone.0066722
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Performance of the brain-tissue specific miRNA TSS prediction model with and without methylation-based features alongside the other features.
| Feature Set | #Features | Classifier Performance | ||
| Criteria |
|
| ||
|
| 90.65 | 1.20 | ||
|
| 68.10 | 4.79 | ||
|
| 371 |
| 96.65 | 0.60 |
|
| 84.35 | 2.74 | ||
|
| 0.70 | 0.04 | ||
|
| 91.85 | 1.31 | ||
|
| 70.71 | 4.77 | ||
|
| 385 |
| 97.47 | 0.52 |
|
| 88.07 | 2.75 | ||
|
| 0.74 | 0.04 | ||
The and denote mean and standard deviation values of the respective performance metrics.
Analysis of the importance of features by F-score.
| Summary | Feature Type | |||||||||
| Statistics | NM | NM-CG | NM-1 | NM-2 | NM-3 | NM-4 | CI | PL | S | MT |
| Minimum Rank | 3 | 7 | 16 | 7 | 3 | 11 | 4 | 130 | 1 | 94 |
| Maximum Rank | 385 | 373 | 77 | 352 | 382 | 385 | 53 | 253 | 43 | 300 |
| Average Rank | 207.7 | 118.82 | 51.5 | 155.69 | 165.33 | 228.61 | 26.92 | 185.5 | 15.33 | 160.93 |
Analysis of the importance of features by VWMRmR feature selection.
| Summary | Feature Type | |||||||||
| Statistics | NM | NM-CG | NM-1 | NM-2 | NM-3 | NM-4 | CI | PL | S | MT |
| Minimum Rank | 2 | 2 | 55 | 7 | 10 | 2 | 4 | 212 | 1 | 16 |
| Maximum Rank | 385 | 280 | 372 | 380 | 385 | 381 | 124 | 301 | 9 | 341 |
| Average Rank | 206.06 | 110.48 | 188.25 | 214.25 | 228.48 | 203.49 | 44 | 270.25 | 4.33 | 149.64 |
Comparison of the performance of three existing gene TSS prediction algorithms along with our proposed method in predicting brain-tissue specific miRNA TSS.
| Training | Classifier Performance based on the Features | |||||
| Algorithm | Sample Type |
|
|
|
|
|
|
| Gene TSSs | 80.86 | 63.33 | 97.78 | 96.61 | 0.65 |
|
| Gene TSSs | 81.67 | 74.44 | 88.89 | 87.01 | 0.64 |
|
| Gene TSSs | 72.78 | 45.56 | 100 | 100 | 0.54 |
|
| miRNA TSSs | 87.22 | 81.11 | 93.33 | 92.41 | 0.75 |
Best mean values of the percentage accuracy, sensitivity, specificity, precision and are shown in bold.