| Literature DB >> 30470762 |
Jianwei Li1,2, Yan Huang1, Xiaoyue Yang1, Yiran Zhou2, Yuan Zhou3.
Abstract
5-methylcytosine (m5C) is a common nucleobase modification, and recent investigations have indicated its prevalence in cellular RNAs including mRNA, tRNA and rRNA. With the rapid accumulation of m5C sites data, it becomes not only feasible but also important to build an accurate model to predict m5C sites in silico. For this purpose, here, we developed a web-server named RNAm5Cfinder based on RNA sequence features and machine learning method to predict RNA m5C sites in eight tissue/cell types from mouse and human. We confirmed the accuracy and usefulness of RNAm5Cfinder by independent tests, and the results show that the comprehensive and cell-specific predictors could pinpoint the generic or tissue-specific m5C sites with the Area Under Curve (AUC) no less than 0.77 and 0.87, respectively. RNAm5Cfinder web-server is freely available at http://www.rnanut.net/rnam5cfinder .Entities:
Year: 2018 PMID: 30470762 PMCID: PMC6251864 DOI: 10.1038/s41598-018-35502-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Performance comparison between RNAm5Cfinder comprehensive predictor and other available servers on independent test.
Figure 2The comparison between one-hot encoding and Feng’s encoding on independent test.
AUC of independent test of different predictors.
| Cell types | AUC |
|---|---|
| mouse_ESC | 0.902 |
| mouse_Heart | 0.772 |
| mouse_Kidney | 0.768 |
| mouse_Liver | 0.768 |
| mouse_Muscle | 0.767 |
| mouse_Small-Intestine | 0.769 |
| mouse_Brain | 0.775 |
| human_Hela | 0.765 |
Figure 3The results of intra- and inter-tissue independent tests for each tissue specific predictor. (A) The color correlates with the performance (AUC). (B) ESC, embryonic stem cell.
The sample size of different tissues’ training and test datasets.
| Tissues | Training set | Test set | ||
|---|---|---|---|---|
| pos | neg | pos | neg | |
| Comprehensive | 19,798 | 593,941 | 6636 | 1,924,243 |
| ESCa-specific | 3440 | 103,201 | 828 | 299,610 |
| Heart-specific | 12,703 | 381,091 | 100 | 30,433 |
| Kidney-specific | 12,700 | 381,001 | 122 | 37,088 |
| Liver-specific | 11,937 | 358,111 | 125 | 37,844 |
| Muscle-specific | 11,826 | 354,781 | 118 | 36,519 |
| Small-Intestine-specific | 11,372 | 341,161 | 107 | 32,170 |
| Brain-specific | 19,141 | 424,231 | 472 | 155,409 |
The ratio of the positives and the negatives of the training set and test set were set to 1:30 and 1:all respectively. As for the test sets of tissue-specific predictors, samples which were used to train predictors for the other tissues were discarded.
aESC, embryonic stem cell.
Performance of different machine learning algorithm.
| Algorithm | AUC |
|---|---|
| logistic regression | 0.700 |
| naïve Bayes | 0.686 |
| Decision Tree | 0.726 |
| Random forest | 0.773 |