| Literature DB >> 27723837 |
Shunian Xiang1, Ke Liu1,2, Zhangming Yan1, Yaou Zhang3, Zhirong Sun1.
Abstract
N6-Methyladenosine (m6A) is the most common mRNA modification; it occurs in a wide range of taxon and is associated with many key biological processes. High-throughput experiments have identified m6A-peaks and sites across the transcriptome, but studies of m6A sites at the transcriptome-wide scale are limited to a few species and tissue types. Therefore, the computational prediction of mRNA m6A sites has become an important strategy. In this study, we integrated multiple features of mRNA (flanking sequences, local secondary structure information, and relative position information) and trained a SVM classifier to predict m6A sites in mammalian mRNA sequences. Our method achieves ideal performance in both cross-validation tests and rigorous independent dataset tests. The server also provides a comprehensive database of predicted transcriptome-wide m6A sites and curated m6A-seq peaks from the literature for both human and mouse, and these can be queried and visualized in a genome browser. The RNAMethPre web server provides a user-friendly tool for the prediction and query of mRNA m6A sites, which is freely accessible for public use at http://bioinfo.tsinghua.edu.cn/RNAMethPre/index.html.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27723837 PMCID: PMC5056760 DOI: 10.1371/journal.pone.0162707
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1RNAMethPre Workflow.
Positive and negative datasets were obtained (Step 1). Features of the datasets were extracted to obtain 366-dimensional vectors for each site as training data. The SVM classifier was trained to generate the SVM model and the performance of the model was evaluated (Step 2). Human transcriptome-wide m6A sites were predicted and a web server was constructed (Step 3).
Fig 2Overall Performances of Mammalian Classifiers Based on 5-fold Cross-validation Tests.
(A) The ROC curve illustrating the performance for full transcript mode. (B) The ROC curve illustrating the performance for mature mRNA mode.
Fig 3Performances of the Mammalian Classifiers on Independent Testing Datasets.
(A) ROC curve illustrating the performance on the unbalanced independent testing dataset in full transcript mode. (B) Precision-recall curve illustrating the performance on the unbalanced independent testing dataset of full transcript mode. (C) ROC curve illustrating the performance on the unbalanced independent testing dataset of mature mRNA mode. (D) Precision-recall curve illustrating the performance on the unbalanced independent testing dataset of mature mRNA mode.
Performance of RNAMethPre for various stringency thresholds and comparison with SRAMP.
| Predictor | Confidence | Specificity | Sensitivity | MCC | ||
|---|---|---|---|---|---|---|
| Rnamethpre | Sramp | Rnamethpre | Sramp | |||
| Mature mRNA mode | High | 90.0% | 46.8% | 44.0% | 0.311 | 0.293 |
| Moderate | 85.2% | 56.0% | 54.2% | 0.305 | 0.294 | |
| Low | 80.0% | 63.8% | - | 0.298 | - | |
| Full transcript mode | High | 93.0% | 64.0% | 50.3% | 0.496 | 0.405 |
| Moderate | 88.0% | 74.0% | 64.5% | 0.465 | 0.385 | |
| Low | 83.0% | 81.0% | 72.8% | 0.435 | 0.414 | |
Comparison of RNAMethPre with the Existing Web Server SRAMP using Independent Unbalanced Datasets.
| Predictor | Mode | AUROC | AUPR |
|---|---|---|---|
| RNAMethPre | full transcript | 0.886 | 0.560 |
| SRAMP | full transcript | 0.891 | 0.523 |
| RNAMethPre | mature mRNA | 0.856 | 0.488 |
| SRAMP | mature mRNA | 0.797 | 0.312 |
Fig 4The user interface of the RNAMethPre web server.
Fig 5The genome browser to visualize the query results.