| Literature DB >> 28440291 |
Pengwei Xing1, Ran Su2, Fei Guo1, Leyi Wei1.
Abstract
N6-methyladenosine (m6A) refers to methylation of the adenosine nucleotide acid at the nitrogen-6 position. It plays an important role in a series of biological processes, such as splicing events, mRNA exporting, nascent mRNA synthesis, nuclear translocation and translation process. Numerous experiments have been done to successfully characterize m6A sites within sequences since high-resolution mapping of m6A sites was established. However, as the explosive growth of genomic sequences, using experimental methods to identify m6A sites are time-consuming and expensive. Thus, it is highly desirable to develop fast and accurate computational identification methods. In this study, we propose a sequence-based predictor called RAM-NPPS for identifying m6A sites within RNA sequences, in which we present a novel feature representation algorithm based on multi-interval nucleotide pair position specificity, and use support vector machine classifier to construct the prediction model. Comparison results show that our proposed method outperforms the state-of-the-art predictors on three benchmark datasets across the three species, indicating the effectiveness and robustness of our method. Moreover, an online webserver implementing the proposed predictor has been established at http://server.malab.cn/RAM-NPPS/. It is anticipated to be a useful prediction tool to assist biologists to reveal the mechanisms of m6A site functions.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28440291 PMCID: PMC5404266 DOI: 10.1038/srep46757
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overall framework of the proposed predictor.
Figure 2Schematic workflow of the proposed feature encoding scheme.
Results of the proposed features by varying the parameter ξ.
| ξ | Dimension | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| 0 | 50 | 77.35 | 77.66 | 77.51 | 0.5501 |
| 1 | 49 | 75.75 | 77.70 | 0.5544 | |
| 2 | 48 | 75.29 | 78.50 | 76.89 | 0.5382 |
| 3 | 47 | 76.82 | 76.82 | 76.82 | 0.5363 |
| 4 | 46 | 75.67 | 77.58 | 76.63 | 0.5326 |
| 77.58 | |||||
| 6 | 44 | 76.82 | 77.74 | 77.28 | 0.5455 |
Predictive results of different features.
| Features | Sn (%) | Sp (%) | Acc (%) |
|---|---|---|---|
| NPPS features (ξ = 5) | 78.12 | 77.58 | 77.85 |
| Joined NPPS features | |||
| Joined NPPS features using MRMD | 76.36 | 79.42 | 77.89 |
| Joined NPPS features using RFE | 67.18 | 74.50 | 70.84 |
| Joined NPPS features using FSDI | 67.69 | 75.38 | 71.54 |
Performance comparison of different classifiers.
| Classifiers | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Random Forest | 75.67 | 75.98 | 75.82 | 0.5165 |
| SVM |
Comparison of identifying m6A sites between different methods on Saccharomyces cerevisiae dataset.
| Predictors | Sn (%) | Sp (%) | Acc (%) | MCC | Optimized parameters |
|---|---|---|---|---|---|
| RAM-NPPS | C = 2048, γ = 0.0001220703125 | ||||
| M6A-HPCS | 77.35 | 67.41 | 72.38 | 0.45 | C = 8, γ = 0.0625 |
Comparison of identifying m6A sites between different methods on Arabidopsis thaliana dataset.
| Predictors | Sn (%) | Sp (%) | Acc (%) | MCC | Optimized parameters |
|---|---|---|---|---|---|
| RAM-NPPS | 87.31 | 91.62 | 89.47 | 0.79 | C = 32, γ = 0.125 |
| Chen’s method | 68.78 | 100.00 | 84.39 | 0.72 | C = 0.5, γ = 0.0078125 |