| Literature DB >> 31581051 |
Wei Chen1, Pengmian Feng2, Xiaoming Song3, Hao Lv4, Hao Lin5.
Abstract
As an essential post-transcriptional modification, N7-methylguanosine (m7G) regulates nearly every step of the life cycle of mRNA. Accurate identification of the m7G site in the transcriptome will provide insights into its biological functions and mechanisms. Although the m7G-methylated RNA immunoprecipitation sequencing (MeRIP-seq) method has been proposed in this regard, it is still cost-ineffective for detecting the m7G site. Therefore, it is urgent to develop new methods to identify the m7G site. In this work, we developed the first computational predictor called iRNA-m7G to identify m7G sites in the human transcriptome. The feature fusion strategy was used to integrate both sequence- and structure-based features. In the jackknife test, iRNA-m7G obtained an accuracy of 89.88%. The superiority of iRNA-m7G for identifying m7G sites was also demonstrated by comparing with other methods. We hope that iRNA-m7G can become a useful tool to identify m7G sites. A user-friendly web server for iRNA-m7G is freely accessible at http://lin-group.cn/server/iRNA-m7G/.Entities:
Keywords: N(7)-methylguanosine; RNA secondary structure; feature fusion; nucleotide chemical property; pseudo nucleotide composition
Year: 2019 PMID: 31581051 PMCID: PMC6796804 DOI: 10.1016/j.omtn.2019.08.022
Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN: 2162-2531 Impact factor: 8.886
Figure 1Determining the Optimal Values for the Two Parameters w and λ of PseDNC
Predictive Results for Identifying m7G Sites by Using Different Features
| Features | Sn (%) | Sp (%) | Acc (%) | MCC | auROC |
|---|---|---|---|---|---|
| NPF | 88.12 | 90.15 | 89.14 | 0.78 | 0.899 |
| PseDNC | 81.92 | 87.99 | 84.95 | 0.70 | 0.841 |
| SSC | 73.11 | 78.71 | 75.91 | 0.52 | 0.776 |
| Fusion | 88.66 | 90.96 | 89.81 | 0.80 | 0.946 |
Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Mathew’s correlation coefficient; auROC, area under the receiver operating characteristic curve; NPF, nucleotide property and frequency; PseDNC, pseudo nucleotide composition; SSC, secondary structure component.
Figure 2Framework of Developing iRNA-m7G
For an RNA sequence, it is converted into a feature vector by fusing nucleotide property and frequency, pseudo nucleotide composition, and secondary structure component. The support vector machine was used to build the classification model.
Figure 3The Receiver Operating Characteristic Curves of the Models Based on Different Features Identifying m7G sites
SSC is the abbreviation for secondary structure component, NPF is for nucleotide property and frequency, PseDNC is for pseudo nucleotide composition, and fusion is the combination of the abovementioned three kinds of features. The auROC values were provided in brackets.
Performance Comparison of Different Classifiers for Identifying m7G Sites by the 10-Fold Cross-Validation Test
| Classifiers | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Naive Bayes | 72.47 | 87.85 | 80.16 | 0.61 |
| Random Forest | 83.27 | 89.88 | 86.57 | 0.73 |
| LogitBoost | 81.38 | 86.23 | 83.81 | 0.68 |
| BayesNet | 77.19 | 87.04 | 82.12 | 0.65 |
| SVM | 88.66 | 90.96 | 89.81 | 0.80 |
Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Mathew’s correlation coefficient; SVM, support vector machine.