| Literature DB >> 29867860 |
Jidong Zhang1, Pengmian Feng2, Hao Lin3, Wei Chen3,4.
Abstract
N6-methyladenosine (m6A) plays important roles in a branch of biological and physiological processes. Accurate identification of m6A sites is especially helpful for understanding their biological functions. Since the wet-lab techniques are still expensive and time-consuming, it's urgent to develop computational methods to identify m6A sites from primary RNA sequences. Although there are some computational methods for identifying m6A sites, no methods whatsoever are available for detecting m6A sites in microbial genomes. In this study, we developed a computational method for identifying m6A sites in Escherichia coli genome. The accuracies obtained by the proposed method are >90% in both 10-fold cross-validation test and independent dataset test, indicating that the proposed method holds the high potential to become a useful tool for the identification of m6A sites in microbial genomes.Entities:
Keywords: N6-methyladenosine; machine learning method; microbial genome; nucleotide physicochemical properties; pseudo nucleotide composition
Year: 2018 PMID: 29867860 PMCID: PMC5960707 DOI: 10.3389/fmicb.2018.00955
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
The 10-fold cross validation predictive results by using different negative datasets for identifying m6A sites in E. coli.
| Negative set 1 | 100.00 | 98.59 | 99.29 | 0.98 |
| Negative set 2 | 100.00 | 98.78 | 99.39 | 0.98 |
| Negative set 3 | 100.00 | 98.44 | 99.22 | 0.98 |
| Negative set 4 | 100.00 | 98.88 | 99.44 | 0.98 |
| Negative set 5 | 100.00 | 98.44 | 99.22 | 0.98 |
| Negative set 6 | 100.00 | 98.49 | 99.25 | 0.98 |
| Negative set 7 | 100.00 | 98.54 | 99.27 | 0.98 |
| Negative set 8 | 100.00 | 98.69 | 99.34 | 0.98 |
| Negative set 9 | 100.00 | 98.49 | 99.25 | 0.98 |
| Negative set 10 | 100.00 | 98.25 | 99.12 | 0.97 |
| Average | 100.00 | 98.56 | 99.28 | 0.98 |
Figure 1The ROC curves of 10-fold cross validation test for identifying m6A sites in E. coli based on different negative datasets. The vertical coordinate is the true positive rate (Sn) while horizontal coordinate is the false positive rate (1-Sp).
Comparison of different parameters for identifying m6A sites in E. coli.
| PseKNC | 65.74 | 60.29 | 63.02 | 0.26 |
| Secondary structure | 67.06 | 60.73 | 63.89 | 0.28 |
| Our method | 100.00 | 98.56 | 99.28 | 0.98 |