| Literature DB >> 31552096 |
Hao Lv1, Fu-Ying Dao1, Zheng-Xing Guan1, Dan Zhang1, Jiu-Xin Tan1, Yong Zhang1, Wei Chen2, Hao Lin1.
Abstract
DNA N6-methyladenine (6mA) is a dominant DNA modification form and involved in many biological functions. The accurate genome-wide identification of 6mA sites may increase understanding of its biological functions. Experimental methods for 6mA detection in eukaryotes genome are laborious and expensive. Therefore, it is necessary to develop computational methods to identify 6mA sites on a genomic scale, especially for plant genomes. Based on this consideration, the study aims to develop a machine learning-based method of predicting 6mA sites in the rice genome. We initially used mono-nucleotide binary encoding to formulate positive and negative samples. Subsequently, the machine learning algorithm named Random Forest was utilized to perform the classification for identifying 6mA sites. Our proposed method could produce an area under the receiver operating characteristic curve of 0.964 with an overall accuracy of 0.917, as indicated by the fivefold cross-validation test. Furthermore, an independent dataset was established to assess the generalization ability of our method. Finally, an area under the receiver operating characteristic curve of 0.981 was obtained, suggesting that the proposed method had good performance of predicting 6mA sites in the rice genome. For the convenience of retrieving 6mA sites, on the basis of the computational method, we built a freely accessible web server named iDNA6mA-Rice at http://lin-group.cn/server/iDNA6mA-Rice.Entities:
Keywords: N6-methyladenine; cross-validation; mono-nucleotide binary encoding; random forest; web-server
Year: 2019 PMID: 31552096 PMCID: PMC6746913 DOI: 10.3389/fgene.2019.00793
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Illustration of N6-methyladenine (6mA) modifications in DNA. The conversion of adenine to 6mA is mediated by methyl-transferases.
Figure 2A flowchart used in this study.
Details of the three motifs in positive samples.
| Motifs | Numbers | Proportions (%) |
|---|---|---|
| GAGG | 26,300 | 17.08 |
| AGG | 24,264 | 15.76 |
| AG | 22,206 | 14.42 |
Figure 3Nucleotide distribution preferences around 6mA and non-6mA sites. The upper half of the x-axis indicates the nucleotide distribution in 6mA site containing sequence, whereas the lower half of the x-axis indicates the nucleotide distribution in non-6mA site containing sequences.
Figure 4Performance evaluation based on three features and their combinations.
Predictive performances of KNFC, MNBE, and NV.
| Methods | MCC | AUC | |||
|---|---|---|---|---|---|
| KNFC (k = 2, 3, 4) | 70.3 | 66.3 | 68.3 | 0.366 | 0.744 |
| MNBE | 93.0 | 90.5 | 91.7 | 0.835 | 0.964 |
| NV | 58.1 | 50.6 | 54.3 | 0.087 | 0.566 |
| KNFC-MNBE | 91.8 | 90.1 | 90.9 | 0.819 | 0.958 |
| KNFC-NV | 70.4 | 66.5 | 68.4 | 0.369 | 0.747 |
| MNBE-NV | 92.8 | 90.3 | 91.6 | 0.832 | 0.963 |
| KNFC-MNBE-NV | 91.7 | 90.3 | 91.0 | 0.820 | 0.925 |
Figure 5Performance evaluation of different algorithms.
Predictive performances of five ratios on the testing and training datasets.
| Ratios | 5:5 | 6:4 | 7:3 | 8:2 | 9:1 | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| testing | training | testing | training | testing | training | testing | training | testing | training | |
| 91.4 | 91.8 | 92.0 | 91.9 | 92.2 | 92.4 | 92.4 | 92.5 | 92.7 | 92.7 | |
| 70.9 | 90.5 | 87.7 | 90.0 | 90.6 | 90.0 | 91.7 | 90.1 | 92.1 | 90.4 | |
| 81.1 | 91.1 | 89.9 | 90.9 | 91.4 | 91.2 | 92.1 | 91.3 | 92.2 | 91.8 | |
| 0.636 | 0.822 | 0.798 | 0.819 | 0.828 | 0.824 | 0.841 | 0.827 | 0.853 | 0.835 | |
Comparison of different methods for predicting 6mA sites in independent dataset.
| Method | MCC | auROC | |||
|---|---|---|---|---|---|
| Our method | 95.8 | 93.3 | 94.6 | 0.891 | 0.981 |
| iDNA6mA-PseKNC | 76.6 | 94.3 | 85.5 | 0.721 | – |
Comparison of different methods for predicting 6mA sites in the rice genome with jackknife test.
| Methods | MCC | auROC | |||
|---|---|---|---|---|---|
| This study | 83.86 | 83.41 | 83.63 | 0.67 | 0.910 |
| i6mA-Pred | 82.95 | 83.30 | 83.13 | 0.66 | 0.886 |
Figure 6A semi-screenshot for the web server page of the iDNA6mA-Rice web server at http://lin-group.cn/server/iDNA6mA-Rice.