| Literature DB >> 32076430 |
Qianfei Huang1, Jun Zhang2, Leyi Wei1, Fei Guo1, Quan Zou3.
Abstract
MOTIVATION: The biological function of N 6-methyladenine DNA (6mA) in plants is largely unknown. Rice is one of the most important crops worldwide and is a model species for molecular and genetic studies. There are few methods for 6mA site recognition in the rice genome, and an effective computational method is needed.Entities:
Keywords: 6mA; DNA; fusion; model; rice
Year: 2020 PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
All datasets.
| Datasets | Positive | Negative | Total | Species |
|---|---|---|---|---|
| Dataset 1 | 880 | 880 | 1,760 | Rice |
| Dataset 2 | 154,000 | 154,000 | 308,000 | Rice |
Figure 1Flowchart showing the construction of this model. The feature selection is selectfrommodel.
Figure 2Schematic showing the process of extracting features from the transition probability matrix of the DNA sequence. The sequence “AATACATGGGGTTATGTGCCACCGGTCATAATATCTAGGGT” is used as an example to explain the process.
Performance of different feature descriptors and classifiers.
| Feature descriptors | SVM (Acc%) | XGboost (Acc%) | GBDT (Acc %) | Vote (Acc %) |
|---|---|---|---|---|
| EIIP | 63.9 | 83.9 | 84.0 | 83.9 |
| ANF | 54.2 | 60.7 | 61.1 | 61.7 |
| BINARY | 82.8 | 84.4 | 83.6 | 84.7 |
| DNC | 58.4 | 61.0 | 59.7 | 61.2 |
| NCP | 82.8 | 83.3 | 83.9 | 84.3 |
| PseEIIP | 53.9 | 66.5 | 65.3 | 65.9 |
| TNC | 56.8 | 66.5 | 65.3 | 66.0 |
| KMER | 53.0 | 64.2 | 64.8 | 65.1 |
| ENAC | 73.5 | 79.4 | 78.8 | 79.0 |
| NAC | 56.3 | 55.5 | 54.6 | 55.5 |
| CKSNAP | 57.2 | 65.3 | 65.3 | 65.8 |
| RCKMER | 55.0 | 62.9 | 62.3 | 62.3 |
| MAKOV | 83.75 | 85.17 | 84.7 | 85.0 |
Figure 3(A) Tenfold cross-validation performance of different classifiers based on dataset 1. (B) Independent test performance of different classifiers based on dataset 1 and dataset 2.
Cross-validation performance of different methods based on dataset 1.
| Method | Sn | Sp | Acc | Mcc |
|---|---|---|---|---|
| Best sequence—no fs | 81.81 | 88.30 | 85.1 | 0.702 |
| Original sequence—no fs | 84.20 | 84.77 | 84.49 | 0.690 |
| Best sequence—fs | 84.89 | 89.66 | 87.27 | 0.746 |
| Oraigin sequence—fs | 85.0 | 89.20 | 87.10 | 0.742 |
Figure 4Feature distribution of different feature methods based on dataset 1.
Figure 5Independent test performance of different feature selection methods based on dataset 1 and dataset 2.
Cross-validation performance of different methods based on dataset 1.
| Method | Sn | Sp | Acc | Mcc |
|---|---|---|---|---|
| Our method | 84.89 | 89.66 | 87.27 | 0.746 |
| MM-6mAPred | 84.31 | 85.22 | 84.77 | 0.695 |
| i6mA-Pred | 82.95 | 83.30 | 83.13 | 0.662 |
Independent test performance of different methods based on dataset 1 and dataset 2.
| Method | Sn | Sp | Acc | Mcc |
|---|---|---|---|---|
| Our method | 95.97 | 75.33 | 85.65 | 0.73 |
| MM-6mAPred | 95.81 | 70.30 | 83.06 | 0.68 |
| I6mA-Pred | 94.24 | 66.59 | 80.42 | 0.63 |
Figure 6Receiver operating characteristic (ROC) curves of 6ma-ricePred, MM-6mAPred, and i6mA-Pred.