| Literature DB >> 35955447 |
Zeeshan Abbas1,2, Hilal Tayara3, Kil To Chong1,4.
Abstract
N6-methyladenine (6mA) has been recognized as a key epigenetic alteration that affects a variety of biological activities. Precise prediction of 6mA modification sites is essential for understanding the logical consistency of biological activity. There are various experimental methods for identifying 6mA modification sites, but in silico prediction has emerged as a potential option due to the very high cost and labor-intensive nature of experimental procedures. Taking this into consideration, developing an efficient and accurate model for identifying N6-methyladenine is one of the top objectives in the field of bioinformatics. Therefore, we have created an in silico model for the classification of 6mA modifications in plant genomes. ENet-6mA uses three encoding methods, including one-hot, nucleotide chemical properties (NCP), and electron-ion interaction potential (EIIP), which are concatenated and fed as input to ElasticNet for feature reduction, and then the optimized features are given directly to the neural network to get classified. We used a benchmark dataset of rice for five-fold cross-validation testing and three other datasets from plant genomes for cross-species testing purposes. The results show that the model can predict the N6-methyladenine sites very well, even cross-species. Additionally, we separated the datasets into different ratios and calculated the performance using the area under the precision-recall curve (AUPRC), achieving 0.81, 0.79, and 0.50 with 1:10 (positive:negative) samples for F. vesca, R. chinensis, and A. thaliana, respectively.Entities:
Keywords: DNA methylation; ElasticNet; bioinformatics; epigenome engineering; epigenomics; neural networks
Mesh:
Year: 2022 PMID: 35955447 PMCID: PMC9369089 DOI: 10.3390/ijms23158314
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Performance comparison of ENet-6mA with other state-of-the-art methods using five-fold cross-validation testing.
| Methods | Acc | Sn | Sp | MCC | AUC |
|---|---|---|---|---|---|
| SNNRice6mA | 0.9204 | 0.9433 | 0.8975 | 0.84 | 0.97 |
| DNA6mA-MINT | 0.9258 | 0.9012 | 0.9306 | 0.85 | 0.97 |
| SpineNet-6mA | 0.9431 | 0.9571 | 0.9292 | 0.88 | 0.98 |
| Proposed |
|
|
|
|
|
Cross-species performance comparison between ENet-6mA and other state-of-the-art-models on R. chinensis, F. vesca, and A. thaliana datasets.
| Methods |
|
|
| |||
|---|---|---|---|---|---|---|
| Accuracy (%) | MCC | Accuracy (%) | MCC | Accuracy (%) | MCC | |
| SNNRice6mA-large | 81.13 | 0.62 | 87.84 | 0.75 | 77.6 | 0.57 |
| DNA6mA-MINT | 82.43 | 0.64 | 88.11 | 0.76 | 76.21 | 0.56 |
| SpineNet-6mA | 85.20 | 0.70 | 90.30 | 0.80 | 76.15 | 0.56 |
| Proposed | 87.75 | 0.75 | 93.20 | 0.86 | 79.14 | 0.60 |
Figure 1Comparison of the AUPRCs generated with and without ElasticNet on imbalanced datasets of F. vesca, R. chinensis, and A. thaliana. (a) F. vesca (without ElasticNet); (b) F. vesca (with ElasticNet); (c) R. chinensis (without ElasticNet); (d) R. chinensis (with ElasticNet); (e) A. thaliana (without ElasticNet); (f) A. thaliana (with ElasticNet).
Figure 2Framework of the proposed model, ENet-6mA.
Figure 3CNN network architecture.
Benchmark dataset; Rice-Lv; and cross-species testing datasets, Fragaria vesca, Rosa chinensis, and Arabidopsis thaliana, used in this study.
| Dataset | Pos Samples | Neg Samples | Total | Family |
|---|---|---|---|---|
|
| 154,000 | 154,000 | 308,000 | rice |
|
| 1966 | 1966 | 3932 | rosaceae |
|
| 813 | 813 | 1626 | rosaceae |
|
| 31,873 | 31,873 | 63,746 | brassicaceae |