| Literature DB >> 31805335 |
Md Mehedi Hasan1, Balachandran Manavalan2, Mst Shamima Khatun3, Hiroyuki Kurata4.
Abstract
One of the most important epigenetic modifications is N4-methylcytosine, which regulates many biological processes including DNA replication and chromosome stability. Identification of N4-methylcytosine sites is pivotal to understand specific biological functions. Herein, we developed the first bioinformatics tool called i4mC-ROSE for identifying N4-methylcytosine sites in the genomes of Fragaria vesca and Rosa chinensis in the Rosaceae, which utilizes a random forest classifier with six encoding methods that cover various aspects of DNA sequence information. The i4mC-ROSE predictor achieves area under the curve scores of 0.883 and 0.889 for the two genomes during cross-validation. Moreover, the i4mC-ROSE outperforms other classifiers tested in this study when objectively evaluated on the independent datasets. The proposed i4mC-ROSE tool can serve users' demand for the prediction of 4mC sites in the Rosaceae genome. The i4mC-ROSE predictor and utilized datasets are publicly accessible at http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/.Entities:
Keywords: DNA methylation; Linear regression; Machine learning; N4-methylcytosine site; Sequence encoding
Mesh:
Substances:
Year: 2019 PMID: 31805335 DOI: 10.1016/j.ijbiomac.2019.12.009
Source DB: PubMed Journal: Int J Biol Macromol ISSN: 0141-8130 Impact factor: 6.953