Qiang Tang1, Juanjuan Kang2, Jiaqing Yuan1, Hua Tang1, Xianhai Li1, Hao Lin2, Jian Huang2, Wei Chen1,3. 1. Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China. 2. Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China. 3. Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China.
Abstract
MOTIVATION: DNA N4-methylcytosine (4mC) is a crucial epigenetic modification. However, the knowledge about its biological functions is limited. Effective and accurate identification of 4mC sites will be helpful to reveal its biological functions and mechanisms. Since experimental methods are cost and ineffective, a number of machine learning-based approaches have been proposed to detect 4mC sites. Although these methods yielded acceptable accuracy, there is still room for the improvement of the prediction performance and the stability of existing methods in practical applications. RESULTS: In this work, we first systematically assessed the existing methods based on an independent dataset. And then, we proposed DNA4mC-LIP, a linear integration method by combining existing predictors to identify 4mC sites in multiple species. The results obtained from independent dataset demonstrated that DNA4mC-LIP outperformed existing methods for identifying 4mC sites. To facilitate the scientific community, a web server for DNA4mC-LIP was developed. We anticipated that DNA4mC-LIP could serve as a powerful computational technique for identifying 4mC sites and facilitate the interpretation of 4mC mechanism. AVAILABILITY AND IMPLEMENTATION: http://i.uestc.edu.cn/DNA4mC-LIP/. CONTACT: hlin@uestc.edu.cn or hj@uestc.edu.cn or chenweiimu@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: DNA N4-methylcytosine (4mC) is a crucial epigenetic modification. However, the knowledge about its biological functions is limited. Effective and accurate identification of 4mC sites will be helpful to reveal its biological functions and mechanisms. Since experimental methods are cost and ineffective, a number of machine learning-based approaches have been proposed to detect 4mC sites. Although these methods yielded acceptable accuracy, there is still room for the improvement of the prediction performance and the stability of existing methods in practical applications. RESULTS: In this work, we first systematically assessed the existing methods based on an independent dataset. And then, we proposed DNA4mC-LIP, a linear integration method by combining existing predictors to identify 4mC sites in multiple species. The results obtained from independent dataset demonstrated that DNA4mC-LIP outperformed existing methods for identifying 4mC sites. To facilitate the scientific community, a web server for DNA4mC-LIP was developed. We anticipated that DNA4mC-LIP could serve as a powerful computational technique for identifying 4mC sites and facilitate the interpretation of 4mC mechanism. AVAILABILITY AND IMPLEMENTATION: http://i.uestc.edu.cn/DNA4mC-LIP/. CONTACT: hlin@uestc.edu.cn or hj@uestc.edu.cn or chenweiimu@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.