Literature DB >> 30239627

Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.

Leyi Wei1, Shasha Luan1, Luis Augusto Eijy Nagai2, Ran Su3, Quan Zou1,4.   

Abstract

MOTIVATION: As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction-modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites.
RESULTS: In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites.
AVAILABILITY AND IMPLEMENTATION: The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30239627     DOI: 10.1093/bioinformatics/bty824

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  45 in total

1.  IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy.

Authors:  Hao Wang; Qilemuge Xi; Pengfei Liang; Lei Zheng; Yan Hong; Yongchun Zuo
Journal:  Amino Acids       Date:  2021-01-23       Impact factor: 3.520

2.  i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites.

Authors:  Tian Xue; Shengli Zhang; Huijuan Qiao
Journal:  Interdiscip Sci       Date:  2021-04-08       Impact factor: 2.233

3.  MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters.

Authors:  Meng Zhang; Fuyi Li; Tatiana T Marquez-Lago; André Leier; Cunshuo Fan; Chee Keong Kwoh; Kuo-Chen Chou; Jiangning Song; Cangzhi Jia
Journal:  Bioinformatics       Date:  2019-09-01       Impact factor: 6.937

4.  Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning.

Authors:  Haodong Xu; Peilin Jia; Zhongming Zhao
Journal:  Brief Bioinform       Date:  2021-05-20       Impact factor: 11.622

5.  ATGPred-FL: sequence-based prediction of autophagy proteins with feature representation learning.

Authors:  Shihu Jiao; Zheng Chen; Lichao Zhang; Xun Zhou; Lei Shi
Journal:  Amino Acids       Date:  2022-03-14       Impact factor: 3.520

Review 6.  Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification.

Authors:  Xiao Liang; Fuyi Li; Jinxiang Chen; Junlong Li; Hao Wu; Shuqin Li; Jiangning Song; Quanzhong Liu
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

7.  iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool.

Authors:  Xiao Yang; Xiucai Ye; Xuehong Li; Lesong Wei
Journal:  Front Genet       Date:  2021-03-31       Impact factor: 4.599

8.  A sequence-based multiple kernel model for identifying DNA-binding proteins.

Authors:  Yuqing Qian; Limin Jiang; Yijie Ding; Jijun Tang; Fei Guo
Journal:  BMC Bioinformatics       Date:  2021-05-31       Impact factor: 3.169

9.  4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.

Authors:  Rao Zeng; Song Cheng; Minghong Liao
Journal:  Front Cell Dev Biol       Date:  2021-05-10

10.  i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning.

Authors:  Yanjuan Li; Zhengnan Zhao; Zhixia Teng
Journal:  Biomed Res Int       Date:  2021-05-29       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.