| Literature DB >> 34352373 |
Hasan Zulfiqar1, Zi-Jie Sun1, Qin-Lai Huang1, Shi-Shi Yuan1, Hao Lv1, Fu-Ying Dao1, Hao Lin2, Yan-Wen Li3.
Abstract
N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.Entities:
Keywords: Convolutional neural network; Feature extraction; Modification; N4-methylcytosine; Word embedding
Mesh:
Substances:
Year: 2021 PMID: 34352373 DOI: 10.1016/j.ymeth.2021.07.011
Source DB: PubMed Journal: Methods ISSN: 1046-2023 Impact factor: 3.608