Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.

Literature DB >> 30822398

iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.

Nguyen Quoc Khanh Le¹, Edward Kien Yee Yapp², Quang-Thai Ho³, N Nagasundaram⁴, Yu-Yen Ou³, Hui-Yuan Yeh⁵.

Abstract

An enhancer is a short (50-1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/.

Entities: Disease Species

Keywords: Continuous bag of words; Regulatory transcription factor; Sequence analysis; Skip gram; Support vector machine; Two-layer classification

Year: 2019 PMID： 30822398 DOI： 10.1016/j.ab.2019.02.017

Source DB: PubMed Journal: Anal Biochem ISSN： 0003-2697 Impact factor: 3.365

Keyword Cloud
Cited

35 in total

1. XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites.

Authors: Kewei Liu; Wei Chen; Hao Lin
Journal: Mol Genet Genomics Date: 2019-08-07 Impact factor: 3.291

2. iN6-methylat (5-step): identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule.

Authors: Nguyen Quoc Khanh Le
Journal: Mol Genet Genomics Date: 2019-05-04 Impact factor: 3.291

Review 3. Some illuminating remarks on molecular genetics and genomics as well as drug development.

Authors: Kuo-Chen Chou
Journal: Mol Genet Genomics Date: 2020-01-01 Impact factor: 3.291

4. BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.

Authors: Muhammad Nabeel Asim; Muhammad Ali Ibrahim; Christoph Zehe; Johan Trygg; Andreas Dengel; Sheraz Ahmed
Journal: Interdiscip Sci Date: 2022-08-10 Impact factor: 3.492

5. iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species.

Authors: Pengyu Zhang; Hongming Zhang; Hao Wu
Journal: Nucleic Acids Res Date: 2022-10-14 Impact factor: 19.160

6. ADH-PPI: An attention-based deep hybrid model for protein-protein interaction prediction.

Authors: Muhammad Nabeel Asim; Muhammad Ali Ibrahim; Muhammad Imran Malik; Andreas Dengel; Sheraz Ahmed
Journal: iScience Date: 2022-09-21

7. BHCMDA: A New Biased Heat Conduction Based Method for Potential MiRNA-Disease Association Prediction.

Authors: Xianyou Zhu; Xuzai Wang; Haochen Zhao; Tingrui Pei; Linai Kuang; Lei Wang
Journal: Front Genet Date: 2020-04-28 Impact factor: 4.599

8. ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation.

Authors: Hai-Cheng Yi; Zhu-Hong You; Xi Zhou; Li Cheng; Xiao Li; Tong-Hai Jiang; Zhan-Heng Chen
Journal: Mol Ther Nucleic Acids Date: 2019-05-10 Impact factor: 8.886

9. EnContact: predicting enhancer-enhancer contacts using sequence-based deep learning model.

Authors: Mingxin Gan; Wenran Li; Rui Jiang
Journal: PeerJ Date: 2019-09-13 Impact factor: 2.984

Review 10. Representation learning applications in biological sequence analysis.

Authors: Hitoshi Iuchi; Taro Matsutani; Keisuke Yamada; Natsuki Iwano; Shunsuke Sumi; Shion Hosoda; Shitao Zhao; Tsukasa Fukunaga; Michiaki Hamada
Journal: Comput Struct Biotechnol J Date: 2021-05-23 Impact factor: 7.271