Literature DB >> 31055655

iN6-methylat (5-step): identifying DNA N6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule.

Nguyen Quoc Khanh Le1.   

Abstract

DNA N6-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N6-methyladenine in the rice genome, higher than in most of the other species. Therefore, the identification of them has become a very important area of study, especially in biological research. Despite the few computational tools employed to address this problem, there still requires a lot of efforts to improve their performance results. In this study, we treat DNA sequences by the continuous bags of nucleobases, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to identify them. Our model which uses this hybrid approach could identify DNA N6-methyladenine sites with achieved a jackknife test sensitivity of 86.48%, specificity of 89.09%, accuracy of 87.78%, and MCC of 0.756. Compared to the state-of-the-art predictor as well as the other methods, our proposed model is able to yield superior performance in all the metrics. Moreover, this study provides a basis for further research that can enrich a field of applying natural language-processing techniques in biological sequences.

Entities:  

Keywords:  Continuous bag of words; DNA N 6-methyladenine; DNA replication; FastText; Skip gram; Support vector machine

Mesh:

Substances:

Year:  2019        PMID: 31055655     DOI: 10.1007/s00438-019-01570-y

Source DB:  PubMed          Journal:  Mol Genet Genomics        ISSN: 1617-4623            Impact factor:   3.291


  74 in total

1.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition.

Authors:  Hao Lin; En-Ze Deng; Hui Ding; Wei Chen; Kuo-Chen Chou
Journal:  Nucleic Acids Res       Date:  2014-10-31       Impact factor: 16.971

2.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties.

Authors:  Zi Liu; Xuan Xiao; Dong-Jun Yu; Jianhua Jia; Wang-Ren Qiu; Kuo-Chen Chou
Journal:  Anal Biochem       Date:  2015-12-31       Impact factor: 3.365

3.  An optimization approach to predicting protein structural class from amino acid composition.

Authors:  C T Zhang; K C Chou
Journal:  Protein Sci       Date:  1992-03       Impact factor: 6.725

4.  Classifying the molecular functions of Rab GTPases in membrane trafficking using deep convolutional neural networks.

Authors:  Nguyen-Quoc-Khanh Le; Quang-Thai Ho; Yu-Yen Ou
Journal:  Anal Biochem       Date:  2018-06-13       Impact factor: 3.365

5.  The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase.

Authors:  I W Althaus; A J Gonzales; J J Chou; D L Romero; M R Deibel; K C Chou; F J Kezdy; L Resnick; M E Busso; A G So
Journal:  J Biol Chem       Date:  1993-07-15       Impact factor: 5.157

6.  Implications of Newly Identified Brain eQTL Genes and Their Interactors in Schizophrenia.

Authors:  Lei Cai; Tao Huang; Jingjing Su; Xinxin Zhang; Wenzhong Chen; Fuquan Zhang; Lin He; Kuo-Chen Chou
Journal:  Mol Ther Nucleic Acids       Date:  2018-07-11       Impact factor: 8.886

7.  Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX).

Authors:  Ehsaneddin Asgari; Alice C McHardy; Mohammad R K Mofrad
Journal:  Sci Rep       Date:  2019-03-05       Impact factor: 4.379

8.  Some remarks on protein attribute prediction and pseudo amino acid composition.

Authors:  Kuo-Chen Chou
Journal:  J Theor Biol       Date:  2010-12-17       Impact factor: 2.691

9.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

10.  PSNO: predicting cysteine S-nitrosylation sites by incorporating various sequence-derived features into the general form of Chou's PseAAC.

Authors:  Jian Zhang; Xiaowei Zhao; Pingping Sun; Zhiqiang Ma
Journal:  Int J Mol Sci       Date:  2014-06-25       Impact factor: 5.923

View more
  15 in total

1.  XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites.

Authors:  Kewei Liu; Wei Chen; Hao Lin
Journal:  Mol Genet Genomics       Date:  2019-08-07       Impact factor: 3.291

2.  i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites.

Authors:  Tian Xue; Shengli Zhang; Huijuan Qiao
Journal:  Interdiscip Sci       Date:  2021-04-08       Impact factor: 2.233

Review 3.  Some illuminating remarks on molecular genetics and genomics as well as drug development.

Authors:  Kuo-Chen Chou
Journal:  Mol Genet Genomics       Date:  2020-01-01       Impact factor: 3.291

4.  Machine learning classification models for fetal skeletal development performance prediction using maternal bone metabolic proteins in goats.

Authors:  Yong Liu; Cristian R Munteanu; Qiongxian Yan; Nieves Pedreira; Jinhe Kang; Shaoxun Tang; Chuanshe Zhou; Zhixiong He; Zhiliang Tan
Journal:  PeerJ       Date:  2019-10-18       Impact factor: 2.984

5.  Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix.

Authors:  Abel Chandra; Alok Sharma; Abdollah Dehzangi; Daichi Shigemizu; Tatsuhiko Tsunoda
Journal:  BMC Mol Cell Biol       Date:  2019-12-20

6.  Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation.

Authors:  Nguyen Quoc Khanh Le; Tuan-Tu Huynh
Journal:  Front Physiol       Date:  2019-12-10       Impact factor: 4.566

7.  Identification of research trends concerning application of stent implantation in the treatment of pancreatic diseases by quantitative and biclustering analysis: a bibliometric analysis.

Authors:  Xuan Zhu; Xing Niu; Tao Li; Chang Liu; Lijie Chen; Guang Tan
Journal:  PeerJ       Date:  2019-10-24       Impact factor: 2.984

8.  Analysis and forecasts for trends of COVID-19 in Pakistan using Bayesian models.

Authors:  Navid Feroze; Kamran Abbas; Farzana Noor; Amjad Ali
Journal:  PeerJ       Date:  2021-07-07       Impact factor: 2.984

9.  SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome.

Authors:  Shaherin Basith; Balachandran Manavalan; Tae Hwan Shin; Gwang Lee
Journal:  Mol Ther Nucleic Acids       Date:  2019-08-16       Impact factor: 8.886

10.  Classifying Promoters by Interpreting the Hidden Information of DNA Sequences via Deep Learning and Combination of Continuous FastText N-Grams.

Authors:  Nguyen Quoc Khanh Le; Edward Kien Yee Yapp; N Nagasundaram; Hui-Yuan Yeh
Journal:  Front Bioeng Biotechnol       Date:  2019-11-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.