Literature DB >> 27410247

Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots.

Chuan Dong1, Ya-Zhou Yuan1, Fa-Zhan Zhang1, Hong-Li Hua1, Yuan-Nong Ye2, Abraham Alemayehu Labena1, Hao Lin1, Wei Chen3, Feng-Biao Guo1.   

Abstract

Pseudo dinucleotide composition (PseDNC) and Z curve showed excellent performance in the classification issues of nucleotide sequences in bioinformatics. Inspired by the principle of Z curve theory, we improved PseDNC to give the phase-specific PseDNC (psPseDNC). In this study, we used the prediction of recombination spots as a case to illustrate the capability of psPseDNC and also PseDNC fused with Z curve theory based on a novel machine learning method named large margin distribution machine (LDM). We verified that combining the two widely used approaches could generate better performance compared to only using PseDNC with a support vector machine based (SVM-based) model. The best Mathew's correlation coefficient (MCC) achieved by our LDM-based model was 0.7037 through the rigorous jackknife test and improved by ∼6.6%, ∼3.2%, and ∼2.4% compared with three previous studies. Similarly, the accuracy was improved by 3.2% compared with our previous iRSpot-PseDNC web server through an independent data test. These results demonstrate that the joint use of PseDNC and Z curve enhances performance and can extract more information from a biological sequence. To facilitate research in this area, we constructed a user-friendly web server for predicting hot/cold spots, HcsPredictor, which can be freely accessed from . In summary, we provided a united algorithm by integrating Z curve with PseDNC. We hope this united algorithm could be extended to other classification issues in DNA elements.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27410247     DOI: 10.1039/c6mb00374e

Source DB:  PubMed          Journal:  Mol Biosyst        ISSN: 1742-2051


  4 in total

1.  UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

Authors:  Pu-Feng Du; Wei Zhao; Yang-Yang Miao; Le-Yi Wei; Likun Wang
Journal:  Int J Mol Sci       Date:  2017-11-14       Impact factor: 5.923

2.  iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC.

Authors:  Hui Yang; Wang-Ren Qiu; Guoqing Liu; Feng-Biao Guo; Wei Chen; Kuo-Chen Chou; Hao Lin
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

3.  SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.

Authors:  Wen Zhang; Xiang Yue; Guifeng Tang; Wenjian Wu; Feng Huang; Xining Zhang
Journal:  PLoS Comput Biol       Date:  2018-12-11       Impact factor: 4.475

4.  Accurate prediction of human essential genes using only nucleotide composition and association information.

Authors:  Feng-Biao Guo; Chuan Dong; Hong-Li Hua; Shuo Liu; Hao Luo; Hong-Wan Zhang; Yan-Ting Jin; Kai-Yue Zhang
Journal:  Bioinformatics       Date:  2017-06-15       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.