Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots.

Literature DB >> 27410247

Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots.

Chuan Dong¹, Ya-Zhou Yuan¹, Fa-Zhan Zhang¹, Hong-Li Hua¹, Yuan-Nong Ye², Abraham Alemayehu Labena¹, Hao Lin¹, Wei Chen³, Feng-Biao Guo¹.

Abstract

Pseudo dinucleotide composition (PseDNC) and Z curve showed excellent performance in the classification issues of nucleotide sequences in bioinformatics. Inspired by the principle of Z curve theory, we improved PseDNC to give the phase-specific PseDNC (psPseDNC). In this study, we used the prediction of recombination spots as a case to illustrate the capability of psPseDNC and also PseDNC fused with Z curve theory based on a novel machine learning method named large margin distribution machine (LDM). We verified that combining the two widely used approaches could generate better performance compared to only using PseDNC with a support vector machine based (SVM-based) model. The best Mathew's correlation coefficient (MCC) achieved by our LDM-based model was 0.7037 through the rigorous jackknife test and improved by ∼6.6%, ∼3.2%, and ∼2.4% compared with three previous studies. Similarly, the accuracy was improved by 3.2% compared with our previous iRSpot-PseDNC web server through an independent data test. These results demonstrate that the joint use of PseDNC and Z curve enhances performance and can extract more information from a biological sequence. To facilitate research in this area, we constructed a user-friendly web server for predicting hot/cold spots, HcsPredictor, which can be freely accessed from . In summary, we provided a united algorithm by integrating Z curve with PseDNC. We hope this united algorithm could be extended to other classification issues in DNA elements.

Entities: Chemical

Mesh：

Substances：
Nucleotides
DNA

Year: 2016 PMID： 27410247 DOI： 10.1039/c6mb00374e

Source DB: PubMed Journal: Mol Biosyst ISSN： 1742-2051

Keyword Cloud
Cited

4 in total

1. UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

Authors: Pu-Feng Du; Wei Zhao; Yang-Yang Miao; Le-Yi Wei; Likun Wang
Journal: Int J Mol Sci Date: 2017-11-14 Impact factor: 5.923

2. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC.

Authors: Hui Yang; Wang-Ren Qiu; Guoqing Liu; Feng-Biao Guo; Wei Chen; Kuo-Chen Chou; Hao Lin
Journal: Int J Biol Sci Date: 2018-05-22 Impact factor: 6.580

3. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.

Authors: Wen Zhang; Xiang Yue; Guifeng Tang; Wenjian Wu; Feng Huang; Xining Zhang
Journal: PLoS Comput Biol Date: 2018-12-11 Impact factor: 4.475

4. Accurate prediction of human essential genes using only nucleotide composition and association information.

Authors: Feng-Biao Guo; Chuan Dong; Hong-Li Hua; Shuo Liu; Hao Luo; Hong-Wan Zhang; Yan-Ting Jin; Kai-Yue Zhang
Journal: Bioinformatics Date: 2017-06-15 Impact factor: 6.937

4 in total