| Literature DB >> 28137599 |
Yushuang Li1, Yanfen Lv2, Xiaonan Li3, Wenli Xiao4, Chun Li5.
Abstract
Four new inter-nucleotide distance sequences for a DNA sequence are defined. They are different from ones presented by Afreixo et al., and overcome the irreversible defect of the global inter-nucleotide distance sequence proposed by Nair and Mahalakshmi. Five basic statistical quantities are extracted from (ordered) precise inter-nucleotide distance sequences to construct a 20 dimensional feature vector. This simple mathematical descriptor of DNA sequence plays crucial roles in sequence comparison and essential gene identification. Euclidean distance between feature vectors is utilized to compare similarities among whole mitochondrial genomes of 18 eutherian mammals and 23 sequences of 16S ribosomal RNA, respectively. Derived phylogenetic trees are quite agreement with a few popular studies. Furthermore, using feature vector as input a support vector machine (SVM)-based method are developed to identify essential genes and non-essential genes of 5 bacteria. Higher AUC values (the minimum is 0.7971, the highest reaches 0.8751 and the average is 0.8174) than some well-known results confirm the performance of the method.Entities:
Keywords: Essential gene identification; Feature vector; Inter-nucleotide distance sequence; Sequence comparison; Statistical quantity; Support vector machine
Mesh:
Substances:
Year: 2017 PMID: 28137599 DOI: 10.1016/j.jtbi.2017.01.031
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691