Literature DB >> 24911780

A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.

Changchuan Yin1, Ying Chen2, Stephen S-T Yau3.   

Abstract

Multiple sequence alignment (MSA) is a prominent method for classification of DNA sequences, yet it is hampered with inherent limitations in computational complexity. Alignment-free methods have been developed over past decade for more efficient comparison and classification of DNA sequences than MSA. However, most alignment-free methods may lose structural and functional information of DNA sequences because they are based on feature extractions. Therefore, they may not fully reflect the actual differences among DNA sequences. Alignment-free methods with information conservation are needed for more accurate comparison and classification of DNA sequences. We propose a new alignment-free similarity measure of DNA sequences using the Discrete Fourier Transform (DFT). In this method, we map DNA sequences into four binary indicator sequences and apply DFT to the indicator sequences to transform them into frequency domain. The Euclidean distance of full DFT power spectra of the DNA sequences is used as similarity distance metric. To compare the DFT power spectra of DNA sequences with different lengths, we propose an even scaling method to extend shorter DFT power spectra to equal the longest length of the sequences compared. After the DFT power spectra are evenly scaled, the DNA sequences are compared in the same DFT frequency space dimensionality. We assess the accuracy of the similarity metric in hierarchical clustering using simulated DNA and virus sequences. The results demonstrate that the DFT based method is an effective and accurate measure of DNA sequence similarity.
Copyright © 2014 Elsevier Ltd. All rights reserved.

Keywords:  Even scaling; Genome; Phylogenetic trees; Similarity distance

Mesh:

Substances:

Year:  2014        PMID: 24911780     DOI: 10.1016/j.jtbi.2014.05.043

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  16 in total

1.  Periodic power spectrum with applications in detection of latent periodicities in DNA sequences.

Authors:  Changchuan Yin; Jiasong Wang
Journal:  J Math Biol       Date:  2016-03-04       Impact factor: 2.259

2.  MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors.

Authors:  Robson P Bonidia; Douglas S Domingues; Danilo S Sanches; André C P L F de Carvalho
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

3.  An efficient numerical representation of genome sequence: natural vector with covariance component.

Authors:  Nan Sun; Xin Zhao; Stephen S-T Yau
Journal:  PeerJ       Date:  2022-06-16       Impact factor: 3.061

4.  Computational analysis of the SARS-CoV-2 and other viruses based on the Kolmogorov's complexity and Shannon's information theories.

Authors:  J A Tenreiro Machado; João M Rocha-Neves; José P Andrade
Journal:  Nonlinear Dyn       Date:  2020-07-04       Impact factor: 5.022

5.  Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Authors:  Marika Kaden; Katrin Sophie Bohnsack; Mirko Weber; Mateusz Kudła; Kaja Gutowska; Jacek Blazewicz; Thomas Villmann
Journal:  Neural Comput Appl       Date:  2021-04-27       Impact factor: 5.606

6.  A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks.

Authors:  Jie Zhou; Pianyu Zhong; Tinghui Zhang
Journal:  Evol Bioinform Online       Date:  2016-10-06       Impact factor: 1.625

7.  A novel fast vector method for genetic sequence comparison.

Authors:  Yongkun Li; Lily He; Rong Lucy He; Stephen S-T Yau
Journal:  Sci Rep       Date:  2017-09-22       Impact factor: 4.379

8.  Analysis of the Hosts and Transmission Paths of SARS-CoV-2 in the COVID-19 Outbreak.

Authors:  Rui Dong; Shaojun Pei; Changchuan Yin; Rong Lucy He; Stephen S-T Yau
Journal:  Genes (Basel)       Date:  2020-06-09       Impact factor: 4.096

9.  A coevolution analysis for identifying protein-protein interactions by Fourier transform.

Authors:  Changchuan Yin; Stephen S-T Yau
Journal:  PLoS One       Date:  2017-04-21       Impact factor: 3.240

10.  Measuring Similarity among Protein Sequences Using a New Descriptor.

Authors:  Mervat M Abo-Elkhier; Marwa A Abd Elwahaab; Moheb I Abo El Maaty
Journal:  Biomed Res Int       Date:  2019-11-22       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.