Literature DB >> 34211040

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter.

Bimal Kumar Sarkar1, Ashish Ranjan Sharma2, Manojit Bhattacharya3, Garima Sharma4, Sang-Soo Lee5, Chiranjib Chakraborty6.   

Abstract

We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word density. Further principal component analysis is used on word density distribution to analyze the dissimilarity between sequences. A dissimilarity matrix is thus formed and shows the appearance of cluster formation. This cluster formation is constructed based on the alignment-free sequence method. Furthermore, the clusters are used to build phylogenetic relations. The cluster algorithm is in good agreement with alignment-based algorithms. The present algorithm is simple and requires less time for computation than other currently available algorithms. We tested the algorithm using beta hemoglobin coding sequences (HBB) of 10 different species and 18 primate mitochondria genome (mtDNA) sequences.

Entities:  

Year:  2021        PMID: 34211040     DOI: 10.1038/s41598-021-93154-3

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


  25 in total

Review 1.  Alignment-free sequence comparison-a review.

Authors:  Susana Vinga; Jonas Almeida
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

2.  A 3D graphical representation of RNA secondary structures.

Authors:  B Liao; T-M Wang
Journal:  J Biomol Struct Dyn       Date:  2004-06

3.  A novel method for similarity analysis and protein sub-cellular localization prediction.

Authors:  Bo Liao; Benyou Liao; Xingming Sun; Qingguang Zeng
Journal:  Bioinformatics       Date:  2010-09-08       Impact factor: 6.937

4.  Alignment-free sequence comparison (II): theoretical power of comparison statistics.

Authors:  Lin Wan; Gesine Reinert; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2010-10-25       Impact factor: 1.479

5.  Alignment-free sequence comparison (I): statistics and power.

Authors:  Gesine Reinert; David Chew; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2009-12       Impact factor: 1.479

6.  Alignment-free detection of local similarity among viral and bacterial genomes.

Authors:  Mirjana Domazet-Lošo; Bernhard Haubold
Journal:  Bioinformatics       Date:  2011-04-06       Impact factor: 6.937

7.  New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

Authors:  Xuemei Liu; Lin Wan; Jing Li; Gesine Reinert; Michael S Waterman; Fengzhu Sun
Journal:  J Theor Biol       Date:  2011-06-25       Impact factor: 2.691

8.  A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors:  B E Blaisdell
Journal:  Proc Natl Acad Sci U S A       Date:  1986-07       Impact factor: 11.205

9.  Graphical Representation and Similarity Analysis of DNA Sequences Based on Trigonometric Functions.

Authors:  Guo-Sen Xie; Xiao-Bo Jin; Chunlei Yang; Jiexin Pu; Zhongxi Mo
Journal:  Acta Biotheor       Date:  2018-04-19       Impact factor: 1.774

10.  TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications.

Authors:  Jia-Feng Yu; Xiao Sun; Ji-Hua Wang
Journal:  J Theor Biol       Date:  2009-08-11       Impact factor: 2.691

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.