Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter.

Literature DB >> 34211040

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter.

Bimal Kumar Sarkar¹, Ashish Ranjan Sharma², Manojit Bhattacharya³, Garima Sharma⁴, Sang-Soo Lee⁵, Chiranjib Chakraborty⁶.

Abstract

We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word density. Further principal component analysis is used on word density distribution to analyze the dissimilarity between sequences. A dissimilarity matrix is thus formed and shows the appearance of cluster formation. This cluster formation is constructed based on the alignment-free sequence method. Furthermore, the clusters are used to build phylogenetic relations. The cluster algorithm is in good agreement with alignment-based algorithms. The present algorithm is simple and requires less time for computation than other currently available algorithms. We tested the algorithm using beta hemoglobin coding sequences (HBB) of 10 different species and 18 primate mitochondria genome (mtDNA) sequences.

Entities: CellLine Chemical Disease Gene Species

Year: 2021 PMID： 34211040 DOI： 10.1038/s41598-021-93154-3

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

25 in total

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter.

Review 1. Alignment-free sequence comparison-a review.

2. A 3D graphical representation of RNA secondary structures.

3. A novel method for similarity analysis and protein sub-cellular localization prediction.

4. Alignment-free sequence comparison (II): theoretical power of comparison statistics.

5. Alignment-free sequence comparison (I): statistics and power.

6. Alignment-free detection of local similarity among viral and bacterial genomes.

7. New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

8. A measure of the similarity of sets of sequences not requiring sequence alignment.

9. Graphical Representation and Similarity Analysis of DNA Sequences Based on Trigonometric Functions.

10. TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications.