Literature DB >> 32438416

CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table.

Jianan Wang, Su Chen, Lili Dong, Guohua Wang.   

Abstract

MOTIVATION: Calculating the frequency of occurrence of each substring of length k in DNA sequences is a common task in many bioinformatics applications, including genome assembly, error correction, and sequence alignment. Although the problem is simple, efficient counting of datasets with high sequencing depth or large genome size is a challenge.
RESULTS: We propose a robust and efficient method, CHTKC, to solve the k-mer counting problem with a lock-free hash table that uses linked lists to resolve collisions. We also design new mechanisms to optimize memory usage and handle situations where memory is not enough to accommodate all k-mers. CHTKC has been thoroughly tested on seven datasets under multiple memory usage scenarios and compared with Jellyfish2 and KMC3. Our work shows that using a hash-table-based method to effectively solve the k-mer counting problem remains a feasible solution.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Keywords:  DNA-seq; algorithm; assembly; hash table; k-mer counting; sequence analysis

Year:  2021        PMID: 32438416     DOI: 10.1093/bib/bbaa063

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  8 in total

1.  TahcoRoll: fast genomic signature profiling via thinned automaton and rolling hash.

Authors:  Chelsea J-T Ju; Jyun-Yu Jiang; Ruirui Li; Zeyu Li; Wei Wang
Journal:  Med Rev (Berl)       Date:  2022-02-14

2.  iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool.

Authors:  Xiao Yang; Xiucai Ye; Xuehong Li; Lesong Wei
Journal:  Front Genet       Date:  2021-03-31       Impact factor: 4.599

3.  A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD.

Authors:  Zhiyu Tao; Yanjuan Li; Zhixia Teng; Yuming Zhao
Journal:  Comput Math Methods Med       Date:  2020-10-19       Impact factor: 2.238

Review 4.  Application of Sparse Representation in Bioinformatics.

Authors:  Shuguang Han; Ning Wang; Yuxin Guo; Furong Tang; Lei Xu; Ying Ju; Lei Shi
Journal:  Front Genet       Date:  2021-12-15       Impact factor: 4.599

5.  Accurate identification of RNA D modification using multiple features.

Authors:  Lijun Dou; Wenyang Zhou; Lichao Zhang; Lei Xu; Ke Han
Journal:  RNA Biol       Date:  2021-03-17       Impact factor: 4.652

6.  4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.

Authors:  Rao Zeng; Song Cheng; Minghong Liao
Journal:  Front Cell Dev Biol       Date:  2021-05-10

7.  Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions.

Authors:  Yixiao Zhai; Yu Chen; Zhixia Teng; Yuming Zhao
Journal:  Front Cell Dev Biol       Date:  2020-10-29

8.  SLDMS: A Tool for Calculating the Overlapping Regions of Sequences.

Authors:  Yu Chen; DongLiang You; TianJiao Zhang; GuoHua Wang
Journal:  Front Plant Sci       Date:  2022-01-03       Impact factor: 5.753

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.