Literature DB >> 28991750

Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

Tony Pan, Patrick Flick, Chirag Jain, Yongchao Liu, Srinivas Aluru.   

Abstract

Counting and indexing fixed length substrings, or $k$k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases per 3-day experiment from a single sequencer. We present Kmerind, a high performance parallel $k$k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's $k$k-mer counter performs similarly or better than the best existing $k$k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts $k$k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1 percent of the $k$k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first $k$k-mer indexing library for distributed memory environments, and the first extensible library for general $k$k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.

Entities:  

Mesh:

Year:  2017        PMID: 28991750     DOI: 10.1109/TCBB.2017.2760829

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  2 in total

1.  Correlation-Based Analysis of COVID-19 Virus Genome Versus Other Fatal Virus Genomes.

Authors:  Sidharth Purohit; Suresh Chandra Satapathy; S Sibi Chakkaravarthy; Yu-Dong Zhang
Journal:  Arab J Sci Eng       Date:  2021-06-24       Impact factor: 2.807

2.  The parallelism motifs of genomic data analysis.

Authors:  Katherine Yelick; Aydın Buluç; Muaaz Awan; Ariful Azad; Benjamin Brock; Rob Egan; Saliya Ekanayake; Marquita Ellis; Evangelos Georganas; Giulia Guidi; Steven Hofmeyr; Oguz Selvitopi; Cristina Teodoropol; Leonid Oliker
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2020-01-20       Impact factor: 4.226

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.