Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fast-HBR: Fast hash based duplicate read remover.

Literature DB >> 35815196

Fast-HBR: Fast hash based duplicate read remover.

Abstract

The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR.

Entities: Chemical

Year: 2022 PMID： 35815196 PMCID： PMC9200608 DOI： 10.6026/97320630018036

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Keyword Cloud
References

12 in total

1. Reducing storage requirements for biological sequence comparison.

Authors: Michael Roberts; Wayne Hayes; Brian R Hunt; Stephen M Mount; James A Yorke
Journal: Bioinformatics Date: 2004-07-15 Impact factor: 6.937

2. Fulcrum: condensing redundant reads from high-throughput sequencing studies.

Authors: Matthew S Burriesci; Erik M Lehnert; John R Pringle
Journal: Bioinformatics Date: 2012-03-13 Impact factor: 6.937

3. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

4. Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers.

Authors: Yuansheng Liu; Xiaocai Zhang; Quan Zou; Xiangxiang Zeng
Journal: Bioinformatics Date: 2021-07-12 Impact factor: 6.937

5. Removing duplicate reads using graphics processing units.

Authors: Andrea Manconi; Marco Moscatelli; Giuliano Armano; Matteo Gnocchi; Alessandro Orro; Luciano Milanesi
Journal: BMC Bioinformatics Date: 2016-11-08 Impact factor: 3.169

6. BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis.

Authors: Gianvito Urgese; Emanuele Parisi; Orazio Scicolone; Santa Di Cataldo; Elisa Ficarra
Journal: Bioinformatics Date: 2020-05-01 Impact factor: 6.937

7. NGSReadsTreatment - A Cuckoo Filter-based Tool for Removing Duplicate Reads in NGS Data.

Authors: Antonio Sérgio Cruz Gaia; Pablo Henrique Caracciolo Gomes de Sá; Mônica Silva de Oliveira; Adonney Allan de Oliveira Veras
Journal: Sci Rep Date: 2019-08-12 Impact factor: 4.379

8. MGnify: the microbiome analysis resource in 2020.

Authors: Alex L Mitchell; Alexandre Almeida; Martin Beracochea; Miguel Boland; Josephine Burgin; Guy Cochrane; Michael R Crusoe; Varsha Kale; Simon C Potter; Lorna J Richardson; Ekaterina Sakharova; Maxim Scheremetjew; Anton Korobeynikov; Alex Shlemov; Olga Kunyavskaya; Alla Lapidus; Robert D Finn
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

9. FastUniq: a fast de novo duplicates removal tool for paired short reads.

Authors: Haibin Xu; Xiang Luo; Jun Qian; Xiaohui Pang; Jingyuan Song; Guangrui Qian; Jinhui Chen; Shilin Chen
Journal: PLoS One Date: 2012-12-20 Impact factor: 3.240

10. EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies.

Authors: Alex L Mitchell; Maxim Scheremetjew; Hubert Denise; Simon Potter; Aleksandra Tarkowska; Matloob Qureshi; Gustavo A Salazar; Sebastien Pesseat; Miguel A Boland; Fiona M I Hunter; Petra Ten Hoopen; Blaise Alako; Clara Amid; Darren J Wilkinson; Thomas P Curtis; Guy Cochrane; Robert D Finn
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971