Literature DB >> 35815196

Fast-HBR: Fast hash based duplicate read remover.

Sami Altayyar1, Abdel Monim Artoli1.   

Abstract

The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR.
© 2022 Biomedical Informatics.

Entities:  

Year:  2022        PMID: 35815196      PMCID: PMC9200608          DOI: 10.6026/97320630018036

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


  12 in total

1.  Reducing storage requirements for biological sequence comparison.

Authors:  Michael Roberts; Wayne Hayes; Brian R Hunt; Stephen M Mount; James A Yorke
Journal:  Bioinformatics       Date:  2004-07-15       Impact factor: 6.937

2.  Fulcrum: condensing redundant reads from high-throughput sequencing studies.

Authors:  Matthew S Burriesci; Erik M Lehnert; John R Pringle
Journal:  Bioinformatics       Date:  2012-03-13       Impact factor: 6.937

3.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

4.  Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers.

Authors:  Yuansheng Liu; Xiaocai Zhang; Quan Zou; Xiangxiang Zeng
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

5.  Removing duplicate reads using graphics processing units.

Authors:  Andrea Manconi; Marco Moscatelli; Giuliano Armano; Matteo Gnocchi; Alessandro Orro; Luciano Milanesi
Journal:  BMC Bioinformatics       Date:  2016-11-08       Impact factor: 3.169

6.  BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis.

Authors:  Gianvito Urgese; Emanuele Parisi; Orazio Scicolone; Santa Di Cataldo; Elisa Ficarra
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

7.  NGSReadsTreatment - A Cuckoo Filter-based Tool for Removing Duplicate Reads in NGS Data.

Authors:  Antonio Sérgio Cruz Gaia; Pablo Henrique Caracciolo Gomes de Sá; Mônica Silva de Oliveira; Adonney Allan de Oliveira Veras
Journal:  Sci Rep       Date:  2019-08-12       Impact factor: 4.379

8.  MGnify: the microbiome analysis resource in 2020.

Authors:  Alex L Mitchell; Alexandre Almeida; Martin Beracochea; Miguel Boland; Josephine Burgin; Guy Cochrane; Michael R Crusoe; Varsha Kale; Simon C Potter; Lorna J Richardson; Ekaterina Sakharova; Maxim Scheremetjew; Anton Korobeynikov; Alex Shlemov; Olga Kunyavskaya; Alla Lapidus; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

9.  FastUniq: a fast de novo duplicates removal tool for paired short reads.

Authors:  Haibin Xu; Xiang Luo; Jun Qian; Xiaohui Pang; Jingyuan Song; Guangrui Qian; Jinhui Chen; Shilin Chen
Journal:  PLoS One       Date:  2012-12-20       Impact factor: 3.240

10.  EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies.

Authors:  Alex L Mitchell; Maxim Scheremetjew; Hubert Denise; Simon Potter; Aleksandra Tarkowska; Matloob Qureshi; Gustavo A Salazar; Sebastien Pesseat; Miguel A Boland; Fiona M I Hunter; Petra Ten Hoopen; Blaise Alako; Clara Amid; Darren J Wilkinson; Thomas P Curtis; Guy Cochrane; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.