Jorge González-Domínguez1, Bertil Schmidt2. 1. Grupo de Arquitectura de Computadores, Universidade da Coruña, Campus De Elviña, 15071, A Coruña, Spain and. 2. Parallel and Distributed Architectures Group, Johannes Gutenberg University Mainz, 55128 Mainz, Germany.
Abstract
UNLABELLED: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors. AVAILABILITY AND IMPLEMENTATION: Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/pardre/ CONTACT: jgonzalezd@udc.es.
UNLABELLED: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors. AVAILABILITY AND IMPLEMENTATION: Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/pardre/ CONTACT: jgonzalezd@udc.es.
Authors: Kai He; Triston G Eastman; Hannah Czolacz; Shuhao Li; Akio Shinohara; Shin-Ichiro Kawada; Mark S Springer; Michael Berenbrink; Kevin L Campbell Journal: Elife Date: 2021-04-29 Impact factor: 8.140
Authors: Christina E May; Anoumid Vaziri; Yong Qi Lin; Olga Grushko; Morteza Khabiri; Qiao-Ping Wang; Kristina J Holme; Scott D Pletcher; Peter L Freddolino; G Gregory Neely; Monica Dus Journal: Cell Rep Date: 2019-05-07 Impact factor: 9.995
Authors: Maeva A Techer; Rahul V Rane; Miguel L Grau; John M K Roberts; Shawn T Sullivan; Ivan Liachko; Anna K Childers; Jay D Evans; Alexander S Mikheyev Journal: Commun Biol Date: 2019-10-01
Authors: Meriam Guellil; Marcel Keller; Jenna M Dittmar; Sarah A Inskip; Craig Cessford; Anu Solnik; Toomas Kivisild; Mait Metspalu; John E Robb; Christiana L Scheib Journal: Genome Biol Date: 2022-02-03 Impact factor: 13.583