Literature DB >> 24347576

Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies.

Raffaele Giancarlo1, Simona E Rombo, Filippo Utro.   

Abstract

High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.

Keywords:  analysis of large biological sequence collections; compressive sequence analysis; data compression in bioinformatics; data compression of large sequence collections; storage and management of HTS data; succinct data structures for bioinformatics

Mesh:

Year:  2013        PMID: 24347576     DOI: 10.1093/bib/bbt088

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  12 in total

1.  Comparison of high-throughput sequencing data compression tools.

Authors:  Ibrahim Numanagić; James K Bonfield; Faraz Hach; Jan Voges; Jörn Ostermann; Claudio Alberti; Marco Mattavelli; S Cenk Sahinalp
Journal:  Nat Methods       Date:  2016-10-24       Impact factor: 28.547

2.  GDC 2: Compression of large collections of genomes.

Authors:  Sebastian Deorowicz; Agnieszka Danek; Marcin Niemiec
Journal:  Sci Rep       Date:  2015-06-25       Impact factor: 4.379

3.  Sequence Factorization with Multiple References.

Authors:  Sebastian Wandelt; Ulf Leser
Journal:  PLoS One       Date:  2015-09-30       Impact factor: 3.240

4.  Compression of next-generation sequencing quality scores using memetic algorithm.

Authors:  Jiarui Zhou; Zhen Ji; Zexuan Zhu; Shan He
Journal:  BMC Bioinformatics       Date:  2014-12-03       Impact factor: 3.169

5.  MAFCO: a compression tool for MAF files.

Authors:  Luís M O Matos; António J R Neves; Diogo Pratas; Armando J Pinho
Journal:  PLoS One       Date:  2015-03-27       Impact factor: 3.240

6.  Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

Authors:  Kelvin V Kredens; Juliano V Martins; Osmar B Dordal; Mauri Ferrandin; Roberto H Herai; Edson E Scalabrin; Bráulio C Ávila
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

7.  Light-weight reference-based compression of FASTQ data.

Authors:  Yongpeng Zhang; Linsen Li; Yanli Yang; Xiao Yang; Shan He; Zexuan Zhu
Journal:  BMC Bioinformatics       Date:  2015-06-09       Impact factor: 3.169

8.  Indexes of large genome collections on a PC.

Authors:  Agnieszka Danek; Sebastian Deorowicz; Szymon Grabowski
Journal:  PLoS One       Date:  2014-10-07       Impact factor: 3.240

Review 9.  Recommendations on e-infrastructures for next-generation sequencing.

Authors:  Ola Spjuth; Erik Bongcam-Rudloff; Johan Dahlberg; Martin Dahlö; Aleksi Kallio; Luca Pireddu; Francesco Vezzi; Eija Korpelainen
Journal:  Gigascience       Date:  2016-06-07       Impact factor: 6.524

Review 10.  Alignment-free sequence comparison: benefits, applications, and tools.

Authors:  Andrzej Zielezinski; Susana Vinga; Jonas Almeida; Wojciech M Karlowski
Journal:  Genome Biol       Date:  2017-10-03       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.