Literature DB >> 30011247

Dynamic Alignment-Free and Reference-Free Read Compression.

Guillaume Holley1,2, Roland Wittler1,2, Jens Stoye1, Faraz Hach3,4,5.   

Abstract

The advent of high throughput sequencing (HTS) technologies raises a major concern about storage and transmission of data produced by these technologies. In particular, large-scale sequencing projects generate an unprecedented volume of genomic sequences ranging from tens to several thousands of genomes per species. These collections contain highly similar and redundant sequences, also known as pangenomes. The ideal way to represent and transfer pangenomes is through compression. A number of HTS-specific compression tools have been developed to reduce the storage and communication costs of HTS data, yet none of them is designed to process a pangenome. In this article, we present dynamic alignment-free and reference-free read compression (DARRC), a new alignment-free and reference-free compression method. It addresses the problem of pangenome compression by encoding the sequences of a pangenome as a guided de Bruijn graph. The novelty of this method is its ability to incrementally update DARRC archives with new genome sequences without full decompression of the archive. DARRC can compress both single-end and paired-end read sequences of any length using all symbols of the IUPAC nucleotide code. On a large Pseudomonas aeruginosa data set, our method outperforms all other tested tools. It provides a 30% compression ratio improvement in single-end mode compared with the best performing state-of-the-art HTS-specific compression method in our experiments.

Entities:  

Keywords:  guided de Bruijn graph; high throughput sequencing; sequence compression

Mesh:

Year:  2018        PMID: 30011247     DOI: 10.1089/cmb.2018.0068

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  2 in total

1.  BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs.

Authors:  Rongjie Wang; Junyi Li; Yang Bai; Tianyi Zang; Yadong Wang
Journal:  PeerJ       Date:  2018-10-19       Impact factor: 2.984

2.  Better quality score compression through sequence-based quality smoothing.

Authors:  Yoshihiro Shibuya; Matteo Comin
Journal:  BMC Bioinformatics       Date:  2019-11-22       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.