Literature DB >> 24300111

High-throughput DNA sequence data compression.

Zexuan Zhu, Yongpeng Zhang, Zhen Ji, Shan He, Xiao Yang.   

Abstract

The exponential growth of high-throughput DNA sequence data has posed great challenges to genomic data storage, retrieval and transmission. Compression is a critical tool to address these challenges, where many methods have been developed to reduce the storage size of the genomes and sequencing data (reads, quality scores and metadata). However, genomic data are being generated faster than they could be meaningfully analyzed, leaving a large scope for developing novel compression algorithms that could directly facilitate data analysis beyond data transfer and storage. In this article, we categorize and provide a comprehensive review of the existing compression methods specialized for genomic data and present experimental results on compression ratio, memory usage, time for compression and decompression. We further present the remaining challenges and potential directions for future research.
© The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

Keywords:  compression; next-generation sequencing; reference-based compression; reference-free compression

Mesh:

Year:  2013        PMID: 24300111     DOI: 10.1093/bib/bbt087

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  17 in total

1.  Aligned genomic data compression via improved modeling.

Authors:  Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  J Bioinform Comput Biol       Date:  2014-12       Impact factor: 1.122

2.  iDoComp: a compression scheme for assembled genomes.

Authors:  Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2014-10-24       Impact factor: 6.937

Review 3.  Mind the gap: resources required to receive, process and interpret research-returned whole genome data.

Authors:  Dana C Crawford; Jessica N Cooke Bailey; Farren B S Briggs
Journal:  Hum Genet       Date:  2019-06-03       Impact factor: 4.132

4.  GDC 2: Compression of large collections of genomes.

Authors:  Sebastian Deorowicz; Agnieszka Danek; Marcin Niemiec
Journal:  Sci Rep       Date:  2015-06-25       Impact factor: 4.379

5.  Compression of Large genomic datasets using COMRAD on Parallel Computing Platform.

Authors:  Christopher Leela Biji; Manu K Madhu; Vineetha Vishnu; Satheesh Kumar K; Achuthsankar S Nair
Journal:  Bioinformation       Date:  2015-05-28

6.  Compression of next-generation sequencing quality scores using memetic algorithm.

Authors:  Jiarui Zhou; Zhen Ji; Zexuan Zhu; Shan He
Journal:  BMC Bioinformatics       Date:  2014-12-03       Impact factor: 3.169

7.  A privacy-preserving solution for compressed storage and selective retrieval of genomic data.

Authors:  Zhicong Huang; Erman Ayday; Huang Lin; Raeka S Aiyar; Adam Molyneaux; Zhenyu Xu; Jacques Fellay; Lars M Steinmetz; Jean-Pierre Hubaux
Journal:  Genome Res       Date:  2016-10-27       Impact factor: 9.043

8.  Light-weight reference-based compression of FASTQ data.

Authors:  Yongpeng Zhang; Linsen Li; Yanli Yang; Xiao Yang; Shan He; Zexuan Zhu
Journal:  BMC Bioinformatics       Date:  2015-06-09       Impact factor: 3.169

Review 9.  Recommendations on e-infrastructures for next-generation sequencing.

Authors:  Ola Spjuth; Erik Bongcam-Rudloff; Johan Dahlberg; Martin Dahlö; Aleksi Kallio; Luca Pireddu; Francesco Vezzi; Eija Korpelainen
Journal:  Gigascience       Date:  2016-06-07       Impact factor: 6.524

10.  Effect of Dynamic Interaction between microRNA and Transcription Factor on Gene Expression.

Authors:  Qi Zhao; Hongsheng Liu; Chenggui Yao; Jianwei Shuai; Xiaoqiang Sun
Journal:  Biomed Res Int       Date:  2016-11-10       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.