Literature DB >> 27587665

GTRAC: fast retrieval from compressed collections of genomic variants.

Kedar Tatwawadi1, Mikel Hernaez1, Idoia Ochoa1, Tsachy Weissman1.   

Abstract

MOTIVATION: The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether.
RESULTS: We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures.
AVAILABILITY AND IMPLEMENTATION: The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2016        PMID: 27587665      PMCID: PMC5013914          DOI: 10.1093/bioinformatics/btw437

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  Genome compression: a novel approach for large collections.

Authors:  Sebastian Deorowicz; Agnieszka Danek; Szymon Grabowski
Journal:  Bioinformatics       Date:  2013-08-21       Impact factor: 6.937

2.  A public resource facilitating clinical use of genomes.

Authors:  Madeleine P Ball; Joseph V Thakuria; Alexander Wait Zaranek; Tom Clegg; Abraham M Rosenbaum; Xiaodi Wu; Misha Angrist; Jong Bhak; Jason Bobe; Matthew J Callow; Carlos Cano; Michael F Chou; Wendy K Chung; Shawn M Douglas; Preston W Estep; Athurva Gore; Peter Hulick; Alberto Labarga; Je-Hyuk Lee; Jeantine E Lunshof; Byung Chul Kim; Jong-Il Kim; Zhe Li; Michael F Murray; Geoffrey B Nilsen; Brock A Peters; Anugraha M Raman; Hugh Y Rienhoff; Kimberly Robasky; Matthew T Wheeler; Ward Vandewege; Daniel B Vorhaus; Joyce L Yang; Luhan Yang; John Aach; Euan A Ashley; Radoje Drmanac; Seong-Jin Kim; Jin Billy Li; Leonid Peshkin; Christine E Seidman; Jeong-Sun Seo; Kun Zhang; Heidi L Rehm; George M Church
Journal:  Proc Natl Acad Sci U S A       Date:  2012-07-13       Impact factor: 11.205

3.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

4.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

  4 in total
  2 in total

1.  XSI - A genotype compression tool for compressive genomics in large biobanks.

Authors:  Rick Wertenbroek; Simone Rubinacci; Ioannis Xenarios; Yann Thoma; Olivier Delaneau
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

2.  Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints.

Authors:  Gustavo Glusman; Denise E Mauldin; Leroy E Hood; Max Robinson
Journal:  Front Genet       Date:  2017-09-26       Impact factor: 4.599

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.