Literature DB >> 29351600

GTC: how to maintain huge genotype collections in a compressed form.

Agnieszka Danek1, Sebastian Deorowicz1.   

Abstract

Motivation: Nowadays, genome sequencing is frequently used in many research centers. In projects, such as the Haplotype Reference Consortium or the Exome Aggregation Consortium, huge databases of genotypes in large populations are determined. Together with the increasing size of these collections, the need for fast and memory frugal ways of representation and searching in them becomes crucial.
Results: We present GTC (GenoType Compressor), a novel compressed data structure for representation of huge collections of genetic variation data. It significantly outperforms existing solutions in terms of compression ratio and time of answering various types of queries. We show that the largest of publicly available database of about 60 000 haplotypes at about 40 million SNPs can be stored in <4 GB, while the queries related to variants are answered in a fraction of a second. Availability and implementation: GTC can be downloaded from https://github.com/refresh-bio/GTC or http://sun.aei.polsl.pl/REFRESH/gtc. Contact: sebastian.deorowicz@polsl.pl. Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2018        PMID: 29351600     DOI: 10.1093/bioinformatics/bty023

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Accurate, scalable cohort variant calls using DeepVariant and GLnexus.

Authors:  Taedong Yun; Helen Li; Pi-Chuan Chang; Michael F Lin; Andrew Carroll; Cory Y McLean
Journal:  Bioinformatics       Date:  2021-01-05       Impact factor: 6.937

2.  VariantStore: an index for large-scale genomic variant search.

Authors:  Prashant Pandey; Yinjie Gao; Carl Kingsford
Journal:  Genome Biol       Date:  2021-08-19       Impact factor: 13.583

3.  Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes.

Authors:  Peter Ralph; Kevin Thornton; Jerome Kelleher
Journal:  Genetics       Date:  2020-05-01       Impact factor: 4.562

4.  genozip: a fast and efficient compression tool for VCF files.

Authors:  Divon Lan; Raymond Tobler; Yassine Souilmi; Bastien Llamas
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

5.  Sparse Project VCF: efficient encoding of population genotype matrices.

Authors:  Michael F Lin; Xiaodong Bai; William J Salerno; Jeffrey G Reid
Journal:  Bioinformatics       Date:  2021-04-01       Impact factor: 6.937

6.  XSI - A genotype compression tool for compressive genomics in large biobanks.

Authors:  Rick Wertenbroek; Simone Rubinacci; Ioannis Xenarios; Yann Thoma; Olivier Delaneau
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.