Literature DB >> 23751181

Using Genome Query Language to uncover genetic variation.

Christos Kozanitis1, Andrew Heiberg, George Varghese, Vineet Bafna.   

Abstract

MOTIVATION: With high-throughput DNA sequencing costs dropping <$1000 for human genomes, data storage, retrieval and analysis are the major bottlenecks in biological studies. To address the large-data challenges, we advocate a clean separation between the evidence collection and the inference in variant calling. We define and implement a Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants.
RESULTS: We provide a number of cases to showcase the use of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in 5-10 lines of high-level code and search large datasets (100 GB) in minutes. We also demonstrate its complementarity with other variant calling tools. Popular variant calling tools can achieve one order of magnitude speed-up by using GQL to retrieve evidence. Finally, we show how GQL can be used to query and compare multiple datasets. By separating the evidence and inference for variant calling, it frees all variant detection tools from the data intensive evidence collection and focuses on statistical inference. AVAILABILITY: GQL can be downloaded from http://cseweb.ucsd.edu/~ckozanit/gql.

Entities:  

Mesh:

Year:  2013        PMID: 23751181      PMCID: PMC3866549          DOI: 10.1093/bioinformatics/btt250

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  35 in total

1.  RHD gene deletion occurred in the Rhesus box.

Authors:  F F Wagner; W A Flegel
Journal:  Blood       Date:  2000-06-15       Impact factor: 22.113

2.  Standardizing the next generation of bioinformatics software development with BioHDF (HDF5).

Authors:  Christopher E Mason; Paul Zumbo; Stephan Sanders; Mike Folk; Dana Robinson; Ruth Aydt; Martin Gollery; Mark Welsh; N Eric Olson; Todd M Smith
Journal:  Adv Exp Med Biol       Date:  2010       Impact factor: 2.622

3.  Continuous base identification for single-molecule nanopore DNA sequencing.

Authors:  James Clarke; Hai-Chen Wu; Lakmal Jayasinghe; Alpesh Patel; Stuart Reid; Hagan Bayley
Journal:  Nat Nanotechnol       Date:  2009-02-22       Impact factor: 39.213

4.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

5.  Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.

Authors:  Anthony J Cox; Markus J Bauer; Tobias Jakobi; Giovanna Rosone
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

6.  A novel gene family NBPF: intricate structure generated by gene duplications during primate evolution.

Authors:  Karl Vandepoele; Nadine Van Roy; Katrien Staes; Frank Speleman; Frans van Roy
Journal:  Mol Biol Evol       Date:  2005-08-03       Impact factor: 16.240

7.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

8.  A 32 kb critical region excluding Y402H in CFH mediates risk for age-related macular degeneration.

Authors:  Theru A Sivakumaran; Robert P Igo; Jeffrey M Kidd; Andy Itsara; Laura J Kopplin; Wei Chen; Stephanie A Hagstrom; Neal S Peachey; Peter J Francis; Michael L Klein; Emily Y Chew; Vedam L Ramprasad; Wan-Ting Tay; Paul Mitchell; Mark Seielstad; Dwight E Stambolian; Albert O Edwards; Kristine E Lee; Dmitry V Leontiev; Gyungah Jun; Yang Wang; Liping Tian; Feiyou Qiu; Alice K Henning; Thomas LaFramboise; Parveen Sen; Manoharan Aarthi; Ronnie George; Rajiv Raman; Manmath Kumar Das; Lingam Vijaya; Govindasamy Kumaramanickavel; Tien Y Wong; Anand Swaroop; Goncalo R Abecasis; Ronald Klein; Barbara E K Klein; Deborah A Nickerson; Evan E Eichler; Sudha K Iyengar
Journal:  PLoS One       Date:  2011-10-12       Impact factor: 3.240

9.  Mapping and sequencing of structural variation from eight human genomes.

Authors:  Jeffrey M Kidd; Gregory M Cooper; William F Donahue; Hillary S Hayden; Nick Sampas; Tina Graves; Nancy Hansen; Brian Teague; Can Alkan; Francesca Antonacci; Eric Haugen; Troy Zerr; N Alice Yamada; Peter Tsang; Tera L Newman; Eray Tüzün; Ze Cheng; Heather M Ebling; Nadeem Tusneem; Robert David; Will Gillett; Karen A Phelps; Molly Weaver; David Saranga; Adrianne Brand; Wei Tao; Erik Gustafson; Kevin McKernan; Lin Chen; Maika Malig; Joshua D Smith; Joshua M Korn; Steven A McCarroll; David A Altshuler; Daniel A Peiffer; Michael Dorschner; John Stamatoyannopoulos; David Schwartz; Deborah A Nickerson; James C Mullikin; Richard K Wilson; Laurakay Bruhn; Maynard V Olson; Rajinder Kaul; Douglas R Smith; Evan E Eichler
Journal:  Nature       Date:  2008-05-01       Impact factor: 49.962

10.  Compression of next-generation sequencing reads aided by highly efficient de novo assembly.

Authors:  Daniel C Jones; Walter L Ruzzo; Xinxia Peng; Michael G Katze
Journal:  Nucleic Acids Res       Date:  2012-08-16       Impact factor: 16.971

View more
  9 in total

1.  Children's rare disease cohorts: an integrative research and clinical genomics initiative.

Authors:  Shira Rockowitz; Nicholas LeCompte; Mary Carmack; Andrew Quitadamo; Lily Wang; Meredith Park; Devon Knight; Emma Sexton; Lacey Smith; Beth Sheidley; Michael Field; Ingrid A Holm; Catherine A Brownstein; Pankaj B Agrawal; Susan Kornetsky; Annapurna Poduri; Scott B Snapper; Alan H Beggs; Timothy W Yu; David A Williams; Piotr Sliz
Journal:  NPJ Genom Med       Date:  2020-07-06       Impact factor: 8.617

2.  START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

Authors:  Xinjie Zhu; Qiang Zhang; Eric Dun Ho; Ken Hung-On Yu; Chris Liu; Tim H Huang; Alfred Sze-Lok Cheng; Ben Kao; Eric Lo; Kevin Y Yip
Journal:  BMC Genomics       Date:  2017-09-22       Impact factor: 3.969

3.  LW-FQZip 2: a parallelized reference-based compression of FASTQ files.

Authors:  Zhi-An Huang; Zhenkun Wen; Qingjin Deng; Ying Chu; Yiwen Sun; Zexuan Zhu
Journal:  BMC Bioinformatics       Date:  2017-03-20       Impact factor: 3.169

4.  Genomic data integration and user-defined sample-set extraction for population variant analysis.

Authors:  Tommaso Alfonsi; Anna Bernasconi; Arif Canakoglu; Marco Masseroli
Journal:  BMC Bioinformatics       Date:  2022-09-29       Impact factor: 3.307

5.  Light-weight reference-based compression of FASTQ data.

Authors:  Yongpeng Zhang; Linsen Li; Yanli Yang; Xiao Yang; Shan He; Zexuan Zhu
Journal:  BMC Bioinformatics       Date:  2015-06-09       Impact factor: 3.169

6.  GenAp: a distributed SQL interface for genomic data.

Authors:  Christos Kozanitis; David A Patterson
Journal:  BMC Bioinformatics       Date:  2016-02-04       Impact factor: 3.169

7.  GORpipe: a query tool for working with sequence data based on a Genomic Ordered Relational (GOR) architecture.

Authors:  Hákon Guðbjartsson; Guðmundur Fr Georgsson; Sigurjón A Guðjónsson; Ragnar Þór Valdimarsson; Jóhann H Sigurðsson; Sigmar K Stefánsson; Gísli Másson; Gísli Magnússon; Vilmundur Pálmason; Kári Stefánsson
Journal:  Bioinformatics       Date:  2016-06-23       Impact factor: 6.937

8.  Children's rare disease cohorts: an integrative research and clinical genomics initiative.

Authors:  Shira Rockowitz; Nicholas LeCompte; Mary Carmack; Andrew Quitadamo; Lily Wang; Meredith Park; Devon Knight; Emma Sexton; Lacey Smith; Beth Sheidley; Michael Field; Ingrid A Holm; Catherine A Brownstein; Pankaj B Agrawal; Susan Kornetsky; Annapurna Poduri; Scott B Snapper; Alan H Beggs; Timothy W Yu; David A Williams; Piotr Sliz
Journal:  NPJ Genom Med       Date:  2020-07-06       Impact factor: 8.617

9.  plyranges: a grammar of genomic data transformation.

Authors:  Stuart Lee; Dianne Cook; Michael Lawrence
Journal:  Genome Biol       Date:  2019-01-04       Impact factor: 13.583

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.