Literature DB >> 30428005

SeQuiLa: an elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals.

Marek Wiewiórka1, Anna Leśniewska2, Agnieszka Szmurło1, Kacper Stępień2, Mateusz Borowiak2, Michał Okoniewski3, Tomasz Gambin1.   

Abstract

SUMMARY: Efficient processing of large-scale genomic datasets has recently become possible due to the application of 'big data' technologies in bioinformatics pipelines. We present SeQuiLa-a distributed, ANSI SQL-compliant solution for speedy querying and processing of genomic intervals that is available as an Apache Spark package. Proposed range join strategy is significantly (∼22×) faster than the default Apache Spark implementation and outperforms other state-of-the-art tools for genomic intervals processing.
AVAILABILITY AND IMPLEMENTATION: The project is available at http://biodatageeks.org/sequila/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2019        PMID: 30428005     DOI: 10.1093/bioinformatics/bty940

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  2 in total

1.  SeQuiLa-cov: A fast and scalable library for depth of coverage calculations.

Authors:  Marek Wiewiórka; Agnieszka Szmurło; Wiktor Kuśmirek; Tomasz Gambin
Journal:  Gigascience       Date:  2019-08-01       Impact factor: 6.524

2.  CNVind: an open source cloud-based pipeline for rare CNVs detection in whole exome sequencing data based on the depth of coverage.

Authors:  Wiktor Kuśmirek; Robert Nowak
Journal:  BMC Bioinformatics       Date:  2022-03-05       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.