Literature DB >> 25649616

GenoMetric Query Language: a novel approach to large-scale genomic data management.

Marco Masseroli1, Pietro Pinoli1, Francesco Venco1, Abdulrahman Kaitoua1, Vahid Jalili1, Fernando Palluzzi1, Heiko Muller1, Stefano Ceri1.   

Abstract

MOTIVATION: Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities.
RESULTS: We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets.
AVAILABILITY AND IMPLEMENTATION: The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25649616     DOI: 10.1093/bioinformatics/btv048

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  TICA: Transcriptional Interaction and Coregulation Analyzer.

Authors:  Stefano Perna; Pietro Pinoli; Stefano Ceri; Limsoon Wong
Journal:  Genomics Proteomics Bioinformatics       Date:  2018-12-19       Impact factor: 7.691

2.  GeMI: interactive interface for transformer-based Genomic Metadata Integration.

Authors:  Giuseppe Serna Garcia; Michele Leone; Anna Bernasconi; Mark J Carman
Journal:  Database (Oxford)       Date:  2022-06-03       Impact factor: 4.462

3.  Scalable analysis of multi-modal biomedical data.

Authors:  Jaclyn Smith; Yao Shi; Michael Benedikt; Milos Nikolic
Journal:  Gigascience       Date:  2021-09-11       Impact factor: 6.524

4.  Accurate and highly interpretable prediction of gene expression from histone modifications.

Authors:  Fabrizio Frasca; Matteo Matteucci; Michele Leone; Marco J Morelli; Marco Masseroli
Journal:  BMC Bioinformatics       Date:  2022-04-26       Impact factor: 3.307

Review 5.  Single-cell Transcriptome Study as Big Data.

Authors:  Pingjian Yu; Wei Lin
Journal:  Genomics Proteomics Bioinformatics       Date:  2016-02-11       Impact factor: 7.691

6.  TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

Authors:  Fabio Cumbo; Giulia Fiscon; Stefano Ceri; Marco Masseroli; Emanuel Weitschek
Journal:  BMC Bioinformatics       Date:  2017-01-03       Impact factor: 3.169

7.  START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

Authors:  Xinjie Zhu; Qiang Zhang; Eric Dun Ho; Ken Hung-On Yu; Chris Liu; Tim H Huang; Alfred Sze-Lok Cheng; Ben Kao; Eric Lo; Kevin Y Yip
Journal:  BMC Genomics       Date:  2017-09-22       Impact factor: 3.969

8.  Explorative visual analytics on interval-based genomic data and their metadata.

Authors:  Vahid Jalili; Matteo Matteucci; Marco Masseroli; Stefano Ceri
Journal:  BMC Bioinformatics       Date:  2017-12-04       Impact factor: 3.169

9.  Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions.

Authors:  Valerio Bianchi; Arnaud Ceol; Alessandro G E Ogier; Stefano de Pretis; Eugenia Galeota; Kamal Kishore; Pranami Bora; Ottavio Croci; Stefano Campaner; Bruno Amati; Marco J Morelli; Mattia Pelizzola
Journal:  Front Genet       Date:  2016-05-06       Impact factor: 4.599

10.  GenAp: a distributed SQL interface for genomic data.

Authors:  Christos Kozanitis; David A Patterson
Journal:  BMC Bioinformatics       Date:  2016-02-04       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.