Literature DB >> 27295683

Data Management for Heterogeneous Genomic Datasets.

Stefano Ceri, Abdulrahman Kaitoua, Marco Masseroli, Pietro Pinoli, Francesco Venco.   

Abstract

Next Generation Sequencing (NGS), a family of technologies for reading DNA and RNA, is changing biological research, and will soon change medical practice, by quickly providing sequencing data and high-level features of numerous individual genomes in different biological and clinical conditions. The availability of millions of whole genome sequences may soon become the biggest and most important "big data" problem of mankind. In this exciting framework, we recently proposed a new paradigm to raise the level of abstraction in NGS data management, by introducing a GenoMetric Query Language (GMQL) and demonstrating its usefulness through several biological query examples. Leveraging on that effort, here we motivate and formalize GMQL operations, especially focusing on the most characteristic and domain-specific ones. Furthermore, we address their efficient implementation and illustrate the architecture of the new software system that we have developed for their execution on big genomic data in a cloud computing environment, providing the evaluation of its performance. The new system implementation is available for download at the GMQL website (http://www.bioinformatics.deib.polimi.it/GMQL/); GMQL can also be tested through a set of predefined queries on ENCODE and Roadmap Epigenomics data at http://www.bioinformatics.deib.polimi.it/GMQL/queries/.

Mesh:

Year:  2016        PMID: 27295683     DOI: 10.1109/TCBB.2016.2576447

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  3 in total

1.  A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling.

Authors:  Supreeta Vijayakumar; Giuseppe Magazzù; Pradip Moon; Annalisa Occhipinti; Claudio Angione
Journal:  Methods Mol Biol       Date:  2022

2.  PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets.

Authors:  Luca Nanni; Pietro Pinoli; Arif Canakoglu; Stefano Ceri
Journal:  BMC Bioinformatics       Date:  2019-11-08       Impact factor: 3.169

3.  RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor.

Authors:  Simone Pallotta; Silvia Cascianelli; Marco Masseroli
Journal:  BMC Bioinformatics       Date:  2022-04-07       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.