Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

Literature DB >> 27637471

Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

Marco Masseroli¹, Abdulrahman Kaitoua², Pietro Pinoli³, Stefano Ceri⁴.

Abstract

While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright Â

Keywords: Data interoperability; Data modeling; Genomic data management; Metadata management; Operations for genomics; Query languages

Mesh：

Year: 2016 PMID： 27637471 DOI： 10.1016/j.ymeth.2016.09.002

Source DB: PubMed Journal: Methods ISSN： 1046-2023 Impact factor: 3.608

Keyword Cloud
Cited

6 in total

6. A Large-Scale and Serverless Computational Approach for Improving Quality of NGS Data Supporting Big Multi-Omics Data Analyses.

Authors: Dariusz Mrozek; Krzysztof Stępień; Piotr Grzesik; Bożena Małysiak-Mrozek
Journal: Front Genet Date: 2021-07-13 Impact factor: 4.599

6 in total

Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

1. PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets.

2. Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives.

3. RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor.

4. Genomic data integration and user-defined sample-set extraction for population variant analysis.

5. Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction.

6. A Large-Scale and Serverless Computational Approach for Improving Quality of NGS Data Supporting Big Multi-Omics Data Analyses.