| Literature DB >> 35186042 |
Karla Cervantes-Gracia1, Richard Chahwan1, Holger Husi2,3.
Abstract
The wealth of high-throughput data has opened up new opportunities to analyze and describe biological processes at higher resolution, ultimately leading to a significant acceleration of scientific output using high-throughput data from the different omics layers and the generation of databases to store and report raw datasets. The great variability among the techniques and the heterogeneous methodologies used to produce this data have placed meta-analysis methods as one of the approaches of choice to correlate the resultant large-scale datasets from different research groups. Through multi-study meta-analyses, it is possible to generate results with greater statistical power compared to individual analyses. Gene signatures, biomarkers and pathways that provide new insights of a phenotype of interest have been identified by the analysis of large-scale datasets in several fields of science. However, despite all the efforts, a standardized regulation to report large-scale data and to identify the molecular targets and signaling networks is still lacking. Integrative analyses have also been introduced as complementation and augmentation for meta-analysis methodologies to generate novel hypotheses. Currently, there is no universal method established and the different methods available follow different purposes. Herein we describe a new unifying, scalable and straightforward methodology to meta-analyze different omics outputs, but also to integrate the significant outcomes into novel pathways describing biological processes of interest. The significance of using proper molecular identifiers is highlighted as well as the potential to further correlate molecules from different regulatory levels. To show the methodology's potential, a set of transcriptomic datasets are meta-analyzed as an example.Entities:
Keywords: bioinformatics; biomarker analysis; data integration; meta-analysis; omics; pathway analysis
Year: 2022 PMID: 35186042 PMCID: PMC8855827 DOI: 10.3389/fgene.2022.828786
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Comparison of available integrative systems biology methodologies.
| Methodology | Strategy | Outcome | Limitations | References |
|---|---|---|---|---|
| HHmeta method | Meta-analysis of Differentially Expressed molecules from omics data. Data from different platforms (e.g. RNAseq, microarray) can be integrated. Biomarker list generation by ranking the frequency distribution and contextualization of molecules into pathways | - Integration of omics Biomarker lists and contextualization into pathway maps | - Depends on availability of the data |
|
| - Novel hypotheses from the Biomarker list | - Relies on previous knowledge | |||
| - Novel hypotheses from the Main deregulated pathways | - Gaps prevail across the pathway maps | |||
| - Better understanding of the disease/condition of interest | - Molecules without a defined function or interaction are not mapped | |||
| Network meta-analysis | Meta-analysis of transcriptomics data by including Differentially Expressed comparison analysis per independent study | - Differentially Expressed Gene list based on meta-analysis of independent experimental studies | - Do not focuses on integrate different omics |
|
| - Focus on obtaining signatures/biomarkers | ||||
| MetaPCA | Meta-analysis of transcriptomic or epigenomic datasets through identification of a common eigen-space for dimension reduction | - Clusters and Patterns of gene expression profile | - Do not focuses on integrate different omics |
|
| - Robust to outliers | - Focus on obtaining signatures/molecular patterns | |||
| MINT | Independent omics studies integration based on similar biological questions | - Identification of reproducible biomarker signatures | - It can only include studies with a sample size bigger than 3 |
|
| Allows supervised and unsupervised frameworks. It is a PLS-based method to model multi-group (studies) data | - Focus on obtaining signatures/biomarkers | |||
| NetworkAnalyst | Gene expression profiling, meta-analysis and systems-level interpretation | - Creates and visualizes biological networks | - Format of gene expression profiles outside the application |
|
| - Web-based meta-analysis of gene expression data | - Integration of transcriptomics studies | |||
| - Comparison of multi gene lists generated outside the tool | ||||
| - Identification of shared and unique genes and processes, through multi-list heatmaps and enrichment networks | ||||
| Mergeomics | Multi-omics association data, pathway analysis and functional genomics, analysis. It corrects for dependencies between omics markers. Based on pathway or network-level meta-analysis | - Identification of key drivers of a disease and causal subnetworks for specific conditions | - Format of gene expression profiles outside the application |
|
| - Single dataset: causal network or key regulatory genes can be identified | - Based on comparison files: Cases vs controls | |||
| - Multiple dataset (same or different data type): meta-analysis, causal networks, key regulatory genes | ||||
| - Groups of disease associated genes: key regulators, condition sub-networks, gene sets association with other conditions or organisms | ||||
| INMEX | Meta-analysis of multiple gene-expression datasets that allows integration of transcriptomics and metabolomics datasets | - Data preparation | - Limited to integration of transcriptomics and metabolomics |
|
| - Statistical analysis: multiple datasets combination based on | ||||
| - Functional analysis and ID combination between genes and metabolites | ||||
| DIABLO | Multi-omics integrative, holistic and data-driven method | - Identification of known and novel multi-omics biomarkers | - Batch effect analyses in each dataset are needed prior to integration |
|
| - Identify correlated variables within omics datasets from the same samples | - Integration of different omics dataset from the same biological samples | |||
| - Focus on obtaining signatures/biomarkers | ||||
| MOFA | Unsupervised identification of principal sources of variation among multi-omics datasets | - Identification of factors specific to data modalities and common within multiple molecular layers | - Analysis and integration of different omics datasets from the same biological samples. Similar to DIABLO, JIVE, PARADIGM or MCIA. |
|
| - Linear model, thus, non-linear associations might be missed | ||||
| Ingenuity pathway analysis (IPA) | Multi-omics pathway analysis tool | - Building of networks to represent biological systems | - Commercial | Ingenuity Pathway Analysis tool (IPA; QIAGEN Inc., Germantown, MD, USA, |
| - Pathway analysis and association of processes activation or inhibition in a specific condition | - Do not generate meta-analyses | |||
| - Identification of novel targets | - Un-reproducible results | |||
| - Comparison across multiple analyses. Similar to Pathway studio (Elsevier) | - Based on computational approaches |
FIGURE 1Flowchart comparing conventional (left) vs. our proposed (right) meta-analytic approach. Blue box represents similarities between approaches. Dashed line highlights the main differences between approaches.
FIGURE 2Theoretical and Real centroiding clustering example visualized in a Volcano plot. (A) A and B represent 2 different molecules. Red and gray circles represent molecule A distribution from the different dataset (DS) comparisons. The majority of molecule A values cluster regarding regulation, log2FC, and p-values (Red circles). Gray circles represent molecule A with non-significant p-values. Green and Red triangles represent molecule B distribution. All molecule B values are significant and cluster regarding regulation, log2FC, and p-values (Red triangles) but two DS comparisons (Green triangles). (B) B1629 and B8816 are real molecules within the DS matrix and represents an example of the distribution of 2 molecules from the biomarker list obtained through the FS index.
FIGURE 3PCA plot grouping and ClueGO/CluePedia focus of our DLBCL cohorts. (A) Blue dots: Group 1; Orange dots: Group 2; White dots: Outliers. (B) ClueGO/CluePedia network created from the Manual approach biomarker list (66% threshold). (C) ClueGO/CluePedia network created from the FS index Biomarker list (0.75 Absolute FS index threshold).
Top deregulated molecules obtained with the Manual approach and FS Index calculation.
| ID | Manual approach | HHmeta method | |||
|---|---|---|---|---|---|
| CluSO ID | Gene name | Final Regulation | Adj. P.V. Mean | Log2FC Mean | FS Index Calculation |
| B2Q85 | ITGA9 | 100 | 9.130E-23 | 4.29 | 1 |
| B2O29 | BIRC3 | 100 | 2.480E-05 | 2.19 | 1 |
| BO058 | HLA-DRB1 | 100 | 3.320E-03 | 2.04 | 1 |
| BO135 | BCL6 | 100 | 1.210E-04 | 2.01 | 1 |
| B8773 | LCE2D | −100 | 2.760E-03 | −2.24 | 1 |
| B9009 | LPP | −100 | 4.687E-05 | −2.25 | 1 |
| BF875 | SYCE1L | −100 | 1.270E-05 | −2.29 | 1 |
| B1137 | ATP10D | −100 | 8.127E-08 | −2.30 | 1 |
| BL316 | DNM1DN8-2 | −100 | 3.360E-04 | −2.40 | 1 |
| BH305 | TSPYL5 | −100 | 4.326E-07 | −2.49 | 1 |
| B5415 | FAM208B | −100 | 7.820E-07 | −2.55 | 1 |
| B7596 | IGF2 | −100 | 3.810E-08 | −2.59 | 1 |
| BO237 | RET | −100 | 1.400E-17 | −2.70 | 1 |
| B8780 | LCE5A | −100 | 1.660E-04 | −2.87 | 1 |
| B8612 | KRTAP5-3 | −100 | 2.240E-07 | −3.00 | 1 |
| B6621 | GPR150 | −100 | 3.360E-07 | −3.01 | 1 |
| B2W20 | YES1 | −100 | 2.380E-08 | −3.45 | 1 |
| B7559 | IFITM5 | −100 | 3.360E-07 | −3.46 | 1 |
Top genes associated with DLBCL through DisGENET.
| Gene | GDA Score | Association Type | Number of PMIDs |
|---|---|---|---|
| BCL2 | 0.4 | Biomarker Altered Expression Genetic Variation | 222 |
| FBXO11 | 0.32 | Biomarker Genetic Variation Causal Mutation | 2 |
| IRF8 | 0.32 | Biomarker Altered Expression Causal Mutation | 2 |
| BCL6 | 0.1 | Biomarker Altered Expression Genetic Variation | 224 |
| BIRC3 | 0.08 | Biomarker Genetic Variation | 8 |
| HDAC9 | 0.07 | Biomarker Altered Expression | 7 |
| ZC3H12D | 0.05 | Biomarker | 5 |
| LIG4 | 0.04 | Biomarker Post-translational modification | 4 |
| HLA-DRB1 | 0.03 | Biomarker Genetic Variation | 3 |
| PSIP1 | 0.02 | Biomarker | 2 |
FIGURE 4PathVisio edited pathway of the obtained biomarker list. NFKB and JAK/STAT section of the complete pathway map from Supplementary Figure S1. This section of a pathway contextualizes and represents the up-regulation trend of the molecules included in the map. Molecules with an adjusted p-value <0.05 from the FS index score calculation were included.
FIGURE 5GeneMANIA focus and Pathvisio de novo pathway contextualization. (A) GeneMANIA results for STAT3; interactor molecules can either represent physical interaction (red), co-expression (purple), genetic interaction (green), shared protein domain (yellow). (B) Pathvisio NFKB and JAK/STAT signaling pathway section with added elements from GeneMANIA highlighted in purple; STAT3 analyzed gene highlighted in yellow box, blue boxes represent feedback loops. analyzed gene highlighted in yellow box, blue boxes represent feedback loops.