| Literature DB >> 31249591 |
Johann S Hawe1,2, Fabian J Theis1,3, Matthias Heinig1,2.
Abstract
A major goal in systems biology is a comprehensive description of the entirety of all complex interactions between different types of biomolecules-also referred to as the interactome-and how these interactions give rise to higher, cellular and organism level functions or diseases. Numerous efforts have been undertaken to define such interactomes experimentally, for example yeast-two-hybrid based protein-protein interaction networks or ChIP-seq based protein-DNA interactions for individual proteins. To complement these direct measurements, genome-scale quantitative multi-omics data (transcriptomics, proteomics, metabolomics, etc.) enable researchers to predict novel functional interactions between molecular species. Moreover, these data allow to distinguish relevant functional from non-functional interactions in specific biological contexts. However, integration of multi-omics data is not straight forward due to their heterogeneity. Numerous methods for the inference of interaction networks from homogeneous functional data exist, but with the advent of large-scale paired multi-omics data a new class of methods for inferring comprehensive networks across different molecular species began to emerge. Here we review state-of-the-art techniques for inferring the topology of interaction networks from functional multi-omics data, encompassing graphical models with multiple node types and quantitative-trait-loci (QTL) based approaches. In addition, we will discuss Bayesian aspects of network inference, which allow for leveraging already established biological information such as known protein-protein or protein-DNA interactions, to guide the inference process.Entities:
Keywords: data integration; genomics; machine learning; mixed data; personalized medicine; prior information; single cell; systems biology
Year: 2019 PMID: 31249591 PMCID: PMC6582773 DOI: 10.3389/fgene.2019.00535
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Overview on selected resources for molecular interactions and omics datasets.
| STRING | P-P | > 5000 | Szklarczyk et al., |
| BioGrid | P-P | > 60 | Stark et al., |
| inBio map | P-P | HS | Li et al., |
| GWAS catalog | D-PH | HS | MacArthur et al., |
| KEGG | multiple | > 5000 | Kanehisa and Goto, |
| APID | P-P | > 400 | Alonso-Lopez et al., |
| doRINA | P-R, miR-R | HS, MM, DM, CE | Blin et al., |
| REMAP | P-D | HS | Chèneby et al., |
| IntAct | P-P | multiple | Orchard et al., |
| Pathway Commons | multiple | multiple | Cerami et al., |
| AGRIS | P-D | AT | Yilmaz et al., |
| ENCODE | G, T, E | HS | The ENCODE Project Consortium, |
| modENCODE | G, T, E | DM, CE | Celniker et al., |
| GTEx | G, T | HS | Carithers et al., |
| ROADMAP | E, T | HS | Roadmap Epigenomics Consortium, |
| GEO | G, T, E | multiple | Edgar et al., |
| ARCHS4 | T | HS, MM | Lachmann et al., |
| The Human Protein Atlas | T, P | HS | Thul et al., |
| MetaboLights | M | multiple | Haug et al., |
| TCGA | G, T, E | HS | Weinstein et al., |
Data type column depicts either the type of interactions (e.g., protein-protein interaction, P-P) or the type of omics data available in the data collection. Interactions: M, metabolite; P, protein; D, DNA; R, RNA; PH, phenotype; Organisms: HS, H. sapiens; AT, A. thaliana; MM, M. musculus; DM, D. melanogaster; CE, C. elegans; Omics: G, genomic; E,epigenomic; T,transcriptomic.
includes functional interactions.
focus on P-P, but arbitrary interactions possible.
Figure 1Scheme for integration of reference interactomes with multi-omics data and phenotypes (GWAS) to obtain context specific interactomes. (A) trans-eQTL allow to investigate e.g., TF binding mechanisms which can be complemented with additional regulatory information such as CpG methylation. (B) Established associations from (A) between SNP S, TF A, CpG F, and gene B complement reference interactomes. (C) Regulatory (possibly heterogeneous) networks are inferred from multi-omics data optionally using established biological knowledge as prior information. Integration of e.g., genotypes, expression, and methylation data allows to investigate regulatory dependencies between different omic layers. (D) Associations identified in (C) complement reference interactomes by adding new regulatory layers (SNP S, CpG F), novel genes (gene C) or new links between already existing genes (genes B and E) similar to (B). (E) Reference interactomes are annotated with SNPs associated with specific disease contexts from GWAS results. (F) The final context-specific interactome enables detailed investigation of disease related regulatory mechanisms across distinct omic layers.
Figure 3Illustration of the concept of different network inference methods. (A) represents a known pathway structure which should be recovered from functional data using the different approaches: two transcription factors influencing expression of two target genes which in turn affect the expression of other downstream genes. (B,C) show correlation based results and their estimated matrices (correlation and partial-correlation, respectively). While using Pearson correlation results in many indirect associations (shown in red), this is largely amended by using partial correlations. (D) The graphical lasso pushes weaker associations (e.g., between TF1 and gene C) toward zero in the precision matrix and might do so even for real edges which have relatively low evidence in the data (like the edge between TF2 and target1). (E) When considering prior information, weak associations still have a chance of getting selected if their respective prior (shown in green) supports them.
Figure 2Illustration of the concept of partial correlation networks. Two networks show the dependency structure between random variables depicted as nodes. Solid edges in (A) represent high Pearson correlation coefficients between random variables, also shown in the corresponding correlation matrix. Solid edges in (B) represent non-zero partial correlation coefficients between random variables, also shown in the corresponding partial correlation matrix. Considering partial correlation compared to Pearson correlation removes the edge between B and C arising from the effect A exhibits on both B and C. Subfigure (C) compares correlation and partial correlation between A and B given C. Scatter plots show the original data (blue), the residuals (green lines) after regressing both A and B on C, and the relation between the residuals (orange). Here a clear linear relation between the residuals is observed, which is reflected in a non-zero partial correlation (represented by an edge) between A and B. Analogously, subfigure (D) compares correlation and partial correlation between B and C given A. Here no clear linear relation between the residuals is observed, which is reflected in a partial correlation between B and C that is not significantly different from zero. Consequently, there is no edge between B and C in the partial correlation graph.
| A dataset in which for each individual sample at least two different kinds of molecular information (such as genotype, gene expression, or DNA methylation information) is available. | |
| Measure of (conditional) dependence between (statistical) variables. Two variables are partially correlated, if they are still significantly correlated after the effect of all other variables in the dataset has been removed from the two target variables via linear regression. For multivariate normal distributions a partial correlation of zero is equivalent to conditional independence between two variables (Baba et al., | |
| In a Gaussian Graphical Model, where the | |
| In a statistical model, the number of variables | |
| Causal networks (also Bayesian networks) are directed acyclic graphs and establish directed dependencies between individual nodes, i.e., all edges between nodes are effectively arrows representing a direction of effect. For example, in a causal co-expression network it could be deduced that the expression of a gene changes as a result of a change in another gene, while in an undirected network this would be reflected as a mere correlation. |
List of network inference methods discussed in this review for which implementations are available.
| GeneNet | shrinkage/pcor | No | No | No | Schäfer and Strimmer, |
| ARACNE(-AP) | Mutual information | No | No | No | Margolin et al., |
| GENIE3 | RF | Potentially | No | No | Huynh-Thu et al., |
| GRNBoost | RF | Potentially | No | No | Aibar et al., |
| gLASSO | LASSO | No | No | No | Friedman et al., |
| wgLasso | LASSO | No | No | No | Li and Jackson, |
| pLasso | LASSO | No | No | No | Wang et al., |
| iRafNet | RF | Yes | Yes | No | Petralia et al., |
| GRaFo | RF/stability selection | Yes | No | No | Fellinghauer et al., |
| causalMGM | RF/StEPS | Yes | No | Yes | Sedgewick et al., |
| bdgraph | MCMC | yes | Yes | No | Mohammadi and Wit, |
Column concept describes the underlying statistical concept. Additional columns indicate applicability of methods to heterogeneous data types (mixed data) as well as possibility for prior incorporation (priors) or directed graph inference (directed). pcor, partial correlation; RF, random forest; StEPS, Stepwise Edge-specific Penalty Selection; MCMC, Markov Chain Monte Carlo.
not specifically tarted to or evaluated with respect to this aspect.
developed in single-cell context.