| Literature DB >> 25983554 |
Xiaoxi Dong1, Anatoly Yambartsev2, Stephen A Ramsey3, Lina D Thomas2, Natalia Shulzhenko4, Andrey Morgun1.
Abstract
Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform these data into biological knowledge, for example, how to use these data to answer questions such as: Which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction, and network interrogation. Here we provide an overview of network analysis including a step-by-step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.Entities:
Keywords: big data; data integration; inter-omics network; network interrogation; network reconstruction; systems biology; transkingdom network
Year: 2015 PMID: 25983554 PMCID: PMC4415676 DOI: 10.4137/BBI.S12467
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1Workflow of network analysis. (A) Network analysis starts from data obtained from high-throughput experiments such as microarray experiments detecting expression of genes in samples. (B) Differentially expressed genes are found between two states of a system (eg, normal vs disease). (C) Correlations of DEGs based on their expression values are calculated to detect regulatory relationship among them. (D) Significant correlations suggest connections between differentially expressed genes (DEGs) and are used to generate a network of DEGs. (E) Network interrogation is performed to detect modules, key regulators, and functional pathways that are important for state transitions. (F) Based on the findings from network interrogation, new hypotheses are generated, which can be tested in newly designed experiments. Data from new experiments could also be subject to further analysis.
Tools for network reconstruction and interrogation.
| STEP | METHOD -(STATISTICS / MATHEMATICS) | TOOL | LINK | REF |
|---|---|---|---|---|
| Normalization | Quantile, lowess | BRB Array tools | ||
| Quantile, lowess, etc. | Package ‘affy* in Bioconductor | |||
| R package ‘phyloseq’ | ||||
| Finding DEGs | BRB Array tools IDEG6 | |||
| Different test statistics, choice with Bonferroni correction | ||||
| SVM | SIRENE | |||
| Semi-supervised learning; Logistic regression | SEREND | |||
| Likelihood of mutual information | CLR | |||
| Mutual information | ARACNE | |||
| Mutual Information | MIDER | ARACNe | ||
| Itemset mining | DISTILLER | |||
| Bayesian hierarchical clustering; conditional | LeMoNe | request from authors | ||
| Entropy Context Likelihood of Relatedness | Inferelator | details/LeMoNe | ||
| Remove indirect links | Partial correlation | Corpcor | ||
| Local partial correlation | Local partial correlation | |||
| Global silencing of indirect correlations | Silencing | |||
| Network deconvolution | Network deconvolution | |||
| Weighted correlation network | Pearson correlation | WGCNA | ||
| Differential | Pearson correlation | CoXpress | ||
| co-expression | Pearson correlation | Dapfinder | ||
| Data integration | Bicluster | cMonkey | ||
| Itemset mining | DISTILLER | Request from authors | ||
| Meta-analysis | Fisher’s combined probability test | metap’ in software ‘stata’ | ||
| OpenMeta | ||||
| Visualization | Cytoscape | |||
| Gephi | ||||
| Circos | ||||
| Module finding | Vertex weighting by local neighborhood density | MCODE | ||
| Union of k-cliques | cfinder | |||
| Markov Cluster Algorithm | mcl | |||
| Function analysis/gene set enrichment | Fisher’s Exact | DAVID | ||
| Kolmogorov-Smirnov statistic modification | GSEA | |||
| Fisher’s Exact | GoMiner | |||
| Hypergeometric | GeneMerge | |||
| Fisher’s Exact | FuncAssociate | |||
| Dimension reduction (independent component analysis or fixed effect meta-estimate) followed by weighted pearson correlation | ProfileChaser | |||
| Hypergeometric test | Bingo | |||
| Jaccard coefficient | EnrichmentMap | |||
| Hypergeometric distribution | SubpathwayMiner | |||
| Identify Key regulators | Network topology properties | Cytoscape | toolsⵆnetworkAnalyzerⵆAnalyze network | |
| Intramodular connectivity, causality testing | WGCNA | |||
| Pathway crosstalk | Crosstalk enrichment Eigen vector | CrossTalkZ Eigengene | ||
| Gene function prediction | Bayesian network | MEFIT | ||
| Fast heuristic algorithm from ridge regression | GeneMANIA | 129 | ||
| New gene ontology | Hierarchical clustering | NeXo | 130 | |
Figure 2Removal of indirect links. As a demonstration, gene X can regulate the expression of both gene Y and Z. But there is no direct regulatory relationship between gene Y and Z. From the calculation of correlation of expression levels of three genes, correlations between gene X and Y, Z are observed as expected. However, genes Y and Z are also significantly correlated since they are both directly regulated by gene X. This correlation from common cause is called indirect link and can be removed by techniques, such as partial correlation, generating a network reflecting regulatory relationships.
Figure 3Illustration of expected and unexpected correlations. (A) When expression of two genes (gene x and gene y) are regulated toward the same direction when comparing two states, eg, both upregulated in disease (upper two panels), we should expect their expression levels to be positively correlated within each state if there exists regulatory relationship between gene x and gene y. When two genes are oppositely regulated when transiting from normal to disease (in the lower two panels, gene x is upregulated while gene z is down regulated), we should expect negative correlation between those two genes in each state. (B) Different combinations of between states and sign of correlations used to define expected or unexpected correlation.
Figure 4(A) Gene 2 and gene 7 correlate with each other in both normal and disease conditions, but the signs of the correlation coefficient are opposite. (B) In normal condition, there is no correlation between gene 4 and gene 5, but they gain positive correlation when the biological system transitioned to disease. (C) Example of visualization of a network transitioning between normal and disease conditions. Red lines represent positive correlation, blue line represent negative correlation, and dotted gray lines represent nonexisting correlations in one condition that strongly appear in the other condition (on this case, becomes positively correlated).
Figure 5Data integration for inter-omics network. (A) Networks are constructed from different data types (eg, network 1 for gene genetic interaction network and network 2 for mRNA coexpression network). These two networks then can be integrated into one network by overlapping the nodes that are correspondent between two networks (eg, gene 3 and its transcript mRNA 3 are merged into one node). (B) In another type of integration, links are created between nodes by different evidence of interaction, either experimentally proved relationship (eg, knockout of gene 1 altered the expression level of mRNA13) or statistical association between features of two nodes (eg, gene 5 and mRNA45).
Figure 6Network interrogation. (A) Densely connected subnetworks (modules) are detected, and enriched functions of those modules are detected. (B) Genes with unknown function (gray) can be annotated based on the function of its neighbors in the network or the functions of the genes in the same module. (C) New gene ontologies can be generated by analyzing the hierarchical organization of gene clusters. (D) Multiple data types can be integrated to help infer the direction of regulation and identify key regulators based on their network topological features. (E) Crosstalks between pathways can be studied by extracting eigengenes or analyzing enriched interactions between networks. Key regulators for pathway crosstalk can also be identified based on their between-module topology properties.
Figure 7Transkingdom network resulting from network analysis. Transkingdom network includes microbial genes (red) and host (mouse) genes (green). A key regulator is identified as a gene within top 1% of bipartite betweenness centrality is LasR (yellow). Two microbial gene subnetworks, indicated by blue circles, are enriched with genes from Pseudomonas aeruginosa and Escherichia coli.