| Literature DB >> 19114008 |
Peter Langfelder1, Steve Horvath.
Abstract
BACKGROUND: Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial.Entities:
Mesh:
Year: 2008 PMID: 19114008 PMCID: PMC2631488 DOI: 10.1186/1471-2105-9-559
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Glossary of WGCNA Terminology.
| Term | Definition |
| Co-expression network | We define co-expression networks as undirected, weighted gene networks. The nodes of such a network correspond to gene expression profiles, and edges between genes are determined by the pairwise correlations between gene expressions. By raising the absolute value of the correlation to a power |
| Module | Modules are clusters of highly interconnected genes. In an unsigned co-expression network, modules correspond to clusters of genes with high absolute correlations. In a signed network, modules correspond to positively correlated genes. |
| Connectivity | For each gene, the connectivity (also known as degree) is defined as the sum of connection strengths with the other network genes: |
| Intramodular connectivity | Intramodular connectivity measures how connected, or co-expressed, a given gene is with respect to the genes of a particular module. The intramodular connectivity may be interpreted as a measure of module membership. |
| Module eigengene | The module eigengene |
| Eigengene significance | When a microarray sample trait |
| Module Membership, also known as eigengene-based connectivity | For each gene, we define a "fuzzy" measure of module membership by correlating its gene expression profile with the module eigengene of a given module. For example, |
| Hub gene | This loosely defined term is used as an abbreviation of "highly connected gene." By definition, genes inside co-expression modules tend to have high connectivity. |
| Gene significance | To incorporate external information into the co-expression network, we make use of gene significance measures. Abstractly speaking, the higher the absolute value of |
| Module significance | Module significance is determined as the average absolute gene significance measure for all genes in a given module. When gene significance is defined as the correlation of gene expression profiles with an external trait |
Figure 1Overview of WGCNA methodology. This flowchart presents a brief overview of the main steps of Weighted Gene Co-expression Network Analysis.
Figure 2Network visualization plots. A. Log-log plot of whole-network connectivity distribution. The x-axis shows the logarithm of whole network connectivity, y-axis the logarithm of the corresponding frequency distribution. On this plot the distribution approximately follows a straight line, which is referred to as approximately scale-free topology. B. Results of classical multidimensional scaling. Modules tend to form separate 'fingers' in this plot. Intramodular hub genes are located at the finger tips. C. Network heatmap plot. Branches in the hierarchical clustering dendrograms correspond to modules. Color-coded module membership is displayed in the color bars below and to the right of the dendrograms. In the heatmap, high co-expression interconnectedness is indicated by progressively more saturated yellow and red colors. Modules correspond to blocks of highly interconnected genes. Genes with high intramodular connectivity are located at the tip of the module branches since they display the highest interconnectedness with the rest of the genes in the module.
Figure 3Module and eigengene network plots. A. Barplot of mean gene significance across modules. In this example we use a trait-based gene significance, Equation 2. The higher the mean gene significance in a module, the more significantly related the module is to the clinical trait of interest. B. Scatterplot of gene significance (y-axis) vs. module membership (x-axis) in the most significant module (green module, see panel A). In modules related to a trait of interest, genes with high module membership often also have high gene significance. C. Hierarchical clustering dendrogram of module eigengenes (labeled by their colors) and the microarray sample trait y. D. Heatmap plot of the adjacencies in the eigengene network including the trait y. Each row and column in the heatmap corresponds to one module eigengene (labeled by color) or the trait (labeled by y). In the heatmap, green color represents low adjacency (negative correlation), while red represents high adjacency (positive correlation).
Figure 4Example WGCNA analysis of liver expression data in female mice. A. Gene dendrogram obtained by average linkage hierarchical clustering. The color row underneath the dendrogram shows the module assignment determined by the Dynamic Tree Cut. B. Heatmap plot of topological overlap in the gene network. In the heatmap, each row and column corresponds to a gene, light color denotes low topological overlap, and progressively darker red denotes higher topological overlap. Darker squares along the diagonal correspond to modules. The gene dendrogram and module assignment are shown along the left and top. C. Hierarchical clustering of module eigengenes that summarize the modules found in the clustering analysis. Branches of the dendrogram (the meta-modules) group together eigengenes that are positively correlated. D. Heatmap plot of the adjacencies in the eigengene network including the trait weight. Each row and column in the heatmap corresponds to one module eigengene (labeled by color) or weight. In the heatmap, green color represents low adjacency (negative correlation), while red represents high adjacency (positive correlation). Squares of red color along the diagonal are the meta-modules. E. A scatterplot of gene significance for weight (GS, Equation 2) versus module membership (MM, Equation 6) in the brown module. GS and MM exhibit a very significant correlation, implying that hub genes of the brown module also tend to be highly correlated with weight. F. The network of the 30 most highly connected genes in the brown module. In this network we only display a connection of the corresponding topological overlap is above a threshold of 0.08.