| Literature DB >> 19812777 |
Heiko Muller1, Francesco Acquati.
Abstract
Meta-analysis of high-throughput gene expression data is often used for the interpretation of proprietary gene expression data sets. We have recently shown that co-occurrence patterns of gene expression in published cancer-related gene expression signatures are reminiscent of several cancer signaling pathways. Indeed, significant co-occurrence of up to ten genes in published gene expression signatures can be exploited to build a co-occurrence network from the sets of co-occurring genes ("co-occurrence modules"). Such co-occurrence network is represented by an undirected graph, where single genes are assigned to vertices and edges indicate that two genes are significantly co-occurring. Thus, graph-cut methods can be used to identify groups of highly interconnected vertices ("network communities") that correspond to sets of genes that are significantly co-regulated in human cancer. Here, we investigate the topological properties of co-occurrence networks derived from published gene expression signatures and show that co-occurrence networks are characterized by scale-free topology and hierarchical modularity. Furthermore, we report that genes with a "promiscuous" or a "faithful" co-occurrence pattern can be distinguished. This behavior is reminiscent of date and party hubs that have been identified in protein-protein interaction networks.Entities:
Keywords: PubLiME; co-occurrence network; date hub; hierarchical modularity; party hub; scale-free network
Year: 2008 PMID: 19812777 PMCID: PMC2735950 DOI: 10.4137/bbi.s518
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1The PubLiME co-occurrence network. A representation of the PubLiME co-occurrence network is shown. The Z-score cutoff during co-occurrence analysis was set to 5 and co-occurrence modules of size 3 were required to be present in at least 5 publications. Larger vertex degrees are visualized by larger vertex diameter. The gene with the largest vertex degree (CDKN3) is indicated by an arrow and vertex degrees of the five most connected genes are shown in white letters.
Z-score and Tchebyshev limit of P-values for observing these vertex degrees by chance.
| Gene | Z | P |
|---|---|---|
| CDKN3 | 7.351 | 0.019 |
| CDC2 | 5.334 | 0.035 |
| CCNB1 | 4.379 | 0.052 |
| LGALS1 | 4.273 | 0.055 |
| MYBL2 | 3.636 | 0.076 |
Figure 2PubLiME co-occurrence network topology.
A) The natural logarithm of the probability of observing a vertex with a given vertex degree category (<=5, 5 <= 10, 10 <= 15, 15 <= 20, 20 <= 25, 25 <= 30, 30 <= 35, 35 <= 40, 40 <= 45, 45 <= 50, <50) is plotted against the natural logarithm of vertex degrees (black diamonds). The slope of the line fitted to these data (the scaling parameter of the scale-free model (grey triangles)) by the least squares method is found to be −2.19. The exponential model (grey squares) has been obtained by fitting a line to the data in [k, ln(P(k)] linear-log space and is visualized here in [ln(k), lnN(P(k))] log-log space. exp-exponential model, sf-scale-free model.
B) Observed vertex degree distribution (black diamonds) in [k, P(k)] linear-linear space along with the predicted vertex degree distributions according to the scale-free (grey triangles) and the exponential models (grey squares) are shown.
C) The natural logarithm of the average clustering coefficient of vertices with the same degree is plotted against the natural logarithm of vertex degrees. Only vertices with degree above 20 were analyzed. The slope of the line fitted to these data using the least squares method (the scaling parameter) is found to be −1.06.
D) The average clustering coefficient is shown for PubLiME co-occurrence networks derived for support 8, 7, 6, and 5. The support parameter indicates the minimal number of lists a module must be part of. The different support values cause the resulting networks to be of different sizes (number of vertices shown on the X-axis). Barabasi-Albert networks of equal size and degree distribution have been generated using the JUNG package random graph generator function for comparison purposes. The average clustering coefficient falls rapidly in Barabasi-Albert networks as network size grows. In PubLiME networks, the average clustering coefficient is stable.
Genes occurring most frequently in PubLiME.
| Gene | Occurrences | Vertex degree | Clustering coefficient | DAVID category | Benjamini P-value |
|---|---|---|---|---|---|
| CCND1 | 30 | 0 | |||
| MYC | 28 | 39 | 0.095816464 | Cell cycle | 0.42 |
| TNFAIP3 | 26 | 14 | 0.142857143 | Apoptosis | 0.94 |
| VEGF | 25 | 9 | 0.333333333 | Signal tranduction | 1 |
| CDKN1A | 25 | 0 | |||
| FN1 | 25 | 0 | |||
| IL8 | 25 | 0 | |||
| CLU | 24 | 0 | |||
| FOS | 24 | 0 | |||
| IGFBP4 | 23 | 0 |
Genes with highest co-occurrence network vertex degree.
| Gene | Occurrences | Vertex degree | Clustering coefficient | DAVID category | Benjamini P-value |
|---|---|---|---|---|---|
| CDKN3 | 19 | 77 | 0.136021873 | Cell cycle | 1.60E−18 |
| CDC2 | 16 | 58 | 0.24984876 | Cell cycle | 7.10E−31 |
| CCNB1 | 17 | 49 | 0.237244898 | M-phase | 3.60E−17 |
| LGALS1 | 20 | 48 | 0.083333333 | Immune response | 9.60E−03 |
| MYBL2 | 12 | 42 | 0.331010453 | Cell cycle | 6.10E−14 |
| MYC | 28 | 39 | 0.095816464 | Cell cycle | 0.42 |
| TK1 | 13 | 39 | 0.431848853 | Cell cycle | 5.80E−17 |
| TOP2A | 22 | 38 | 0.385490754 | Cell cycle | 2.90E−15 |
| CDC20 | 14 | 35 | 0.482352941 | Cell cycle | 1.20E−16 |
| TTK | 10 | 35 | 0.41512605 | Cell cycle | 5.00E−21 |