| Literature DB >> 18021416 |
Tae-Min Kim1, Yeun-Jun Chung, Mun-Gan Rhyu, Myeong Ho Jung.
Abstract
BACKGROUND: Gene clustering has been widely used to group genes with similar expression pattern in microarray data analysis. Subsequent enrichment analysis using predefined gene sets can provide clues on which functional themes or regulatory sequence motifs are associated with individual gene clusters. In spite of the potential utility, gene clustering and enrichment analysis have been used in separate platforms, thus, the development of integrative algorithm linking both methods is highly challenging.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18021416 PMCID: PMC2217565 DOI: 10.1186/1471-2105-8-453
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic representation of GSECA algorithm. A. The individual steps of functional clustering are demonstrated. For each functional gene set prepared from public gene database (left), all pairs of gene members are calculated for Pearson correlation coefficient (PCC). The distribution of individual PCC is shown as histogram indicating how closely gene members are correlated with each other (middle). The mean of PCC values is calculated as expression coherence (EC) and the significance level is determined using gene permutation tests. Functional gene sets with significantly high expression coherences are then selected and grouped into respective functional clusters with similar expression patterns (right). B. Mean expression values of all genes belonging to the functional cluster are calculated as seed values of the corresponding cluster (left). The entire genes in the array are calculated for their similarity or Pearson correlation coefficient (PCC) with the seed values and ordered according to the similarity. The ordered gene list is then matched with regulatory motif gene sets and the extent of enrichment (enrichment score or ES) is determined by GSEA method (right).
Figure 2Functional clustering of murine myogenesis- and erythropoiesis-related functional gene sets. A. Thirty-one functional gene sets with significantly high expression coherences in myogenesis-related expression profile, are categorized into 4 functional clusters. For individual functional gene sets, gene numbers and expression coherence of the corresponding gene sets are also demonstrated in parentheses. Hierarchical clustering was used to measure the distances between functional gene sets and those with similar expression patterns were grouped into individual functional clusters. The expression level of a functional gene set is the mean expression value of the genes belonging to the gene set and schematically illustrated in heat map with gene set dendrogram. B. Three functional clusters composed of 18 functional gene sets are similarly demonstrated for erythropoiesis-related expression profile.
List of regulatory motif gene sets significantly enriched in individual functional clusters
| Dataset | Functional cluster | Transcription factora |
| Myogenesis | 1 | Sp-1 |
| 2 | Arnt, SREBP-1, Sp-1, MyoD, E2A, USF | |
| 3 | Sp-1, USF, LBP-1, Myc | |
| 4 | NRF-1, E2F, ATF/CREB, ETF, NF-Y, GABP, Elk-1, ZF5 | |
| Erythropoiesis | 1 | SREBP-1, USF, GATA-1 |
| 2 | AP1 | |
| 3 | NF-Y, NRF-1, ATF/CREB, E2F, Arnt, Tel-2, Egr-3, Myc, ETF, Sp-1, GABP, YY1, HIF-1, Elk-1, ZF5 | |
aSignificantly enriched (P < 0.05, Bonferonni corrected) regulatory motif gene sets are shown for the corresponding transcription factors. When more than one regulatory motif sets corresponding to a single transcription factor were identified, the most significant one was listed taking the redundancy of regulatory motif gene sets into consideration. The listing order of transcription factors is according to the significance level of enrichment in individual functional clusters.
List of putative synergistic motif pairs
| Dataset | Motif 1 (gene size/EC)a | Motif 2 (gene size/EC) | Gene sizeb | EC | Significancec |
| Myogenesis | Arnt (694/0.0020) | SREBP-1 (839/0.0024) | 382 | 0.0086 | 0.02 |
| Sp-1 (3178/0.0006) | MyoD (696/0.0082) | 318 | 0.0201 | < 0.01 | |
| Sp-1 (3178/0.0006) | E2A (906/0.0050) | 444 | 0.0043 | < 0.01 | |
| Erythropoiesis | SREBP-1 (1126/0.0171) | USF (372/0.0053) | 839 | 0.0348 | < 0.01 |
aTwo regulatory motif gene sets are demonstrated as motif 1 and motif 2 with the gene numbers and expression coherence (EC) of the corresponding gene set. For motifs pairs, genes occurred both in two regulatory motif gene sets are separately measured for gene numberb and expression coherence. The significance levelc determined by permutation tests is also demonstrated. Only the motif pairs with significant expression coherence (P < 0.05) are shown in the list.
Figure 3Statistical comparison of GSECA results with conventional strategy. A. Gene clustering and enrichment analysis was performed for 7 functional gene sets corresponding to the functional cluster 2 of myogenesis dataset. K-means and SOM clustering was performed with 16 different settings for the gene numbers to be clustered (5 – 50%) and cluster numbers (5 – 100). The significance levels (Y-axis) are illustrated with the color lines corresponding to 16 settings (shown in the bow below). For comparison, the unadjusted significance level or normal P value of GSECA algorithm are demonstrated as asterisk. B. The significance level for 2 functional gene sets of erythropoiesis are similarly calculated and compared with those of GSECA results. C and D. The comparison results of 6 and 3 regulatory motif gene sets with significance enrichment in the functional cluster 2 of myogenesis (C) and cluster 1 of erythropoiesis (D) are similarly demonstrated.