| Literature DB >> 28587632 |
Atif Khan1, Dejan Katanic1, Juilee Thakar2,3,4.
Abstract
BACKGROUND: Despite advances in the gene-set enrichment analysis methods; inadequate definitions of gene-sets cause a major limitation in the discovery of novel biological processes from the transcriptomic datasets. Typically, gene-sets are obtained from publicly available pathway databases, which contain generalized definitions frequently derived by manual curation. Recently unsupervised clustering algorithms have been proposed to identify gene-sets from transcriptomics datasets deposited in public domain. These data-driven definitions of the gene-sets can be context-specific revealing novel biological mechanisms. However, the previously proposed algorithms for identification of data-driven gene-sets are based on hard clustering which do not allow overlap across clusters, a characteristic that is predominantly observed across biological pathways.Entities:
Keywords: Dendritic cells; Epithelial cells; Gene-gene mutual information; Gene-sets; Influenza infections; Overlapping gene-sets
Mesh:
Year: 2017 PMID: 28587632 PMCID: PMC5461682 DOI: 10.1186/s12859-017-1669-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic representation of FIGS pipeline. The context-specific datasets obtained from public repositories were integrated as described in [20]. FCM is performed on gene-gene mutual information matrix. Gene-sets obtained from optimized FCM clustering were compared with KEGG pathways for validation and multi-functional genes connecting different gene-sets were identified
Fig. 2Optimization of FCM parameters. a Average membership value (y-axis) per cluster with increasing fuzziness (x-axis), b Average number of genes per cluster (y-axis) for increasing fuzziness (x-axis) and four cluster association criteria, c 50 trials conducted with random initial assignment of the centroids found only 16% reproducible clusters, d Objective function values for FCM clustering with initial centroid assignment performed randomly and by Ward’s method (red line) under fuzziness 1.1, 1.2 and 1.3 respectively. Ward based initialization converged more rapidly and produced stable and robust clustering solution
Fig. 3Overlap observed among KEGG pathways and FCM gene-sets. The overlap among KEGG pathways represented by a heat-map b circular graph and the overlap among DC FCM gene-sets represented by c heat-map d circular graph. The color scale ranging from blue to yellow in the heat-map (a, c) and the increasing width of arc (b, d) correspond low to high number of overlapping genes across pairs of clusters
Fig. 4Validation of DC FCM gene-sets. a The enrichment of KEGG pathways and ISGs in DC FCM gene-sets, five colors ranging from blue to yellow represent –log10 (p-value) ≤1.30, >1.30 and ≤3, >3 and ≤4, >4 and ≤5, and >5 calculated by hypergeometric test, b Circular graph represents overlap between the DC FCM gene-sets, c number of genes in DC FCM gene-sets and d membership values of the genes DC36 and DC45, and overlapping genes (circled in red) between DC36 and DC45
Comparison of multifunctional genes from FCM gene-sets and KEGG pathways. Multifunctional genes that were involved in at least 3 FCM DC gene-sets were also overlapping between KEGG pathways
| Multifunctional genes | No. of pathways | No. of FCM DC clusters | Enriched pathway names | FCM cluster |
|---|---|---|---|---|
| NFATC4 | 5 | 5 | MAPK_SIGNALING, VEGF_SIGNALING, NATURAL_KILLER_CELL_MEDIATED_CYTOTOXICITY, T_CELL_RECEPTOR_SIGNALING, B_CELL_RECEPTOR_SIGNALING | 34,35,37,43,45 |
| CCL23 | 2 | 4 | CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION, CHEMOKINE_SIGNALING_PATHWAY | 17,18,39,40 |
| GAB2 | 2 | 4 | FC_EPSILON_RI_SIGNALING, FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS | 13,16,19,31 |
| IL21R | 2 | 4 | CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION, JAK_STAT_SIGNALING | 7,8,20,24 |
| VASP | 2 | 4 | FOCAL_ADHESION, FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS | 1,4,10,50 |
| ANAPC1 | 2 | 3 | CELL_CYCLE, UBIQUITIN_MEDIATED_PROTEOLYSIS | 7,8,29 |
| ASAP1 | 2 | 3 | ENDOCYTOSIS, FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS | 30,31,39 |
| CCND2 | 3 | 3 | CELL_CYCLE, FOCAL_ADHESION, JAK_STAT_SIGNALING | 7,14,29 |
| CD80 | 4 | 3 | CELL_ADHESION_MOLECULES_CAMS, TOLL_LIKE_RECEPTOR_SIGNALING_PATHWAY, INTESTINAL_IMMUNE_NETWORK_FOR_IGA_PRODUCTION, ISGs | 36,45,46 |
| CDC16 | 2 | 3 | CELL_CYCLE, UBIQUITIN_MEDIATED_PROTEOLYSIS | 8,14,29 |
| CDK4 | 2 | 3 | CELL_CYCLE, T_CELL_RECEPTOR_SIGNALING | 23,33,44 |
| DNM2 | 2 | 3 | ENDOCYTOSIS, FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS | 3,12,31 |
| EP300 | 3 | 3 | JAK_STAT_SIGNALING, CELL_CYCLE, TGF_BETA_SIGNALING, | 3,4,50 |
| HSPB1 | 2 | 3 | MAPK_SIGNALING, VEGF_SIGNALING | 5,25,50 |
| IL1R2 | 3 | 3 | MAPK_SIGNALING, CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION, HEMATOPOIETIC_CELL_LINEAGE | 6,21,22 |
| ITGAV | 2 | 3 | FOCAL_ADHESION, CELL_ADHESION_MOLECULES_CAMS | 2,25,50 |
| MAP3K1 | 3 | 3 | MAPK_SIGNALING_PATHWAY, UBIQUITIN_MEDIATED_PROTEOLYSIS, RIG_I_LIKE_RECEPTOR_SIGNALING | 1,27,50 |
| POLR1C | 2 | 3 | RNA_POLYMERASE, CYTOSOLIC_DNA_SENSING | 3,31,33 |
| PPP3CB | 6 | 3 | MAPK_SIGNALING, APOPTOSIS, VEGF_SIGNALING, NATURAL_KILLER_CELL_MEDIATED_CYTOTOXICITY, T_CELL_RECEPTOR_SIGNALING, B_CELL_RECEPTOR_SIGNALING | 16,28,29 |
| RPS6KB1 | 4 | 3 | ERBB_SIGNALING, MTOR_SIGNALING, TGF_BETA_SIGNALING, FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS | 16,23,28 |
| TNFRSF1A | 3 | 3 | MAPK_SIGNALING, CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION, APOPTOSIS | 5,25,50 |
| WAS | 2 | 3 | CHEMOKINE_SIGNALING, FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS | 23,28,44 |
| PIK3R1 | 14 | 3 | T CELL RECEPTOR SIGNALING, B CELL RECEPTOR SIGNALING, TOLL LIKE RECEPTOR SIGNALING and 11 others | 7,8,24 |
Fig. 5The temporal expression of the gene-sets enriched with KEGG pathways. Mean temporal expression of gene-sets significantly enriched (p < 0.01) with a ISGs (DC21, DC26, DC36 and DC45), b Toll-like receptor signaling pathway (DC18, DC37 and DC46) and c MAPK signaling pathway (DC6, DC10 and DC27)
Fig. 6Comparison of FCM with hard clustering methods. a Number of genes overlapping between FCM gene-sets and k-means with Ward’s initialization (bottom), and Ward’s hierarchical clustering (top) and b the enrichment of ISGs and KEGG pathways by Fisher's exact test in clusters identified by K-mean, hierarchichal and FCM methods
Fig. 7Application of FCM pipeline on EC dataset. a The enrichment of KEGG pathways and ISGs in EC FCM gene-sets, five colors ranging from blue to yellow represent –log10 (p-value) ≤1.30, >1.30 and ≤3, >3 and ≤4, >4 and ≤5, and >5 calculated by hypergeometric test, b Circular graph represents overlap between the EC FCM gene-sets, c number of genes in 50 EC FCM gene-sets, and d Venn diagram representing number of genes overlapping between at least two FCM gene-sets in DC, EC, and KEGG/ISGs pathways