| Literature DB >> 26179909 |
Vinay Randhawa1,2, Vishal Acharya3,4.
Abstract
BACKGROUND: Oral squamous cell carcinoma (OSCC) is associated with substantial mortality and morbidity but, OSCC can be difficult to detect at its earliest stage due to its molecular complexity and clinical behavior. Therefore, identification of key gene signatures at an early stage will be highly helpful.Entities:
Mesh:
Year: 2015 PMID: 26179909 PMCID: PMC4502639 DOI: 10.1186/s12920-015-0114-0
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1The steps involved in systems level analysis of data on oral squamous cell carcinoma (OSCC). a Microarray data collection and preprocessing of experiments to identify differentially expressed genes (DEGs). b Construction of the OSCC network and identification of an OSCC stage-associated module and of cancer hub genes. c Development and testing of a key hub gene-based classifier model by 5-fold cross-validation
A list of Affymetrix datasets used in the study
| Dataset identifier | Initial number of samples (Tumor + Normal) | Samples (Tumor + Normal) left after initial preprocessing | Affymetrix platform | Reference |
|---|---|---|---|---|
| GSE31056 | 47 (23 + 24) | 47 (23 + 24) | HGU-133plus2 | [ |
| GSE9844 | 38 (26 + 12) | 38 (26 + 12) | HGU-133plus2 | [ |
| GSE30784 | 212 (167 + 45) | 210 (165 + 45) | HGU-133plus2 | [ |
| GSE3524 | 20 (16 + 4) | 20 (16 + 4) | HGU-133a | [ |
| GSE42743 | 103 (74 + 29) | 100 (73 + 27) | HGU-133plus2 | [ |
| GSE2280 | 27 (22 + 5) | 27 (22 + 5) | HGU-133a | [ |
| GSE6791 | 44 (30 + 14) | 44 (30 + 14) | HGU-133plus2 | [ |
Fig. 2A multidimensional scaling (MDS) plot of the merged gene expression data.a This panel shows that without removal of the batch effect, all samples are clustered by experiment and by platform (not by the biological variable of interest) inside the MDS space. b With intra-platform batch adjustment, the samples are intermingled on the basis of the biological variable. All samples are color coded by biological variables (normal: red, cancer: green), with different symbols corresponding to different studies
Fig. 3Module assignments for the expression data on oral squamous cell carcinoma (OSCC). a A gene dendrogram is constructed by average linkage hierarchical clustering. The color row underneath the cluster tree shows module assignment implemented by the dynamic tree cut method.b The Z-summary statistic (y-axis) of the original data modules against 100 random samples is plotted as a function of module size. Each circle represents a module labeled by a color and module name. The dashed redline denotes a significance threshold (Z = 10)
Fig. 4Analysis of expression data on oral squamous cell carcinoma (OSCC) in the WGCNA software. WGCNA: Weighted Gene Correlation Network Analysis. Suitability of the pink module is clearly visible.a A heatmap of module eigengenes (MEs) and correlations, where each row represents a module (labeled by color), and each column represents a trait. The value at the top of each square represents Pearson’s correlation coefficient between the MEs and trait, along with the associated p-value in parentheses. The red and blue colors represent a strong positive and negative correlation, respectively, between a ME and a trait. b Module significance (MS) of all modules, with pink at the top of the plot, indicating that expression profiles of the pink module are strongly associated with the trait. c Analysis of topological robustness of the pink module via plotting of a simultaneous node deletion against changes in the size of the largest component, σ(ρ), when the fraction ρ of the vertices (nodes) was removed. The results indicate network robustness. d The plot of gene significance (GS GS) against scaled connectivity (K ) where each point (“darkgolden” and “darkcyan”) corresponds to a gene in the pink module. Intramodular connectivity significantly correlated with gene significance (r = 0.36, p = 8.3 × 10−5). All large labeled nodes (GS >0.2 and K > 0.3) are the identified hubs. Among these, darkgolden nodes represent hubs with the strongest correlation with the phenotype (GS >0.2, K > 0.3, and f >675); these hubs represent “key hub genes”
Fig. 5Visualization of hub genes in the pink module network. All gene-to-gene correlations were selected in the pink module, and the network was visualized by means of the Cytoscape software. Edge (grey) width is proportional to the weight of the correlation between two genes. All large labeled nodes are the identified hubs (gene significance [GS GS] >0.2 and scaled connectivity [K ] > 0.3), whereas darkgolden nodes represent hubs that show the strongest correlations with the phenotype (these are “key hub genes”)
Fig. 6Significantly enriched pathways among the hub genes. A two-way evidence plot of signaling-pathway impact analysis (SPIA) for each pathway is represented by one dot. Pathways on the right of the red oblique line (red dots) are statistically significant at the 1 % threshold after Bonferroni correction of global p-values (PG) obtained by combining (by Fisher’s method) over-representation of differentially expressed genes (DEGs) in a given pathway (PPERT) and an abnormal perturbation of the pathway (PNDE). The pathways on the right of the blue oblique line (blue dots) are statistically significant after false discovery rate (FDR) correction of PG
Categories of functionally enriched gene ontology (GO) biological processes (BPs) in the pink module. The latter is the cancer-associated module, and hub genes from this module are shown in the table
| Representative GO term | BP ID | Frequency ( | Hypergeometric p | Genes in GO category |
|---|---|---|---|---|
|
| GO:0006797 | 0.001 | 5.789 × 10−03 |
|
|
| GO:0071305 | 0.040 | 2.153 × 10−05 |
|
|
| GO:0000085 | 0.013 | 1.156 × 10−03 |
|
|
| GO:0045091 | 0.009 | 4.873 × 10−04 |
|
|
| GO:0010839 | 0.015 | 4.873 × 10−04 |
|
|
| GO:0022612 | 0.399 | 2.716 × 10−05 |
|
Key hub gene signatures based on an ensemble of centrality and trait relevance criteria (Gene significance [GS GS] >0.2, scaled connectivity [K ] 0.3, frequency [f] > 675)
| Entrez ID | Approved Gene Symbol | Approved Gene Name | Scaled connectivity ( | Gene significance ( | Frequency ( |
|---|---|---|---|---|---|
| 11335 |
| chromobox homolog 3 | 0.82 | 0.43 | 982 |
| 5723 |
| phosphoserine phosphatase | 0.37 | 0.32 | 980 |
| 27032 |
| ATPase, Ca++ transporting, type 2C, member 1 | 0.91 | 0.40 | 948 |
| 29887 |
| sorting nexin 10 | 0.72 | 0.38 | 936 |
| 23531 |
| monocyte to macrophage differentiation-associated | 0.57 | 0.37 | 936 |
| 79572 |
| ATPase type 13A3 | 0.74 | 0.38 | 929 |
| 2744 |
| glutaminase | 0.70 | 0.37 | 923 |
| 1956 |
| epidermal growth factor receptor | 0.63 | 0.33 | 869 |
| 2768 |
| guanine nucleotide binding protein (G protein) alpha 12 | 0.80 | 0.37 | 840 |
| 10257 |
| ATP-binding cassette, sub-family C (CFTR/MRP), member 4 | 0.46 | 0.32 | 770 |
| 3149 |
| high mobility group box 3 | 0.74 | 0.36 | 765 |
| 8091 |
| high mobility group AT-hook 2 | 0.81 | 0.34 | 744 |
| 3198 |
| homeobox A1 | 0.75 | 0.34 | 676 |
Fig. 7The plot of a receiver-operating characteristic (ROC) curve. The average area under the curve (AUC) of ~0.81 denotes the accuracy of the signature of key hub genes in the test dataset. The ROC curve depicts a true positive rate (sensitivity) versus a false positive rate (one minus specificity). The diagonal line in the ROC plot has an AUC value of 0.5, representing the predictive power of a random guess. The graph was rendered in the ROCR software