Literature DB >> 31496804

The relevance between the immune response-related gene module and clinical traits in head and neck squamous cell carcinoma.

Abstract

PURPOSE: Head and neck squamous cell carcinoma (HNSCC) is the sixth most prevalent cancer in the world, accounting for more than 90% of head and neck malignant tumors. However, its molecular mechanism is largely unknown. To help elucidate the potential mechanism of HNSCC tumorigenesis, we investigated the gene interaction patterns associated with tumorigenesis.
METHODS: Weighted gene co-expression network analysis (WGCNA) can help us to predict the intrinsic relationship or correlation between gene expression. Additionally, we further explored the combination of clinical information and module construction.
RESULTS: Sixteen modules were constructed, among which the key module most closely associated with clinical information was identified. By analyzing the genes in this module, we found that the latter may be related to the immune response, inflammatory response and formation of the tumor microenvironment. Sixteen hub genes were identified-ARHGAP9, SASH3, CORO1A, ITGAL, PPP1R16B, TBC1D10C, IL10RA, ITK, AKNA, PRKCB, TRAF3IP3, GIMAP4, CCR7, P2RY8, GIMAP7, and SP140. We further validated these genes at the transcriptional and translation levels.
CONCLUSION: The innovative use of a weighted network to analyze HNSCC samples provides new insights into the molecular mechanism and prognosis of HNSCC. Additionally, the hub genes we identified can be used as biomarkers and therapeutic targets of HNSCC, laying the foundation for the accurate diagnosis and treatment of HNSCC in clinical and research in the future.

Entities: Chemical

Keywords: co-expression; gene module; head and neck squamous cell carcinoma; hub gene; immune response-related genes; wgcna

Year: 2019 PMID： 31496804 PMCID： PMC6689548 DOI： 10.2147/CMAR.S201177

Source DB: PubMed Journal: Cancer Manag Res ISSN： 1179-1322 Impact factor: 3.989

Introduction

Head and neck squamous cell carcinoma (HNSCC) is a common malignant cancer type that originates from squamous cells of the upper respiratory tract1 (eg, the oral cavity, pharynx, and larynx) and upper gastrointestinal tract. HNSCC is the sixth most prevalent cancer in the world,2 accounting for more than 90% of head and neck malignancies. As a heterogeneous disease, HNSCC has different prognoses due to its diverse anatomical locations, disease progression and etiology.3 Smoking, drinking and infection with human papillomavirus (HPV) are all risk factors for the occurrence and development of HNSCC.4,5 Despite surgery, radiotherapy and chemotherapy, approximately half of the patients die from this disease,6 explaining why exploring the molecular mechanism of its tumorigenesis and finding new targets for drug therapy are needed. In recent years, research on the molecular mechanism of HNSCC is in progress, and the discovery of important pathway alterations, such as Notch,7 Ras,8 and VEGF signaling,9 has led to the emergence of some related targeted drugs that are expected to improve the survival and prognosis of HNSCC patients.10 Therefore, it is important to study the molecular mechanism of HNSCC for early diagnosis and optimal treatment. Exploring the molecular mechanism underlying HNSCC invasion and metastasis is of great significance in addressing the issue of identifying new drug targets for future therapy. From chip technology to high-throughput sequencing technology, biological data have increased exponentially due to the development of biological detection methods.11 An integrated organism is a complex network comprising many genes, RNA and proteins. In the occurrence of various diseases, especially cancer, there are many interactions among these components. Traditional methods are not suitable to this complex interaction network. Therefore, the biological network analysis method based on high-throughput data arises at the historic moment.12 The signaling pathway network, protein interaction network, metabolic network, gene regulation network and gene co-expression network are common biological networks.13 The most representative gene co-expression network is weighted gene co-expression network analysis (WGCNA).14 The aim of this method is to identify synergistic expression gene modules and explore the relationship between gene networks and phenotypes, as well as between hub genes in the network.15 WGCNA emphasizes the change of modules rather than individual genes, greatly alleviating the problem of multiple comparisons.16 Additionally, it links phenotypes with gene expression, which is propitious for further exploration. WGCNA has provided significant results in the gene analysis of many species, such as humans and mice,17 and has become a widely used network analysis tool. In order to identify mechanistically informative features of HNSCC, in this study, weighted gene co-expression network analysis was used to analyze the gene expression profiles of 270 HNSCC samples from the GSE65858 dataset. Several modules have been obtained that were combined with clinical information of patients to determine the biological significance by further analysis. We identified a significant correlation between the genes in the turquoise module and some clinical traits, leading us to further excavate the genes in this module. Finally, we validate these hub genes in The Cancer Genome Atlas (TCGA) dataset to confirm that these genes do have biological significance in the tumorigenesis of HNSCC. Our study lays the foundation for searching potential biomarkers to treat HNSCC and further experimental verification.

Materials and methods

Data information

The HNSCC tissue samples used in this study were all from the GSE65858 dataset of NCBI Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/). Two hundred seventy samples were obtained from HNSCC patients. The gene expression data were obtained using the GPL10558 Illumina HumanHT-12 V4.0 expression beadchip sequencing platform. The corresponding clinical information (including age, sex, smoking, alcohol consumption, tumor location, tumor grade, tumor stage, metastasis, and HPV16 infection) was also obtained from the GSE65858 dataset. Because some data were missing, the use of WGCNA for subsequent analysis required the integrity of the gene expression data. Thus, the missing data needed to be processed to ensure subsequent analysis. The advantage of deleting missing data rows or columns as a processing method is that no errors are introduced. First, we manually removed 20 samples of patients with clinical information deficiency and retained 250 samples with complete information for further follow-up analysis. Eighty-four (33.6%) of the tumors occurred in the oropharynx, 82 (32.8%) occurred in the cavum oris, 48 (19.2%) occurred in the larynx, 32 (12.8%) occurred in the hypopharynx and 4 contained missing information. In 196 samples, the expression of HPV 16 DNA was negative, 19 samples were HPV 16 dna+rna-, and 35 samples were HPV 16 dna+rna+. Number assignment of samples’ clinical traits can be seen in Table S1. Network module analysis is vulnerable to the influence of outlier samples; thus, removing outlier samples before constructing the network is particularly important to obtain meaningful subsequent analysis results. The goodGeneSample function in WGCNA was used to remove genes and samples with excessive missing values, and then the outlier samples were removed by the method of sample hierarchical clustering-pruning graph. The specific methods were as follows: 1) building a hierarchical clustering tree for all samples and drawing and 2) removing outliers whose leaf node height was significantly higher than other samples.

Co-expression network construction

We used the Affy package (using version 3.5.1 R software) to preprocess and standardize matrix files (RMA standardization). RMA is used to correct background, and impute is used to supplement missing values. According to the order of standard deviation (SD) from large to small, we selected the first five thousand genes to construct WGCNA,18 which not only reduces the size of the whole network and computational load but also maintains a scale-free topological network. WGCNA realizes network construction in the form of a “soft threshold”. The essence of the soft threshold method is to transform the correlation coefficients of similar matrices to adjacent functions in the form of function transformation and then calculate the topological overlap matrix. Next, we used pickSoftThreshold to select the appropriate soft threshold. The scale-free topology fit index, as well as the mean connectivity, serves as a function of the soft-thresholding power. By calculating the scale-free topology fit index and mean connectivity for several powers, we can choose an appropriate soft-thresholding power to build network. WGCNA uses dissimilarity to cluster. The specific algorithm used is the topological overlap dissimilarity measure (TOM)19 to calculate the degree of association between genes. Using hierarchical clustering function, the genes with a similar expression spectrum are classified into TOM-based modules. We used these 5,000 genes to visualize gene networks. Gray refers to the background gene without any module. Different modules are built from dendrograms.

Module preservation across the network

Two hundred forty-five samples clustered after removing outliers were divided into the training and validation sets using the createDataPartition function in R. The advantage of createDataPartition is that it can randomly extract the training set we need from the low-entropy dataset. We calculate the preservation of different modules by calculating the Z value, Z=observed−meanpermuted/sdpermuted. This is a permutation test. It repeatedly permutes the gene labels in the test network to estimate the mean and standard deviation under the hypothesis of null preservation. If Z is greater than 10, the module indicates strong preservation; if Z is less than 10 and greater than 2, the module is weakly preserved; if less than 2, the module is not preserved. Statistical significance is evaluated by bootstrapping (N=200 permutations).20

Identification of clinically significant modules

In the analysis, we combined modules with a various nontime dependent variables. First, we calculate the eigengene (module Eigengenes, ME) of the module, which is defined as the first principal component of all gene expression level vectors in the module. Next, for an optional gene, we defined its gene significance (GS) as the correlation coefficient between its expression level and dependent variable level. Continuous dependent variables are calculated by the Pearson correlation coefficient. For a module, the modular significances (MSs) relative to a dependent variable are defined as the correlation coefficients between its characteristic genes and level of dependent variables. The module membership (MM) of any gene in a module is defined as the correlation coefficient between the gene and characteristic gene of the module. It measures the degree of subordination of a gene in the module that can be used to screen the important genes in the module. In the module analysis, if a module and a dependent variable have a high-significance MS, the gene in the module may be an important factor affecting the dependent variable Y. Under this condition, it is necessary to carry out in-module analysis to screen the important genes in the module. The specific methods are as follows: For gene i in the module, a scatter plot of the module membership degree (MM) relative to the gene significance (GS) is made, and the genes with high MM and GS are selected as the objects for further analysis.

Function enrichment analysis

In the postgenomic era, knowledge about gene functional pathways is increasing daily. Kyoto University and Tokyo University jointly developed the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, which integrates a large amount of knowledge about gene, protein, biological pathways and other interactive networks to facilitate the use of gene information.21 The Gene Ontology (GO) database is a platform for standardized description or interpretation of the terminology of gene and gene product characteristics. It is a platform for bioinformatics researchers to unify the induction, processing，interpretation and sharing of gene and gene product data.22 In this study, “clusterProfiler”23 and “org.Hs.eg.db” packages in R software were used to analyze GO and KEGG, annotate the genes of interest in the modules, and try to identify the enrichment pathway of the genes in the modules. We set p-valueCutoff =0.01 and qvalueCutoff =0.05.

Identification of hub genes in the key module

We believe that genes that are highly interconnected with other nodes in the module are functionally important. In this study, key modules were selected and visualized by cytoscape, and hub genes in key module networks were screened according to degree. First, we selected the weight value of more than 0.3 by weight ranking, and then identified the top 30 genes by degree. We believe that these genes are highly interconnected with other genes and have biological significance.

Validation and exploration of hub genes

TCGA is a joint project initiated by NCI and NHGRI in the United States that provides a complete atlas related to all changes in the cancer genome. CBioPortal (http://www.cbioportal.org/) acts as a web page for cancer genome research that can be used to browse, visualize and analyze multidimensional cancer genome data. It provides graphical information aggregation, network analysis and survival analysis at the gene level on multiple platforms. GEPIA(http://GEPIA.cancer-pku.cn/) is a web server to analyze the RNA sequencing expression data of 9,736 tumors and 8,587 normal samples from the TCGA and The Genotype-Tissue Expression (GTEx) projects using a standard processing pipeline. GEPIA was used to analyze the survival rate of 30 candidate genes. According to similar literatures,24,25 the genes with logrank p-value less than 0.05 were identified as real hub genes. Those genes can affect prognosis. Next, we used GEPIA to check the differences in the expression of hub genes between normal and tumor tissues. In this study, we used CBioPortal to verify the related genetic changes in the hub genes identified in HNSCC samples in the TCGA database and explore the correlation between the hub genes and other genes and drugs. We hope to uncover relevant drug targets and provide direction for further research.

Results

Expression value analysis of microarray data

We downloaded the gene expression matrix files of 270 HNSCC samples from GSE65858 (https://www.ncbi.nlm.nih.gov/) and manually deleted 20 samples for which clinical information is missing. Finally, 250 samples were retained to carry out in subsequent analysis. The background was corrected, and the data were normalized by R software. The visualization results of normalization are shown in Figure S1. Gene annotations were carried out using R package, and probes corresponding to multiple genes were removed. The average value of genes corresponding to multiple probes was taken as the expression level of the gene. After obtaining the genes, we selected the first 5,000 genes in SD descending order to construct a co-expression network. We used the hclust function of the WGCNA packages to cluster 250 samples. As shown in Figure 1A, we can see the sample clustering tree. The samples are divided into two clusters. We chose 60 as the cutoff value; after 5 outliers were removed, 245 samples were retained for subsequent WGCNA analysis. Next, we reclustered the 245 samples and corresponded them with clinical traits to obtain the sample tree and characteristic heat map (Figure 1B).

Figure 1

Construction of the co-expression network using the WGCNA package in R.

Notes: (A) Sample clustering to detect outliers. Five outliers were removed after clustering—GSM1607776, GSM1607734, GSM1607907, GSM1607816 and GSM1607849. cutHeight =60. (B) Recluster samples and clinical traits dendogram. The color intensity was proportional to the gender, age, packyears, alcohol consumption, tumor type, tumor site, u1cc stage, t category, n category, distant metastasis and HPV16 infection.

Abbreviations: WGCNA, weighted gene co-expression network analysis; HPV, human papillomavirus.

Construction of the co-expression network using the WGCNA package in R. Notes: (A) Sample clustering to detect outliers. Five outliers were removed after clustering—GSM1607776, GSM1607734, GSM1607907, GSM1607816 and GSM1607849. cutHeight =60. (B) Recluster samples and clinical traits dendogram. The color intensity was proportional to the gender, age, packyears, alcohol consumption, tumor type, tumor site, u1cc stage, t category, n category, distant metastasis and HPV16 infection. Abbreviations: WGCNA, weighted gene co-expression network analysis; HPV, human papillomavirus.

Construction of the weighted co-expression network and preservation

To determine the optimal value of the soft threshold, it is necessary to search the power value that can make the adjacent function suit the scale-free condition in a certain range. As shown in Figure 2A and B, we chose 4 as the power value when the Scale-free topology fit index nearly equals to 0.9 and the mean connectivity is also close to 0. We constructed the co-expression matrix using the expression matrix and the estimated optimal power value. The connectivity between genes was calculated, and similarity between genes was calculated accordingly. Next, the coefficient of dissimilarity among genes was deduced, and the hierarchical clustering gene dendrogram was obtained accordingly. According to the standard of dynamicTreecut, the number of genes was set to 30 as the minimum number of each module. Here, we used different colors to represent all the modules, among which gray, by default, represented the genes that could not be classified into any module. Sixteen modules were finally obtained, and the visualization results are shown in Figure 3A. There were 229 genes in the black, 645 in the blue, 456 in the brown, 54 in the cyan, 393 in the green, 138 in the magenta, 50 in the midnightblue, 165 in the pink, 124 in the purple, 319 in the red, 66 in the salmon, 75 in the tan, 1092 in the turquoise, 413 in the yellow, and 601 in the gray modules. Detailed information of these genes can be found in Table S2. We verified the module preservation by randomly selecting test sets and training sets in the dataset and found that all modules had high preservation, especially the turquoise module (Figure 3B).

Figure 2

Determination of the soft-thresholding power.

Notes: (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). (B) Analysis of the mean connectivity for various soft-thresholding powers. The best fit power value was 4.

Figure 3

(A) Cluster dendrogram of genes. Each branch in the figure represents one gene, and every color below represents one co-expression module. (B) Zsummary statistics of module preservation. The figure shows the preservation Zsummary statistic (y-axis) as a function of the module size. The blue (low) and green (high) lines are thresholds indicating the 2< Z <10 region corresponding to moderate/high preservation.

Determination of the soft-thresholding power. Notes: (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). (B) Analysis of the mean connectivity for various soft-thresholding powers. The best fit power value was 4. (A) Cluster dendrogram of genes. Each branch in the figure represents one gene, and every color below represents one co-expression module. (B) Zsummary statistics of module preservation. The figure shows the preservation Zsummary statistic (y-axis) as a function of the module size. The blue (low) and green (high) lines are thresholds indicating the 2< Z <10 region corresponding to moderate/high preservation.

Relating clinical traits to modules and correlation between modules

The Pearson correlation coefficients of the module Eigengenes and corresponding variables represent the correlation between the modules and clinical traits. The results are shown in Figure 4. Among the clinical information included in the study, gender, age, smoking, alcohol consumption, tumor type, site, grade, stage, metastasis and infection of HPV16 were nontime-related variables. We found that the absolute correlation coefficients of each module with gender, tumor type and metastasis were small, and the correlation was almost insignificant. Age, smoking and alcohol consumption were weakly correlated with some modules. The site, grade, stage of tumors and infection of HPV16 are strongly correlated with some modules. It was obvious that the brown module, green module and turquoise module show a significant correlation with clinical traits. As illustrated in Figure 4, to further analyze genes in these modules, we found a strong correlation between the Gene Significance for the tumor site and Module Membership in the turquoise module (cor =0.58, p=3.6e−99), and the turquoise module was negatively correlated with the tumor site (cor =−0.23, p=(3e−4)) (Figure 5A). There is a negative correlation between genes in the brown module and N category, as shown in Figure 5B. Additionally, GS for the infection status of HPV16 was negatively correlated with MM in the green module, as shown in Figure 5C. Additionally, we constructed a hierarchical clustering tree based on WGCNA module eigengenes and the heat map of adjacency between modules. The colors are from low adjacency (green) to high adjacency (red) to show the relationship between the eigengenes of the WGCNA module (Figure 5D).

Figure 4

Heatmap of the correlation between the module eigengenes and clinical traits. Turquoise, green and brown modules are three most closely related modules to clinical traits.

Figure 5

(A) Scatter plot of gene significance for the tumor site and module membership in the turquoise module. (B) Scatter plot of gene significance for the n category and module membership in the brown module. (C) Scatter plot of gene significance for HPV16 infection and module membership in the green module. (D) Eigengene dendrogram and eigengene adjacency heatmap.

Heatmap of the correlation between the module eigengenes and clinical traits. Turquoise, green and brown modules are three most closely related modules to clinical traits. (A) Scatter plot of gene significance for the tumor site and module membership in the turquoise module. (B) Scatter plot of gene significance for the n category and module membership in the brown module. (C) Scatter plot of gene significance for HPV16 infection and module membership in the green module. (D) Eigengene dendrogram and eigengene adjacency heatmap.

Identification of key modules

We constructed a network heatmap plot with these 5,000 genes, as shown in Figure S2. The different colors of the horizontal axis and vertical axis represented different modules, and the color differences of each block could be clearly seen. Among them, the turquoise module had the darkest yellow color. At the same time, we found that the turquoise module has a significant correlation with many clinical features in Figure 4. Thus, this module was identified as a key module. Next, we will further analyze the genes in this module.

Function enrichment analysis in the turquoise module

We performed enrichment analysis to explore the GO database and KEGG pathway in which the turquoise module genes are involved. The main results are shown in Tables 1 and 2. The enrichment results showed that the biological process of turquoise module genes mainly involved inflammation-related and immune responses, such as T-cell activation, leukocyte cell-cell adhesion, positive regulation of T cells, and lymphocyte and leukocyte activation. The cell components associated with it included the Major Histocompatibility Complex (MHC) class II protein complex, MHC protein complex, membrane raft, and T-cell receptor complex, which are spatially related to the surface and extracellular space of immune cells. Simultaneously, the molecular function of the module was related to MHC class II receptor activity and nonmembrane spanning protein tyrosine kinase activity (see Tables S3 and S4 for more information). More specifically, pathway enrichment analysis of the module indicated that inflammation and the tumor microenvironment evolution mechanisms are related to these genes. Important pathways such as Th1 and Th2 cell differentiation, Th17 cell differentiation, hematopoietic cell lineage, and intestinal immune network for IgA production were enriched. Other pathways such as Staphylococcus aureus infection，viral myocarditis，Epstein-Barr virus infection，and human T-cell leukemia virus 1 infection also showed enrichment. The visualization of the enrichment results is shown in Figure 6. The characteristic information may further promote our research on the pathogenesis and tumorigenesis of HNSCC.

Table 1

GO analysis of genes in turquoise module

ONTOLOGY	ID	Description	p-value	p. adjust	q-value	Count
BP	GO:0042110	T cell activation	5.75E-26	2.92E-22	2.52E-22	89
BP	GO:0007159	Leukocyte cell-cell adhesion	2.06E-21	5.23E-18	4.52E-18	68
BP	GO:0050867	Positive regulation of cell activation	4.76E-20	8.07E-17	6.97E-17	69
BP	GO:0050870	Positive regulation of T cell activation	1.33E-19	1.44E-16	1.25E-16	51
BP	GO:0051249	Regulation of lymphocyte activation	1.42E-19	1.44E-16	1.25E-16	78
BP	GO:1903039	Positive regulation of leukocyte cell-cell adhesion	2.53E-19	2.15E-16	1.85E-16	52
BP	GO:0050863	Regulation of T cell activation	3.91E-19	2.84E-16	2.46E-16	63
BP	GO:0002696	Positive regulation of leukocyte activation	5.76E-19	3.66E-16	3.16E-16	66
BP	GO:0070661	Leukocyte proliferation	2.41E-18	1.36E-15	1.17E-15	57
BP	GO:0022409	Positive regulation of cell-cell adhesion	6.12E-18	3.11E-15	2.69E-15	54
BP	GO:1903037	Regulation of leukocyte cell-cell adhesion	6.81E-18	3.15E-15	2.72E-15	59
BP	GO:0022407	Regulation of cell-cell adhesion	8.12E-18	3.44E-15	2.97E-15	68
BP	GO:0046651	Lymphocyte proliferation	4.99E-17	1.95E-14	1.69E-14	53
BP	GO:0032943	MONONUCLEAR cell proliferation	7.07E-17	2.57E-14	2.22E-14	53
BP	GO:0045785	Positive regulation of cell adhesion	2.04E-16	6.91E-14	5.97E-14	66
BP	GO:0051251	Positive regulation of lymphocyte activation	2.26E-16	7.19E-14	6.21E-14	58
BP	GO:0002521	Leukocyte differentiation	5.83E-16	1.74E-13	1.50E-13	74
BP	GO:0030098	Lymphocyte differentiation	2.00E-15	5.66E-13	4.89E-13	58
BP	GO:0002768	Immune response-regulating cell surface receptor signaling pathway	5.64E-15	1.51E-12	1.30E-12	72
BP	GO:0050851	Antigen receptor-mediated signaling pathway	7.82E-15	1.99E-12	1.72E-12	52
BP	GO:0042113	B cell activation	3.43E-14	8.29E-12	7.16E-12	50
BP	GO:0002429	Immune response-activating cell surface receptor signaling pathway	1.93E-13	4.46E-11	3.85E-11	66
BP	GO:0050670	Regulation of lymphocyte proliferation	8.15E-13	1.80E-10	1.56E-10	40
BP	GO:0032944	Regulation of mononuclear cell proliferation	9.64E-13	1.96E-10	1.70E-10	40
BP	GO:0001819	Positive regulation of cytokine production	9.66E-13	1.96E-10	1.70E-10	61
CC	GO:0042613	MHC class II protein complex	1.51E-14	8.70E-12	7.81E-12	13
CC	GO:0042611	MHC protein complex	8.87E-11	2.56E-08	2.30E-08	13
CC	GO:0045121	Membrane raft	5.23E-08	8.29E-06	7.44E-06	41
CC	GO:0098857	Membrane microdomain	5.74E-08	8.29E-06	7.44E-06	41
CC	GO:0098589	Membrane region	1.55E-07	1.79E-05	1.60E-05	41
CC	GO:0042101	T cell receptor complex	2.15E-07	2.07E-05	1.86E-05	9
CC	GO:0031234	Extrinsic component of cytoplasmic side of plasma membrane	3.81E-07	3.14E-05	2.82E-05	21
CC	GO:0009898	Cytoplasmic side of plasma membrane	9.07E-07	6.54E-05	5.87E-05	27
CC	GO:0071556	Integral component of lumenal side of endoplasmic reticulum membrane	1.59E-06	9.20E-05	8.26E-05	10
CC	GO:0098553	Lumenal side of endoplasmic reticulum membrane	1.59E-06	9.20E-05	8.26E-05	10
CC	GO:0030669	Clathrin-coated endocytic vesicle membrane	1.89E-06	9.90E-05	8.89E-05	12
CC	GO:0030665	Clathrin-coated vesicle membrane	2.70E-06	0.00013	0.000117	19
CC	GO:0098562	Cytoplasmic side of membrane	9.80E-06	0.000435	0.00039	27
CC	GO:0009897	External side of plasma membrane	1.49E-05	0.000616	0.000552	34
CC	GO:0019897	Extrinsic component of plasma membrane	1.70E-05	0.000655	0.000588	24
CC	GO:0015629	Actin cytoskeleton	2.15E-05	0.000743	0.000667	49
CC	GO:0045334	Clathrin-coated endocytic vesicle	2.19E-05	0.000743	0.000667	13
CC	GO:0031252	Cell leading edge	2.75E-05	0.000881	0.000791	41
CC	GO:0030136	Clathrin-coated vesicle	3.66E-05	0.00111	0.000996	24
CC	GO:0001772	Immunological synapse	4.53E-05	0.001306	0.001172	9
CC	GO:0030139	Endocytic vesicle	0.000105	0.002875	0.00258	32
CC	GO:0042629	Mast cell granule	0.00011	0.00289	0.002594	7
CC	GO:0030662	Coated vesicle membrane	0.000141	0.003459	0.003104	22
CC	GO:0043235	Receptor complex	0.000144	0.003459	0.003104	38
CC	GO:0030666	Endocytic vesicle membrane	0.000163	0.003759	0.003374	21
CC	GO:0005925	Focal adhesion	0.000193	0.004292	0.003852	39
MF	GO:0032395	MHC class II receptor activity	3.00E-09	2.80E-06	2.64E-06	8
MF	GO:0004715	Non-membrane spanning protein tyrosine kinase activity	2.28E-06	0.001066	0.001002	13

Abbreviations: BP, biological process; CC, cellular component; GO, Gene Ontology; MF, molecular function.

Table 2

KEGG pathway analysis of turquoise module

ID	Description	GeneRatio	p-value	p. adjust	qvalue	Count
hsa04658	Th1 and Th2 cell differentiation	29/497	4.61E-13	1.33E-10	1.06E-10	29
hsa04659	Th17 cell differentiation	31/497	9.01E-13	1.33E-10	1.06E-10	31
hsa04640	Hematopoietic cell lineage	27/497	7.70E-11	6.66E-09	5.30E-09	27
hsa04672	Intestinal immune network for IgA production	19/497	8.99E-11	6.66E-09	5.30E-09	19
hsa05150	Staphylococcus aureus infection	19/497	1.31E-09	7.73E-08	6.16E-08	19
hsa05166	Human T-cell leukemia virus 1 infection	39/497	1.00E-08	4.94E-07	3.93E-07	39
hsa04662	B cell receptor signaling pathway	20/497	1.84E-08	7.79E-07	6.20E-07	20
hsa05416	Viral myocarditis	18/497	2.42E-08	8.94E-07	7.12E-07	18
hsa05310	Asthma	13/497	2.92E-08	9.60E-07	7.65E-07	13
hsa05169	Epstein-Barr virus infection	36/497	3.25E-08	9.61E-07	7.66E-07	36
hsa04064	NF-kappa B signaling pathway	23/497	3.70E-08	9.95E-07	7.92E-07	23
hsa04514	Cell adhesion molecules (CAMs)	28/497	1.96E-07	4.83E-06	3.85E-06	28
hsa05140	Leishmaniasis	19/497	2.13E-07	4.85E-06	3.86E-06	19
hsa05330	Allograft rejection	13/497	4.97E-07	1.05E-05	8.37E-06	13
hsa05332	Graft-versus-host disease	13/497	1.34E-06	2.65E-05	2.11E-05	13
hsa04612	Antigen processing and presentation	18/497	1.98E-06	3.66E-05	2.92E-05	18
hsa04062	Chemokine signaling pathway	31/497	2.18E-06	3.80E-05	3.02E-05	31
hsa04940	Type I diabetes mellitus	13/497	2.47E-06	4.06E-05	3.23E-05	13
hsa05340	Primary immunodeficiency	12/497	2.63E-06	4.10E-05	3.27E-05	12
hsa05321	Inflammatory bowel disease (IBD)	16/497	3.57E-06	5.28E-05	4.21E-05	16
hsa05145	Toxoplasmosis	22/497	3.96E-06	5.58E-05	4.45E-05	22
hsa05320	Autoimmune thyroid disease	14/497	5.98E-06	7.70E-05	6.14E-05	14
hsa05152	Tuberculosis	29/497	5.99E-06	7.70E-05	6.14E-05	29
hsa04660	T cell receptor signaling pathway	20/497	8.33E-06	0.000103	8.18E-05	20
hsa05130	Pathogenic Escherichia coli infection	14/497	9.57E-06	0.000113	9.03E-05	14

Abbreviation: KEGG, Kyoto Encyclopedia of Genes and Genomes.

Figure 6

GO and KEGG pathway enrichment analyses of turquoise module genes. (A) Dotplot of GO analysis divided into three functional groups: biological processes, cell composition and molecular function. (B) UpSet plot of KEGG pathway enrichment, visualizing the complex association between genes and different pathways. It emphasizes the genes overlapping among different pathways. (C) plotGOgraph of Biological Processes depicting tree-like relationships between different processes. (D) The terms of Biological Processes are organized as a directed acyclic graph. An insightful way to view the results of the analysis is to investigate how the significant Biological Processes are distributed over the GO graph. The goplot function shows the subgraph induced by the most significant terms.

Abbreviations: GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.

GO analysis of genes in turquoise module Abbreviations: BP, biological process; CC, cellular component; GO, Gene Ontology; MF, molecular function. KEGG pathway analysis of turquoise module Abbreviation: KEGG, Kyoto Encyclopedia of Genes and Genomes. GO and KEGG pathway enrichment analyses of turquoise module genes. (A) Dotplot of GO analysis divided into three functional groups: biological processes, cell composition and molecular function. (B) UpSet plot of KEGG pathway enrichment, visualizing the complex association between genes and different pathways. It emphasizes the genes overlapping among different pathways. (C) plotGOgraph of Biological Processes depicting tree-like relationships between different processes. (D) The terms of Biological Processes are organized as a directed acyclic graph. An insightful way to view the results of the analysis is to investigate how the significant Biological Processes are distributed over the GO graph. The goplot function shows the subgraph induced by the most significant terms. Abbreviations: GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Identification of hub genes in the turquoise module

We used the Matthews correlation coefficient (MCC) algorithm and degree to rank the genes in the turquoise module (see Tables S5 and S6 for more information). The first 30 genes were selected and intersected with 18 hub genes selected by HNSCC data from the TCGA database by WGCNA.26 The results are shown in Figure S3A (Detailed results are shown in Table 3). Figure S3B is a plot of the connectivity among the first 30 genes selected by degree sequencing (see Table S7 for more information). We screened these genes and identified 16 genes that have an impact on survival in patients with HNSCC by GEPIA.27 The results are shown in Figure 7, and these genes are considered as hub genes.

Table 3

Common genes between different algorithms and the hub gene obtained from TCGA

Categories	Elements	Count
MCC & TCGA & degree	NCKAP1L/IL10RA/TRAF3IP3/WAS/SASH3	5
MCC & degree	CORO1A/SP140/GIMAP4/PPP1R16B/IRF8/PRKCB1/HCST/PARVG/CD53/DOCK2/AKNA/ITK/GIMAP7/TBC1D10C/GIMAP5/ARHGAP9/ITGAL/BTK	18
TCGA & degree	SLA/APBB1IP	2
Degree only	P2RY8/GPSM3/CCR7/SH2D1A/LAT2	5
MCC only	ARHGEF6/RCSD1/FAM65B/GMFG/DOCK8/FAIM3/Septin 6	7
TCGA only	PTPN22/FERMT3/GIMAP6/DOK2/CELF2/SELPLG/TNFRSF1B/MYO1F/WIPF1/UBASH3A/AMICA1	11

Abbreviations: TCGA, The Cancer Genome Atlas; MCC, Matthews correlation coefficient.

Figure 7

Survival analysis of 16 hub genes from the turquoise module. They are ARHGAP9, SASH3, CORO1A, ITGAL, PPP1R16B, TBC1D10C, IL10RA, ITK, AKNA, PRKCB, TRAF3IP3, GIMAP4, CCR7, P2RY8, GIMAP7 and SP140. P<0.05 was regarded as significant.

Common genes between different algorithms and the hub gene obtained from TCGA Abbreviations: TCGA, The Cancer Genome Atlas; MCC, Matthews correlation coefficient. Survival analysis of 16 hub genes from the turquoise module. They are ARHGAP9, SASH3, CORO1A, ITGAL, PPP1R16B, TBC1D10C, IL10RA, ITK, AKNA, PRKCB, TRAF3IP3, GIMAP4, CCR7, P2RY8, GIMAP7 and SP140. P<0.05 was regarded as significant. GEPIA was used to validate the changes in these 16 hub genes in normal and tumor tissues. Figure 8 shows the expression difference of these genes between normal and tumor tissues in HNSCC. The expression of most genes in tumors was higher than that in normal tissues. Except for PPP1R16B, there was no significant difference between tumors and normal tissues. GIMAP7 expression in normal tissues was higher than that in tumors. CBioPortal28,29 was used to further explore these genes, and queried genes were altered in 183 (37%) patients. The frequency of change of each hub gene in HNSCC is shown in Figure 9A, and TBC1D10C is altered the most (10%). We can tell that the upregulation and amplification of these genes are the main changes in HNSCC. Figure 9B shows 16 queried genes and the 50 most frequently changed neighbor genes, which also introduces the relationship between central genes and cancer drugs. ITK and ITGAL are the targets of cancer drugs. Figure S4 is the relationship between the methylation and gene expression of these genes in HNSCC explored in CBioPortal. ARHGAP9, SASH3 and P2RY8 have no methylation information. The expression of other genes is negatively correlated with the degree of methylation, indicating that the upregulation of these genes may be caused by hypomethylation.

Figure 8

Validation of the hub genes at the transcriptional level.

Notes: Validation of the hub genes by the TCGA HNSCC dataset in GEPIA. The expression of most genes in tumors was higher than that in normal tissues. Except for PPP1R16B, there was no significant difference in expression between tumors and normal tissues. GIMAP7 expression in normal tissues was higher than that in tumors. Abbreviations: TCGA, The Cancer Genome Atlas; HNSCC, Head and neck squamous cell carcinoma.

Figure 9

(A) The alteration frequency of 16 hub genes in the TCGA HNSCC dataset is illustrated. (B) Gene network and tumor drugs associated with the hub genes in the TCGA HNSCC dataset.

Abbreviation: TCGA, The Cancer Genome Atlas.

Validation of the hub genes at the transcriptional level. Notes: Validation of the hub genes by the TCGA HNSCC dataset in GEPIA. The expression of most genes in tumors was higher than that in normal tissues. Except for PPP1R16B, there was no significant difference in expression between tumors and normal tissues. GIMAP7 expression in normal tissues was higher than that in tumors. Abbreviations: TCGA, The Cancer Genome Atlas; HNSCC, Head and neck squamous cell carcinoma. (A) The alteration frequency of 16 hub genes in the TCGA HNSCC dataset is illustrated. (B) Gene network and tumor drugs associated with the hub genes in the TCGA HNSCC dataset. Abbreviation: TCGA, The Cancer Genome Atlas.

Discussion

HNSCC is a common malignant tumor caused by squamous cell abnormalities. The 5-year survival rate of patients with early HNSCC is 40–60%. However, for patients with local recurrence and distant metastasis, the median survival time after palliative chemotherapy is only 6–9 months. The survival time of advanced patients with chemotherapy tolerance has decreased to 3–6 months.30 Therefore, studying the development of HNSCC at the molecular level and formulating effective measures to prevent and suppress the metastasis of HNSCC are of great significance for the prevention and treatment of HNSCC. Weighted network analysis can be used to provide a holistic view of disease dynamics, while also enabling us to reduce complexity to a direct relationship. This method may reveal new discoveries of the disease status and mechanism of tumor evolution. It has been used to analyze the molecular mechanisms of many diseases, such as breast cancer31 and inflammatory bowel disease.32 In this study, we used this important tool to analyze HNSCC samples in the GSE65858 dataset. We used this method to transform the expression data from 5,000 genes to 16 modules. Next, we combined them with various clinical information. In these modules, we further focused on gene pivots highly related to various clinical features. Functional enrichment of key module genes was analyzed by R software. Sixteen hub genes significantly related to the survival rate were identified in key modules and were further validated in TCGA to verify the reliability of the results. We also compared the hub genes obtained by WGCNA from the TCGA data of HNSCC,33 and found a high degree of overlap, indicating that this module is highly preservative, and these genes are indeed related to the pathogenesis of HNSCC. CBioPortal was used to explore hub genes and their genetic alteration. Based on the measurable changes in network analysis, more samples were analyzed. These gene changes provide new insights into the molecular explanation of the pathogenesis of HNSCC, and the hub genes may be used as biomarkers and therapeutic targets for the accurate diagnosis and treatment of HNSCC in the future, providing an early warning to clinicians about patients at risk for progressive disease development. Current advances in cancer treatment are focused on precise/personalized approaches based on detectable molecular abnormalities in patients’ tumors, which can then be utilized in a targeted manner. HNSCC evades immune responses through multiple resistance mechanisms. It is characterized by an immunosuppressive environment which includes the release of immunosuppressive factors,34 activation, expansion of immune cells with inhibitory activity and the reduction of tumor immunogenicity.35,36 This is why HNSCC has long been considered an immunosuppressive disease. Patients often show a low absolute lymphocyte count, spontaneous apoptosis of cytotoxic T lymphocytes (CTLs), deficiencies in NK cell activity and antigen presentation dysfunction. The key module selected by WGCNA also confirms this view. The preservation of this module is very strong, indicating that the genes in it are closely related and highly correlated. Moreover, its gene function enrichment analysis is mainly focused on immune-related pathways, suggesting that the pathogenesis of HNSCC may be related to immune disorders. The survival analysis of hub genes reveals that the expression of immune-related genes may affect the prognosis of patients with HNSCC. Survival analysis in HNSCC patients showed that patients with high expression of these genes had a higher survival rate (p<0.05), with biological significance. To further explore the expression differences of these genes in normal and tumor tissues, we found that the expression of many genes in tumors was higher than that in normal tissues, indicating that these immune-related genes were activated in HNSCC and played roles such as inhibiting the process of tumorigenesis and formation of the local tumor microenvironment. These data also provide us with a new idea for the treatment of HNSCC. Ideally, the immune system can play an active immune killing role by recognizing the relevant or specific antigens of tumors. If the immunogenicity of tumor cells is weakened, they can grow and spread without the control and establish a local immunosuppressive microenvironment.37 Among all malignant tumors, head and neck squamous cell carcinoma has a high mutation rate,38 and the establishment of an immunosuppressive microenvironment is the key to tumorigenesis and development.39 In recent years, tumor immunotherapy has gradually become one of the hotspots in clinical and basic research of cancer and has made breakthroughs in the treatment of melanoma, lung cancer and other tumors.40 Immunotherapy of tumors is mainly aimed at curing tumors by improving the patient’s own immune system and inducing an anti-tumor immune response.41 Cytokines are often in a state of imbalance in the tumor microenvironment of head and neck squamous cell carcinomas—that is, immunosuppressive cytokines are more than immunostimulatory cytokines, thus promoting the immune escape of head and neck squamous cell carcinomas. Compared with healthy people, HNSCC patients have fewer absolute T lymphocyte counts in peripheral blood and more inhibited regulatory T cells.42 At the same time, the tumor and its surrounding stromal cells can secrete many cytokines such as IL10 and tumor growth factor to form an immunosuppressive tumor microenvironment, which can avoid killing by reducing the expression of tumor antigen and promoting the apoptosis of cytotoxic T lymphocytes.43 Additionally, the inhibition of natural killer cell activity and damage of the dendritic cell antigen-presenting function are also important mechanisms to avoid epidemic surveillance of head and neck tumors.44 Therefore, immunotherapy for head and neck squamous cell carcinoma has great potential. We analyzed the hub genes screened in this study. ARHGAP9 (Rho GTPase Activating Protein 9), a member of the RhoGAP family, has been identified as a RhoGAP for Cdc42 and Rac1.45 RhoGAP proteins promote the hydrolysis of GTP-bound Rho GTP enzymes, thus inactivating Rho GTP enzymes and inhibiting various cellular processes, such as gene transcription, cytoskeleton tissue, cell proliferation, migration and invasion.46 Recent studies have found that ARHGAP9 inhibits the migration and invasion of hepatocellular carcinoma cells by upregulating FOXJ2/E-cadherin.47 We speculate that ARHGAP9 may play a similar role in the genesis of HNSCC. GTPase, a family of immuno-related proteins (GIMAP), is most widely expressed in the immune system and is differentially regulated during early human Th-cell differentiation.48 The highly conserved human GIMAP gene region consists of seven functional genes and one pseudogene.49 Although they are likely to be regulated similarly, subcellular localization and structural differences indicate that each gene has a specific function.50 GIMAP4 plays a role in Th-cell secretion.51 But there is little research on GIAMP7, and the expression of GIMAP7 in normal tissues is higher than that in tumor tissues. This also indicates that there are different functions of genes in this family. We speculate that GIMAP7 may play a regulatory role in the regulation of the immune response in coordination with other molecules. It can be used as a potential molecular target for future exploration. SASH3 was also found to have associated changes in oligodendroglioma,52 and methylation of the PPP1R16B locus was reported in patients with colorectal cancer.53 CORO1A is an actin regulatory protein mainly expressed in hematopoietic cells that is essential for the development of T cells and homeostasis.54 These genes have not yet been reported to play a role in HNSCC and warrant further exploration. The role of IL10 in the regulation of the immune response during tumorigenesis has been widely described,55 which corresponds to what we found in this study that the higher expression of IL10RA in HNSCC than in normal samples. Our study also found that the expression of IL10RA affect the prognosis of HNSCC, presumably related to the different expression of IL10. TBC1D10C was previously proven to bind and inhibit Ras and Calcineurin proteins. It has been proven that TBC1D10C exhibits a physical interaction with H-Ras and CaN in T cells, inhibits the Ras/MAPK signaling pathway and is a negative feedback inhibitor of the CaN signaling pathway.56 It was hypothesized that human AKNA is encoded by a single gene located in the FRA9E region of chromosome 9q3257 that is associated with functional deletion mutations and often leads to inflammatory and neoplastic diseases.58 Recent findings that single-nucleotide polymorphisms (SNPs) in the AT-hook domain of human AKNA increase the risk of cervical cancer support this hypothesis.59 Overall, these genes play an important role in HNSCC tumorigenesis. However, the primary limitation of the present study was that these important hub genes remain to be verified by experiments; therefore, further analyses are required to determine the mechanisms underlying the growth of HNSCC cells. Further research is required to fully explore their roles in HNSCC. In general, the innovative use of a weighted network to analyze HNSCC samples provides new insights into the molecular mechanism and prognosis of HNSCC. We can combine more information in future research, such as SNPs, leading to better understanding of HNSCC.

58 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Genetic network inference: from co-expression clustering to reverse engineering.

Authors: P D'haeseleer; S Liang; R Somogyi
Journal: Bioinformatics Date: 2000-08 Impact factor: 6.937

3. Isolation of a novel human gene, ARHGAP9, encoding a rho-GTPase activating protein.

Authors: Y Furukawa; T Kawasoe; Y Daigo; T Nishiwaki; H Ishiguro; M Takahashi; J Kitayama; Y Nakamura
Journal: Biochem Biophys Res Commun Date: 2001-06-15 Impact factor: 3.575

4. A general framework for weighted gene co-expression network analysis.

Authors: Bin Zhang; Steve Horvath
Journal: Stat Appl Genet Mol Biol Date: 2005-08-12

5. Requirement for coronin 1 in T lymphocyte trafficking and cellular homeostasis.

Authors: Niko Föger; Linda Rangell; Dimitry M Danilenko; Andrew C Chan
Journal: Science Date: 2006-08-11 Impact factor: 47.728

6. Transcript profiling in peripheral T-cell lymphoma, not otherwise specified, and diffuse large B-cell lymphoma identifies distinct tumor profile signatures.

Authors: Daruka Mahadevan; Catherine Spier; Kimiko Della Croce; Susan Miller; Benjamin George; Chris Riley; Stephen Warner; Thomas M Grogan; Thomas P Miller
Journal: Mol Cancer Ther Date: 2005-12 Impact factor: 6.261

Review 7. Immune escape associated with functional defects in antigen-processing machinery in head and neck cancer.

Authors: Robert L Ferris; Theresa L Whiteside; Soldano Ferrone
Journal: Clin Cancer Res Date: 2006-07-01 Impact factor: 12.531

8. The human AKNA gene expresses multiple transcripts and protein isoforms as a result of alternative promoter usage, splicing, and polyadenylation.

Authors: Jennifer C Sims-Mourtada; Shirley Bruce; Morgan R McKeller; Roberto Rangel; Liliana Guzman-Rojas; Kelly Cain; Cristina Lopez; Drazen B Zimonjic; Nicholas C Popescu; John Gordon; Miles F Wilkinson; Hector Martinez-Valdez
Journal: DNA Cell Biol Date: 2005-05 Impact factor: 3.311

9. Comparative analysis of the human gimap gene cluster encoding a novel GTPase family.

Authors: Jürgen Krücken; Regina M U Schroetel; Inga U Müller; Nadia Saïdani; Predrag Marinovski; W Peter M Benten; Olaf Stamm; Frank Wunderlich
Journal: Gene Date: 2004-10-27 Impact factor: 3.688

10. The Gene Ontology (GO) project in 2006.

Authors:
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

18 in total

Review 1. Fixing the GAP: The role of RhoGAPs in cancer.

Authors: Gabriel Kreider-Letterman; Nicole M Carr; Rafael Garcia-Mata
Journal: Eur J Cell Biol Date: 2022-02-10 Impact factor: 6.020

2. Identification of four genes associated with cutaneous metastatic melanoma.

Authors: Chen Ji; Yuming Li; Kai Yang; Yanwei Gao; Yan Sha; Dong Xiao; Xiaohong Liang; Zhongqin Cheng
Journal: Open Med (Wars) Date: 2020-06-11

3. Establishment of a p53 Null Murine Oral Carcinoma Cell Line and the Identification of Genetic Alterations Associated with This Carcinoma.

Authors: Kuo-Wei Chang; Chia-En Lin; Hsi-Feng Tu; Hsin-Yao Chung; Yi-Fen Chen; Shu-Chun Lin
Journal: Int J Mol Sci Date: 2020-12-08 Impact factor: 5.923

4. PSMC2, ORC5 and KRTDAP are specific biomarkers for HPV-negative head and neck squamous cell carcinoma.

Authors: Yushen Su; Zhirui Zeng; Dongyun Rong; Yushi Yang; Bei Wu; Yu Cao
Journal: Oncol Lett Date: 2021-02-12 Impact factor: 2.967

5. HPV infection related immune infiltration gene associated therapeutic strategy and clinical outcome in HNSCC.

Authors: Hao Zeng; Xindi Song; Jianrui Ji; Linyan Chen; Qimeng Liao; Xuelei Ma
Journal: BMC Cancer Date: 2020-08-24 Impact factor: 4.430

6. AKNA Is a Potential Prognostic Biomarker in Gastric Cancer and Function as a Tumor Suppressor by Modulating EMT-Related Pathways.

Authors: Gang Wang; Dan Sun; Wenhui Li; Yan Xin
Journal: Biomed Res Int Date: 2020-05-13 Impact factor: 3.411

7. Different transcriptome profiles between human retinoblastoma Y79 cells and an etoposide-resistant subline reveal a chemoresistance mechanism.

Authors: Wen-Ping Song; Si Zheng; Hong-Juan Yao; Xiao-Fei Zhou; Rui Li; Cheng-Yue Zhang; Jun-Yang Zhao; Lie-Wei Wang; Rong-Guang Shao; Liang Li
Journal: BMC Ophthalmol Date: 2020-03-06 Impact factor: 2.209