Head and neck squamous cell carcinoma (HNSCC) was reported as the sixth most common type of malignancy in humans, constituting ~4% of all new cases in the United States in 2015 (1,2). According to a recent report (2018), ~600,000 patients were affected worldwide yearly and the incidence rate has significantly increased (3). Improvements in clinical therapy have not led to corresponding improvements in the prognosis of patients with HNSCC. The 5-year survival rate of patients with HNSCC still remains between 40 and 50%. Revealing the underlying mechanism of HNSCC development could provide potential biomarkers or therapeutic targets in HNSCC (4).Several studies have focused on the mechanism of HNSCC, and the new generation of sequencing technology provides a rich resource for the study of significant genetic changes during tumorigenesis and for the screening of potential diagnostic and prognostic markers of cancer. For instance, it was reported that actin-like protein 8 (ACTL8) was increasingly expressed in HNSCC and regarded as an independent prognostic factor (5). The expression of neutrophil gelatinase-associated lipocalin was lower in HNSCC than in normal tissues, and was correlated with the tumorigenesis of HNSCC (6). Calpain 6 expression was significantly decreased in HNSCC and positively associated with the survival rate of patients with the disease, thereby indicating the role of calpain 6 as a tumor suppressor in HNSCC (7). However, due to the limited sample sizes, previous studies may provide false predictions.In the present study, integrated analysis was performed to identify the key genes involved in the development of HNSCC. Firstly, the differentially expressed genes (DEGs) between HNSCC and normal tissues were screened, followed by Gene Ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis and protein-protein interaction (PPI) network analysis. Finally, the key candidate DEGs were identified according to Centiscape and log-rank survival analysis, and were verified using the Gene Expression Ontology (GEO) datasets. These key DEGs were identified as potential biomarkers for early diagnosis and as therapeutic targets for HNSCC.
Materials and methods
Gene expression profile data and identification of DEGs
The level-3 RNA sequence (RNA Seq) data (fragments per kilobase of transcript per million mapped reads upper quartile data) of HNSCC and corresponding normal tissue samples were downloaded from the TCGA database (dataset no. :544; http://www.cancer.gov/tcga) (8) using the Genomic Data Commons Application Programming Interface (9). A total of 544 samples from 500 patients with HNSCC and 44 normal controls were collected in December 2018. The normal controls included normal tissues from the oral cavity, oral tongue, larynx, floor of the mouth and base of the tongue. The raw data was downloaded and the log2 fold-change (log2FC) was calculated using the Limma R package (version 3.2.5; http://www.r-project.org/) to screen DEGs between HNSCC and normal tissues. The following cut-off criteria were applied: log2FC>2 and P<0.05. The adjustment of P<0.5 was set as the threshold to adjust the P-value for multiple comparisons.For verification purposes, the microarray expression dataset GSE6631 was downloaded (as minimum information about a microarray experiment notation in mark-up language formatted family files) from the GEO database (10). The GSE6631 was based on the GPL8300 Platforms (Affymetrix Human Genome U95 version 2 array) and included 44 HNCC samples and paired normal samples (submission date, 2007; last updated, 2018) (11). The sample information and expression profile data were extracted by R package (version:3.2.5) from GES6631. Statistical analyses were performed with GraphPad Prism version 8.0 software (GraphPad Software, Inc.). Single comparisons between two groups were performed using Paired Student's t-test.
GO and pathway enrichment analyses of DEGs
GO analysis and KEGG pathway enrichment of DEGs were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) (version, 6.7; http://david-d.ncifcrf.gov/) to screen for possible biological processes, cellular components, molecular functions and signaling pathways of the involved DEGs (12). The resulting data were imported into Cytoscape ClueGo software (version: 3.6.0) for visual analysis (13). P<0.05 was considered as statistically significant. The following parameter settings were applied: Identifier, ‘official gene symbol.’; list type, ‘gene list.’; species, ‘homo sapiens.’; count threshold, 2; and ease threshold, 0.05.
PPI network construction and candidate gene identification
The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; version, 11.0; http://string-db.org) was used to construct the PPI network (14). The minimum required interaction score was set to a medium confidence of 0.4, and the organism was set to ‘Homo sapiens’. The Cytoscape software was then used to visualize the network. Cytoscape CentiScape (version, 3.6.0; http://apps.cytoscape.org/apps/centiscape) (15) was used to screen candidate key proteins in the network, according to the degree of centrality. The genes with a node degree of ≥15 were considered as the candidate key genes.
Association between candidate key genes and clinicopathological parameters of HNSCC
By using the Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia.cancer-pku.cn/), the effect of candidate key genes on overall survival (OS) rate was evaluated using log-rank test and the Mantel-Cox test. P<0.05 for log-rank test indicated statistical significance. Genes significantly associated with HNSCC OS rate were considered as key genes. The following parameter settings were applied: Group cut-off, ‘median.’; hazards ratio, ‘yes.’; 95% confidence interval, ‘yes.’; and axis units, ‘months.’.
Comparison of key genes in HNSCC and other types of cancer
In order to evaluate the specificity of the key genes screened in the previous step to HNSCC, the expression of these genes was examined in other tumor datasets from the GEPIA database, including adrenocortical carcinoma, bladder urothelial carcinoma (BLCA), breast invasive carcinoma, endocervical adenocarcinoma (CESC), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), liver hepatocellular carcinoma, lung squamous cell carcinoma and lung adenocarcinoma (LUAD). These datasets came from the TCGA and the GTEx projects (16). The unmatched normal and tumor tissues were compared. The raw data were filtered based on the cut-offs log2FC>2 and P<0.05.
Results
Identification of DEGs in HNSCC
RNAseq data of HNSCC and corresponding normal tissue samples were downloaded from the TCGA database. The data was screened by the Limma package, using P<0.05 and log2FC>2 as the cut-off criteria, which identified 1,181 DEGs (Fig. 1), including 354 upregulated and 827 downregulated genes.
Figure 1.
Differential gene expression between head and neck squamous cell carcinoma and normal tissues. The volcano plot presents upregulated (red points) and downregulated genes (green points), screened on the basis of fold-change >2.0 and a correction for P<0.05. The black points represent genes with no significant difference.
GO analysis and signaling pathway enrichment of DEGs in HNSCC
The GO analysis of the 1,181 DEGs was performed using the DAVID database, with the criterion set at P<0.05. The DEGs were divided into three groups, namely, biological process, cellular component and molecular function groups. As shown in Fig. 2A, the main DEG-associated biological functions were ‘cell adhesion’, ‘extracellular matrix organization’, ‘skeletal system development’ and ‘ion transmembrane transport’. As demonstrated in Fig. 2B, the cellular component analysis revealed that the selected DEGs were mainly located at the ‘extracellular exosome’, ‘extracellular region’ and ‘extracellular space’. The molecular function of the DEGs was mainly associated with ‘actin binding’, ‘heparin binding’ and ‘cytokine activity’ (Fig. 2C). As demonstrated in Fig. 3, the KEGG pathways enriched by the DEGs were mainly associated with ‘protein digestion and absorption’, ‘extracellular matrix-receptor interaction’, ‘drug metabolism’ and the ‘PPAR signaling pathway’.
Figure 2.
GO analysis of the DEGs in head and neck squamous cell carcinoma. The graphs present the number of DEGs enriched in each GO term: (A) Biological process, (B) cellular component and (C) molecular function. GO, Gene Ontology; DEGs, differentially expressed genes.
Figure 3.
KEGG pathway analysis of the differentially expressed genes in head and neck squamous cell carcinoma. The green diamond in the center represents different KEGG pathways, including ‘protein digestion and absorption’, ‘extracellular matrix-receptor interaction’, ‘drug metabolism’ and the ‘PPAR signaling pathway’. The peripheral red nodes represent the genes enriched in the corresponding pathways. KEGG, Kyoto Encyclopedia of Genes and Genomes.
Key candidate DEG identification with PPI network analysis
The PPI network of the DEG expression products was constructed using Cytoscape software and the STRING database (http://string-db.org). A total of 1,035 DEGs were incorporated into the PPI network complex. The Cytoscape CentiScape was used to screen candidate key genes in the network, with degree of centrality ≥15 set as the inclusion criterion, which identified 50 genes that were included in the following analysis (Fig. 4).
Figure 4.
PPI network of the DEGs. A total of 1,035 DEGs were incorporated into the PPI network complex. The peripheral red nodes represent the DEGs with degree of centrality ≥15. The lines represent the interactions between nodes. PPI, protein-protein interaction; DEGs, differentially expressed genes.
Log-rank survival analysis by the GEPIA database, found 8 out of the 50 candidate key genes, including integrin α-5 (ITGA5) and serpin family E member 1 (SERPINE1), to be significantly associated with HNSCC OS rate (P<0.05; Table I). If the threshold was set to P<0.01, only ITGA5 and SERPINE1 were found to be associated with the OS rate of HNSCC (Fig. 5). The Cox proportional hazard ratios of ITGA5 and SERPINE1 were both 1.5.
Table I.
Log regression analysis of differently expressed genes with node degree of centrality ≥15.
Gene symbol
Degree of centrality
Log-rank, P-value
COL1A1
45
0.30
COL1A2
37
0.84
FN1
34
0.11
COL2A1
34
0.26
COL3A1
33
0.56
MMP9
32
0.93
COL4A2
30
0.61
COL4A1
30
0.62
COL5A2
29
0.29
COL4A5
28
0.85
COL11A1
26
0.09
COL7A1
26
0.10
COL5A1
26
0.28
SPARC
26
0.30
COL6A3
26
0.86
SERPINH1
25
0.02
COL6A1
25
0.22
COL4A6
25
0.36
TIMP1
24
0.04
SPP1
24
0.04
CSF2
23
0.02
LUM
23
0.63
COL10A1
23
0.98
GNGT1
22
0.15
COL27A1
22
0.46
ITGA5
21
<0.01
MMP1
21
0.04
CXCL10
21
0.19
CENPA
21
0.30
COL12A1
21
0.53
P4HA3
20
0.06
MMP13
20
0.09
MMP3
20
0.23
POSTN
20
0.30
PLK1
20
0.41
COL13A1
20
0.44
COL22A1
20
0.62
FOXM1
19
0.55
ACAN
18
0.50
AURKB
18
0.63
KIF2C
17
0.29
CDC45
17
0.90
PTHLH
17
0.92
SERPINE1
16
<0.01
ITGA11
16
0.50
TPX2
16
0.61
OASL
16
0.67
NID1
15
0.04
CXCL9
15
0.18
UBE2C
15
0.58
Figure 5.
Prognostic relevance of ITGA5 and SERPINE1 in patients with head and neck squamous cell carcinoma. Kaplan-Meier survival analysis of patients with low and high expression of (A) ITGA5 and (B) SERPINE1 was performed using the Gene Expression Profiling Interactive Analysis database. The group cut-off value was based on the median. The dotted lines represent the 95% confidence intervals. ITGA5, integrin α-5; SERPINE1, serpin family E member 1; HR, hazard ratio.
ITGA5 is highly expressed in HNSCC specifically
To evaluate the specificity of the key genes previously screened in HNSCC, their expression levels were determined in other tumor datasets from the GEPIA database. The expression of ITGA5 was only significantly increased in HNSCC and decreased in BLCA, CESC, COAD and LUSC (Fig. 6A). However, SERPINE1 expression was significantly increased in ESCA, as well as in HNSCC, with no other significant changes in other tumors (Fig. 6B).
Figure 6.
Expression of ITGA5 and SERPINE1 in different types of cancer. The expression levels of (A) ITGA5 and (B) SERPINE1 were compared in different types of cancer in humans. Box plots were drawn using the R software and the raw data from the Gene Expression Profiling Interactive Analysis database. ITGA5 expression was only significantly increased in HNSCC, whereas it was decreased in BLCA, CESC, COAD and LUSC. The expression of SERPINE1 was significantly increased in HNSCC, as well as ESCA. The red boxplots represented tumor samples and the grey boxplots represented normal samples. ITGA5, integrin α-5; SERPINE1, serpin family E member 1; HNSCC, head and neck squamous cell carcinoma; ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, endocervical adenocarcinoma; COAD, colon adenocarcinoma; ESCA, esophageal carcinoma; LIHC, liver hepatocellular carcinoma; LUSC, lung squamous cell carcinoma; LUAD, lung adenocarcinoma.
For verification, ITGA5 expression was evaluated in HNSCC based on the microarray express dataset GSE6631. Compared with ITGA5 expression in normal tissues, the expression in HNSCC tissues was upregulated (P=0.007; Fig. 7).
Figure 7.
Expression of ITGA5 in HNSCC samples compared with expression in normal samples. The horizontal axis represents the different samples and the vertical axis represents the expression level of ITGA5. ITGA5, integrin α-5.
Discussion
HNSCC is one of the most common types of malignancies in humans; it is characterized by rapid progression, a high migration capacity and high mortality rate. However, there are almost no biomarkers or targets for the diagnosis and treatment of HNSCC (1). Thus, the identification of genes that are differentially expressed between tumor and normal tissues will contribute to future studies on the pathogenesis of HNSCC and will provide potential diagnostic biomarkers and therapeutic targets.In the present study, TCGA datasets of HNSCC were integrated and analyzed using bioinformatics. Finally, a total of 1,181 DEGs were identified including 354 upregulated genes and 827 downregulated genes in the initial step. Subsequently, the 1,181 DEGs were analyzed based on GO terms and classified based on KEGG signaling pathways. The GO analysis indicated that the DEGs were mainly involved in ‘cell adhesion’, ‘extracellular matrix organization’, ‘skeletal system development’, ‘ion transmembrane transport’, ‘actin binding’, ‘heparin binding’ and ‘cytokine activity’. The KEGG pathway analysis revealed that the DEGs were mainly associated with ‘protein digestion and absorption’, ‘extracellular matrix-receptor interaction’, ‘drug metabolism’ and the ‘PPAR signaling pathway’. In addition, the PPI network complex was constructed and eight key genes, including ITGA5, SERPINE1, serpin family H member 1, colony-stimulating factor 2, tissue inhibitor of metalloproteinases 1, nidogen 1, secreted phosphoprotein 1 and matrix metallopeptidase 1, were screened based on centrality and log-rank survival analysis by GEPIA. The GEPIA database contains genotype-tissue expression data, which increases the sample size and accuracy of the analysis, and was therefore was used to identify key DEGs. As a result, ITGA5 and SERPINE1 were identified as key genes.ITGA5 is an important member of the integrin family, and its coding gene is located at the human chromosome 12q11-q13. Integrin is an extracellular matrix receptor that acts as an adhesive receptor for extracellular matrix proteins, including fibrin, laminin and collagen (17). ITGA5 forms the link between the extracellular matrix and intracellular signal transduction, and also participates in a variety of important physiological processes; it is also associated with tumor occurrence, development, invasion and metastasis (18). A study on hepatocellular carcinoma (HCC) demonstrated a negative correlation between the expression of ITGA5 and miR-128, and the downregulation of ITGA5 leading to the inhibition of HCC cell metastasis and stem cell-like properties (19). However, the expression of ITGA5 played diverse roles in breast cancer cells (20). Expression in highly invasive breast cancer cells was almost absent in comparison with that in less invasive cells, thereby indicating the negative association between ITGA5 and breast cancer metastasis (21,22). In ovarian cancer cells, downregulation of ITGA5 induced by forced expression of miR-17 significantly limited the adhesion, invasion and tumorigenesis ability of cancer cells (23). ITGA5 was reported to be highly expressed in glioblastoma compared with normal brain glial cells, and its downregulation inhibited proliferation, invasion and migration (24). ITGA5 expression was also upregulated in oral squamous cell carcinoma (OSCC), a common type of HNSCC (25). Knockdown of ITGA5 inhibited the proliferation, migration and invasion of OSCC cells, thereby indicating that ITGA5 could promote OSCC progression.SERPINE1, also known as plasminogen activator inhibitor type I, is a primary inhibitor of plasminogen activators and a marker of poor prognosis in cancer. In colorectal cancer, SERPINE1 expression was upregulated and was significantly associated with grading and microsatellite instability (26). In esophageal cancer, SERPINE1 expression was upregulated and significantly associated with age range, which was consistent with the analysis of the present study (27). SEPINE1 was highly expressed in oral carcinomas compared with in the matched tumor adjacent normal tissues (28), and its overexpression resulted in increased proliferation and tumor budding. Moreover, SEPINE1 was associated with poor progression-free and cancer-specific survival in patients with head and neck cancer (29,30).In conclusion, analysis of GEPIA datasets demonstrated the association between high expression of ITGA5 and SERPINE1 and poor OS rate in patients with HNSCC, with identical hazard ratio for both genes. The investigation of these genes was conducted in other tumor datasets, in order to determine their specificity to HNSCC. The expression of ITGA5 was significantly increased in HNSCC only and decreased in BLCA, CESC, COAD and LUSC. On the other hand, the expression of SERPINE1 was significantly increased in ESCA, as well as in HNSCC. A limitation of the present study was that the gene expression in HNSCC was only analyzed by using the public database. In subsequent experiments, molecular biology and cell biology methods will be utilized to further verify these results. Overall, these findings suggest the potential of ITGA5 as a diagnostic or prognostic marker and as a therapeutic target in HNSCC.
Authors: Anna Agnieszka Klimczak-Bitner; Radzisław Kordek; Jan Bitner; Jacek Musiał; Janusz Szemraj Journal: Oncol Lett Date: 2016-09-29 Impact factor: 2.967
Authors: Gianluigi Mazzoccoli; Valerio Pazienza; Anna Panza; Maria Rosa Valvano; Giorgia Benegiamo; Manlio Vinciguerra; Angelo Andriulli; Ada Piepoli Journal: J Cancer Res Clin Oncol Date: 2011-12-24 Impact factor: 4.553