Yuda Gong1, Xuan Liu1, Arvind Sahu2, Abhinav V Reddy3, Haiyu Wang1. 1. Department of General Surgery, Fudan University Zhongshan Hospital, Shanghai, China. 2. Department of Oncology, Goulburn Valley Health, Shepparton, Victoria, Australia. 3. Department of Radiation Oncology & Molecular Radiation Sciences, Johns Hopkins University School of Medicine, Sidney Kimmel Cancer Center, Baltimore, MD, USA.
Abstract
Background: Gastric cancer (GC) is the 5th most common cause of cancer in the world and the 3rd largest cause of cancer-related death. It is usually associated with a variety of cancers, of which cholangiocarcinoma (CCA) combined with GC accounts for about 1.6%. This study sought to examine the hub genes and role of lipid metabolism in the development and diagnosis of GC and CCA. Methods: To screen potential hub genes, The Cancer Genome Atlas (TCGA) data sets, including the GC (STAD, dataset of GC) and CCA (CHOL, dataset of CCA) data sets, were used to conduct a differentially expressed gene (DEG) analysis and an enrichment analysis of the DEGs. A weighted-gene co-expression network analysis (WGCNA) was conducted to identify the significant gene module and then find the hub genes in the module. To verify the 4 hub genes, we conducted a differentiation analysis of the 4 genes in GC and CCA and found that there were differences. A survival analysis of the hub genes was performed and mutations were mapped. Additionally, tumor immune microenvironment (TIME) and immune analyses were performed to evaluate how lipid metabolism affects the development of GC with CCA. Results: The principal component analysis showed that both GC and CCA had distinct up-regulated and down-regulated genes, which are involved in a variety of metabolic processes. Upon WGCNA, the turquoise and blue modules were meaningful, and the hub genes were identified from these 2 modules. Four hub genes were identified: amyloid beta precursor protein binding family B member 1 (APBB1), Homo sapiens armadillo repeat containing X-linked 1 (ARMCX1), DAZ interacting zinc finger protein 1 (DZIP1), and methionine sulfoxide reductase B3 (MSRB3). In survival analysis, increased expression of the 4 hub genes was associated with inferior survival outcomes, with variations in all 4 genes. Additionally, we demonstrated that genes related to lipid metabolism had an effect on immune function. Conclusions: APBB1, ARMCX1, DZIP1, and MSRB3 affect the development of GC and CCA and can be used as biomarkers. The expression of lipid metabolism genes is related to the TIME of patients with GC and CCA. 2022 Annals of Translational Medicine. All rights reserved.
Background: Gastric cancer (GC) is the 5th most common cause of cancer in the world and the 3rd largest cause of cancer-related death. It is usually associated with a variety of cancers, of which cholangiocarcinoma (CCA) combined with GC accounts for about 1.6%. This study sought to examine the hub genes and role of lipid metabolism in the development and diagnosis of GC and CCA. Methods: To screen potential hub genes, The Cancer Genome Atlas (TCGA) data sets, including the GC (STAD, dataset of GC) and CCA (CHOL, dataset of CCA) data sets, were used to conduct a differentially expressed gene (DEG) analysis and an enrichment analysis of the DEGs. A weighted-gene co-expression network analysis (WGCNA) was conducted to identify the significant gene module and then find the hub genes in the module. To verify the 4 hub genes, we conducted a differentiation analysis of the 4 genes in GC and CCA and found that there were differences. A survival analysis of the hub genes was performed and mutations were mapped. Additionally, tumor immune microenvironment (TIME) and immune analyses were performed to evaluate how lipid metabolism affects the development of GC with CCA. Results: The principal component analysis showed that both GC and CCA had distinct up-regulated and down-regulated genes, which are involved in a variety of metabolic processes. Upon WGCNA, the turquoise and blue modules were meaningful, and the hub genes were identified from these 2 modules. Four hub genes were identified: amyloid beta precursor protein binding family B member 1 (APBB1), Homo sapiens armadillo repeat containing X-linked 1 (ARMCX1), DAZ interacting zinc finger protein 1 (DZIP1), and methionine sulfoxide reductase B3 (MSRB3). In survival analysis, increased expression of the 4 hub genes was associated with inferior survival outcomes, with variations in all 4 genes. Additionally, we demonstrated that genes related to lipid metabolism had an effect on immune function. Conclusions: APBB1, ARMCX1, DZIP1, and MSRB3 affect the development of GC and CCA and can be used as biomarkers. The expression of lipid metabolism genes is related to the TIME of patients with GC and CCA. 2022 Annals of Translational Medicine. All rights reserved.
Gastric cancer (GC) is the 5th most common cancer and the 3rd leading cause of death in the world (1). The incidence rate is the highest in Asia and lowest in Africa (1). Cholangiocarcinoma (CCA) is rare, but it may be found after an initial diagnosis as GC. In 1.1–4.7% of GC cases, at least 1 other primary tumor can be found (1). The most common tumors associated with GC are colon cancer, lung cancer, and hepatocellular carcinoma. CCA is relatively rare, and is found synchronously in 1.6% of all GCs (2). However, the incidence rate of CCA is rising (2).The risk of GC and CCA is associated with deoxyribonucleic acid (DNA) replication defects, DNA methylation, and chromosomal and microsatellite instability. Surgery alone is the standard of care for early-stage GC, while surgery and chemotherapy with or without radiation therapy is used for advanced-staged GC and CCA (2,3). However, even after aggressive therapy, recurrence rates are high (4). Additionally, the mortality rate is high because most patients are diagnosed relatively late (4). Thus, it is necessary to understand the molecular mechanism involved in the carcinogenesis to determine prognostic biomarkers and potential treatment targets.High-throughput sequencing technology provides a new perspective on the genomic, transcriptome, and epigenome characteristics of cancer. Systems biology can effectively analyze complex human diseases, especially in large-scale cancer data sets (5). The weighted-gene co-expression network analysis (WGCNA) is an accurate and efficient multigene analysis method, and it is possible to construct a scale-free network to investigate the correlation between different genomes and genes. WGCNAs have been used to identify the clinical modules and hub genes of cancer (6).Lipid metabolism reprogramming has always been regarded as a new method for identifying malignant tumors, and there is increasing evidence from clinical and experimental studies that lipid metabolism disorder plays a key role in tumorigenesis (7,8). The pathogenesis of gastric cancer is complex and diverse, in which lipid metabolism plays a crucial role. Increasing the synthesis of new lipids or the uptake of exogenous lipids can promote the rapid growth of cancer cells and the formation of tumors. Lipids are not only the structural basis of biofilm, but also signal molecules and energy. It is worth noting that lipid metabolism can induce drug resistance of gastric cancer cells by reshaping the tumor microenvironment. Highly proliferative human CCA cells rely on lipid and lipoprotein uptake to promote fatty acid catabolism, suggesting that inhibiting fatty acid oxidation and/or lipid uptake may be the treatment strategy of this subclass of CCA (2,3). Some studies have shown that good control of blood lipid levels can improve the tumor immune microenvironment (7,8). The tumor immune microenvironment (TIME) reflects the immune landscape of the tumor microenvironment, which is very important in the occurrence and development of tumors (6). Immune cells play a key role in cell reprogramming. Immune cells can modify the microenvironment of tumor cells by secreting various biological factors, impacting tumorigenesis (8,9). Thus, studying the immune microenvironment of lipid metabolism-related genes (LMRGs) is of great significance in tumor risk assessment and biomarker screening.As such, we examined hub genes and role of lipid metabolism in the development and diagnosis of GC and CCA. This study used WGCNA to analyze the pathogenesis of GC and CCA. The ribonucleic acid (RNA) sequencing data of the GC and CCA samples were downloaded from TCGA, and the differentially expressed genes (DEGs) between GC, CCA, and normal tissues were analyzed to determine their expression and function levels. A Gene Ontology (GO) analysis was conducted to examine the functional enrichment of the DEGs. A WGCNA of the DEG matrix identified the modules related to the clinical characteristics of patients with GC and CCA. The hub genes identified by the bioinformatics analysis were verified by a survival analysis. Finally, we analyzed the immune microenvironment of GC and CCA as well as how genes related to lipid metabolism impact tumorigenesis. It is different from previous reports that our research explored the common hub genes of gastric cancer and CCA. We present the following article in accordance with the STREGA reporting checklist (available at https://atm.amegroups.com/article/view/10.21037/atm-22-3530/rc).
Methods
Data collection and pre-processing
The RNA-sequencing data and clinical information of patients were obtained from the “Genomic Data Commons (GDC) TCGA Stomach Cancer (STAD)” and “GDC TCGA Bile Duct Cancer (CHOL)” data sets, hosted by the Xena website of the University of California at Santa Cruz (http://xena.ucsc.edu/). The RNA-sequencing data included 415 GC samples and 35 normal tissue samples, and 36 CHOL samples and 9 normal tissue samples. The workflow of the study is shown in . The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Figure 1
Flowchart of the data analysis. GDC, Genomic Data Commons; TCGA, The Cancer Genome Atlas; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma; WGCNA, weighted-gene co-expression network analysis; GEO, Gene Expression Omnibus.
Flowchart of the data analysis. GDC, Genomic Data Commons; TCGA, The Cancer Genome Atlas; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma; WGCNA, weighted-gene co-expression network analysis; GEO, Gene Expression Omnibus.
Identification of DEGs and statistical analysis
The “limma” package in R language (version 4.1.2) was used to identify the DEGs between STAD, CHOL, and normal tissues. A DEG was defined as a DEG with a log2(fold change) value >1 and a P value <0.01. The volcano map of the DEGs was drawn using the “ggplot2” software package.
Functional enrichment of DEGs
After using “org.Hs.eg.db” package (version: 3.10) to convert the DEG identifiers, “clusterprofiler” software package (version: 3.14.3) was used to perform the function enrichment analysis of the DEG based on GO dataset. The GO terms with a P value <0.05 were considered statistically significant.
WGCNA
WGCNA package (version: 1.70-3) was used to construct the DEG networks. First, ineligible genes and samples were removed using the “good samples genes” function to filter the expression matrix in the WGCNA. Second, the samples were clustered using the “flash cluster” tool in R. Third, the Pearson correlation coefficient matrix was calculated for the paired gene comparison. Fourth, using the “picks of threshold” function, the appropriate soft threshold power (β) was selected to ensure that the network was scale-free. Finally, the “power” function was used to construct the adjacency matrix.
Identification of clinically significant modules and hub genes
First, we described the characteristics of each gene expression module. Second, we calculated the correlation between the characteristic genes and clinical characteristics to determine which modules had clinical significance. The linear relationship between gene expression and clinical features had significance (GS). If the GS was closely related to the module members (MMs), it was defined as the correlation between the module feature genes and single gene expression profiles, and the hub genes of the module were related to CHOL’s STAD. These hub genes were candidate hub genes. Since there were >20 related hub genes, we used a logistic regression to further screen the most important hub genes.
Validation of hub genes
The Gene Expression Profiling Interactive Analysis (GEPIA) website was used to study the expression levels of hub genes in the CHOL STAD sample (http://gepia.cancer-pku.cn/). The ability of the hub genes to predict survival was evaluated by a Kaplan-Meier analysis using the “survival” software package (version: 3.2-7). First, we obtained the DEG expression profiles and the prognostic data of the tumor samples of STAD and CHOL from TCGA. Second, we determined the median expression value of each gene. The samples were assigned to the “high expression” or “low expression” group according to whether the expression level of a specific gene was above or below the median. A log-rank test was used to evaluate the significance of the survival difference between the high expression group and low expression group. If the test was associated with a P value <0.05, the gene was considered an effective hub gene.Next, we screened the differences in hub gene expression between the normal and STAD and CHOL tissues according to the data of colon adenocarcinoma and rectal adenocarcinoma in TCGA and the Genotype Tissue Expression Project (GTEX) on the GEPIA website. The expression level was standardized by its mean value, and a difference with a P value <0.05 was considered statistically significant. By using the “ggpubr” software package (version: 0.4.0) and the GSE66229 data set (STAD) and GSE26566 data set (CHOL) in Gene Expression Omnibus (GEO) database, the expression differences between STAD with CHOL and normal tissues were analyzed to further verify the hub genes (https://www.ncbi.nlm.nih.gov/geo/). The independent sample test was adopted as the standard.We mapped the genome of the hub genes, including the mutation, copy number variation, and messenger RNA expression Z-score (rnaseqv2 rsem), using the data of the STAD and CHOL samples in the data set in the TCGA pan cancer map. We also used the mutation mapping tool to describe the mutation of each hub gene. We used the cBioPortal to access and analyze the data (http://www.cbioportal.org/).
Selection of the subgroups and the TIME evaluation
First, the univariate Cox regression analysis showed that 11 genes were associated with the prognosis of CHOL complicated with STAD. According to the expression matrix of the 11 genes, the R software package “consensusclusterplus” (version: 1.58.0) was used. For consistency clustering, the expression data (estimation) algorithm was used to estimate the stromal cells and immune cells in the malignant tumor tissues, and the stromal score, immune score, and tumor purity were calculated.
Immune analyses
A TIMER immune-infiltration analysis was conducted to evaluate the abundance of 6 immune infiltrating cells [i.e., B cells, dendritic cells, macrophages, neutrophils, cluster of differentiation (CD)4 T cells, and CD8 T cells]. Data, including data for 28 immune infiltrating cells and related genes, were obtained from the molecular feature database, and the concentration of 28 immuno-invasive cells in the tumor samples was evaluated by a single sample gene set enrichment analysis (SSGSEA).
Results
Pre-processing of data
Our findings were obtained from 415 GC samples, 35 normal tissue samples, 36 CCA samples, and 9 normal tissue samples. It is common practice to distinguish between the two main components of tumor samples and normal samples (8). After a principal component analysis, we found that both CHOL and STAD had obvious clusters (see ).
Figure 2
Identification of DEGs in normal, stomach, and cholangiocarcinoma. (A) Sample size of PCA. (B,C) The main components were analyzed by tSNE. STAD PCA is B and CHOL PCA is C. (D,E) Volcanic structure, the blue points represent the downregulated genes, the gray points represent the significant genes, and the red points represents the upregulated genes. STAD volcanic plot is (D) and CHOL volcanic plot is (E). (F) A GO analysis of the functional enrichment of the upregulated genes. The size of the points reflect the number of enrichment genes in a given ontological term, and the color represents the type of enrichment. DEGs, differentially expressed genes; PCA, principal component analysis; tSNE, T-distributed stochastic neighbor embedding; FDR, false discovery rate; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma; GO, Gene Ontology.
Identification of DEGs in normal, stomach, and cholangiocarcinoma. (A) Sample size of PCA. (B,C) The main components were analyzed by tSNE. STAD PCA is B and CHOL PCA is C. (D,E) Volcanic structure, the blue points represent the downregulated genes, the gray points represent the significant genes, and the red points represents the upregulated genes. STAD volcanic plot is (D) and CHOL volcanic plot is (E). (F) A GO analysis of the functional enrichment of the upregulated genes. The size of the points reflect the number of enrichment genes in a given ontological term, and the color represents the type of enrichment. DEGs, differentially expressed genes; PCA, principal component analysis; tSNE, T-distributed stochastic neighbor embedding; FDR, false discovery rate; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma; GO, Gene Ontology.
Identification of DEGs in the STAD with CHOL samples and the GO enrichment analysis
In 44 normal samples, 36 CHOL, and 415 STAD samples, a total of 908 DEGs were identified, including 450 upregulated genes and 458 downregulated genes. STAD volcanic plot is 2D and CHOL volcanic plot is 2E (see ). To explore the potential biological function of the DEGs in the STAD, we conducted an enrichment analysis (see ). The upregulated DEGs were mainly involved in extracellular regions and small molecule metabolism. In contrast to oxyacetic acid metabolism and organic acid metabolism (see ), the downregulated DEGs were mainly involved in the development of the nervous system, cell projection, and neurogenesis. These results are consistent with the known STAD dysfunction in STAD and CHOL patients; thus, our results appear to be reliable.
WGCNA and identification of critical modules
A WGCNA was used to build a network based on the DEG expression matrix and clinical data from the STAD and CHOL samples. We performed a cluster analysis and checked the quality of the STAD and CHOL samples. All the samples were in the cluster and within the threshold range (height <200). Thus, there were no outliers that needed to be removed. The WGCNA applied the following 6 clinical variables (see ): disease status (normal or tumor), gender (female or male), pathological stage (I–IV), sample type (primary tumor or normal solid tissue), age, weight, and survival time. The STAD sample containing CHOL was divided into the tumor sample and the normal sample (10).
Figure 3
WGCNA analysis of the DEGs in GC with cholangiocarcinoma. (A) Clinical characteristics and cluster tree of STAD data and CHOL samples. (B) Deggs’s tree view depends on the 1-tom dissimilarity measure. Each branch represents a gene, and each color represents a common expression form. (C) The number of genes in the 3 modules. (D) Heat map of the correlation between the modular signature gene and the clinical features of the CHOL patients. (E) Heat map values of gene module for the correlation coefficient and characteristic. WGCNA, weighted-gene co-expression network analysis; DEGs, differentially expressed genes; GC, gastric cancer; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma.
WGCNA analysis of the DEGs in GC with cholangiocarcinoma. (A) Clinical characteristics and cluster tree of STAD data and CHOL samples. (B) Deggs’s tree view depends on the 1-tom dissimilarity measure. Each branch represents a gene, and each color represents a common expression form. (C) The number of genes in the 3 modules. (D) Heat map of the correlation between the modular signature gene and the clinical features of the CHOL patients. (E) Heat map values of gene module for the correlation coefficient and characteristic. WGCNA, weighted-gene co-expression network analysis; DEGs, differentially expressed genes; GC, gastric cancer; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma.To build a scalable network, we used a soft threshold power β set to 7. The independence level was set to 0.9, and the average connectivity was close to 0. We clustered DEGs for same DEGs of grouping with the same expression module and merged the module with a <0.25 height difference (11). This process resulted in the following 3 co-expression modules: turquoise, blue, and brown (see ).The characteristic gene of the blue module [correlation (COR) =0.31, P=4.8×10−5] and turquoise module (COR =0.19, P=3.7×10−6) was significantly positively correlated with STAD and CHOL (see ). These correlations were confirmed by hierarchical clustering, heat map, and adjacency analyses (see ). The results suggested that the cyan and blue modules may contribute to the tumorigenesis of CHOL and STAD. Thus, the turquoise and blue modules were analyzed based on hub genes (12).
Identification of candidate hub genes from the turquoise and blue modules
In the turquoise and blue modules, the MMs and GS scores were strongly positively correlated. The selection criteria for the hub genes were relatively lower than the standard cut-off threshold (MM >0.8). In the turquoise module, the genes identified reached the thresholds of a “cor.gene” module membership >0.75 and a “cor.genetraitsignicac” >0.6. In the blue module, the genes met the thresholds of a “cor.gene module membership” >0.75 and a “cor.gene traitsential” >0.7.
Hub gene expression and correlation with survival
Based on the expression data and clinical information of the STAD containing CHOL tumor samples in TCGA, we studied the potential association between the expression of 37 genes identified in the turquoise module and 9 genes identified in the blue module and patient survival. The details of TCGA participants are shown in . To further analyze the relationship between the genes in the module and survival status, a forest plot was generated (see ). There were over 20 candidate hub genes, and we conducted a logistic regression to identify 4 important hub genes. The turquoise module genes APBB1, ARMCX1, and MSRB3 were associated with prognosis, as was the blue module gene DZIP1. The increased expression of these genes was associated with inferior overall survival (OS), the OS details of four genes was as follow: low level APBB1 had hazard ratio (HR) 1.46 (95% CI: 1.08, 1.97) with P value 0.01 compared with high level, low level ARMCX1 had HR 1.48 (95% CI: 1.10, 2.00) with P value 0.01, low level DZIP1 had HR 1.62 (95% CI: 1.20, 2.19) with P value <0.001 and low level MSRB3 had HR 1.54 (95% CI: 1.14, 2.08) with P value <0.001 (see ). Thus, we defined these genes as the “final” hub genes.
Table 1
The demographic information for GDC TCGA stomach cancer
Stomach cancer
Dead (N=280)
Alive (N=170)
Total (N=450)
P value
Tumor-normal, n (%)
0.32
Normal
25 (5.6)
10 (2.2)
35 (7.8)
Tumor
255 (56.7)
160 (35.6)
415 (92.2)
Age (years), mean ± SD
65.18±11.06
66.68±9.78
65.75±10.60
0.13
Gender, n (%)
0.18
Female
106 (23.6)
53 (11.8)
159 (35.3)
Male
174 (38.7)
117 (26.0)
291 (64.7)
Weight (kg), mean ± SD
67.55±22.59
65.21±18.72
66.63±24.59
0.24
Pathologic stage, n (%)
<0.001
Stage I
47 (11.1)
15 (3.5)
62 (14.6)
Stage II
101 (23.7)
40 (9.4)
141 (33.1)
Stage III
96 (22.5)
82 (19.3)
178 (41.8)
Stage IV
20 (4.7)
25 (5.9)
45 (10.6)
Sample type, n (%)
0.32
Primary tumor
255 (56.7)
160 (35.6)
415 (92.2)
Solid tissue normal
25 (5.6)
10 (2.2)
35 (7.8)
GDC, Genomic Data Commons; TCGA, The Cancer Genome Atlas; SD, standard deviation.
Table 2
The demographic information for GDC TCGA bile duct cancer
Cholangiocarcinoma
Dead (N=23)
Alive (N=22)
Total (N=45)
P value
Tumor-normal, n (%)
1
Normal
5 (11.1)
4 (8.9)
9 (20.0)
Tumor
18 (40.0)
18 (40.0)
36 (80.0)
Age (years), mean ± SD
63.87±12.13
65.27±13.04
64.56±12.46
0.71
Gender, n (%)
1
Female
12 (26.7)
11 (24.4)
23 (51.1)
Male
11 (24.4)
11 (24.4)
22 (48.9)
Weight (kg), mean ± SD
81.43±23.42
81.91±19.57
81.67±21.38
0.94
Pathologic stage, n (%)
0.37
Stage I
12 (26.7)
14 (31.1)
26 (57.8)
Stage II
5 (11.1)
5 (11.1)
10 (22.2)
Stage III
0 (0.0)
1 (2.2)
1 (2.2)
Stage IV
6 (13.3)
2 (4.4)
8 (17.8)
Sample type, n (%)
1
Primary tumor
18 (40.0)
18 (40.0)
36 (80.0)
Solid tissue normal
5 (11.1)
4 (8.9)
9 (20.0)
GDC, Genomic Data Commons; TCGA, The Cancer Genome Atlas; SD, standard deviation.
Figure 4
Survival analysis of the 4 hub genes. (A) The important hub genes were divided into forests by multiple analysis. (B) The STAD Kaplan-Meier survival curve was stratified with CHOL patients, and the low or high expression of 4 hub genes was determined by a logistic regression. (C) Differences in the gene expression of the 4 hub genes between the normal and tumor tissues. ***, P<0.0001; ****, P<0.00001. H, high risk; L, low risk; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma.
GDC, Genomic Data Commons; TCGA, The Cancer Genome Atlas; SD, standard deviation.GDC, Genomic Data Commons; TCGA, The Cancer Genome Atlas; SD, standard deviation.Survival analysis of the 4 hub genes. (A) The important hub genes were divided into forests by multiple analysis. (B) The STAD Kaplan-Meier survival curve was stratified with CHOL patients, and the low or high expression of 4 hub genes was determined by a logistic regression. (C) Differences in the gene expression of the 4 hub genes between the normal and tumor tissues. ***, P<0.0001; ****, P<0.00001. H, high risk; L, low risk; STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma.Using the GEPIA website, we confirmed that there were significant differences in the expression of all these hub genes in the normal and CHOL bearing STAD tissues (see ). To further validate the genes related to survival, we used the GEO data sets GSE66229 and GSE26566 to identify the significant survival-related genes for STAD and CHOL, respectively (see ). The results of TCGA analysis were consistent with those of the GEO analysis.
Figure 5
According to the STAD and CHOL geographical location data, 4 hub genes were verified and mutated. (A,B) The central genes were validated using geographic data sets (GSE 66229 and GSE 26566). (C,D) The bar charts and heat maps showed mutations in 4 hub genes. *, P<0.05; **, P<0.001; ***, P<0.0001; ****, P<0.00001; –, not applicable. STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma; GSE, Genomic Spatial Event.
According to the STAD and CHOL geographical location data, 4 hub genes were verified and mutated. (A,B) The central genes were validated using geographic data sets (GSE 66229 and GSE 26566). (C,D) The bar charts and heat maps showed mutations in 4 hub genes. *, P<0.05; **, P<0.001; ***, P<0.0001; ****, P<0.00001; –, not applicable. STAD, stomach adenocarcinoma; CHOL, cholangiocarcinoma; GSE, Genomic Spatial Event.
Mutation landscape of hub genes
According to the data of the STAD patients with CHOL in TCGA, the OncoPrint view of the hub genes in the cBioPortal database was used to visualize the mutation of the 4 hub genes. In some STAD patients, there were 4 hub gene mutations (29%). In a small number of CHOL patients (11%), there were 4 hub gene mutations. DZIP1 had the highest STAD mutation rate (8%), and the most missense mutations, and mutations leading to a higher expression of messenger RNA (see ), while DZIP1 (4%) had the highest cholesterol mutation rate (see ).
Clustering of 2 molecular subtypes based on LMRGs
According to the 11 prognostic genes generated by the univariate Cox analysis, the patients with STAD and CHOL in the training cohort were divided into subgroups using the consistent clustering method (see ). When k=2, the optimal clustering stability was reached. In total, 332 patients were clustered in group 1 and 163 patients were clustered in group 2. The heat maps showed the expression levels of the LMRGs in the two subtypes (see ), and a significant difference in expression between cluster 1 and cluster 2 was found. These results suggest that the LMRGs divide the STAD with CHOL patients into two molecular subtypes.
Figure 6
In the 2 sets, clustering was performed using CHOL and STAD data. (A-C) k=2 was determined as the best value for the desirable grouping; (D) Heat map showing the expression of lipid metabolism genes in the 2 subgroups; (E) Matrix score, immune score, and estimated score. (F) The abundance of the 6 filtered immune cells. CDF, cumulative distribution function; C1, Cluster 1; C2, Cluster 2; CHOL, cholangiocarcinoma; STAD, stomach adenocarcinoma. *, P<0.05; **, P<0.001; ****, P<0.00001; –, not applicable.
In the 2 sets, clustering was performed using CHOL and STAD data. (A-C) k=2 was determined as the best value for the desirable grouping; (D) Heat map showing the expression of lipid metabolism genes in the 2 subgroups; (E) Matrix score, immune score, and estimated score. (F) The abundance of the 6 filtered immune cells. CDF, cumulative distribution function; C1, Cluster 1; C2, Cluster 2; CHOL, cholangiocarcinoma; STAD, stomach adenocarcinoma. *, P<0.05; **, P<0.001; ****, P<0.00001; –, not applicable.
Two molecular subtypes exhibited different TIME and immune status
Additionally, we also performed immunoassays to explore the immune differences between the two molecular subtypes. The estimation algorithm showed that the immune score (P<0.05) and estimated score (P<0.05) of the patients with CHOL in group 2 were significantly lower than those of patients in group 1. Cluster 1 had a lower stromal score than cluster 2 (P<0.05). Additionally, the TIMER algorithm showed that cluster 1 had a greater number of B cells (P<0.05), macrophages (P<0.05), NK cells (P<0.05), CD4 T cells (P<0.05), CD8 T cells (P<0.05), endothelial cells (P<0.05), and natural killer cells (P<0.05) (). These results showed that there were significant differences in the TIME and immune status between the two molecular subtypes.
Discussion
In this study, the WGCNA method was used to analyze the pathogenesis of GC and CCA. The RNA-sequencing data of the GC and CCA samples were downloaded from TCGA, and the DEGs between the GC, CCA, and normal tissues were analyzed in relation to their expression and function levels. The GO database was used to analyze the functional enrichment of DEGs. The WGCNA of the DEG matrix identified modules related to the clinical characteristics of patients with GC and CCA. The hub genes found by the bioinformatics analysis were verified by survival rates, and an immunohistochemical GC and CCA analysis. Additionally, an in-depth analysis of the lipid metabolism genes was conducted to explore the effect of lipid metabolism on the pathogenesis of GC and CCA patients, and to analyze the immune microenvironments of GC and CCA patients.This study provides a relatively novel idea for combination of four cancer hub genes. We performed a bioinformatics analysis on an independent group of patients to identify biomarkers. Our results suggest that the poor prognosis of GC and CCA is related to the overexpression of APBB1, ARMCX1, DZIP1, and MSRB3. Our results of the functional enrichment analysis of the GC and CCA genes are consistent with previous reports (13,14). The metabolism of cancer cells plays an important role in the occurrence and development of tumors.Through the WGCNA, we found that central modules and hub genes play an important role in cancer development. We identified two key modules (i.e., the turquoise and blue modules), whose genes were associated with GC and CCA. Complex gene networks regulate the occurrence and development of GC and CCA. We found that 4 hub genes (i.e., APBB1, ARMCX1, DZIP1 and MSRB3) were closely related to the OS rate of GC patients. ARMCX1 has been reported to have potential prognostic value for GC and may have clinical application value (13). However, further basic research needs to be conducted to clarify the specific mechanism of ARMCX1 in GC. Both ARMCX1 and ARMCX2 are involved in the biological pathways and processes related to the progression and treatment of GC. Additionally, DZIP1 is an independent prognostic factor of GC and may be involved in the progression of GC (14). DZIP1 is involved in the activation of the emergency phenotype and is related to the massive infiltration of a variety of immune cells. Additionally, the DZIP1 mutation may be related to a good prognosis in GC patients. Finally, the methylation of DZIP1, a DNA promoter, can be used as a target for future GC therapy. Additionally, patients with GC have increased expression of MSRB3 which is associated with poor prognosis (15). Research on APBB1 in GC is limited, but it may be a potential biomarker of GC. Since the completion of this study on CCA biomarkers, we have not found any other relevant studies on these 4 CCA hub genes.Consensus clustering is a reliable method for dividing samples into different subgroups based on gene expression matrices. According to the LMRG expression matrix of the GC and CCA patients, we first identified 2 molecular subgroups by consensus clustering. Next, an immune function analysis was carried out to examine the role of LMRGs in GC and CCA (16,17). Since the progress of a tumor is related to changes in the surrounding matrix, and the immune cell is the important component of the tumor matrix, the TIME plays the important role in the prognosis of the patient (18). Additionally, the abnormal metabolism of cancer cells will lead to a change in the metabolism of the TIME (19). TIME was different between the two subgroups that cluster 1 was more immunogenic. Cluster 1 may benefit more because of greater abundance of CD8 T cells. In fact, recent trials (TOPAZ-1 and KEYNOTE-811) have shown that immunotherapy can improve clinical outcomes in patients with GC and CCA (20,21).The estimation algorithm is an innovative method for calculating tumor purity and the ratio of immune cells and stromal cells in tumors according to gene expression values (22). The immune score produced by the estimation algorithm quantitatively shows the immune component in the tumor sample and reflects the TIME (23). The purity of a tumor is defined as the proportion of the malignant cells in the tumor tissue, and it is significantly correlated with the prognosis of the tumor (24). Consistent with previous findings, we found that patients with a better prognosis had a higher immune score. The cluster 1 immune score was higher than that of the cluster 2. Additionally, we also used TIMER to evaluate the immune status of these 2 molecular subsets, which was helpful in quantifying the 6 tumor infiltrating immune subsets. The TIMER analysis showed that the abundance of all the immune cells in Cluster 1 increased significantly, which indicated that the immune cells in Cluster 1 were upregulated. In addition, four hug genes APBB1, ARMCX1, DZIP1, and MSRB3 were related with TIME of GC or CCA.In conclusion, this study conducted a WGCNA and found 4 hub genes implicated in GC and CCA: APBB1, ARMCX1, DZIP1, and MSRB3. These findings are helpful in fully understanding the gene network of GC and CCA (25). Through consensus clustering, 2 molecular subtypes were identified in GC and CCA based on the LMRGs. The immunoassays showed that abnormal lipid metabolism hinders the immune system. Further studies are warranted to determine how to best use this molecular information to improve clinical outcomes in GC and CCA.The article’s supplementary files as