Feng Cao1, Yun-Sheng Cheng1, Liang Yu1, Yan-Yan Xu2, Yong Wang1. 1. Department of General Surgery, The Second Hospital of Anhui Medical University, Hefei, Anhui, China (mainland). 2. Department of General Surgery, The Second Hospital of Anhui Medical University, Hefei, China (mainland).
Abstract
BACKGROUND This bioinformatics study aimed to identify differentially expressed genes (DEGs) and protein-protein interaction (PPI) networks associated with functional pathways in ulcerative colitis based on 3 Gene Expression Omnibus (GEO) datasets. MATERIAL AND METHODS The GSE87466, GSE75214, and GSE48958 MINiML formatted family files were downloaded from the GEO database. DEGs were identified from the 3 datasets, and volcano maps and heat maps were drawn after R language standardization and analysis, respectively. Venn diagram software was used to identify common DEGs. PPI analysis of common DEGs was performed using the Search Tool for the Retrieval of Interacting Genes. Gene modules and hub genes were visualized in the PPI network using Cytoscape. Enrichment analysis was performed for all common DEGs, module genes, and hub genes. RESULTS A total of 90 DEGs were selected, which included 3 functional modules and 1 hub gene module. CXCL8 module genes were mainly enriched in cytokine-mediated signaling pathways and interleukin (IL)-10 signaling. CCL20 module genes were mainly enriched in the IL-17 signaling pathway and cellular response to IL-1. Hub gene modules mainly involved IL-10, IL-4, and IL-13 signaling pathways. CXCL8, CXCL1, and IL-1ß were the top 3 hub genes and were mainly involved in IL-10 signaling. CONCLUSIONS Bioinformatics analysis using 3 GEO datasets identified CXCL8, CXCL1, and IL-1ß, which are involved in IL-10 signaling, as the top 3 hub genes in ulcerative colitis. The findings from this study remain to be validated, but they may contribute to the further understanding of the pathogenesis of ulcerative colitis.
BACKGROUND This bioinformatics study aimed to identify differentially expressed genes (DEGs) and protein-protein interaction (PPI) networks associated with functional pathways in ulcerative colitis based on 3 Gene Expression Omnibus (GEO) datasets. MATERIAL AND METHODS The GSE87466, GSE75214, and GSE48958 MINiML formatted family files were downloaded from the GEO database. DEGs were identified from the 3 datasets, and volcano maps and heat maps were drawn after R language standardization and analysis, respectively. Venn diagram software was used to identify common DEGs. PPI analysis of common DEGs was performed using the Search Tool for the Retrieval of Interacting Genes. Gene modules and hub genes were visualized in the PPI network using Cytoscape. Enrichment analysis was performed for all common DEGs, module genes, and hub genes. RESULTS A total of 90 DEGs were selected, which included 3 functional modules and 1 hub gene module. CXCL8 module genes were mainly enriched in cytokine-mediated signaling pathways and interleukin (IL)-10 signaling. CCL20 module genes were mainly enriched in the IL-17 signaling pathway and cellular response to IL-1. Hub gene modules mainly involved IL-10, IL-4, and IL-13 signaling pathways. CXCL8, CXCL1, and IL-1ß were the top 3 hub genes and were mainly involved in IL-10 signaling. CONCLUSIONS Bioinformatics analysis using 3 GEO datasets identified CXCL8, CXCL1, and IL-1ß, which are involved in IL-10 signaling, as the top 3 hub genes in ulcerative colitis. The findings from this study remain to be validated, but they may contribute to the further understanding of the pathogenesis of ulcerative colitis.
Inflammatory bowel disease (IBD) includes Crohn’s disease and ulcerative colitis (UC), which have overlapping clinical and molecular features. The incidence rate of IBD is increasing in worldwide. China has the highest incidence of IBD in Asia, and the ratio of UC to Crohn’s disease was reported to be 2.0 [1]. UC is a diffuse, chronic, nonspecific inflammatory disease that occurs in the colorectal mucosa and/or submucosa. The clinical symptoms of UC are characterized by relapsing mucosal inflammation that manifests as persistent or repeated mucopurulent bloody stool, abdominal pain, and other systemic symptoms [2]. UC prevalence is highest in Europe, followed by Canada and then the United States [3]. The age of onset is getting younger [4]. UC is recurrent and difficult to heal, and it is referred to as “deathless cancer” due to its serious impact on patients’ quality of life. UC is closely related to the development of colon cancer [5], which is difficult to cure. The pathogenesis of UC has not been clarified, and the possible underlying mechanisms include bacterial infection, intestinal mucosal barrier dysfunction, and genetic, dietary, environmental, immunological, and psychological factors [6]. The interaction of a variety of factors results in high intestinal mucosa sensitivity, ultimately leading to intestinal inflammation and tissue damage. Conservative medical treatment of UC in clinical practice is somewhat limited and includes dietary regulation and drug treatment with aminosalicylic acid, antibiotics, probiotics, and glucocorticoids. The criterion standard surgical treatment of UC is ileal pouch anal anastomosis [7]. This surgery can remove UC target organs, but the incidence of postoperative complication of pouchitis is high [8,9]. Therefore, further understanding of the pathogenesis and regulation of UC at the molecular level may provide new directions for UC prevention and treatment.The rapid development and extensive application of gene expression profile data has led to bioinformatics analysis becoming a popular method to explore disease pathogenesis. Bioinformatics analysis provides significant insight into the pathophysiological mechanisms of diseases at the genetic level. For example, Cheng et al. [10] screened GNG11, GNB4, and AGT as potential molecular targets and diagnostic biomarkers for UC and Crohn’s disease by bioinformatics. Feng et al. [11] found that TATAbinding protein 1, nuclear factor-κB, and microRNAs were closely related to 233 differentially expressed genes (DEGs) of UC and may be potential molecular targets for its treatment [11]. Here, using 3 datasets from the Gene Expression Omnibus (GEO) database, we applied bioinformatics tools to perform a series of UC expression profile analyses to identify potential core genes that can serve as molecular targets for the prevention and treatment of UC. Through these analyses, we identified DEGs and protein–protein interaction (PPI) networks associated with functional pathways in UC.
Material and Methods
Dataset information
The search terms used in the GEO database, which is affiliated with the National Center for Biotechnology Information () were “ulcerative colitis” (MeSH Terms) AND “human” [Organism] [12]. We select datasets with normal tissue samples and UC colon tissue samples: GSE87466, GSE75214, and GSE48958. The GSE87466 dataset was generated on the GPL13158 platform (HT_HG-U133_Plus_PM Affymetrix HT HG-U133+ PM Array Plate), and the GSE75214 and GSE48958 datasets were generated on the GPL6244 platform (HuGene-1_0-st Affymetrix Human Gene 1.0 ST Array transcript [gene] version). The GSE87466, GSE75214, and GSE48958 datasets contained 21, 11, and 8 normal tissue samples and 87, 97, and 13 UC colon tissue samples, respectively.
Data processing and screening of DEGs
The MINiML formatted family files for each of the GEO datasets were downloaded. R language was used to extract the matrix file, and standardization was performed using quantiles. We used |logFC| >1.5 and P<0.05 as standards to identify DEGs through the “limma” R software package. A volcano map was used to show all DEGs. We then used Euclidean distances to perform clustering. The “pheatmap” package of R software was used to draw the 100 DEGs from the 3 datasets. Lastly, Venn diagram software was used to identify DEGs common to all datasets.
Common DEGs enrichment analysis
DEGs were subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Gene and Genome (KEGG) enrichment analyses using the Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.8 (). The parameters were set to P<0.05. GO enrichment analysis is a common method for identifying molecular function, cellular component, and biological process characteristics of genomic or transcriptome data. KEGG mainly includes system information, genomic information, and chemical information, and it can be used to annotate the genome and integrate the related effects of known protein products.
PPI network analysis
PPI network analysis was performed using the Search Tool for the Retrieval of Interacting Genes (STRING) () online tool to assess the correlation between 2 or more protein products. Additionally, the Molecular Complex Detection (MCODE) and cytoHubba plugins were used in Cytoscape () to detect important module and hub genes in the PPI network, respectively. The MCODE parameters were set as degree cutoff=2, maximum depth=100, k-core=2, and node score cutoff=0.2. The cytoHubba plugin screens hub genes using the degree algorithm.
Enrichment analysis of module genes and hub genes
The Metascape Database () [13] was used to perform enrichment analysis for genes in the 2 top-scoring modules. Metascape Database integrates multiple authoritative data resources including GO, KEGG, UniProt, and DrugBank, which can better complete pathway enrichment and annotation of biological processes. Reactome () provides genetic pathway visualization, interpretation, and analysis. We used this pathway database for hub gene enrichment analysis. GeneCards (), which provides comprehensive biological information on annotated and predicted human genes, was used to search for single hub genes of CXCL8 and CXCL1.
Results
Identifying DEGs in UC tissue and normal tissue
We uniformly standardized the 3 datasets (Figure 1). We obtained 527, 293, and 125 DEGs from GSE87466, GSE75214, and GSE48958, respectively (Figure 2A). The GSE87466 dataset contained 347 upregulated DEGs and 180 downregulated DEGs. The GSE75214 and GSE48958 datasets contained 187 upregulated and 106 downregulated DEGs and 69 upregulated and 56 downregulated DEGs, respectively. The top 100 DEGs are displayed in heat maps (Figure 2B). All up- and downregulated DEGs were screened using Venn diagram software to identify common DEGs. A total of 52 upregulated and 38 downregulated common DEGs were obtained (Figure 2C).
Figure 1
Standardization of gene expression. Standardization of gene expression in (A) GSE87466, (B) GSE75214, and (C) GSE48958 datasets. The blue bar represents the data before normalization, and the red bar represents the data after normalization.
Figure 2
Screening results for the differentially expressed genes. (A) Volcano maps of gene expression in GSE87466, GSE75214, and GSE48958 datasets. Data points in red, green, and black represent genes with upregulated, downregulated, and not significantly different expression, respectively. (B) Heat maps of gene expression GSE87466, GSE75214, and GSE48958 datasets. (C) Venn diagram of common differentially expressed genes in GSE87466, GSE75214, and GSE48958 datasets.
GO biological functions and KEGG pathway enrichment analysis of common DEGs
The results of the GO enrichment analysis are shown in Figure 3. In the biological process analysis, DEGs were mainly involved in the inflammatory response, defense response, immune response, response to bacteria, and response to cytokine stimulus. In the cellular component analysis, DEGs were mainly enriched in extracellular regions and in the cell, membrane, and insoluble fractions. In the molecular function analysis, DEGs were mainly enriched in tetrapyrrole binding, cytokine activity, chemokine activity, heme binding, and calcium ion binding.
Figure 3
Gene ontology (GO) and Kyoto Encyclopedia of Gene and Genome (KEGG) pathway enrichment analysis for common differentially expressed genes.
The main KEGG analysis results showed that DEGs were mainly concentrated in starch and sucrose metabolism and ATP-binding cassette (ABC) transporter pathways (Figure 3).
PPI network and visualization of genes
We obtained a PPI network composed of 69 nodes and 180 edges (Figure 4A). Further analysis using the MCODE plugin identified 3 functional modules. The 2 top-scoring modules were CCL20 and CXCL8. The CCL20 module included CCL20, MMP1, CXCL13, C3, CXCL11, IDO1, NOS2, SOCS3, IL1RN, MMP3, MMP7, and TIMP1. They were closely connected with each other in the CCL20 module. CXCL8, CXCL1, IL-1β, PTGS2, LCN2, and CHI3L1 were more highly associated with another module (Figure 4B–4D). The hub genes identified using the cytoHubba plugin included CXCL8, CXCL1, IL-1β, PTGS2, TIMP1, CCL20, C3, MMP1, LCN2, and MMP3. Among them, CXCL8, CXCL1, and IL-1β were the top 3 hub genes (Figure 4E).
Figure 4
Protein–protein interaction (PPI) analysis. (A) PPI networks of the common differentially expressed genes. (B–D) The 3 top-scoring modules are shown from the top to bottom. (E) The PPI hub gene module.
Further enrichment analysis of the module genes and hub genes
CXCL8 and CCL20 module genes were further enriched using the Metascape database, which integrates GO function and KEGG pathway analyses (Figure 5A, 5B). CXCL8 module genes were mainly enriched in the cytokine-mediated signaling pathway, response to bacteria, regulation of inflammatory response, pregnancy (females), and positive regulation of response to external stimulus. CCL20 module genes were predominantly enriched in the cellular response to IL-1, positive regulation of angiogenesis, and antimicrobial humoral response. The KEGG pathway enrichment analysis showed that CXCL8 module genes were significantly enriched in IL-10 signaling, TNF signaling, and regulation of insulin-like growth factor. CCL20 module genes were mainly enriched in the IL-17, IL-4, and IL-13 signaling pathways.
Figure 5
Metascape enrichment analysis of the top 2 modules and reactome pathway enrichment analysis for hub gene modules. (A) Metascape analysis of the CCL20 modules. (B) Metascape analysis of the CXCL8 modules. (C) Interleukin (IL)-10 signaling pathway. (D) Heat map of the expression of all proteins involved in the IL-10 signaling pathway in the human body.
The top 3 hub genes, CXCL8, CXCL1, and IL-1β, were subjected to further GO analysis (Table 1). In the biological process analysis, CXCL8 was mainly enriched in angiogenesis and antimicrobial humoral immune response mediated by antimicrobial peptides. CXCL1 was predominantly enriched in actin cytoskeleton organization and in the cellular response to lipopolysaccharide. IL-1β was significantly enriched in activation of MAPK activity and apoptotic processes. In the cellular component analysis, CXCL8, CXCL1, and IL-1β were mainly enriched in the extracellular region and in extracellular space. CXCL1 was also enriched in the specific granule lumen and tertiary granule lumen. IL-1β was also enriched in the cytosol, lysosome, and secretory granule. In the molecular function analysis, CXCL8 and CXCL1 were mainly enriched in chemokine activity, CXCR chemokine receptor binding, and IL-8 receptor binding. IL-1β was predominantly enriched in chemokine activity and integrin binding.
Table 1
The Gene Ontology enrichment analysis of the top 3 hub genes.
Genes
Biological process
Cellular component
Molecular function
CXCL8
GO: 0001525 angiogenesis;GO: 0002237 response to molecule of bacterial origin;GO: 0006935 chemotaxis;GO: 0006952 defense response;GO: 0006954 inflammatory response;GO: 0006955 immune response;GO: 0007050 cell cycle arrest;GO: 0007165 signal transduction;GO: 0007186 G protein-coupled receptor signaling pathway;GO: 0008285 negative regulation of cell proliferation
GO: i0005576 extracellular region;GO: 0005615 extracellular space
GO: 0005125 cytokine activity;GO: 0005149 interleukin-1 receptor binding;GO: 0005178 integrin binding;GO: 0005515 protein binding;GO: 0019904 protein domain specific binding
The main pathway enrichment results for hub genes are shown in Table 2. The top 3 hub genes were mainly enriched in IL-10, IL-4, and IL-13 signaling; signaling by interleukins; cytokine signaling in the immune system; and in the immune system. Among them, the P value of the IL-10 signaling pathway was the lowest (Figure 5C). This pathway is composed of 2 chemical components, 45 proteins, and 39 DNA and RNA molecules. The expression levels of 45 IL-10 signaling pathway genes in various tissues in the human body are shown in Figure 5D. CXCL8, CXCL1, and IL-1β are expressed in many organs including the colon, duodenum, appendix, and liver.
Table 2
The main signaling pathway enrichment analysis of the top 3 hub genes.
This study screened gene UC and normal tissue expression datasets and identified 90 DEGs, including 3 functional modules and 1 hub gene module. Enrichment analysis found that the hub gene module was enriched in IL-10, IL-4, and IL-13 signaling pathways. In addition, CXCL8, CXCL1, and IL-1β, which are mainly involved in IL-10 signal transduction, were the top 3 hub genes in this module. Cheng et al. [10] previously suggested that GNG11, GNB4 and AGT could be potential therapeutic molecular targets and diagnostic biomarkers for UC and Crohn’s disease. Feng et al. [11] found that TATA-binding protein 1, nuclear factor-κB, and microRNAs may also be potential molecular targets for treatment of UC. The main reason for the different identifications is that different datasets were used. The DEGs identified in these studies, as well as those in the current study, may play important roles in the pathogenesis of UC and could represent potential therapeutic targets.UC is a global, progressive, and complex disease with a long course and recurrent episodes. The incidence rates of UC are increasing worldwide, but the pathogenesis remains unclear and current treatments are not satisfactory. The relatively recent development of bioinformatics has led to the emergence of targeted molecular therapies for many diseases. Zhong et al. [14] used a bioinformatics analysis to show that the ATP-citrate lyase (ACLY) hub gene may be a molecular target in type 2 diabetes. Feng et al. [15] identified 4 significantly upregulated DEGs related to poor prognosis in ovarian cancer that could be potential therapeutic targets for patients with ovarian cancer. Similarly, exploring the molecular regulatory mechanism of UC occurrence and development by screening and analyzing key genes at the genetic level is expected to provide new directions for the precise prevention and treatment of UC in the clinic.IL-10 is an anti-inflammatory cytokine that inhibits a broad spectrum of activated macrophage/monocyte functions. IL-10 plays an important role in UC. Previous studies [12,16] revealed that intestinal inflammation may be aggravated in individuals with IL-10 deficiency through inhibition of regulatory T cells/cytotoxic T-lymphocyte-associated protein 4 and through promotion of the IL-1β/T helper 2 cell pathway. Studies have shown that IL-10 can negatively regulate CXCL8, CXCL1, IL-1β, CCL20, and PTGS2 [17-19]. These data are consistent with our results showing that these genes are involved in the IL-10 signaling pathway.Chemokine (C-X-C motif) ligand 8 (CXCL8), also known as IL-8 and neutrophil factor, can be secreted by lymphocytes, monocytes, and epithelial cells, and it plays an important role in the occurrence and development of tumors by promoting angiogenesis and immune cell infiltration [20]. Additionally, CXCL8 can recruit and activate neutrophils and granulocytes to migrate to the site of inflammation [21,22]. Heidarian et al. [23] found that changing the intestinal flora of patients with IBD can promote increased IL-8 expression and thereby increase the severity of the disease. Another study showed that IL-8 expression is increased in patients with IBD, especially in conjunction with UC [24]. These results are consistent with those presented here, which indicate that CXCL8 is highly expressed in patients with UC patients. In our study, CXCL8 was the DEG with the highest degree value in the hub gene module. This finding suggests that it may play an important regulatory role in the pathogenesis of UC and could be expected to serve as a new therapeutic target for the treatment of UC.CXCL1 is also known as NAP-3, CINC-1, and GRO-α. This protein was originally identified in melanoma, and it was later confirmed to be expressed in neutrophils, macrophages, and epithelial cells and to be involved in inflammation, angiogenesis, and wound-healing biological processes [25]. Alzoghaibi et al. [26] found that GRO-α levels were significantly higher in patients with IBD than in healthy controls, and that elevated GRO-α levels exacerbated IBD-related inflammation. This finding is consistent with the results presented by Mitsuyama et al. [27] showing that serum GRO-α is positively correlated with IBD activity and may be a marker for detecting IBD activity. In the present study, CXCL1 was another hub gene involved in UC pathogenesis. However, the specific mechanism through which CXCL1 contributes to UC requires further study.IL-1β is mainly produced by monocytes and macrophages. It is a pro-inflammatory factor that participates in the occurrence and development of various diseases. Tian et al. [28] showed that IL-1β can promote tumor growth and metastasis. In addition, IL-1β can increase the host immune response through nuclear factor-κB, which then increases susceptibility to IBD [29]. De Santis et al. [30] found that IL-1β exacerbates colitis and may be a potential target for the clinical treatment of UC. This finding is consistent with our results. IL-1β was identified as 1 of the top 3 hub genes, with the same degree value as CXCL1. We speculate that IL-1β also plays an important role in the pathogenesis of UC, but this requires confirmation through additional research.Prostaglandin-endoperoxide synthase (PTGS), or cyclooxygenase, is a key enzyme in prostaglandin biosynthesis. It is involved in various biological processes including inflammation, reproduction, and tumor migration. Song et al. [31] showed that reducing PTGS2 expression can help to reduce the occurrence of colorectal cancer. This possibility is consistent with the results presented by Venè et al. [32], which showed that PTGS2 inhibitors have a positive effect on the prognostic survival of patients with colorectal cancer. Other researchers have shown that PTGS2 expression in colon tissue increases prostaglandin synthesis, which can cause visceral hypersensitivity in patients with irritable bowel syndrome [33]. However, there are relatively few studies on the specific molecular mechanism through which PTGS2 functions in the pathogenesis of IBD. Our results show that PTGS2 may be a therapeutic target for UC.Chemokine (CC motif) ligand 20 (CCL20) is also known as human macrophage inflammatory protein 3α. The combination of CCL20 and CCR6 may be responsible for the immunosuppressive effect of marrow-derived suppressor cells [34]. Similarly, Woznicki et al. [35] found that the CCL20 and CCL5 chemokines are involved in the immune response in the human intestine. CCL20 may be an important regulatory molecule in the pathogenesis of IBD. Skovdahl et al. [36] found increased expression of CCL20 and CCR6 in colonic mucosal epithelial cells and peripheral blood mononuclear cells in patients with IBD. While our results indicate that CCL20 has less effect on UC occurrence and development than do the other 2 hub genes, it does not indicate that CCL20 is of less importance in UC. Confirmation of these results using a large number of samples and experiments is required.There is overlap between UC and Crohn’s disease, which is why previous bioinformatics studies have focused on IBD. The current study used 3 GEO datasets that relied on the accurate identification of UC, which was not validated in this study. Larger datasets should be analyzed in future. This study was a bioinformatics study, and it did not validate the potential molecular biomarkers identified.
Conclusions
Bioinformatics analysis using 3 GEO datasets identified CXCL8, CXCL1, and IL-1β as the top 3 hub genes involved in IL-10 signaling in UC. The findings from this study remain to be validated, but they may contribute to further understanding of the pathogenesis of UC.
Authors: A Mark-Christensen; R Erichsen; S Brandsborg; F R Pachler; C B Nørager; N Johansen; J H Pachler; O Thorlacius-Ussing; M D Kjaer; N Qvist; L Preisler; J Hillingsø; J Rosenberg; S Laurberg Journal: Colorectal Dis Date: 2018-01 Impact factor: 3.788
Authors: Michael Kvorjak; Yasmine Ahmed; Michelle L Miller; Raahul Sriram; Claudia Coronnello; Jana G Hashash; Douglas J Hartman; Cheryl A Telmer; Natasa Miskov-Zivanov; Olivera J Finn; Sandra Cascio Journal: Cancer Immunol Res Date: 2019-12-12 Impact factor: 11.151
Authors: R de Waal Malefyt; C G Figdor; R Huijbens; S Mohan-Peterson; B Bennett; J Culpepper; W Dang; G Zurawski; J E de Vries Journal: J Immunol Date: 1993-12-01 Impact factor: 5.422
Authors: Xiaoying S Zhong; John H Winston; Xiuju Luo; Kevin T Kline; Syed Z Nayeem; Yingzi Cong; Tor C Savidge; Roderick H Dashwood; Don W Powell; Qingjie Li Journal: Cell Mol Gastroenterol Hepatol Date: 2018-03-09
Authors: Yingyao Zhou; Bin Zhou; Lars Pache; Max Chang; Alireza Hadj Khodabakhshi; Olga Tanaseichuk; Christopher Benner; Sumit K Chanda Journal: Nat Commun Date: 2019-04-03 Impact factor: 14.919
Authors: Jan Korbecki; Iwona Szatkowska; Patrycja Kupnicka; Wojciech Żwierełło; Katarzyna Barczak; Iwona Poziomkowska-Gęsicka; Jerzy Wójcik; Dariusz Chlubek; Irena Baranowska-Bosiacka Journal: Int J Mol Sci Date: 2022-06-28 Impact factor: 6.208