Literature DB >> 30015893

Comprehensive analysis of gene expression profiles provides insight into the pathogenesis of Crohn's disease.

Abstract

Crohn's disease (CD) is a type of inflammatory bowel disease that cannot be fully cured by medication or surgery. In the present study, the aim was to understand the underlying mechanisms of CD. Two CD microarray datasets were downloaded from The Gene Expression Omnibus database: GSE36807 (13 CD and 7 normal samples) and GSE59071 (8 CD and 11 normal samples). A series of bioinformatics analyses were conducted, including weighted gene co‑expression network analysis to identify stable modules, and analysis of differentially expressed genes (DEGs) between CD and normal samples. The common DEGs in the GSE36807 and GSE59071 datasets were screened. Subsequently, overlapping genes in the stable modules and the DEGs were selected to construct a protein‑protein interaction (PPI) network using Cytoscape software. Enrichment analysis of genes in the network was performed to explore their biological functions. A total of 10 stable modules and 927 DEGs were identified, of which 234 genes were shared in the stable modules and the DEGs. After removal of 32 uncharacterized genes, 202 genes were selected to build the PPI network. Low density lipoprotein receptor (LDLR), toll‑like receptor 2 (TLR2), lipoprotein lipase (LPL), forkhead box protein M1 (FOXM1) and neuropeptide Y (NPY) were revealed as key nodes with high degree. Pathway enrichment analysis demonstrated that LPL was enriched in the peroxisome proliferator‑activated receptor (PPAR) signaling pathway. In conclusion, LDLR, TLR2, FOXM1 and NPY, as well as LPL in the PPAR signaling pathway may serve critical roles in the pathogenesis of CD.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2018 PMID： 30015893 PMCID： PMC6102736 DOI： 10.3892/mmr.2018.9267

Source DB: PubMed Journal: Mol Med Rep ISSN： 1791-2997 Impact factor: 2.952

Introduction

Crohn's disease (CD) is a type of inflammatory bowel disease (IBD) that can affect any part of the gastrointestinal tract (1). This disease is caused by immune, bacterial and environmental factors in individuals with genetic susceptibility (2–4). Patients with CD usually suffer from weight loss, abdominal pain, fever and diarrhea (5); other complications associated with CD include arthritis, anemia, inflammation of the eye, fatigue and skin problems (6). CD is a common disease in the developed world, with increasing incidence and prevalence in developing countries (7,8). At present, there are no medical treatments or operative interventions that can fully cure CD, and in some cases surgery may even result in recurrence of the disease (9). Therefore, the underlying mechanisms of CD need to be explored in order to develop novel treatment strategies. So far, some progress has been made with understanding the pathogenesis of CD. For example, it was demonstrated that mutation of the autophagy-related 16-like 1 gene is associated with CD, and may possibly support the role of autophagy in the development and progression of IBD (10). T helper (Th) 17 cells are a unique Th cell lineage that serve an important role in inflammatory diseases by secreting proinflammatory cytokines, including interleukin 17 (IL-17) and IL-23, and may mediate the Th1/Th17 imbalance seen in CD and ulcerative colitis (11,12). A previous study reported that signal transducer and activator of transcription 3 (STAT3) and Janus kinase 2 (JAK2), which are involved in the STAT-JAK pathway, can increase the risk of CD (13). Nucleotide-binding oligomerization domain-containing 2 (NOD2) is expressed throughout the small intestine, with highest expression in the terminal ileum where there is an abundance of Paneth cells, and is also closely associated with pathogenesis of CD (14,15). Despite these profound findings, pathogenesis of CD has not been fully elucidated. Bioinformatics analysis is a useful tool for exploring disease pathogenesis and screening novel therapeutic targets (16). Kenny et al performed a genome-wide association study in an Ashkenazi Jewish population with CD and revealed that 16 replicated and novel loci made up 11.2% of the total genetic variations associated with CD risk (17). Fransen et al employed expression quantitative trait loci to select single nucleotide polymorphisms (SNPs) for follow-up and identified two CD-associated SNPs in the ubiquitin-conjugating enzyme E2L 3 and B-cell chronic lymphocytic leukemia/lymphoma 3 genes (18). These findings provide some information on the etiology of CD; however, to the best of our knowledge the mechanisms of CD pathogenesis have not yet been investigated by comprehensive bioinformatics analysis. In the present study, comprehensive bioinformatics analysis was carried out to understand the pathogenesis of CD and identify novel therapeutic targets. The gene expression profiles of patients with CD were downloaded from a public database, and subsequently a series of bioinformatics analyses were performed, including weighted gene co-expression network analysis (WGCNA), meta-analysis, protein-protein interaction (PPI) network analysis and enrichment analysis for key genes associated with CD. This study contributes towards the further understanding of the mechanisms underlying human CD.

Materials and methods

Data source

The key words ‘Crohn's disease’ and ‘Homo sapiens’ were used to search relevant gene expression profile data on the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The inclusion criteria were as follows: i) Gene expression profiles; ii) intestinal tissues from patients with CD (not cells); iii) availability of both CD and normal samples; iv) human samples and v) number of total samples ≥18. Based on these criteria, two microarray datasets were selected: GSE36807 (Platform GPL570; 13 CD and 7 normal samples) (19) and GSE59071 (Platform GPL6244; 8 CD and 11 normal samples) (20).

Data pre-processing

The raw data (CEL files) were downloaded from GEO. Subsequently, the oligo package version 1.41.1 from R (http://www.bioconductor.org/packages/release/bioc/html/oligo.html) (21) was utilized to perform pre-treatments, including conversion of data format, summarization using median-polish, background correction using the MAS method and data normalization by quantile method.

WGCNA analysis

WGCNA is a popular systems biology tool used to construct gene co-expression networks, which can be used to detect disease-associated gene clusters and identify therapeutic targets (22). GSE36807 and GSE59071 were used as the training and validation sets, respectively. Stable gene modules associated with CD were screened using the R WGCNA package version 1.61 (https://cran.r-project.org/web/packages/WGCNA/index.html) (22), with the following parameters; cutHeight=0.95 and ≥30 genes/module; the correlation coefficients (CC) were also calculated.

Meta-analysis

Using the MetaDE.ES function in the R MetaDE package (https://cran.r-project.org/src/contrib/Archive/MetaDE/) (23), differentially expressed genes (DEGs) between CD and normal samples that were common to the two datasets were identified. To ensure homogeneity of genes, tau2=0, Qpval >0.05 and false discovery rate <0.05 were used as the cut-off criteria.

Construction of the PPI network

Genes in the stable modules and DEGs were compared, and overlapping genes were used as candidate genes for construction of the PPI network. An integrated PPI network was generated from the candidate genes by merging numerous PPI databases, including BioGRID version 3.4.153 (http://thebiogrid.org/) (24), Human Protein Reference Database release 9 (http://www.hprd.org/) (25) and STRING version 10.5 (https://string-db.org/) (26). The PPI network was visualized using the Cytoscape software version 3.3.0 (http://www.cytoscape.org/) (27).

Functional and pathway enrichment analyses

The Database for Annotation, Visualization and Integrated Discovery (DAVID) version 6.8 (https://david.ncifcrf.gov/) is an easy-to-use web tool in which genes are annotated and summarized according to shared categorical data (28). Based on DAVID analysis, Gene Ontology (GO) (29) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) (30) pathway enrichment analysis were conducted for the nodes (a node represents a gene) identified in the PPI network, with the criterion of P<0.05.

Results

To ensure the GSE36807 and GSE59071 datasets were comparable, correlation of gene expression and connectivity of common genes was analyzed. The results revealed that the CC for gene expression and connectivity between the two datasets were 0.82 (P<1×10−200; Fig. 1A) and 0.51 (P<1×10−200; Fig. 1B), respectively. Therefore, it was deduced that the GSE36807 and GSE59071 datasets were comparable.

Figure 1.

Correlation between (A) gene expression and (B) gene connectivity in the GSE36807 and GSE59071 datasets. Cor, correlation.

To meet the prerequisite of scale-free network distribution, the value of the adjacency matrix weighting parameter ‘power’ was explored. After setting the ranges of parameters for the network, the scale-free topology matrix was calculated. Using GSE36807 as the essential data, the scale-free topology model fit was calculated and statistics were selected for graphing (Fig. 2A). The greater the value of CC2, the closer the network was to scale-free distribution. Based on this theory, the mean connectivity of genes was calculated using power=8 as this was the minimum power value to achieve R2=0.9. The results demonstrated that the mean connectivity was 16, which conforms to the node connection properties of a scale-free network (Fig. 2B).

Figure 2.

(A) Selection of the weighting parameter ‘power’. The red line indicates when the square of correlation coefficient equals 0.9. (B) Mean connectivity of genes according to ‘power’. The red line indicates a mean connectivity of 16 when the ‘power’ is 8.

Using GSE36807 as the training set, disease-associated modules were screened. Variation in gene expression in the samples was calculated, and the genes with variation coefficients >0.1 were selected. Subsequently, the community dissimilarities among the genes were calculated, and a clustering tree was obtained. A total of 10 modules (D1M1, D1M2, D1M3, D1M4, D1M5, D1M6, D1M7, D1M8, D1M9 and D1M10) were identified using cutHeight=0.95 (Fig. 3A). Similarly, module division was performed for GSE59071 (Fig. 3B) and the modules in this dataset were used to evaluate the stability of those identified in the training set.

Figure 3.

Module partition trees of (A) GSE36807 and (B) GSE59071; each color represents a different module.

The correlation of gene expression was analyzed for the same color modules in the two datasets, and the results demonstrated that 10 modules were stable (Table I; preservation Z-score >5; P≤0.05). According to the expression similarities of the module genes, the correlation of the modules in GSE36807 (Fig. 4A) and GSE59071 (Fig. 4B) were analyzed (Table I). By combining the correlations among the modules with the GO categories for genes in the stable modules (Table I), a network for the modules was constructed (Fig. 4C). This revealed that genes in three stable modules (M1, M2 and M8) were significantly associated with cell adhesion and genes in two stable modules (M4 and M10) were significantly associated with immune response.

Table I.

Correlation of gene expression for the same color modules in GSE36807 and GSE59071.

GSE36807	GSE59071	Color	Size	Correlation	P-value	Z-score	Module characterization
Modules				Module preservation

D1M1	D2M1	Black	163	0.40	1.2×10⁻⁷	20.34	Cell adhesion
D1M2	D2M2	Blue	431	0.37	2.0×10⁻¹⁵	10.05	Cell adhesion
D1M3	D2M3	Brown	412	0.30	5.1×10⁻¹⁰	15.88	Cell-cell signaling
D1M4	D2M4	Green	180	0.50	8.9×10⁻¹³	16.73	Immune response
D1M5	D2M5	Grey	368	0.35	4.8×10⁻¹²	4.12	–
D1M6	D2M6	Magenta	38	0.32	5.0×10⁻²	11.74	Cell cycle
D1M7	D2M7	Pink	123	0.41	2.5×10⁻⁶	5.53	Regulation of transcription
D1M8	D2M8	Red	177	0.35	1.8×10⁻⁶	21.44	Cell adhesion
D1M9	D2M9	Turquoise	944	0.68	4.2×10⁻¹²⁹	17.93	Ion transport
D1M10	D2M10	Yellow	352	0.42	1.8×10⁻¹⁶	29.51	Immune response

Size is the number of genes in one module. Module characterization is the significant functional term for each module. 510 indicate stable and highly stable, respectively. D1, Dataset1 GSE36807; D2, Dataset2 GSE59071; M, module.

Figure 4.

Correlation cluster diagrams for the modules of (A) GSE36807 and (B) GSE59071. (C) Module-module correlation network. Single lines connect modules within the same dataset and double lines connect modules across datasets. The thickness of the links indicates the extent of correlation. D1, dataset 1 GSE36807; D2 dataset 2 GSE59071; M, module.

After calculation of the parameter values of each gene, a total of 927 DEGs were screened according to the aforementioned thresholds. The heatmap revealed that changes in expression of the DEGs were similar for GSE36807 and GSE59071 (Fig. 5).

Figure 5.

Clustering heatmap of differentially expressed genes.

A total of 234 overlapping genes from genes in the stable modules (3,188 genes) and the 927 DEGs were identified and used as candidate genes (Fig. 6A). The number and proportion of these genes in different modules are shown in Fig. 6B. A total of 32 uncharacterized genes in the grey module were removed from subsequent analyses. PPIs were predicted for the remaining 202 genes and a PPI network (123 nodes and 270 edges) was constructed (Fig. 7). Notably, low density lipoprotein receptor (LDLR; node degree=22), toll-like receptor 2 (TLR2; node degree=19), lipoprotein lipase (LPL; node degree=18), forkhead box M1 (FOXM1; node degree=17) and neuropeptide Y (NPY; node degree=16) were the top five nodes with high degrees in the PPI network.

Figure 6.

(A) Venn diagram depicting the overlap between the 3,188 genes in the stable modules and the 927 differentially expressed genes. (B) Pie chart of the number of genes in different modules. WCGNA, weighted gene co-expression network analysis.

Figure 7.

Protein-protein interaction network. Upright red and inverted green triangles represent upregulated and downregulated proteins, respectively. The color intensity indicates gene expression. The enlarged nodes are the top five nodes with highest degrees. FC, fold change.

A total of 24 GO categories and 12 KEGG pathways were enriched for the PPI network nodes. Notably, the network nodes were mainly implicated in the enzyme-linked receptor protein signaling pathway (GO category; P=0.00185), cell-cell signaling (GO category; P=0.003505), peroxisome proliferator-activated receptor (PPAR) signaling pathway (KEGG pathway; P=0.001014) and cytokine-cytokine receptor interaction (KEGG pathway; P=0.004187) (Fig. 8).

Figure 8.

(A) Functional categories and (B) pathways enriched for the protein-protein interaction network nodes. BP, biological process; CC, cellular component; GO, Gene Ontology; MAPK, mitogen-activated protein kinase; MF, molecular function; PPAR, peroxisome proliferator-activated receptor.

Discussion

In the present study, a total of 10 stable CD-associated modules were identified using WGCCA; of which three stable modules (M1, M2 and M8) and two stable modules (M4 and M10) were significantly associated with cell adhesion and immune response, respectively. In addition, 927 overlapping DEGs in the GSE36807 and GSE59071 datasets were screened. In the PPI network, five important nodes with relatively high node degrees were identified as LDLR, TLR2, LPL, FOXM1 and NPY. Previous studies have investigated the relationship between inflammatory cytokines and LDLR (31,32), and demonstrated that the inflammatory cytokines IL-1β and tumor necrosis factor (TNF)-α can regulate LDLR expression leading to accumulation of high concentrations of low-density lipoprotein (33). FOXM1 belongs to the forkhead box (FOX) family of transcription factors, and regulates key genes involved in goblet cell metaplasia and lung inflammation (34,35). FOXM1 is an important mediator of cell proliferation, and its expression is enhanced during gastric carcinogenesis induced by Helicobacter pylori infection (36). Overexpression of triglyceride-rich lipoproteins can induce lipid accumulation and early inflammatory responses, and LPL serves an anti-inflammatory role by hydrolyzing triglyceride-rich lipoproteins (37). PPARs can regulate lipid metabolism-associated genes and control inflammation, and PPARα and -γ activators contribute to macrophage secretion of LPL (38,39). At present, there is no direct evidence for the involvement of LDLR, FOXM1 and LPL in CD development. However, their expression levels may be linked to disease progression. A study demonstrated that polymorphisms of LIGHT (a TNF superfamily member) are associated with CD, and inhibiting LIGHT reduces dyslypidemia in mice lacking LDLR (40). This suggests a possible association between LDLR and CD. In IBD, the susceptibility genes include FYN proto-oncogene and HCK proto-oncogene, which are involved in the phosphoinositide 3-kinase signaling network, which also contains FOXM1 (41). Since CD is a major subtype of IBD, FOXM1 may also be associated with CD. With regards to the LPL gene, decreased expression is observed in CD adipose-derived stem cells (ASCs) independent of clinical stage, compared with in healthy ASCs, indicating that CD modifies the plasticity of mesenteric ASCs (42). In the present study, LPL was upregulated in CD PPI network, indicating a potential role of LPL in the progression of CD; pathway enrichment analysis revealed that LPL was enriched in the PPAR signaling pathway. Interestingly, the alterations in LPL expression levels of ASC cells and CD tissues revealed different trends, suggesting that the creeping fat tissue may be associated with the immunomodulatory properties in patients with CD. Overall, the results demonstrated that increases in LDLR, FOXM1 and LPL may be highly relevant to the development of CD. TLRs initiate immune responses to microbial infection, with TLR2 recognizing bacterial lipoproteins, zymosan, peptidoglycan and lipoteichoic acids (43,44). TLR2, TLR4, and their transmembrane co-receptor cluster of differentiation 14, are all upregulated in the terminal ileum of patients with IBD, and their dysregulation may serve critical roles in the pathogenesis of IBD (45). NOD2 mediates the anti-inflammatory cytokine bias induced by TLR2 ligands; therefore, defective NOD2 function may promote inflammation and increase the disease risk of CD (46). TLR2 and TLR4 have been reported to be overexpressed in the inflamed colonic mucosa of children with IBD, indicating that innate immunity is associated with the pathogenesis of IBD (47). A previous study discovered that TLR2 and TLR4 are highly expressed on inflammation-associated intestinal macrophages (IMACs), leading to a higher reactivity to lipopolysaccharide and possibly CD (48). In addition, the role of Gp96, an endoplasmic reticulum chaperone for multiple protein substrates, in CD has been investigated. The results demonstrated that the lack of Gp96 in CD IMACs is associated with loss of tolerance against the host gut flora, and that the Gp96 knockdown cell line has decreased TLR2 and TLR4 expression (49). Therefore, it may be hypothesized that TLR2 has an important role in the pathogenesis of CD. Neurogenic inflammation is critical for the development of IBD, and NPY induces oxidative stress and colitis by regulating the expression of neuronal nitric oxide synthase (50). NPY and the NPY receptor Y1 (NPY1R) are associated with intestinal inflammation and suppression of NPY1R signaling may represent a potential therapeutic strategy for colonic inflammation (51,52). NPY peptides have been implicated in various gastrointestinal disorders, including IBD, malabsorption and short gut; and NPY receptor agonists and antagonists can be used for preventing diarrhea and intestinal inflammation (53). Gut neurohormones, including NPY, can affect inflammation by interacting with the immune system, and NPY represents a promising target for treating IBD (54). In the present study, NPY was reported as an important node with a high degree in the CD PPI network; however, the expression levels of NPY were lower in the CD samples of patients compared with in the healthy control, which is inconsistent to the reports (50,52). Considering the important role of NPY in CD and that its expression levels varies from that of previous reports, further validation of NPY mRNA and protein expression in clinical samples is required. Nevertheless, there are some limitations to the present study. Firstly, the results are solely based on bioinformatics analysis and although various important genes for CD etiology were identified, experimental validation is required. Secondly, only two CD microarray datasets were selected with relatively small sample sizes; therefore, this could affect the robustness of the results. Further experiments will be designed and conducted to validate these findings in follow-up studies. In conclusion, a total of 10 stable modules and 927 DEGs associated with CD were identified. Notably, LDLR, TLR2, FOXM1, NPY and LPL may be the key genes involved in pathogenesis of the disease. LPL may exert its effects through the PPAR signaling pathway.

52 in total

Review 1. Innate immunity in Crohn's disease: the reverse side of the medal.

Authors: Rodrigue Dessein; Mathias Chamaillard; Silvio Danese
Journal: J Clin Gastroenterol Date: 2008-09 Impact factor: 3.062

Review 2. Toll-like receptors and their crosstalk with other innate receptors in infection and immunity.

Authors: Taro Kawai; Shizuo Akira
Journal: Immunity Date: 2011-05-27 Impact factor: 31.745

3. BioGRID: A Resource for Studying Biological Interactions in Yeast.

Authors: Rose Oughtred; Andrew Chatr-aryamontri; Bobby-Joe Breitkreutz; Christie S Chang; Jennifer M Rust; Chandra L Theesfeld; Sven Heinicke; Ashton Breitkreutz; Daici Chen; Jodi Hirschman; Nadine Kolas; Michael S Livstone; Julie Nixon; Lara O'Donnell; Lindsay Ramage; Andrew Winter; Teresa Reguly; Adnane Sellam; Chris Stark; Lorrie Boucher; Kara Dolinski; Mike Tyers
Journal: Cold Spring Harb Protoc Date: 2016-01-04

Review 4. NPY family of hormones: clinical relevance and potential use in gastrointestinal disease.

Authors: L C Vona-Davis; D W McFadden
Journal: Curr Top Med Chem Date: 2007 Impact factor: 3.295

5. [The role of surgery in Crohn's disease treatment].

Authors: M Hut'an; M ml Hut'an
Journal: Rozhl Chir Date: 2009-04

6. Expression of Toll-like receptor 2 (TLR2), TLR4, and CD14 in biopsy samples of patients with inflammatory bowel diseases: upregulated expression of TLR2 in terminal ileum of patients with ulcerative colitis.

Authors: Lenka Frolova; Pavel Drastich; Pavel Rossmann; Klara Klimesova; Helena Tlaskalova-Hogenova
Journal: J Histochem Cytochem Date: 2007-11-26 Impact factor: 2.479

7. NOD2 mediates anti-inflammatory signals induced by TLR2 ligands: implications for Crohn's disease.

Authors: Mihai G Netea; Bart Jan Kullberg; Dirk J de Jong; Barbara Franke; Tom Sprong; Ton H J Naber; Joost P H Drenth; Jos W M Van der Meer
Journal: Eur J Immunol Date: 2004-07 Impact factor: 5.532

8. FoxM1 is overexpressed in Helicobacter pylori-induced gastric carcinogenesis and is negatively regulated by miR-370.

Authors: Yimin Feng; Lixiang Wang; Jiping Zeng; Li Shen; Xiuming Liang; Han Yu; Shili Liu; Zhifang Liu; Yundong Sun; Wenjuan Li; Chunyan Chen; Jihui Jia
Journal: Mol Cancer Res Date: 2013-04-10 Impact factor: 5.852

9. The peroxisome proliferator-activated receptor: A family of nuclear receptors role in various diseases.

Authors: Sandeep Tyagi; Paras Gupta; Arminder Singh Saini; Chaitnya Kaushal; Saurabh Sharma
Journal: J Adv Pharm Technol Res Date: 2011-10

10. Identification of novel predictor classifiers for inflammatory bowel disease by gene expression profiling.

Authors: Trinidad Montero-Meléndez; Xavier Llor; Esther García-Planella; Mauro Perretti; Antonio Suárez
Journal: PLoS One Date: 2013-10-14 Impact factor: 3.240

3 in total

Review 1. MicroRNAs overexpressed in Crohn's disease and their interactions with mechanisms of epigenetic regulation explain novel aspects of Crohn's disease pathogenesis.

Authors: Cecilia Fernández-Ponce; Roberto Navarro Quiroz; Anderson Díaz Perez; Gustavo Aroca Martinez; Andrés Cadena Bonfanti; Antonio Acosta Hoyos; Lorena Gómez Escorcia; Sandra Hernández Agudelo; Christian Orozco Sánchez; José Villarreal Camacho; Linda Atencio Ibarra; Jose Consuegra Machado; Alberto Espinoza Garavito; Francisco García-Cózar; Elkin Navarro Quiroz
Journal: Clin Epigenetics Date: 2021-02-18 Impact factor: 6.551

2. Single-Cell Sequencing of Developing Human Gut Reveals Transcriptional Links to Childhood Crohn's Disease.

Authors: Rasa Elmentaite; Alexander D B Ross; Kenny Roberts; Kylie R James; Daniel Ortmann; Tomás Gomes; Komal Nayak; Liz Tuck; Sophie Pritchard; Omer Ali Bayraktar; Robert Heuschkel; Ludovic Vallier; Sarah A Teichmann; Matthias Zilbauer
Journal: Dev Cell Date: 2020-12-07 Impact factor: 12.270

3. Diagnostic and Predictive Value of Immune-Related Genes in Crohn's Disease.

Authors: Bing Yu; Yi-Xin Yin; Yan-Ping Tang; Kang-Lai Wei; Zhi-Gang Pan; Ke-Zhi Li; Xian-Wen Guo; Bang-Li Hu
Journal: Front Immunol Date: 2021-04-16 Impact factor: 7.561

3 in total